🤖 Powerful Guide to Understanding The 3 Main Types Of Machine Learning That Experts Don't Want You to Know!
Hey there! Ready to dive into Understanding The 3 Main Types Of Machine Learning? This friendly guide will walk you through everything step-by-step with easy-to-follow examples. Perfect for beginners and pros alike!
🚀
💡 Pro tip: This is one of those techniques that will make you look like a data science wizard! Supervised Learning - Linear Regression Implementation - Made Simple!
Linear regression serves as a foundational supervised learning algorithm that models the relationship between dependent and independent variables. This example shows you how to create a simple linear regression model from scratch using numpy, focusing on the mathematical principles behind the algorithm.
Let’s make this super clear! Here’s how we can tackle this:
import numpy as np
import matplotlib.pyplot as plt
class LinearRegression:
def __init__(self, learning_rate=0.01, iterations=1000):
self.lr = learning_rate
self.iterations = iterations
self.weights = None
self.bias = None
def fit(self, X, y):
n_samples, n_features = X.shape
self.weights = np.zeros(n_features)
self.bias = 0
# Training loop
for _ in range(self.iterations):
y_pred = np.dot(X, self.weights) + self.bias
# Calculate gradients
dw = (1/n_samples) * np.dot(X.T, (y_pred - y))
db = (1/n_samples) * np.sum(y_pred - y)
# Update parameters
self.weights -= self.lr * dw
self.bias -= self.lr * db
def predict(self, X):
return np.dot(X, self.weights) + self.bias
# Example usage
X = np.random.randn(100, 1)
y = 2 * X + 1 + np.random.randn(100, 1) * 0.1
model = LinearRegression(learning_rate=0.01, iterations=1000)
model.fit(X, y)
predictions = model.predict(X)
🚀
🎉 You’re doing great! This concept might seem tricky at first, but you’ve got this! Unsupervised Learning - K-Means Clustering - Made Simple!
K-means clustering algorithm partitions n observations into k clusters by iteratively updating cluster centroids. This example shows how to create a k-means clustering algorithm from scratch, including centroid initialization and cluster assignment.
Don’t worry, this is easier than it looks! Here’s how we can tackle this:
import numpy as np
class KMeans:
def __init__(self, n_clusters=3, max_iters=100):
self.n_clusters = n_clusters
self.max_iters = max_iters
self.centroids = None
def fit(self, X):
# Initialize centroids randomly
idx = np.random.choice(len(X), self.n_clusters, replace=False)
self.centroids = X[idx]
for _ in range(self.max_iters):
# Assign clusters
distances = np.sqrt(((X - self.centroids[:, np.newaxis])**2).sum(axis=2))
cluster_labels = np.argmin(distances, axis=0)
# Update centroids
new_centroids = np.array([X[cluster_labels == k].mean(axis=0)
for k in range(self.n_clusters)])
if np.all(self.centroids == new_centroids):
break
self.centroids = new_centroids
return cluster_labels
def predict(self, X):
distances = np.sqrt(((X - self.centroids[:, np.newaxis])**2).sum(axis=2))
return np.argmin(distances, axis=0)
# Example usage
X = np.random.randn(300, 2) * 2
kmeans = KMeans(n_clusters=3)
labels = kmeans.fit(X)
🚀
✨ Cool fact: Many professional data scientists use this exact approach in their daily work! Reinforcement Learning - Q-Learning Algorithm - Made Simple!
Q-Learning is a model-free reinforcement learning algorithm that learns to make best decisions by maintaining a Q-table of state-action pairs. This example shows you a simple Q-learning agent in a discrete environment.
Here’s where it gets exciting! Here’s how we can tackle this:
import numpy as np
class QLearningAgent:
def __init__(self, states, actions, learning_rate=0.1, discount=0.95, epsilon=0.1):
self.q_table = np.zeros((states, actions))
self.lr = learning_rate
self.gamma = discount
self.epsilon = epsilon
def choose_action(self, state):
if np.random.random() < self.epsilon:
return np.random.randint(self.q_table.shape[1])
return np.argmax(self.q_table[state])
def learn(self, state, action, reward, next_state):
old_value = self.q_table[state, action]
next_max = np.max(self.q_table[next_state])
# Q-learning formula
new_value = (1 - self.lr) * old_value + self.lr * (reward + self.gamma * next_max)
self.q_table[state, action] = new_value
# Example usage
n_states = 10
n_actions = 4
agent = QLearningAgent(n_states, n_actions)
# Training loop example
state = 0
for _ in range(1000):
action = agent.choose_action(state)
next_state = min(state + action, n_states - 1) # Simple environment
reward = 1 if next_state == n_states - 1 else 0
agent.learn(state, action, reward, next_state)
state = next_state if next_state != n_states - 1 else 0
🚀
🔥 Level up: Once you master this, you’ll be solving problems like a pro! Neural Network Implementation from Scratch - Made Simple!
Neural networks form the backbone of deep learning, consisting of interconnected layers of neurons. This example shows how to create a basic feedforward neural network with backpropagation using only numpy.
Let’s break this down together! Here’s how we can tackle this:
import numpy as np
class NeuralNetwork:
def __init__(self, layers):
self.weights = [np.random.randn(y, x) * 0.01
for x, y in zip(layers[:-1], layers[1:])]
self.biases = [np.zeros((y, 1)) for y in layers[1:]]
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
return x * (1 - x)
def forward(self, X):
self.activations = [X]
for w, b in zip(self.weights, self.biases):
net = np.dot(w, self.activations[-1]) + b
self.activations.append(self.sigmoid(net))
return self.activations[-1]
def backward(self, X, y, learning_rate):
m = X.shape[1]
delta = self.activations[-1] - y
for l in range(len(self.weights) - 1, -1, -1):
dW = np.dot(delta, self.activations[l].T) / m
db = np.sum(delta, axis=1, keepdims=True) / m
if l > 0:
delta = np.dot(self.weights[l].T, delta) * \
self.sigmoid_derivative(self.activations[l])
self.weights[l] -= learning_rate * dW
self.biases[l] -= learning_rate * db
# Example usage
nn = NeuralNetwork([2, 4, 1])
X = np.random.randn(2, 100)
y = np.array([int(x1 > x2) for x1, x2 in X.T]).reshape(1, -1)
for _ in range(1000):
output = nn.forward(X)
nn.backward(X, y, 0.1)
🚀 Support Vector Machine Implementation - Made Simple!
Support Vector Machines find the best hyperplane that separates classes by maximizing the margin between them. This example shows you a simplified SVM using the Sequential Minimal Optimization (SMO) algorithm for binary classification.
Here’s a handy trick you’ll love! Here’s how we can tackle this:
import numpy as np
class SVM:
def __init__(self, C=1.0, max_iter=100):
self.C = C
self.max_iter = max_iter
def kernel(self, x1, x2):
return np.dot(x1, x2) # Linear kernel
def fit(self, X, y):
self.n_samples, self.n_features = X.shape
self.X = X
self.y = y
# Initialize alphas and bias
self.alphas = np.zeros(self.n_samples)
self.b = 0
# SMO Algorithm
for _ in range(self.max_iter):
alpha_pairs_changed = 0
for i in range(self.n_samples):
Ei = self._decision_function(X[i]) - y[i]
if (y[i] * Ei < -0.001 and self.alphas[i] < self.C) or \
(y[i] * Ei > 0.001 and self.alphas[i] > 0):
j = np.random.randint(0, self.n_samples)
while j == i:
j = np.random.randint(0, self.n_samples)
Ej = self._decision_function(X[j]) - y[j]
# Save old alphas
alpha_i_old = self.alphas[i]
alpha_j_old = self.alphas[j]
# Compute L and H
if y[i] != y[j]:
L = max(0, self.alphas[j] - self.alphas[i])
H = min(self.C, self.C + self.alphas[j] - self.alphas[i])
else:
L = max(0, self.alphas[i] + self.alphas[j] - self.C)
H = min(self.C, self.alphas[i] + self.alphas[j])
if L == H:
continue
# Compute eta
eta = 2 * self.kernel(X[i], X[j]) - \
self.kernel(X[i], X[i]) - \
self.kernel(X[j], X[j])
if eta >= 0:
continue
# Update alpha j
self.alphas[j] -= y[j] * (Ei - Ej) / eta
self.alphas[j] = min(H, max(L, self.alphas[j]))
if abs(self.alphas[j] - alpha_j_old) < 1e-5:
continue
# Update alpha i
self.alphas[i] += y[i] * y[j] * (alpha_j_old - self.alphas[j])
# Update threshold b
b1 = self.b - Ei - y[i] * (self.alphas[i] - alpha_i_old) * \
self.kernel(X[i], X[i]) - \
y[j] * (self.alphas[j] - alpha_j_old) * \
self.kernel(X[i], X[j])
b2 = self.b - Ej - y[i] * (self.alphas[i] - alpha_i_old) * \
self.kernel(X[i], X[j]) - \
y[j] * (self.alphas[j] - alpha_j_old) * \
self.kernel(X[j], X[j])
self.b = (b1 + b2) / 2
alpha_pairs_changed += 1
if alpha_pairs_changed == 0:
break
def _decision_function(self, X):
return np.sum(self.alphas * self.y * \
np.apply_along_axis(lambda x: self.kernel(x, X), 1, self.X)) + self.b
def predict(self, X):
return np.sign([self._decision_function(x) for x in X])
# Example usage
X = np.random.randn(100, 2)
y = np.array([1 if x[0] + x[1] > 0 else -1 for x in X])
svm = SVM(C=1.0)
svm.fit(X, y)
predictions = svm.predict(X)
🚀 Decision Tree Implementation - Made Simple!
Decision trees are versatile machine learning algorithms that make predictions by learning simple decision rules from data. This example shows how to build a decision tree classifier from scratch with information gain splitting criterion.
Don’t worry, this is easier than it looks! Here’s how we can tackle this:
import numpy as np
from collections import Counter
class Node:
def __init__(self, feature=None, threshold=None, left=None,
right=None, value=None):
self.feature = feature
self.threshold = threshold
self.left = left
self.right = right
self.value = value
class DecisionTree:
def __init__(self, max_depth=10):
self.max_depth = max_depth
self.root = None
def _entropy(self, y):
hist = np.bincount(y)
ps = hist / len(y)
return -np.sum([p * np.log2(p) for p in ps if p > 0])
def _information_gain(self, y, X_column, threshold):
parent_entropy = self._entropy(y)
left_mask = X_column <= threshold
right_mask = ~left_mask
if len(y[left_mask]) == 0 or len(y[right_mask]) == 0:
return 0
n = len(y)
n_l, n_r = len(y[left_mask]), len(y[right_mask])
e_l, e_r = self._entropy(y[left_mask]), self._entropy(y[right_mask])
child_entropy = (n_l/n) * e_l + (n_r/n) * e_r
return parent_entropy - child_entropy
def _best_split(self, X, y):
best_gain = -1
best_feature = None
best_threshold = None
n_features = X.shape[1]
for feature in range(n_features):
thresholds = np.unique(X[:, feature])
for threshold in thresholds:
gain = self._information_gain(y, X[:, feature], threshold)
if gain > best_gain:
best_gain = gain
best_feature = feature
best_threshold = threshold
return best_feature, best_threshold
def _build_tree(self, X, y, depth=0):
n_samples, n_features = X.shape
n_classes = len(np.unique(y))
# Stopping criteria
if (self.max_depth is not None and depth >= self.max_depth) or \
n_classes == 1 or n_samples < 2:
leaf_value = max(Counter(y).items(), key=lambda x: x[1])[0]
return Node(value=leaf_value)
# Find best split
best_feature, best_threshold = self._best_split(X, y)
# Create child splits
left_idxs = X[:, best_feature] <= best_threshold
right_idxs = ~left_idxs
left = self._build_tree(X[left_idxs], y[left_idxs], depth+1)
right = self._build_tree(X[right_idxs], y[right_idxs], depth+1)
return Node(best_feature, best_threshold, left, right)
def fit(self, X, y):
self.n_classes = len(np.unique(y))
self.root = self._build_tree(X, y)
def _traverse_tree(self, x, node):
if node.value is not None:
return node.value
if x[node.feature] <= node.threshold:
return self._traverse_tree(x, node.left)
return self._traverse_tree(x, node.right)
def predict(self, X):
return np.array([self._traverse_tree(x, self.root) for x in X])
# Example usage
X = np.random.randn(100, 2)
y = np.array([0 if x[0] + x[1] > 0 else 1 for x in X])
tree = DecisionTree(max_depth=5)
tree.fit(X, y)
predictions = tree.predict(X)
🚀 Gradient Boosting Implementation - Made Simple!
Gradient Boosting combines multiple weak learners into a strong predictor by iteratively fitting new models to the residuals of previous predictions. This example shows a basic gradient boosting regressor using decision trees.
Ready for some cool stuff? Here’s how we can tackle this:
import numpy as np
from sklearn.tree import DecisionTreeRegressor
class GradientBoostingRegressor:
def __init__(self, n_estimators=100, learning_rate=0.1, max_depth=3):
self.n_estimators = n_estimators
self.learning_rate = learning_rate
self.max_depth = max_depth
self.trees = []
def fit(self, X, y):
self.initial_prediction = np.mean(y)
# Initialize predictions with mean value
F = np.full_like(y, self.initial_prediction, dtype=np.float64)
for _ in range(self.n_estimators):
# Calculate negative gradients (residuals)
residuals = y - F
# Fit a new tree to the residuals
tree = DecisionTreeRegressor(max_depth=self.max_depth)
tree.fit(X, residuals)
# Update predictions
predictions = tree.predict(X)
F += self.learning_rate * predictions
self.trees.append(tree)
def predict(self, X):
# Start with initial prediction
predictions = np.full(X.shape[0], self.initial_prediction,
dtype=np.float64)
# Add predictions from each tree
for tree in self.trees:
predictions += self.learning_rate * tree.predict(X)
return predictions
# Example usage
np.random.seed(42)
X = np.random.randn(100, 2)
y = 3 * X[:, 0] + 2 * X[:, 1] + np.random.randn(100) * 0.1
gb = GradientBoostingRegressor(n_estimators=100, learning_rate=0.1, max_depth=3)
gb.fit(X, y)
predictions = gb.predict(X)
# Calculate MSE
mse = np.mean((predictions - y) ** 2)
print(f"Mean Squared Error: {mse:.4f}")
🚀 Random Forest Implementation - Made Simple!
Random Forest is an ensemble learning method that constructs multiple decision trees and outputs the mean prediction of the individual trees. This example shows you how to build a random forest classifier from scratch.
Here’s a handy trick you’ll love! Here’s how we can tackle this:
import numpy as np
from collections import Counter
class RandomForestClassifier:
def __init__(self, n_trees=10, max_depth=10, min_samples_split=2,
n_features=None):
self.n_trees = n_trees
self.max_depth = max_depth
self.min_samples_split = min_samples_split
self.n_features = n_features
self.trees = []
def _bootstrap_samples(self, X, y):
n_samples = X.shape[0]
idxs = np.random.choice(n_samples, size=n_samples, replace=True)
return X[idxs], y[idxs]
def _get_random_features(self, n_features):
feature_idxs = np.random.choice(self.n_features_total,
size=n_features, replace=False)
return feature_idxs
def fit(self, X, y):
self.n_classes = len(np.unique(y))
self.n_features_total = X.shape[1]
if self.n_features is None:
self.n_features = int(np.sqrt(self.n_features_total))
# Create trees
for _ in range(self.n_trees):
tree = DecisionTree(max_depth=self.max_depth,
min_samples_split=self.min_samples_split)
# Get bootstrap samples
X_sample, y_sample = self._bootstrap_samples(X, y)
# Get random feature subset
feature_idxs = self._get_random_features(self.n_features)
# Train tree on bootstrap samples with random features
tree.fit(X_sample[:, feature_idxs], y_sample)
self.trees.append((tree, feature_idxs))
def predict(self, X):
tree_predictions = []
for tree, feature_idxs in self.trees:
prediction = tree.predict(X[:, feature_idxs])
tree_predictions.append(prediction)
# Take majority vote
tree_predictions = np.array(tree_predictions).T
predictions = [Counter(pred).most_common(1)[0][0]
for pred in tree_predictions]
return np.array(predictions)
class DecisionTree:
def __init__(self, max_depth=None, min_samples_split=2):
self.max_depth = max_depth
self.min_samples_split = min_samples_split
self.root = None
def fit(self, X, y):
self.n_classes = len(np.unique(y))
self.root = self._grow_tree(X, y)
def _grow_tree(self, X, y, depth=0):
n_samples, n_features = X.shape
n_labels = len(np.unique(y))
# Stopping criteria
if (self.max_depth is not None and depth >= self.max_depth) or \
n_labels == 1 or \
n_samples < self.min_samples_split:
leaf_value = self._most_common_label(y)
return Node(value=leaf_value)
# Find best split
feat_idxs = np.random.choice(n_features, n_features, replace=False)
best_feat, best_thresh = self._best_split(X, y, feat_idxs)
# Create child splits
left_idxs = X[:, best_feat] <= best_thresh
right_idxs = ~left_idxs
left = self._grow_tree(X[left_idxs], y[left_idxs], depth+1)
right = self._grow_tree(X[right_idxs], y[right_idxs], depth+1)
return Node(best_feat, best_thresh, left, right)
def _most_common_label(self, y):
counter = Counter(y)
return counter.most_common(1)[0][0]
def predict(self, X):
return np.array([self._traverse_tree(x, self.root) for x in X])
def _traverse_tree(self, x, node):
if node.is_leaf():
return node.value
if x[node.feature] <= node.threshold:
return self._traverse_tree(x, node.left)
return self._traverse_tree(x, node.right)
class Node:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature
self.threshold = threshold
self.left = left
self.right = right
self.value = value
def is_leaf(self):
return self.value is not None
# Example usage
X = np.random.randn(100, 4)
y = np.array([0 if np.sum(x) > 0 else 1 for x in X])
rf = RandomForestClassifier(n_trees=10, max_depth=5)
rf.fit(X, y)
predictions = rf.predict(X)
🚀 K-Nearest Neighbors Implementation - Made Simple!
K-Nearest Neighbors is a simple yet powerful algorithm that makes predictions based on the majority class or average value of the k closest training examples. This example shows both classification and regression capabilities.
Here’s where it gets exciting! Here’s how we can tackle this:
import numpy as np
from collections import Counter
class KNN:
def __init__(self, k=3, weighted=True):
self.k = k
self.weighted = weighted
def fit(self, X, y):
self.X_train = X
self.y_train = y
def _euclidean_distance(self, x1, x2):
return np.sqrt(np.sum((x1 - x2) ** 2))
def _get_neighbors(self, x):
# Calculate distances between x and all examples in the training set
distances = [self._euclidean_distance(x, x_train)
for x_train in self.X_train]
# Get indices of k-nearest neighbors
k_indices = np.argsort(distances)[:self.k]
# Get corresponding distances
k_distances = np.array(distances)[k_indices]
# Get labels of k-nearest neighbors
k_nearest_labels = self.y_train[k_indices]
return k_nearest_labels, k_distances
def predict(self, X):
predictions = [self._predict(x) for x in X]
return np.array(predictions)
def _predict(self, x):
# Get k nearest neighbors
k_labels, k_distances = self._get_neighbors(x)
# For regression
if isinstance(self.y_train[0], (int, float, np.integer, np.floating)):
if self.weighted:
# Avoid division by zero
weights = 1 / (k_distances + 1e-10)
return np.sum(k_labels * weights) / np.sum(weights)
return np.mean(k_labels)
# For classification
if self.weighted:
# Weight votes by inverse distance
weights = 1 / (k_distances + 1e-10)
weighted_votes = {}
for label, weight in zip(k_labels, weights):
weighted_votes[label] = weighted_votes.get(label, 0) + weight
return max(weighted_votes.items(), key=lambda x: x[1])[0]
# Majority voting
counter = Counter(k_labels)
return counter.most_common(1)[0][0]
# Example usage - Classification
X = np.random.randn(100, 2)
y = np.array([0 if x[0] + x[1] > 0 else 1 for x in X])
knn_clf = KNN(k=3, weighted=True)
knn_clf.fit(X, y)
clf_predictions = knn_clf.predict(X[:5])
# Example usage - Regression
y_reg = X[:, 0] * 2 + X[:, 1] * 3 + np.random.randn(100) * 0.1
knn_reg = KNN(k=3, weighted=True)
knn_reg.fit(X, y_reg)
reg_predictions = knn_reg.predict(X[:5])
print("Classification predictions:", clf_predictions)
print("Regression predictions:", reg_predictions)
🚀 Principal Component Analysis Implementation - Made Simple!
PCA is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance. This example shows how to compute PCA from scratch using eigendecomposition.
Don’t worry, this is easier than it looks! Here’s how we can tackle this:
import numpy as np
class PCA:
def __init__(self, n_components=None):
self.n_components = n_components
self.components = None
self.mean = None
self.explained_variance = None
def fit(self, X):
# Center the data
self.mean = np.mean(X, axis=0)
X_centered = X - self.mean
# Compute covariance matrix
cov_matrix = np.cov(X_centered.T)
# Compute eigenvalues and eigenvectors
eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
# Sort eigenvectors by eigenvalues in descending order
idx = np.argsort(eigenvalues)[::-1]
eigenvalues = eigenvalues[idx]
eigenvectors = eigenvectors[:, idx]
# Store explained variance
self.explained_variance = eigenvalues
# Store first n_components eigenvectors
if self.n_components is None:
self.n_components = X.shape[1]
self.components = eigenvectors[:, :self.n_components]
def transform(self, X):
# Center the data
X_centered = X - self.mean
# Project data onto principal components
return np.dot(X_centered, self.components)
def inverse_transform(self, X):
# Project back to original space
return np.dot(X, self.components.T) + self.mean
def get_explained_variance_ratio(self):
return self.explained_variance / np.sum(self.explained_variance)
# Example usage
np.random.seed(42)
X = np.random.randn(100, 5)
# Add some correlation
X[:, 1] = X[:, 0] * 2 + np.random.randn(100) * 0.1
X[:, 2] = X[:, 0] * -0.5 + X[:, 1] * 0.8 + np.random.randn(100) * 0.1
pca = PCA(n_components=2)
pca.fit(X)
# Transform data
X_transformed = pca.transform(X)
# Get explained variance ratio
explained_variance_ratio = pca.get_explained_variance_ratio()
print("Explained variance ratio:", explained_variance_ratio[:2])
print("Transformed data shape:", X_transformed.shape)
# Reconstruct original data
X_reconstructed = pca.inverse_transform(X_transformed)
reconstruction_error = np.mean((X - X_reconstructed) ** 2)
print("Reconstruction error:", reconstruction_error)
🚀 Neural Network with Backpropagation - Mathematical Foundation - Made Simple!
The mathematical foundation of neural networks relies on forward propagation for predictions and backpropagation for learning. This example shows you the core mathematical concepts using matrix operations and chain rule.
Let’s break this down together! Here’s how we can tackle this:
import numpy as np
class NeuralNetworkMath:
def __init__(self, layer_sizes):
"""
Mathematical implementation showing detailed computations
layer_sizes: list of integers representing neurons per layer
"""
self.weights = []
self.biases = []
for i in range(len(layer_sizes) - 1):
# Initialize weights and biases with He initialization
self.weights.append(np.random.randn(layer_sizes[i+1], layer_sizes[i]) *
np.sqrt(2.0/layer_sizes[i]))
self.biases.append(np.random.randn(layer_sizes[i+1], 1))
def sigmoid(self, z):
"""Sigmoid activation function"""
return 1 / (1 + np.exp(-z))
def sigmoid_prime(self, z):
"""Derivative of sigmoid function"""
s = self.sigmoid(z)
return s * (1 - s)
def cost_derivative(self, output_activations, y):
"""
Cost function derivative for MSE
$$\frac{\partial C}{\partial a} = (a - y)$$
"""
return output_activations - y
def feedforward(self, a):
"""
Forward propagation with mathematical notation
$$a^{l+1} = \sigma(w^l a^l + b^l)$$
"""
self.zs = [] # List to store all z vectors
self.activations = [a] # List to store all activations
for w, b in zip(self.weights, self.biases):
z = np.dot(w, a) + b
self.zs.append(z)
a = self.sigmoid(z)
self.activations.append(a)
return a
def backprop(self, x, y):
"""
Backpropagation algorithm implementation
Returns gradients for weights and biases
Key equations:
$$\delta^L = \nabla_a C \odot \sigma'(z^L)$$
$$\delta^l = ((w^{l+1})^T \delta^{l+1}) \odot \sigma'(z^l)$$
$$\frac{\partial C}{\partial w^l} = \delta^l (a^{l-1})^T$$
$$\frac{\partial C}{\partial b^l} = \delta^l$$
"""
nabla_w = [np.zeros(w.shape) for w in self.weights]
nabla_b = [np.zeros(b.shape) for b in self.biases]
# Forward pass
output = self.feedforward(x)
# Backward pass
# Compute delta for output layer
delta = self.cost_derivative(output, y) * \
self.sigmoid_prime(self.zs[-1])
nabla_b[-1] = delta
nabla_w[-1] = np.dot(delta, self.activations[-2].T)
# Compute delta for hidden layers
for l in range(2, len(self.weights) + 1):
delta = np.dot(self.weights[-l+1].T, delta) * \
self.sigmoid_prime(self.zs[-l])
nabla_b[-l] = delta
nabla_w[-l] = np.dot(delta, self.activations[-l-1].T)
return nabla_w, nabla_b
def update_mini_batch(self, mini_batch, learning_rate):
"""
Update weights and biases using mini-batch gradient descent
$$w^l \rightarrow w^l - \frac{\eta}{m} \sum \frac{\partial C}{\partial w^l}$$
$$b^l \rightarrow b^l - \frac{\eta}{m} \sum \frac{\partial C}{\partial b^l}$$
"""
nabla_w = [np.zeros(w.shape) for w in self.weights]
nabla_b = [np.zeros(b.shape) for b in self.biases]
for x, y in mini_batch:
delta_nabla_w, delta_nabla_b = self.backprop(x, y)
nabla_w = [nw + dnw for nw, dnw in zip(nabla_w, delta_nabla_w)]
nabla_b = [nb + dnb for nb, dnb in zip(nabla_b, delta_nabla_b)]
self.weights = [w - (learning_rate/len(mini_batch)) * nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b - (learning_rate/len(mini_batch)) * nb
for b, nb in zip(self.biases, nabla_b)]
# Example usage
# Create network with 2 inputs, 3 hidden neurons, and 1 output
nn = NeuralNetworkMath([2, 3, 1])
# Training data: XOR problem
X = np.array([[[0], [0]], [[0], [1]], [[1], [0]], [[1], [1]]])
y = np.array([[[0]], [[1]], [[1]], [[0]]])
# Train for a few epochs
for epoch in range(1000):
for i in range(len(X)):
nn.update_mini_batch([(X[i], y[i])], learning_rate=0.1)
# Test the network
for x in X:
prediction = nn.feedforward(x)
print(f"Input: {x.T}, Prediction: {prediction.T}")
🚀 Implementation of Convolutional Neural Network Operations - Made Simple!
Convolutional Neural Networks are specialized for processing grid-like data. This example shows the fundamental operations of convolution and pooling layers from scratch without using deep learning frameworks.
Here’s where it gets exciting! Here’s how we can tackle this:
import numpy as np
class CNNOperations:
@staticmethod
def conv2d(input_volume, kernel, stride=1, padding=0):
"""
builds 2D convolution operation
Parameters:
- input_volume: shape (height, width, channels)
- kernel: shape (kernel_height, kernel_width, in_channels, out_channels)
"""
if padding > 0:
input_volume = np.pad(
input_volume,
((padding, padding), (padding, padding), (0, 0)),
mode='constant'
)
h_in, w_in, c_in = input_volume.shape
k_h, k_w, _, c_out = kernel.shape
# Calculate output dimensions
h_out = (h_in - k_h) // stride + 1
w_out = (w_in - k_w) // stride + 1
# Initialize output volume
output = np.zeros((h_out, w_out, c_out))
# Perform convolution
for h in range(h_out):
for w in range(w_out):
h_start = h * stride
w_start = w * stride
# Extract patch from input volume
patch = input_volume[
h_start:h_start+k_h,
w_start:w_start+k_w,
:
]
# Compute convolution for all output channels
for c in range(c_out):
output[h, w, c] = np.sum(
patch * kernel[:, :, :, c]
)
return output
@staticmethod
def max_pooling2d(input_volume, pool_size=2, stride=2):
"""
builds 2D max pooling operation
Parameters:
- input_volume: shape (height, width, channels)
- pool_size: size of pooling window
- stride: stride of pooling operation
"""
h_in, w_in, c = input_volume.shape
# Calculate output dimensions
h_out = (h_in - pool_size) // stride + 1
w_out = (w_in - pool_size) // stride + 1
# Initialize output volume
output = np.zeros((h_out, w_out, c))
# Perform max pooling
for h in range(h_out):
for w in range(w_out):
h_start = h * stride
w_start = w * stride
# Extract patch and compute max for each channel
patch = input_volume[
h_start:h_start+pool_size,
w_start:w_start+pool_size,
:
]
output[h, w, :] = np.max(np.max(patch, axis=0), axis=0)
return output
@staticmethod
def relu(x):
"""ReLU activation function"""
return np.maximum(0, x)
@staticmethod
def softmax(x):
"""Softmax activation function"""
exp_x = np.exp(x - np.max(x, axis=-1, keepdims=True))
return exp_x / np.sum(exp_x, axis=-1, keepdims=True)
# Example usage
# Create sample input volume (6x6 image with 3 channels)
input_volume = np.random.randn(6, 6, 3)
# Create sample kernels (3x3 kernels, 3 input channels, 2 output channels)
kernels = np.random.randn(3, 3, 3, 2)
# Demonstrate convolution operation
conv_output = CNNOperations.conv2d(input_volume, kernels, stride=1, padding=1)
print("Convolution output shape:", conv_output.shape)
# Apply ReLU activation
relu_output = CNNOperations.relu(conv_output)
print("ReLU output shape:", relu_output.shape)
# Apply max pooling
pooling_output = CNNOperations.max_pooling2d(relu_output, pool_size=2, stride=2)
print("Max pooling output shape:", pooling_output.shape)
# Flatten and apply softmax
flattened = pooling_output.reshape(-1)
softmax_output = CNNOperations.softmax(flattened)
print("Softmax output shape:", softmax_output.shape)
🚀 Autoencoders - Dimensionality Reduction and Feature Learning - Made Simple!
Autoencoders are neural networks that learn to compress and reconstruct data. This example shows a simple autoencoder with customizable architecture for unsupervised feature learning.
This next part is really neat! Here’s how we can tackle this:
import numpy as np
class Autoencoder:
def __init__(self, input_dim, encoding_dims, learning_rate=0.01):
"""
Parameters:
- input_dim: dimension of input data
- encoding_dims: list of dimensions for encoder layers
- learning_rate: learning rate for gradient descent
"""
self.input_dim = input_dim
self.encoding_dims = encoding_dims
self.learning_rate = learning_rate
# Initialize weights and biases
self.weights = []
self.biases = []
# Encoder weights
prev_dim = input_dim
for dim in encoding_dims:
self.weights.append(
np.random.randn(prev_dim, dim) * np.sqrt(2.0/prev_dim)
)
self.biases.append(np.zeros((dim, 1)))
prev_dim = dim
# Decoder weights (mirror of encoder)
for i in range(len(encoding_dims)-2, -1, -1):
dim = encoding_dims[i]
self.weights.append(
np.random.randn(prev_dim, dim) * np.sqrt(2.0/prev_dim)
)
self.biases.append(np.zeros((dim, 1)))
prev_dim = dim
# Output layer
self.weights.append(
np.random.randn(prev_dim, input_dim) * np.sqrt(2.0/prev_dim)
)
self.biases.append(np.zeros((input_dim, 1)))
def sigmoid(self, x):
return 1 / (1 + np.exp(-x))
def sigmoid_derivative(self, x):
s = self.sigmoid(x)
return s * (1 - s)
def forward(self, x):
"""Forward pass through the autoencoder"""
self.activations = [x]
self.zs = []
# Forward propagation
a = x
for w, b in zip(self.weights, self.biases):
z = np.dot(w.T, a) + b
self.zs.append(z)
a = self.sigmoid(z)
self.activations.append(a)
return a
def backward(self, x, output):
"""Backward pass to compute gradients"""
nabla_w = [np.zeros(w.shape) for w in self.weights]
nabla_b = [np.zeros(b.shape) for b in self.biases]
# Compute output error
delta = (output - x) * self.sigmoid_derivative(self.zs[-1])
# Backpropagate error
for l in range(len(self.weights)):
nabla_b[-l-1] = delta
nabla_w[-l-1] = np.dot(self.activations[-l-2], delta.T)
if l < len(self.weights) - 1:
delta = np.dot(self.weights[-l-1], delta) * \
self.sigmoid_derivative(self.zs[-l-2])
return nabla_w, nabla_b
def train_step(self, x):
"""Perform one training step"""
# Forward pass
output = self.forward(x)
# Backward pass
nabla_w, nabla_b = self.backward(x, output)
# Update weights and biases
self.weights = [w - self.learning_rate * nw
for w, nw in zip(self.weights, nabla_w)]
self.biases = [b - self.learning_rate * nb
for b, nb in zip(self.biases, nabla_b)]
# Return reconstruction error
return np.mean((output - x) ** 2)
# Example usage
# Create synthetic data
data_dim = 10
n_samples = 1000
data = np.random.randn(data_dim, n_samples)
# Create autoencoder
autoencoder = Autoencoder(
input_dim=data_dim,
encoding_dims=[8, 4, 2] # Compress to 2 dimensions
)
# Train autoencoder
n_epochs = 100
for epoch in range(n_epochs):
total_error = 0
for i in range(n_samples):
x = data[:, i:i+1]
error = autoencoder.train_step(x)
total_error += error
if (epoch + 1) % 10 == 0:
print(f"Epoch {epoch+1}, Average Error: {total_error/n_samples:.4f}")
# Encode some data
sample_data = data[:, :5]
encoded_data = autoencoder.forward(sample_data)
print("\nOriginal data shape:", sample_data.shape)
print("Encoded data shape:", encoded_data.shape)
🚀 Natural Language Processing Basic Implementations - Made Simple!
Natural Language Processing involves various techniques for processing and analyzing text data. This example shows fundamental NLP operations including tokenization, TF-IDF, and basic text classification.
Ready for some cool stuff? Here’s how we can tackle this:
import numpy as np
from collections import Counter, defaultdict
import re
class NLPToolkit:
def __init__(self):
self.vocab = set()
self.word2idx = {}
self.idx2word = {}
self.idf = {}
def tokenize(self, text):
"""Basic tokenization"""
# Convert to lowercase and split on non-word characters
tokens = re.findall(r'\w+', text.lower())
return tokens
def build_vocabulary(self, texts):
"""Build vocabulary from list of texts"""
word_counts = Counter()
for text in texts:
tokens = self.tokenize(text)
word_counts.update(tokens)
# Create vocabulary (words appearing at least twice)
self.vocab = {word for word, count in word_counts.items()
if count >= 2}
# Create word-to-index mappings
self.word2idx = {word: idx for idx, word in enumerate(self.vocab)}
self.idx2word = {idx: word for word, idx in self.word2idx.items()}
def compute_tf(self, text):
"""Compute term frequency"""
tokens = self.tokenize(text)
tf = Counter(tokens)
# Normalize by document length
total_terms = len(tokens)
return {term: count/total_terms for term, count in tf.items()}
def compute_idf(self, texts):
"""Compute inverse document frequency"""
doc_count = len(texts)
term_doc_count = defaultdict(int)
for text in texts:
# Count each term only once per document
terms = set(self.tokenize(text))
for term in terms:
term_doc_count[term] += 1
# Compute IDF
self.idf = {term: np.log(doc_count/(count + 1)) + 1
for term, count in term_doc_count.items()}
def compute_tfidf(self, text):
"""Compute TF-IDF for a document"""
tf = self.compute_tf(text)
return {term: tf_val * self.idf.get(term, 0)
for term, tf_val in tf.items()}
def text_to_bow(self, text):
"""Convert text to bag-of-words vector"""
tokens = self.tokenize(text)
bow = np.zeros(len(self.vocab))
for token in tokens:
if token in self.word2idx:
bow[self.word2idx[token]] += 1
return bow
def text_to_tfidf_vector(self, text):
"""Convert text to TF-IDF vector"""
tfidf = self.compute_tfidf(text)
vector = np.zeros(len(self.vocab))
for term, value in tfidf.items():
if term in self.word2idx:
vector[self.word2idx[term]] = value
return vector
class NaiveBayesClassifier:
def __init__(self, nlp_toolkit):
self.nlp = nlp_toolkit
self.class_probs = {}
self.word_probs = defaultdict(dict)
def train(self, texts, labels):
"""Train Naive Bayes classifier"""
# Compute class probabilities
label_counts = Counter(labels)
total_docs = len(labels)
self.class_probs = {label: count/total_docs
for label, count in label_counts.items()}
# Compute word probabilities per class
word_counts = defaultdict(Counter)
for text, label in zip(texts, labels):
tokens = self.nlp.tokenize(text)
word_counts[label].update(tokens)
# Compute word probabilities with Laplace smoothing
vocab_size = len(self.nlp.vocab)
for label in self.class_probs:
total_words = sum(word_counts[label].values())
for word in self.nlp.vocab:
count = word_counts[label][word]
# Add-one smoothing
prob = (count + 1) / (total_words + vocab_size)
self.word_probs[label][word] = prob
def predict(self, text):
"""Predict class for text"""
tokens = self.nlp.tokenize(text)
scores = {}
for label in self.class_probs:
# Start with log of class probability
score = np.log(self.class_probs[label])
# Add log probabilities of words
for token in tokens:
if token in self.nlp.vocab:
score += np.log(self.word_probs[label][token])
scores[label] = score
# Return label with highest score
return max(scores.items(), key=lambda x: x[1])[0]
# Example usage
# Sample texts and labels
texts = [
"machine learning is fascinating",
"deep neural networks are powerful",
"natural language processing with python",
"statistical analysis and data science",
"artificial intelligence and robotics"
]
labels = ["ML", "DL", "NLP", "Stats", "AI"]
# Initialize NLP toolkit
nlp = NLPToolkit()
nlp.build_vocabulary(texts)
nlp.compute_idf(texts)
# Create and train classifier
classifier = NaiveBayesClassifier(nlp)
classifier.train(texts, labels)
# Test classification
test_text = "learning neural networks with python"
prediction = classifier.predict(test_text)
print(f"Predicted class for '{test_text}': {prediction}")
# Show TF-IDF vector for a document
tfidf_vector = nlp.text_to_tfidf_vector(test_text)
print("\nTF-IDF vector shape:", tfidf_vector.shape)
print("Non-zero terms:")
for idx, value in enumerate(tfidf_vector):
if value > 0:
print(f"{nlp.idx2word[idx]}: {value:.4f}")
🚀 Additional Resources - Made Simple!
- “Deep Learning” by Goodfellow, Bengio, and Courville
- “Machine Learning: A Probabilistic Perspective” by Kevin Murphy
- Search for: “Murphy ML Probabilistic Perspective PDF”
- “Natural Language Processing with Deep Learning”
- “A Tutorial on Support Vector Machines for Pattern Recognition”
- “Random Forests in Machine Learning”
- Search for: “Breiman Random Forests Statistics PDF”
- “An Introduction to Statistical Learning”
🎊 Awesome Work!
You’ve just learned some really powerful techniques! Don’t worry if everything doesn’t click immediately - that’s totally normal. The best way to master these concepts is to practice with your own data.
What’s next? Try implementing these examples with your own datasets. Start small, experiment, and most importantly, have fun with it! Remember, every data science expert started exactly where you are right now.
Keep coding, keep learning, and keep being awesome! 🚀