Data Science

🤖 Unsupervised Machine Learning In Python Secrets That Will Transform Your!

Hey there! Ready to dive into Unsupervised Machine Learning In Python? This friendly guide will walk you through everything step-by-step with easy-to-follow examples. Perfect for beginners and pros alike!

SuperML Team
Share this article

Share:

🚀

💡 Pro tip: This is one of those techniques that will make you look like a data science wizard! Introduction to Unsupervised Learning Fundamentals - Made Simple!

Unsupervised learning algorithms discover hidden patterns in unlabeled data by identifying inherent structures and relationships. These algorithms operate without explicit target variables, making them essential for exploratory data analysis and feature learning in real-world applications where labeled data is scarce.

Let me walk you through this step by step! Here’s how we can tackle this:

# Basic example of data preparation for unsupervised learning
import numpy as np
from sklearn.preprocessing import StandardScaler

# Generate synthetic data
np.random.seed(42)
X = np.random.randn(1000, 5)  # 1000 samples, 5 features

# Standardize features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

print("Original data shape:", X.shape)
print("Scaled data statistics:")
print("Mean:", X_scaled.mean(axis=0))
print("Std:", X_scaled.std(axis=0))

🚀

🎉 You’re doing great! This concept might seem tricky at first, but you’ve got this! K-Means Clustering Implementation - Made Simple!

K-means clustering partitions n observations into k clusters by iteratively assigning points to the nearest centroid and updating centroid positions. This example shows you the algorithm’s core mechanics without using external libraries, showcasing the fundamental concepts.

Here’s where it gets exciting! Here’s how we can tackle this:

def kmeans_from_scratch(X, k, max_iters=100):
    # Randomly initialize centroids
    n_samples, n_features = X.shape
    centroids = X[np.random.choice(n_samples, k, replace=False)]
    
    for _ in range(max_iters):
        # Assign points to nearest centroid
        distances = np.sqrt(((X - centroids[:, np.newaxis])**2).sum(axis=2))
        labels = np.argmin(distances, axis=0)
        
        # Update centroids
        new_centroids = np.array([X[labels == i].mean(axis=0) for i in range(k)])
        
        # Check convergence
        if np.all(centroids == new_centroids):
            break
        centroids = new_centroids
    
    return labels, centroids

# Example usage
X = np.random.randn(300, 2)  # 300 samples, 2 features
labels, centroids = kmeans_from_scratch(X, k=3)

🚀

Cool fact: Many professional data scientists use this exact approach in their daily work! Principal Component Analysis Theory - Made Simple!

Principal Component Analysis (PCA) is a dimensionality reduction technique that transforms high-dimensional data into a lower-dimensional space while preserving maximum variance. The mathematical foundation involves eigendecomposition of the covariance matrix.

Here’s where it gets exciting! Here’s how we can tackle this:

# Mathematical representation of PCA
"""
Covariance Matrix:
$$C = \frac{1}{n-1}X^TX$$

Eigendecomposition:
$$C = V\Lambda V^T$$

Projection Matrix:
$$P = XV$$

Where:
$$X$$ is the centered data matrix
$$V$$ contains eigenvectors
$$\Lambda$$ contains eigenvalues
"""

🚀

🔥 Level up: Once you master this, you’ll be solving problems like a pro! PCA Implementation from Scratch - Made Simple!

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

def pca_from_scratch(X, n_components):
    # Center the data
    X_centered = X - np.mean(X, axis=0)
    
    # Compute covariance matrix
    cov_matrix = np.cov(X_centered.T)
    
    # Compute eigenvectors and eigenvalues
    eigenvalues, eigenvectors = np.linalg.eigh(cov_matrix)
    
    # Sort eigenvectors by eigenvalues in descending order
    idx = np.argsort(eigenvalues)[::-1]
    eigenvectors = eigenvectors[:, idx]
    
    # Select top n_components eigenvectors
    components = eigenvectors[:, :n_components]
    
    # Project data
    X_transformed = np.dot(X_centered, components)
    
    return X_transformed, components

# Example usage
X = np.random.randn(200, 5)
X_transformed, components = pca_from_scratch(X, n_components=2)
print("Original shape:", X.shape)
print("Transformed shape:", X_transformed.shape)

🚀 Hierarchical Clustering Implementation - Made Simple!

Hierarchical clustering builds a tree of clusters by progressively merging or splitting groups based on distance metrics. This example shows you agglomerative clustering using single linkage strategy.

Let’s break this down together! Here’s how we can tackle this:

def hierarchical_clustering(X, n_clusters):
    n_samples = X.shape[0]
    
    # Initialize clusters (each point is a cluster)
    clusters = [[i] for i in range(n_samples)]
    
    # Calculate initial distances
    distances = np.zeros((n_samples, n_samples))
    for i in range(n_samples):
        for j in range(i + 1, n_samples):
            distances[i, j] = np.sqrt(np.sum((X[i] - X[j])**2))
            distances[j, i] = distances[i, j]
    
    while len(clusters) > n_clusters:
        # Find closest clusters
        min_dist = float('inf')
        merge_idx = None
        
        for i in range(len(clusters)):
            for j in range(i + 1, len(clusters)):
                dist = min([distances[x, y] for x in clusters[i] for y in clusters[j]])
                if dist < min_dist:
                    min_dist = dist
                    merge_idx = (i, j)
        
        # Merge clusters
        i, j = merge_idx
        clusters[i].extend(clusters[j])
        clusters.pop(j)
    
    return clusters

# Example usage
X = np.random.randn(50, 2)
clusters = hierarchical_clustering(X, n_clusters=3)

🚀 DBSCAN Algorithm Implementation - Made Simple!

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) identifies clusters based on density, distinguishing core points, border points, and noise points. This algorithm excels at discovering clusters of arbitrary shapes and handling noise effectively.

Let’s break this down together! Here’s how we can tackle this:

def dbscan_from_scratch(X, eps, min_samples):
    n_samples = X.shape[0]
    labels = np.zeros(n_samples, dtype=int) - 1  # -1 represents unvisited
    cluster_label = 0
    
    def get_neighbors(point_idx):
        distances = np.sqrt(np.sum((X - X[point_idx])**2, axis=1))
        return np.where(distances <= eps)[0]
    
    def expand_cluster(point_idx, neighbors, cluster_label):
        labels[point_idx] = cluster_label
        i = 0
        while i < len(neighbors):
            neighbor_idx = neighbors[i]
            if labels[neighbor_idx] == -1:
                labels[neighbor_idx] = cluster_label
                new_neighbors = get_neighbors(neighbor_idx)
                if len(new_neighbors) >= min_samples:
                    neighbors = np.append(neighbors, new_neighbors)
            i += 1
    
    for i in range(n_samples):
        if labels[i] != -1:
            continue
            
        neighbors = get_neighbors(i)
        if len(neighbors) < min_samples:
            labels[i] = 0  # Mark as noise
        else:
            cluster_label += 1
            expand_cluster(i, neighbors, cluster_label)
    
    return labels

# Example usage
X = np.random.randn(100, 2) * 2
labels = dbscan_from_scratch(X, eps=0.5, min_samples=5)

🚀 Gaussian Mixture Models Theory - Made Simple!

Gaussian Mixture Models represent complex probability distributions as a weighted sum of Gaussian components, using the Expectation-Maximization algorithm for parameter estimation. Each component is defined by its mean, covariance, and mixing coefficient.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

"""
GMM Probability Density Function:
$$p(x) = \sum_{k=1}^K \pi_k \mathcal{N}(x|\mu_k,\Sigma_k)$$

Log-likelihood:
$$\ln p(X|\pi,\mu,\Sigma) = \sum_{n=1}^N \ln\left(\sum_{k=1}^K \pi_k \mathcal{N}(x_n|\mu_k,\Sigma_k)\right)$$

Where:
$$\pi_k$$ are mixing coefficients
$$\mu_k$$ are means
$$\Sigma_k$$ are covariance matrices
"""

🚀 GMM Implementation - Made Simple!

Here’s a handy trick you’ll love! Here’s how we can tackle this:

def gmm_from_scratch(X, n_components, max_iters=100):
    n_samples, n_features = X.shape
    
    # Initialize parameters
    weights = np.ones(n_components) / n_components
    means = X[np.random.choice(n_samples, n_components, replace=False)]
    covs = [np.eye(n_features) for _ in range(n_components)]
    
    def gaussian_pdf(X, mean, cov):
        n = X.shape[1]
        diff = (X - mean).T
        return np.exp(-0.5 * np.sum(diff * np.linalg.solve(cov, diff), axis=0)) / \
               np.sqrt((2 * np.pi) ** n * np.linalg.det(cov))
    
    for _ in range(max_iters):
        # E-step: Compute responsibilities
        resp = np.zeros((n_samples, n_components))
        for k in range(n_components):
            resp[:, k] = weights[k] * gaussian_pdf(X.T, means[k], covs[k])
        resp /= resp.sum(axis=1, keepdims=True)
        
        # M-step: Update parameters
        Nk = resp.sum(axis=0)
        weights = Nk / n_samples
        
        for k in range(n_components):
            means[k] = np.sum(resp[:, k:k+1] * X, axis=0) / Nk[k]
            diff = X - means[k]
            covs[k] = np.dot(resp[:, k] * diff.T, diff) / Nk[k]
    
    return weights, means, covs

# Example usage
X = np.concatenate([np.random.randn(100, 2) + [2, 2],
                   np.random.randn(100, 2) + [-2, -2]])
weights, means, covs = gmm_from_scratch(X, n_components=2)

🚀 Real-world Application: Customer Segmentation - Made Simple!

This example shows you customer segmentation using purchase history data, combining preprocessing, feature engineering, and clustering to identify distinct customer groups for targeted marketing strategies.

Let’s make this super clear! Here’s how we can tackle this:

import pandas as pd
from sklearn.preprocessing import StandardScaler

# Generate synthetic customer data
np.random.seed(42)
n_customers = 1000

customer_data = pd.DataFrame({
    'recency': np.random.exponential(30, n_customers),
    'frequency': np.random.poisson(5, n_customers),
    'monetary': np.random.lognormal(4, 1, n_customers),
    'avg_basket': np.random.normal(50, 15, n_customers)
})

# Preprocess data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(customer_data)

# Apply clustering
def optimize_kmeans(X, max_k=10):
    inertias = []
    for k in range(1, max_k + 1):
        labels, _ = kmeans_from_scratch(X, k)
        inertia = sum(np.min(np.sum((X - c)**2, axis=1)) 
                     for c in [X[labels == i].mean(axis=0) 
                     for i in range(k)])
        inertias.append(inertia)
    return inertias

# Find best number of clusters
inertias = optimize_kmeans(X_scaled)
optimal_k = 4  # Determined from elbow method

# Final clustering
labels, centroids = kmeans_from_scratch(X_scaled, optimal_k)
customer_data['Segment'] = labels

🚀 Results for Customer Segmentation Analysis - Made Simple!

The customer segmentation analysis reveals distinct behavioral patterns across different customer groups, showing clear separation in purchasing habits and value contribution to the business.

Ready for some cool stuff? Here’s how we can tackle this:

# Analyze segments
segment_analysis = customer_data.groupby('Segment').agg({
    'recency': 'mean',
    'frequency': 'mean',
    'monetary': 'mean',
    'avg_basket': 'mean'
}).round(2)

print("Segment Characteristics:")
print(segment_analysis)

# Calculate segment sizes
segment_sizes = customer_data['Segment'].value_counts()
print("\nSegment Sizes:")
print(segment_sizes)

# Example output:
"""
Segment Characteristics:
         recency  frequency  monetary  avg_basket
Segment                                         
0         15.23      7.82    89.45      62.34
1         45.67      2.31    45.23      38.91
2         28.91      4.56    67.89      51.23
3         35.78      3.45    56.78      45.67

Segment Sizes:
Segment
2    312
0    289
1    256
3    143
"""

🚀 Autoencoder Neural Network Implementation - Made Simple!

Autoencoders learn efficient data representations through dimensionality reduction by training the network to reconstruct its input. This example shows a simple autoencoder using numpy for unsupervised feature learning.

Let’s make this super clear! Here’s how we can tackle this:

class Autoencoder:
    def __init__(self, input_dim, encoding_dim):
        self.input_dim = input_dim
        self.encoding_dim = encoding_dim
        
        # Initialize weights
        self.W1 = np.random.randn(input_dim, encoding_dim) * 0.01
        self.b1 = np.zeros(encoding_dim)
        self.W2 = np.random.randn(encoding_dim, input_dim) * 0.01
        self.b2 = np.zeros(input_dim)
        
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))
    
    def sigmoid_derivative(self, x):
        return x * (1 - x)
    
    def forward(self, X):
        # Encoder
        self.hidden = self.sigmoid(np.dot(X, self.W1) + self.b1)
        # Decoder
        self.output = self.sigmoid(np.dot(self.hidden, self.W2) + self.b2)
        return self.output
    
    def backward(self, X, learning_rate=0.01):
        m = X.shape[0]
        
        # Compute gradients
        error = self.output - X
        d_output = error * self.sigmoid_derivative(self.output)
        
        dW2 = np.dot(self.hidden.T, d_output)
        db2 = np.sum(d_output, axis=0)
        
        d_hidden = np.dot(d_output, self.W2.T) * self.sigmoid_derivative(self.hidden)
        dW1 = np.dot(X.T, d_hidden)
        db1 = np.sum(d_hidden, axis=0)
        
        # Update weights
        self.W2 -= learning_rate * dW2
        self.b2 -= learning_rate * db2
        self.W1 -= learning_rate * dW1
        self.b1 -= learning_rate * db1
        
        return np.mean(error ** 2)

# Example usage
X = np.random.rand(1000, 10)
autoencoder = Autoencoder(input_dim=10, encoding_dim=5)

# Training
for epoch in range(100):
    output = autoencoder.forward(X)
    loss = autoencoder.backward(X)
    if epoch % 10 == 0:
        print(f"Epoch {epoch}, Loss: {loss:.4f}")

🚀 Anomaly Detection with Isolation Forest - Made Simple!

Isolation Forest detects anomalies by isolating observations through recursive partitioning, assigning anomaly scores based on the average path length to isolation.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

class IsolationTree:
    def __init__(self, height_limit):
        self.height_limit = height_limit
        self.size = 0
        self.split_feature = None
        self.split_value = None
        self.left = None
        self.right = None
        
    def fit(self, X, current_height=0):
        self.size = len(X)
        
        if current_height >= self.height_limit or len(X) <= 1:
            return
        
        n_features = X.shape[1]
        self.split_feature = np.random.randint(n_features)
        min_val = X[:, self.split_feature].min()
        max_val = X[:, self.split_feature].max()
        
        if min_val == max_val:
            return
        
        self.split_value = np.random.uniform(min_val, max_val)
        
        left_indices = X[:, self.split_feature] < self.split_value
        self.left = IsolationTree(self.height_limit)
        self.right = IsolationTree(self.height_limit)
        
        self.left.fit(X[left_indices], current_height + 1)
        self.right.fit(X[~left_indices], current_height + 1)
        
    def path_length(self, x, current_height=0):
        if self.split_feature is None:
            return current_height
        
        if x[self.split_feature] < self.split_value:
            return self.left.path_length(x, current_height + 1)
        else:
            return self.right.path_length(x, current_height + 1)

# Example usage
X = np.concatenate([
    np.random.randn(95, 2),  # Normal points
    np.random.randn(5, 2) * 4  # Anomalies
])

# Create and train forest
n_trees = 100
height_limit = int(np.ceil(np.log2(len(X))))
forest = [IsolationTree(height_limit) for _ in range(n_trees)]

for tree in forest:
    indices = np.random.randint(0, len(X), size=len(X))
    tree.fit(X[indices])

# Calculate anomaly scores
def anomaly_score(x, forest):
    avg_path_length = np.mean([tree.path_length(x) for tree in forest])
    n = len(X)
    c = 2 * (np.log(n - 1) + 0.5772156649) - 2 * (n - 1) / n
    return 2 ** (-avg_path_length / c)

scores = np.array([anomaly_score(x, forest) for x in X])
anomalies = scores > np.percentile(scores, 95)  # Top 5% as anomalies

🚀 Word Embeddings with Word2Vec Implementation - Made Simple!

Word2Vec builds unsupervised learning of word embeddings using neural networks, creating vector representations that capture semantic relationships between words through context prediction in large text corpora.

Here’s where it gets exciting! Here’s how we can tackle this:

import numpy as np
from collections import defaultdict, Counter

class Word2Vec:
    def __init__(self, dimensions=100, window_size=2, learning_rate=0.01):
        self.dimensions = dimensions
        self.window_size = window_size
        self.learning_rate = learning_rate
        self.word_to_index = {}
        self.embeddings = None
        
    def build_vocabulary(self, sentences):
        word_counts = Counter([word for sentence in sentences for word in sentence])
        vocabulary = sorted(word_counts.keys())
        self.word_to_index = {word: idx for idx, word in enumerate(vocabulary)}
        vocab_size = len(vocabulary)
        
        # Initialize embeddings randomly
        self.embeddings = np.random.randn(vocab_size, self.dimensions) * 0.01
        self.context_weights = np.random.randn(self.dimensions, vocab_size) * 0.01
        
    def train_pair(self, target_idx, context_idx):
        # Forward pass
        hidden = self.embeddings[target_idx]
        output = np.dot(hidden, self.context_weights)
        prob = self._softmax(output)
        
        # Backward pass
        error = prob.copy()
        error[context_idx] -= 1
        
        # Update weights
        grad_context = np.outer(hidden, error)
        grad_hidden = np.dot(error, self.context_weights.T)
        
        self.context_weights -= self.learning_rate * grad_context
        self.embeddings[target_idx] -= self.learning_rate * grad_hidden
        
    def _softmax(self, x):
        exp_x = np.exp(x - np.max(x))
        return exp_x / exp_x.sum()
    
    def train(self, sentences, epochs=5):
        for epoch in range(epochs):
            loss = 0
            pairs = 0
            for sentence in sentences:
                indices = [self.word_to_index[word] for word in sentence]
                
                for i, target_idx in enumerate(indices):
                    context_indices = indices[max(0, i-self.window_size):i] + \
                                    indices[i+1:i+1+self.window_size]
                    
                    for context_idx in context_indices:
                        self.train_pair(target_idx, context_idx)
                        pairs += 1
            
            if epoch % 1 == 0:
                print(f"Epoch {epoch+1}, Processed {pairs} pairs")

# Example usage
sentences = [
    "the quick brown fox jumps over the lazy dog".split(),
    "the fox is quick and brown".split(),
    "the dog is lazy".split()
]

model = Word2Vec(dimensions=50, window_size=2)
model.build_vocabulary(sentences)
model.train(sentences, epochs=100)

🚀 Deep Clustering Neural Network - Made Simple!

Deep clustering combines representation learning with clustering objectives, simultaneously learning feature representations and cluster assignments through an end-to-end trainable architecture.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

def deep_clustering_network(X, n_clusters, encoding_dim=32, epochs=100):
    n_samples, n_features = X.shape
    
    # Initialize autoencoder weights
    W_encoder = np.random.randn(n_features, encoding_dim) * 0.01
    b_encoder = np.zeros(encoding_dim)
    W_decoder = np.random.randn(encoding_dim, n_features) * 0.01
    b_decoder = np.zeros(n_features)
    
    # Initialize clustering layer
    cluster_centers = np.random.randn(n_clusters, encoding_dim) * 0.01
    
    def encode(X):
        return np.tanh(np.dot(X, W_encoder) + b_encoder)
    
    def decode(H):
        return np.tanh(np.dot(H, W_decoder) + b_decoder)
    
    def soft_assignment(H):
        # Student's t-distribution kernel
        alpha = 1.0  # degrees of freedom
        distances = np.sum((H[:, np.newaxis, :] - cluster_centers[np.newaxis, :, :]) ** 2, axis=2)
        q = 1.0 / (1.0 + distances / alpha)
        q = q ** ((alpha + 1.0) / 2.0)
        q = q / q.sum(axis=1, keepdims=True)
        return q
    
    # Training loop
    for epoch in range(epochs):
        # Forward pass
        H = encode(X)
        X_recon = decode(H)
        Q = soft_assignment(H)
        
        # Update cluster centers
        numerator = np.dot(Q.T, H)
        denominator = Q.sum(axis=0, keepdims=True).T
        cluster_centers = numerator / denominator
        
        # Compute reconstruction loss
        recon_loss = np.mean((X - X_recon) ** 2)
        
        if epoch % 10 == 0:
            print(f"Epoch {epoch}, Reconstruction Loss: {recon_loss:.4f}")
    
    # Final cluster assignments
    H = encode(X)
    Q = soft_assignment(H)
    clusters = np.argmax(Q, axis=1)
    
    return clusters, H, cluster_centers

# Example usage
X = np.random.randn(500, 20)  # 500 samples, 20 features
clusters, embeddings, centers = deep_clustering_network(X, n_clusters=5)

🚀 Additional Resources - Made Simple!

🎊 Awesome Work!

You’ve just learned some really powerful techniques! Don’t worry if everything doesn’t click immediately - that’s totally normal. The best way to master these concepts is to practice with your own data.

What’s next? Try implementing these examples with your own datasets. Start small, experiment, and most importantly, have fun with it! Remember, every data science expert started exactly where you are right now.

Keep coding, keep learning, and keep being awesome! 🚀

Back to Blog

Related Posts

View All Posts »