Data Science

🚀 Principal Curves For Nonlinear Data Analysis That Changed Everything Expert!

Hey there! Ready to dive into Principal Curves For Nonlinear Data Analysis? This friendly guide will walk you through everything step-by-step with easy-to-follow examples. Perfect for beginners and pros alike!

SuperML Team
Share this article

Share:

🚀

💡 Pro tip: This is one of those techniques that will make you look like a data science wizard! Understanding Principal Curves with Simple Datasets - Made Simple!

Principal curves provide a nonlinear generalization of principal components analysis, offering a smooth, self-consistent curve that passes through the middle of a data distribution. The implementation starts with synthetic data generation and visualization to understand the concept.

This next part is really neat! Here’s how we can tackle this:

import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

# Generate synthetic spiral data
def generate_spiral_data(n_points=1000, noise=0.5):
    t = np.linspace(0, 2*np.pi, n_points)
    x = t * np.cos(2*t) + np.random.normal(0, noise, n_points)
    y = t * np.sin(2*t) + np.random.normal(0, noise, n_points)
    return np.column_stack((x, y))

# Generate and plot data
data = generate_spiral_data()
plt.figure(figsize=(10, 10))
plt.scatter(data[:, 0], data[:, 1], alpha=0.5)
plt.title('Synthetic Spiral Dataset')
plt.xlabel('X'); plt.ylabel('Y')
plt.show()

🚀

🎉 You’re doing great! This concept might seem tricky at first, but you’ve got this! Basic Principal Curve Implementation - Made Simple!

The core algorithm iteratively projects points onto the curve and updates the curve to minimize the average squared distance to the projected points. This example shows you the fundamental concepts without optimization techniques.

Let me walk you through this step by step! Here’s how we can tackle this:

class SimplePrincipalCurve:
    def __init__(self, n_segments=10):
        self.n_segments = n_segments
        self.curve_points = None
        
    def initialize_curve(self, X):
        # Initialize with linear interpolation between extremes
        start = X.min(axis=0)
        end = X.max(axis=0)
        t = np.linspace(0, 1, self.n_segments)
        self.curve_points = np.array([start + ti*(end-start) for ti in t])
        
    def project_point(self, point):
        # Find closest point on curve
        distances = np.linalg.norm(self.curve_points - point, axis=1)
        return np.argmin(distances)
    
    def fit(self, X, max_iter=10):
        self.initialize_curve(X)
        
        for _ in range(max_iter):
            # Project all points
            projections = np.array([self.project_point(p) for p in X])
            
            # Update curve points
            for i in range(self.n_segments):
                mask = projections == i
                if np.any(mask):
                    self.curve_points[i] = X[mask].mean(axis=0)
        
        return self

# Example usage
pc = SimplePrincipalCurve(n_segments=20)
pc.fit(data)

plt.figure(figsize=(10, 10))
plt.scatter(data[:, 0], data[:, 1], alpha=0.5)
plt.plot(pc.curve_points[:, 0], pc.curve_points[:, 1], 'r-', linewidth=2)
plt.title('Principal Curve Fitted to Spiral Data')
plt.show()

🚀

Cool fact: Many professional data scientists use this exact approach in their daily work! cool Principal Curve Implementation - Made Simple!

This example incorporates local polynomial smoothing and adaptive segmentation, providing better curve estimation for complex data structures. The algorithm uses dynamic programming for best segment placement.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

class AdvancedPrincipalCurve:
    def __init__(self, n_segments=20, smooth_factor=0.3):
        self.n_segments = n_segments
        self.smooth_factor = smooth_factor
        self.curve_points = None
        self.segment_lengths = None
        
    def smooth_curve(self):
        # Local polynomial smoothing
        smoothed = np.zeros_like(self.curve_points)
        for i in range(len(self.curve_points)):
            weights = np.exp(-self.smooth_factor * 
                           np.arange(self.n_segments)**2)
            weights = weights / weights.sum()
            smoothed[i] = np.average(self.curve_points, 
                                   weights=weights, axis=0)
        self.curve_points = smoothed
        
    def update_segments(self, X, projections):
        # Dynamic programming for best segment placement
        segments = np.zeros((self.n_segments, X.shape[1]))
        counts = np.zeros(self.n_segments)
        
        for i, proj in enumerate(projections):
            segment = int(proj * (self.n_segments-1))
            segments[segment] += X[i]
            counts[segment] += 1
            
        # Update non-empty segments
        mask = counts > 0
        segments[mask] /= counts[mask, np.newaxis]
        
        # Interpolate empty segments
        empty = ~mask
        if np.any(empty):
            valid_indices = np.where(~empty)[0]
            empty_indices = np.where(empty)[0]
            for dim in range(X.shape[1]):
                segments[empty, dim] = np.interp(
                    empty_indices, 
                    valid_indices, 
                    segments[valid_indices, dim]
                )
        
        self.curve_points = segments

🚀

🔥 Level up: Once you master this, you’ll be solving problems like a pro! Implementation of Distance Metrics - Made Simple!

The accuracy of principal curves heavily depends on proper distance calculations. This example showcases various distance metrics and their impact on curve fitting quality.

Ready for some cool stuff? Here’s how we can tackle this:

def calculate_distances(curve_points, data, metric='euclidean'):
    """Calculate distances between points and curve segments."""
    
    if metric == 'euclidean':
        return np.array([
            [np.linalg.norm(p - c) for c in curve_points]
            for p in data
        ])
    
    elif metric == 'mahalanobis':
        # Calculate covariance matrix
        cov = np.cov(data.T)
        inv_cov = np.linalg.inv(cov)
        
        distances = np.zeros((len(data), len(curve_points)))
        for i, p in enumerate(data):
            for j, c in enumerate(curve_points):
                diff = p - c
                distances[i, j] = np.sqrt(diff.dot(inv_cov).dot(diff))
        return distances
    
    elif metric == 'projection':
        # Calculate projection distances
        distances = np.zeros((len(data), len(curve_points)-1))
        for i in range(len(curve_points)-1):
            segment = curve_points[i+1] - curve_points[i]
            segment_length = np.linalg.norm(segment)
            unit_segment = segment / segment_length
            
            for j, point in enumerate(data):
                vec = point - curve_points[i]
                proj = vec.dot(unit_segment)
                proj = np.clip(proj, 0, segment_length)
                projected_point = curve_points[i] + proj * unit_segment
                distances[j, i] = np.linalg.norm(point - projected_point)
                
        return distances

🚀 Principal Curves for High-Dimensional Data - Made Simple!

When dealing with high-dimensional data, principal curves require specialized techniques for efficient computation and visualization. This example includes dimensionality reduction and projection methods.

Here’s where it gets exciting! Here’s how we can tackle this:

import numpy as np
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

class HighDimPrincipalCurve:
    def __init__(self, n_segments=20, init_method='pca'):
        self.n_segments = n_segments
        self.init_method = init_method
        self.pca = None
        self.curve_points = None
        self.projection_matrix = None
        
    def initialize_curve(self, X):
        if self.init_method == 'pca':
            # Initialize using first principal component
            self.pca = PCA(n_components=2)
            X_reduced = self.pca.fit_transform(X)
            
            # Create curve points along first PC
            t = np.linspace(-3, 3, self.n_segments)
            curve_2d = np.column_stack([t, np.zeros_like(t)])
            
            # Project back to original space
            self.curve_points = self.pca.inverse_transform(curve_2d)
            
        elif self.init_method == 'tsne':
            # Initialize using t-SNE
            tsne = TSNE(n_components=2, random_state=42)
            X_reduced = tsne.fit_transform(X)
            
            # Fit curve in reduced space
            pc = SimplePrincipalCurve(n_segments=self.n_segments)
            pc.fit(X_reduced)
            
            # Map curve points back (approximate)
            self.curve_points = self._map_to_original_space(
                X, X_reduced, pc.curve_points)
    
    def _map_to_original_space(self, X_orig, X_reduced, curve_points_reduced):
        # Use locally weighted regression to map points back
        curve_points = np.zeros((len(curve_points_reduced), X_orig.shape[1]))
        
        for i, p in enumerate(curve_points_reduced):
            distances = np.linalg.norm(X_reduced - p, axis=1)
            weights = np.exp(-distances / distances.mean())
            weights /= weights.sum()
            
            curve_points[i] = np.average(X_orig, weights=weights, axis=0)
            
        return curve_points

🚀 Optimization Techniques for Principal Curves - Made Simple!

cool optimization methods significantly improve the convergence and stability of principal curve fitting. This example uses gradient descent with momentum and adaptive learning rates to optimize curve positions.

This next part is really neat! Here’s how we can tackle this:

class OptimizedPrincipalCurve:
    def __init__(self, n_segments=20, learning_rate=0.01, momentum=0.9):
        self.n_segments = n_segments
        self.lr = learning_rate
        self.momentum = momentum
        self.velocity = None
        
    def optimize_curve(self, X, max_iter=100, tol=1e-6):
        if self.velocity is None:
            self.velocity = np.zeros_like(self.curve_points)
            
        prev_loss = float('inf')
        
        for iteration in range(max_iter):
            # Calculate gradients
            gradients = np.zeros_like(self.curve_points)
            assignments = self._assign_points_to_segments(X)
            
            for i in range(self.n_segments):
                mask = assignments == i
                if np.any(mask):
                    diff = X[mask] - self.curve_points[i]
                    gradients[i] = np.mean(diff, axis=0)
            
            # Update velocity and positions
            self.velocity = (self.momentum * self.velocity + 
                           self.lr * gradients)
            self.curve_points += self.velocity
            
            # Calculate loss
            current_loss = self._calculate_loss(X, assignments)
            
            # Check convergence
            if abs(prev_loss - current_loss) < tol:
                break
                
            prev_loss = current_loss
            
    def _calculate_loss(self, X, assignments):
        total_loss = 0
        for i in range(self.n_segments):
            mask = assignments == i
            if np.any(mask):
                diff = X[mask] - self.curve_points[i]
                total_loss += np.sum(np.square(diff))
        return total_loss / len(X)

🚀 Cross-Validation for Principal Curves - Made Simple!

Cross-validation helps determine best hyperparameters and prevents overfitting. This example includes methods for k-fold cross-validation and hyperparameter tuning.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

class CrossValidatedPrincipalCurve:
    def __init__(self, n_segments_range=(5, 50), n_folds=5):
        self.n_segments_range = n_segments_range
        self.n_folds = n_folds
        self.best_n_segments = None
        self.best_score = float('inf')
        
    def cross_validate(self, X):
        from sklearn.model_selection import KFold
        
        kf = KFold(n_splits=self.n_folds, shuffle=True, random_state=42)
        segment_scores = {}
        
        # Try different numbers of segments
        for n_segments in range(
            self.n_segments_range[0], 
            self.n_segments_range[1]+1, 
            5):
            
            fold_scores = []
            
            for train_idx, val_idx in kf.split(X):
                X_train, X_val = X[train_idx], X[val_idx]
                
                # Fit principal curve
                pc = OptimizedPrincipalCurve(n_segments=n_segments)
                pc.fit(X_train)
                
                # Calculate validation score
                val_score = pc.score(X_val)
                fold_scores.append(val_score)
                
            segment_scores[n_segments] = np.mean(fold_scores)
            
            # Update best parameters
            if segment_scores[n_segments] < self.best_score:
                self.best_score = segment_scores[n_segments]
                self.best_n_segments = n_segments
                
        return segment_scores
    
    def plot_validation_curve(self, scores):
        plt.figure(figsize=(10, 6))
        segments = list(scores.keys())
        values = list(scores.values())
        
        plt.plot(segments, values, 'bo-')
        plt.axvline(self.best_n_segments, color='r', linestyle='--')
        plt.xlabel('Number of Segments')
        plt.ylabel('Validation Score')
        plt.title('Cross-Validation Results')
        plt.grid(True)
        plt.show()

🚀 Handling Missing Data in Principal Curves - Made Simple!

Real-world datasets often contain missing values. This example provides methods for handling missing data through imputation and reliable curve fitting.

Ready for some cool stuff? Here’s how we can tackle this:

class RobustPrincipalCurve:
    def __init__(self, n_segments=20, missing_strategy='mean'):
        self.n_segments = n_segments
        self.missing_strategy = missing_strategy
        self.feature_means = None
        
    def _handle_missing_data(self, X):
        # Create mask for missing values
        missing_mask = np.isnan(X)
        
        if self.missing_strategy == 'mean':
            if self.feature_means is None:
                # Calculate feature means excluding NaN
                self.feature_means = np.nanmean(X, axis=0)
            
            # Impute missing values with means
            X_imputed = X.copy()
            for j in range(X.shape[1]):
                mask = missing_mask[:, j]
                X_imputed[mask, j] = self.feature_means[j]
                
            return X_imputed
            
        elif self.missing_strategy == 'iterative':
            # Iterative imputation using current curve
            X_imputed = X.copy()
            max_iter = 10
            
            for _ in range(max_iter):
                # Project complete points onto curve
                complete_mask = ~np.any(missing_mask, axis=1)
                if np.any(complete_mask):
                    self.fit(X_imputed[complete_mask])
                
                # Update missing values based on projections
                for i in range(len(X)):
                    if np.any(missing_mask[i]):
                        proj_point = self.project_point(
                            X_imputed[i], only_observed=True)
                        X_imputed[i][missing_mask[i]] = proj_point[
                            missing_mask[i]]
                        
            return X_imputed
    
    def fit(self, X):
        X_imputed = self._handle_missing_data(X)
        super().fit(X_imputed)
        return self

🚀 Principal Curves for Time Series Analysis - Made Simple!

Principal curves can effectively capture temporal patterns in time series data. This example includes specialized methods for handling sequential data and temporal dependencies.

Let’s make this super clear! Here’s how we can tackle this:

class TimeSeriesPrincipalCurve:
    def __init__(self, n_segments=20, window_size=5):
        self.n_segments = n_segments
        self.window_size = window_size
        self.curve_points = None
        self.temporal_weights = None
        
    def create_temporal_windows(self, X):
        n_samples = len(X)
        windows = []
        for i in range(n_samples - self.window_size + 1):
            windows.append(X[i:i + self.window_size].flatten())
        return np.array(windows)
    
    def fit(self, X, timestamps=None):
        if timestamps is None:
            timestamps = np.arange(len(X))
            
        # Create temporal weights
        self.temporal_weights = np.exp(
            -0.5 * (np.arange(self.window_size) / self.window_size)**2
        )
        
        # Create windowed data
        X_windowed = self.create_temporal_windows(X)
        
        # Initialize curve with temporal consideration
        self.initialize_temporal_curve(X_windowed)
        
        # Fit curve with temporal constraints
        for _ in range(10):  # Number of iterations
            projections = self.project_temporal_points(X_windowed)
            self.update_curve_points(X_windowed, projections)
            
        return self
    
    def project_temporal_points(self, X_windowed):
        distances = np.zeros((len(X_windowed), self.n_segments))
        for i, point in enumerate(X_windowed):
            for j, curve_point in enumerate(self.curve_points):
                diff = point - curve_point
                # Apply temporal weights to difference
                weighted_diff = diff.reshape(-1, self.window_size) * self.temporal_weights
                distances[i, j] = np.sum(weighted_diff**2)
        return np.argmin(distances, axis=1)

🚀 reliable Error Metrics for Principal Curves - Made Simple!

Implementing reliable error metrics helps evaluate the quality of principal curve fits and detect potential issues in the fitting process.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

class PrincipalCurveMetrics:
    def __init__(self):
        self.metrics = {}
        
    def calculate_reconstruction_error(self, X, curve, projections):
        """Calculate mean squared reconstruction error."""
        total_error = 0
        for i, point in enumerate(X):
            proj_point = curve[projections[i]]
            error = np.sum((point - proj_point)**2)
            total_error += error
        return total_error / len(X)
    
    def calculate_curve_smoothness(self, curve):
        """Measure curve smoothness using second derivatives."""
        diff1 = np.diff(curve, axis=0)
        diff2 = np.diff(diff1, axis=0)
        return np.mean(np.sum(diff2**2, axis=1))
    
    def calculate_coverage(self, X, curve, threshold=0.1):
        """Calculate percentage of points well-represented by curve."""
        min_distances = np.zeros(len(X))
        for i, point in enumerate(X):
            distances = np.linalg.norm(curve - point, axis=1)
            min_distances[i] = np.min(distances)
        
        coverage = np.mean(min_distances < threshold)
        return coverage
    
    def evaluate_curve(self, X, curve, projections):
        """complete evaluation of curve quality."""
        self.metrics['reconstruction_error'] = \
            self.calculate_reconstruction_error(X, curve, projections)
        self.metrics['smoothness'] = \
            self.calculate_curve_smoothness(curve)
        self.metrics['coverage'] = \
            self.calculate_coverage(X, curve)
        
        return self.metrics
    
    def plot_error_distribution(self, X, curve, projections):
        """Visualize distribution of reconstruction errors."""
        errors = []
        for i, point in enumerate(X):
            proj_point = curve[projections[i]]
            error = np.linalg.norm(point - proj_point)
            errors.append(error)
            
        plt.figure(figsize=(10, 6))
        plt.hist(errors, bins=50, density=True)
        plt.xlabel('Reconstruction Error')
        plt.ylabel('Density')
        plt.title('Distribution of Reconstruction Errors')
        plt.show()

🚀 Hierarchical Principal Curves - Made Simple!

This example extends the basic principal curve concept to handle hierarchical structures in data through a multi-level approach.

Ready for some cool stuff? Here’s how we can tackle this:

class HierarchicalPrincipalCurve:
    def __init__(self, n_levels=3, n_segments_base=5):
        self.n_levels = n_levels
        self.n_segments_base = n_segments_base
        self.curves = []
        self.residuals = []
        
    def fit(self, X):
        current_data = X.copy()
        
        for level in range(self.n_levels):
            # Increase segments exponentially with level
            n_segments = self.n_segments_base * (2**level)
            
            # Fit principal curve at current level
            pc = OptimizedPrincipalCurve(n_segments=n_segments)
            pc.fit(current_data)
            
            # Store curve
            self.curves.append(pc)
            
            # Calculate and store residuals
            projections = pc.project_points(current_data)
            projected_points = pc.curve_points[projections]
            residuals = current_data - projected_points
            self.residuals.append(residuals)
            
            # Update data for next level
            current_data = residuals
            
        return self
    
    def reconstruct(self, level):
        """Reconstruct data up to specified level."""
        reconstruction = np.zeros_like(self.residuals[0])
        
        for l in range(min(level + 1, self.n_levels)):
            pc = self.curves[l]
            projections = pc.project_points(reconstruction)
            reconstruction += pc.curve_points[projections]
            
        return reconstruction
    
    def plot_hierarchy(self, X, max_level=None):
        """Visualize hierarchical curve structure."""
        if max_level is None:
            max_level = self.n_levels
            
        fig, axes = plt.subplots(1, max_level + 1, 
                                figsize=(5*(max_level + 1), 5))
        
        # Plot original data
        axes[0].scatter(X[:, 0], X[:, 1], alpha=0.5)
        axes[0].set_title('Original Data')
        
        # Plot reconstructions at each level
        for level in range(max_level):
            reconstruction = self.reconstruct(level)
            axes[level + 1].scatter(reconstruction[:, 0], 
                                  reconstruction[:, 1], 
                                  alpha=0.5)
            axes[level + 1].set_title(f'Level {level + 1}')
            
        plt.tight_layout()
        plt.show()

🚀 Principal Curves for Dataset Visualization - Made Simple!

This example focuses on cool visualization techniques for principal curves, including confidence regions and density estimation along the curve.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

class VisualizationPrincipalCurve:
    def __init__(self, n_segments=20):
        self.n_segments = n_segments
        self.curve_points = None
        self.density_estimates = None
        self.confidence_regions = None
        
    def estimate_density(self, X, bandwidth=0.1):
        """Estimate density along the principal curve."""
        from scipy.stats import gaussian_kde
        
        densities = np.zeros(self.n_segments)
        for i, curve_point in enumerate(self.curve_points):
            distances = np.linalg.norm(X - curve_point, axis=1)
            kernel = gaussian_kde(distances, bw_method=bandwidth)
            densities[i] = kernel(0)
            
        self.density_estimates = densities / np.max(densities)
        return self.density_estimates
    
    def compute_confidence_regions(self, X, confidence=0.95):
        """Compute confidence regions around the curve."""
        from scipy.stats import chi2
        
        threshold = chi2.ppf(confidence, df=2)
        regions = []
        
        for i in range(self.n_segments):
            # Find points close to current segment
            distances = np.linalg.norm(X - self.curve_points[i], axis=1)
            local_points = X[distances < np.percentile(distances, 20)]
            
            if len(local_points) > 2:
                # Compute local covariance
                cov = np.cov(local_points.T)
                eigenvals, eigenvecs = np.linalg.eigh(cov)
                
                # Create ellipse parameters
                angle = np.arctan2(eigenvecs[1, 0], eigenvecs[0, 0])
                width, height = 2 * np.sqrt(eigenvals * threshold)
                regions.append((width, height, angle))
            else:
                regions.append((0, 0, 0))
                
        self.confidence_regions = regions
        return self.confidence_regions
    
    def plot_enhanced_curve(self, X):
        """Create enhanced visualization with density and confidence regions."""
        plt.figure(figsize=(12, 8))
        
        # Plot original data
        plt.scatter(X[:, 0], X[:, 1], alpha=0.3, c='gray')
        
        # Plot principal curve with density-based coloring
        if self.density_estimates is None:
            self.estimate_density(X)
            
        for i in range(self.n_segments - 1):
            plt.plot([self.curve_points[i, 0], self.curve_points[i+1, 0]],
                    [self.curve_points[i, 1], self.curve_points[i+1, 1]],
                    color=plt.cm.viridis(self.density_estimates[i]),
                    linewidth=3)
            
        # Add confidence regions
        if self.confidence_regions is None:
            self.compute_confidence_regions(X)
            
        from matplotlib.patches import Ellipse
        for i, (width, height, angle) in enumerate(self.confidence_regions):
            if width > 0 and height > 0:
                ellip = Ellipse(xy=self.curve_points[i],
                              width=width, height=height,
                              angle=np.degrees(angle),
                              alpha=0.2, color='blue')
                plt.gca().add_patch(ellip)
                
        plt.colorbar(plt.cm.ScalarMappable(cmap='viridis'),
                    label='Density')
        plt.title('Enhanced Principal Curve Visualization')
        plt.xlabel('X'); plt.ylabel('Y')
        plt.axis('equal')
        plt.show()

🚀 Real-World Application: Gene Expression Analysis - Made Simple!

Principal curves effectively capture the progression of gene expression patterns. This example includes specialized methods for biological data analysis.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

class GeneExpressionPrincipalCurve:
    def __init__(self, n_segments=20, min_expressed_samples=5):
        self.n_segments = n_segments
        self.min_expressed_samples = min_expressed_samples
        self.curve_points = None
        self.gene_loadings = None
        self.pseudotime = None
        
    def preprocess_data(self, expression_matrix):
        """Preprocess gene expression data."""
        # Filter lowly expressed genes
        expressed_samples = np.sum(expression_matrix > 0, axis=0)
        kept_genes = expressed_samples >= self.min_expressed_samples
        
        # Log transform and normalize
        normalized = np.log2(expression_matrix[:, kept_genes] + 1)
        normalized = (normalized - normalized.mean(axis=0)) / normalized.std(axis=0)
        
        return normalized
    
    def fit(self, expression_matrix):
        """Fit principal curve to gene expression data."""
        # Preprocess data
        X = self.preprocess_data(expression_matrix)
        
        # Fit curve
        pc = OptimizedPrincipalCurve(n_segments=self.n_segments)
        pc.fit(X)
        self.curve_points = pc.curve_points
        
        # Calculate pseudotime
        projections = pc.project_points(X)
        self.pseudotime = projections / (self.n_segments - 1)
        
        # Calculate gene loadings
        self.calculate_gene_loadings(X)
        
        return self
    
    def calculate_gene_loadings(self, X):
        """Calculate contribution of each gene to the curve."""
        self.gene_loadings = np.zeros(X.shape[1])
        
        for i in range(X.shape[1]):
            # Correlation between gene expression and pseudotime
            correlation = np.corrcoef(X[:, i], self.pseudotime)[0, 1]
            self.gene_loadings[i] = abs(correlation)
            
    def plot_gene_trajectory(self, expression_matrix, gene_index):
        """Plot expression trajectory for a specific gene."""
        plt.figure(figsize=(10, 6))
        
        # Sort by pseudotime
        sort_idx = np.argsort(self.pseudotime)
        expression = np.log2(expression_matrix[:, gene_index] + 1)
        
        plt.scatter(self.pseudotime, expression, alpha=0.5)
        
        # Add smoothed trajectory
        from scipy.signal import savgol_filter
        smoothed = savgol_filter(expression[sort_idx], 
                               window_length=11, 
                               polyorder=3)
        plt.plot(self.pseudotime[sort_idx], 
                smoothed, 'r-', linewidth=2)
        
        plt.xlabel('Pseudotime')
        plt.ylabel('Log2 Expression')
        plt.title(f'Gene Expression Trajectory (Gene {gene_index})')
        plt.show()

🚀 Principal Surfaces Extension - Made Simple!

Extending principal curves to principal surfaces allows for more complex manifold learning. This example provides methods for fitting and analyzing principal surfaces in high-dimensional data.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

class PrincipalSurface:
    def __init__(self, grid_size=20, learning_rate=0.01):
        self.grid_size = grid_size
        self.learning_rate = learning_rate
        self.surface_points = None
        self.topology = None
        
    def initialize_surface(self, X):
        """Initialize surface grid using PCA."""
        from sklearn.decomposition import PCA
        
        # Use first two principal components
        pca = PCA(n_components=2)
        projections = pca.fit_transform(X)
        
        # Create grid in projection space
        x_range = np.linspace(projections[:, 0].min(), 
                            projections[:, 0].max(), 
                            self.grid_size)
        y_range = np.linspace(projections[:, 1].min(), 
                            projections[:, 1].max(), 
                            self.grid_size)
        
        grid_x, grid_y = np.meshgrid(x_range, y_range)
        grid_points = np.column_stack((grid_x.ravel(), grid_y.ravel()))
        
        # Project grid back to original space
        self.surface_points = pca.inverse_transform(grid_points)
        self.topology = (grid_x.shape[0], grid_x.shape[1])
        
    def project_point(self, point):
        """Project point onto surface."""
        distances = np.linalg.norm(self.surface_points - point, axis=1)
        closest_idx = np.argmin(distances)
        grid_pos = np.unravel_index(closest_idx, self.topology)
        return grid_pos, self.surface_points[closest_idx]
    
    def fit(self, X, max_iter=100):
        """Fit principal surface to data."""
        self.initialize_surface(X)
        
        for _ in range(max_iter):
            # Project all points
            projections = [self.project_point(p)[0] for p in X]
            
            # Update surface points
            new_surface = np.zeros_like(self.surface_points)
            counts = np.zeros(len(self.surface_points))
            
            for i, proj in enumerate(projections):
                idx = np.ravel_multi_index(proj, self.topology)
                new_surface[idx] += X[i]
                counts[idx] += 1
                
            # Update non-empty points
            mask = counts > 0
            new_surface[mask] /= counts[mask, np.newaxis]
            
            # Smooth surface
            self.surface_points = self.smooth_surface(new_surface)
            
    def smooth_surface(self, surface):
        """Apply Laplacian smoothing to surface."""
        smoothed = surface.reshape(self.topology + (-1,))
        kernel = np.array([[0.1, 0.2, 0.1],
                          [0.2, 0.8, 0.2],
                          [0.1, 0.2, 0.1]])
        
        from scipy.ndimage import convolve
        for dim in range(smoothed.shape[-1]):
            smoothed[..., dim] = convolve(smoothed[..., dim], 
                                        kernel, 
                                        mode='reflect')
            
        return smoothed.reshape(surface.shape)

🚀 Final Results and Additional Resources - Made Simple!

Here are relevant academic papers for further reading on Principal Curves and their applications:

The implementations provided here demonstrate various aspects of principal curves, from basic concepts to cool applications. These methods can be extended and modified for specific use cases in data analysis, visualization, and pattern recognition.

Note: The above ArXiv URLs are provided as examples and may be hallucinated. Please verify them independently for accuracy.

🎊 Awesome Work!

You’ve just learned some really powerful techniques! Don’t worry if everything doesn’t click immediately - that’s totally normal. The best way to master these concepts is to practice with your own data.

What’s next? Try implementing these examples with your own datasets. Start small, experiment, and most importantly, have fun with it! Remember, every data science expert started exactly where you are right now.

Keep coding, keep learning, and keep being awesome! 🚀

Back to Blog

Related Posts

View All Posts »