Data Science

🤖 Master Misleading Accuracy In Machine Learning: That Will Boost Your!

Hey there! Ready to dive into Misleading Accuracy In Machine Learning? This friendly guide will walk you through everything step-by-step with easy-to-follow examples. Perfect for beginners and pros alike!

SuperML Team
Share this article

Share:

undefined

🚀

💡 Pro tip: This is one of those techniques that will make you look like a data science wizard! Why Accuracy Can Be Misleading in Machine Learning - Made Simple!

Accuracy is often used as a primary metric for evaluating machine learning models. However, relying solely on accuracy can lead to incorrect conclusions about a model’s performance, especially in certain scenarios. This presentation will explore why accuracy can be misleading and introduce alternative metrics and techniques to better evaluate model performance.

Here’s where it gets exciting! Here’s how we can tackle this:

Copyimport numpy as np
from sklearn.metrics import accuracy_score

# Example of a misleading accuracy score
y_true = np.array([0, 0, 0, 0, 1, 1, 1, 1, 1, 1])
y_pred = np.array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])

accuracy = accuracy_score(y_true, y_pred)
print(f"Accuracy: {accuracy}")  # Output: Accuracy: 0.4

# Despite 40% accuracy, the model always predicts class 0

🚀

🎉 You’re doing great! This concept might seem tricky at first, but you’ve got this! The Class Imbalance Problem - Made Simple!

Class imbalance occurs when one class in a dataset significantly outnumbers the other(s). In such cases, accuracy can be misleading as a model that always predicts the majority class will have high accuracy but fail to identify the minority class.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

Copyimport numpy as np
from sklearn.dummy import DummyClassifier
from sklearn.metrics import accuracy_score, confusion_matrix

# Create an imbalanced dataset
X = np.random.rand(1000, 10)
y = np.zeros(1000)
y[:50] = 1  # Only 5% of the data belongs to class 1

# Train a dummy classifier that always predicts the majority class
dummy_clf = DummyClassifier(strategy="most_frequent")
dummy_clf.fit(X, y)

# Make predictions
y_pred = dummy_clf.predict(X)

# Calculate accuracy
accuracy = accuracy_score(y, y_pred)
print(f"Accuracy: {accuracy}")  # Output: Accuracy: 0.95

# Display confusion matrix
print("Confusion Matrix:")
print(confusion_matrix(y, y_pred))

🚀

Cool fact: Many professional data scientists use this exact approach in their daily work! Real-Life Example: Medical Diagnosis - Made Simple!

Consider a rare disease affecting 1% of the population. A model that always predicts “no disease” would have 99% accuracy but fail to identify any actual cases.

Here’s where it gets exciting! Here’s how we can tackle this:

Copyimport numpy as np
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

# Simulating a population of 10,000 people
population_size = 10000
disease_prevalence = 0.01

# Generate true labels
y_true = np.random.choice([0, 1], size=population_size, p=[1-disease_prevalence, disease_prevalence])

# Model always predicting "no disease"
y_pred = np.zeros(population_size)

# Calculate metrics
accuracy = accuracy_score(y_true, y_pred)
conf_matrix = confusion_matrix(y_true, y_pred)

print(f"Accuracy: {accuracy:.2f}")
print("\nConfusion Matrix:")
print(conf_matrix)
print("\nClassification Report:")
print(classification_report(y_true, y_pred))

🚀

🔥 Level up: Once you master this, you’ll be solving problems like a pro! Precision and Recall - Made Simple!

To address the limitations of accuracy, we introduce precision and recall. Precision measures the proportion of correct positive predictions, while recall measures the proportion of actual positives correctly identified.

This next part is really neat! Here’s how we can tackle this:

Copyfrom sklearn.metrics import precision_score, recall_score

# Using the medical diagnosis example from the previous slide
precision = precision_score(y_true, y_pred)
recall = recall_score(y_true, y_pred)

print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")

# Interpreting the results
print("\nInterpretation:")
print("Precision of 0.00 means no true positives were predicted.")
print("Recall of 0.00 means no actual positive cases were identified.")

🚀 F1 Score: Balancing Precision and Recall - Made Simple!

The F1 score is the harmonic mean of precision and recall, providing a single metric that balances both. It’s particularly useful when you have an uneven class distribution.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

Copyfrom sklearn.metrics import f1_score

# Calculate F1 score
f1 = f1_score(y_true, y_pred)

print(f"F1 Score: {f1:.2f}")

# Demonstrate how F1 score changes with different precision-recall trade-offs
precisions = [0.5, 0.8, 0.9]
recalls = [0.8, 0.5, 0.1]

for p, r in zip(precisions, recalls):
    f1 = 2 * (p * r) / (p + r)
    print(f"Precision: {p:.2f}, Recall: {r:.2f}, F1 Score: {f1:.2f}")

🚀 ROC Curve and AUC - Made Simple!

The Receiver Operating Characteristic (ROC) curve and Area Under the Curve (AUC) provide a complete view of a model’s performance across different classification thresholds.

Ready for some cool stuff? Here’s how we can tackle this:

Copyimport numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import roc_curve, auc
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_classes=2, weights=[0.9, 0.1], random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train a logistic regression model
model = LogisticRegression()
model.fit(X_train, y_train)

# Get predicted probabilities
y_pred_proba = model.predict_proba(X_test)[:, 1]

# Calculate ROC curve and AUC
fpr, tpr, _ = roc_curve(y_test, y_pred_proba)
roc_auc = auc(fpr, tpr)

# Plot ROC curve
plt.figure()
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (AUC = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

🚀 Confusion Matrix Visualization - Made Simple!

A confusion matrix provides a detailed breakdown of correct and incorrect classifications for each class. Visualizing it can offer insights into model performance.

Let me walk you through this step by step! Here’s how we can tackle this:

Copyimport numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix

# Generate sample predictions
y_true = np.array([0, 0, 0, 1, 1, 1, 1, 1])
y_pred = np.array([0, 1, 0, 1, 0, 1, 0, 1])

# Create confusion matrix
cm = confusion_matrix(y_true, y_pred)

# Visualize confusion matrix
plt.figure(figsize=(8, 6))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

# Print classification metrics
tn, fp, fn, tp = cm.ravel()
accuracy = (tp + tn) / (tp + tn + fp + fn)
precision = tp / (tp + fp)
recall = tp / (tp + fn)
f1 = 2 * (precision * recall) / (precision + recall)

print(f"Accuracy: {accuracy:.2f}")
print(f"Precision: {precision:.2f}")
print(f"Recall: {recall:.2f}")
print(f"F1 Score: {f1:.2f}")

🚀 Cross-Validation: Assessing Model Stability - Made Simple!

Cross-validation helps assess how well a model generalizes to unseen data by splitting the dataset into multiple training and testing sets.

Let me walk you through this step by step! Here’s how we can tackle this:

Copyfrom sklearn.model_selection import cross_val_score
from sklearn.datasets import make_classification
from sklearn.svm import SVC

# Generate a synthetic dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Create an SVM classifier
svm = SVC(kernel='rbf', random_state=42)

# Perform 5-fold cross-validation
cv_scores = cross_val_score(svm, X, y, cv=5)

print("Cross-validation scores:", cv_scores)
print(f"Mean CV Score: {cv_scores.mean():.3f} (+/- {cv_scores.std() * 2:.3f})")

# Visualize cross-validation results
plt.figure(figsize=(8, 6))
plt.boxplot(cv_scores)
plt.title('5-Fold Cross-Validation Results')
plt.ylabel('Accuracy')
plt.show()

🚀 Stratified Sampling for Imbalanced Datasets - Made Simple!

Stratified sampling ensures that the proportion of samples for each class is roughly the same in train and test sets, which is super important for imbalanced datasets.

Here’s where it gets exciting! Here’s how we can tackle this:

Copyfrom sklearn.model_selection import train_test_split
from collections import Counter

# Create an imbalanced dataset
X = np.random.rand(1000, 5)
y = np.zeros(1000)
y[:100] = 1  # Only 10% of the data belongs to class 1

# Perform stratified split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, stratify=y, random_state=42)

# Check class distribution
print("Original dataset class distribution:")
print(Counter(y))

print("\nTraining set class distribution:")
print(Counter(y_train))

print("\nTest set class distribution:")
print(Counter(y_test))

# Visualize class distribution
fig, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize=(15, 5))

ax1.pie(Counter(y).values(), labels=Counter(y).keys(), autopct='%1.1f%%')
ax1.set_title('Original Dataset')

ax2.pie(Counter(y_train).values(), labels=Counter(y_train).keys(), autopct='%1.1f%%')
ax2.set_title('Training Set')

ax3.pie(Counter(y_test).values(), labels=Counter(y_test).keys(), autopct='%1.1f%%')
ax3.set_title('Test Set')

plt.tight_layout()
plt.show()

🚀 Real-Life Example: Spam Detection - Made Simple!

In spam detection, accuracy alone can be misleading. A model that classifies all emails as non-spam might have high accuracy but fail to detect actual spam.

Ready for some cool stuff? Here’s how we can tackle this:

Copyimport numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Simulating an email dataset (0: non-spam, 1: spam)
emails = 10000
spam_ratio = 0.2

y_true = np.random.choice([0, 1], size=emails, p=[1-spam_ratio, spam_ratio])

# Model A: Always predicts non-spam
y_pred_A = np.zeros(emails)

# Model B: More balanced predictions
y_pred_B = np.random.choice([0, 1], size=emails, p=[0.85, 0.15])

def evaluate_model(y_true, y_pred, model_name):
    accuracy = accuracy_score(y_true, y_pred)
    precision = precision_score(y_true, y_pred)
    recall = recall_score(y_true, y_pred)
    f1 = f1_score(y_true, y_pred)
    
    print(f"{model_name} Performance:")
    print(f"Accuracy: {accuracy:.3f}")
    print(f"Precision: {precision:.3f}")
    print(f"Recall: {recall:.3f}")
    print(f"F1 Score: {f1:.3f}\n")

evaluate_model(y_true, y_pred_A, "Model A (Always Non-Spam)")
evaluate_model(y_true, y_pred_B, "Model B (Balanced)")

🚀 Handling Class Imbalance: Oversampling and Undersampling - Made Simple!

When dealing with imbalanced datasets, techniques like oversampling the minority class or undersampling the majority class can help improve model performance.

Let me walk you through this step by step! Here’s how we can tackle this:

Copyfrom sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from imblearn.over_sampling import SMOTE
from imblearn.under_sampling import RandomUnderSampler
from sklearn.svm import SVC

# Generate imbalanced dataset
X, y = make_classification(n_samples=10000, n_classes=2, weights=[0.9, 0.1], random_state=42)

# Split the data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train SVM on imbalanced data
svm = SVC()
svm.fit(X_train, y_train)
y_pred = svm.predict(X_test)
print("Original imbalanced dataset:")
print(classification_report(y_test, y_pred))

# Apply SMOTE (oversampling)
smote = SMOTE(random_state=42)
X_train_smote, y_train_smote = smote.fit_resample(X_train, y_train)

svm_smote = SVC()
svm_smote.fit(X_train_smote, y_train_smote)
y_pred_smote = svm_smote.predict(X_test)
print("\nAfter SMOTE oversampling:")
print(classification_report(y_test, y_pred_smote))

# Apply Random Undersampling
rus = RandomUnderSampler(random_state=42)
X_train_rus, y_train_rus = rus.fit_resample(X_train, y_train)

svm_rus = SVC()
svm_rus.fit(X_train_rus, y_train_rus)
y_pred_rus = svm_rus.predict(X_test)
print("\nAfter Random Undersampling:")
print(classification_report(y_test, y_pred_rus))

🚀 Learning Curves: Diagnosing Bias and Variance - Made Simple!

Learning curves help visualize how model performance changes with increasing training data, aiding in diagnosing underfitting or overfitting issues.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

Copyimport numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate a dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Define the model
model = SVC(kernel='rbf', random_state=42)

# Calculate learning curves
train_sizes, train_scores, test_scores = learning_curve(
    model, X, y, cv=5, n_jobs=-1, 
    train_sizes=np.linspace(0.1, 1.0, 10),
    scoring='accuracy'
)

# Calculate mean and standard deviation
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

# Plot learning curves
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_mean, label='Training score', color='

🎯 Response: - Let’s Get Started!

🚀 Learning Curves: Diagnosing Bias and Variance - Made Simple!

Learning curves help visualize how model performance changes with increasing training data, aiding in diagnosing underfitting or overfitting issues.

Let’s make this super clear! Here’s how we can tackle this:

Copyimport numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import learning_curve
from sklearn.svm import SVC
from sklearn.datasets import make_classification

# Generate a dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)

# Define the model
model = SVC(kernel='rbf', random_state=42)

# Calculate learning curves
train_sizes, train_scores, test_scores = learning_curve(
    model, X, y, cv=5, n_jobs=-1, 
    train_sizes=np.linspace(0.1, 1.0, 10),
    scoring='accuracy'
)

# Calculate mean and standard deviation
train_mean = np.mean(train_scores, axis=1)
train_std = np.std(train_scores, axis=1)
test_mean = np.mean(test_scores, axis=1)
test_std = np.std(test_scores, axis=1)

# Plot learning curves
plt.figure(figsize=(10, 6))
plt.plot(train_sizes, train_mean, label='Training score', color='blue')
plt.fill_between(train_sizes, train_mean - train_std, train_mean + train_std, alpha=0.1, color='blue')
plt.plot(train_sizes, test_mean, label='Cross-validation score', color='red')
plt.fill_between(train_sizes, test_mean - test_std, test_mean + test_std, alpha=0.1, color='red')

plt.title('Learning Curves')
plt.xlabel('Training Examples')
plt.ylabel('Score')
plt.legend(loc='best')
plt.grid(True)
plt.show()

🚀 Feature Importance: Understanding Model Decisions - Made Simple!

Feature importance helps identify which features contribute most to a model’s predictions, providing insights into its decision-making process.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

Copyfrom sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import make_classification
import numpy as np
import matplotlib.pyplot as plt

# Generate a dataset with known feature importances
X, y = make_classification(n_samples=1000, n_features=10, n_informative=3, 
                           n_redundant=0, n_repeated=0, n_classes=2, 
                           random_state=42)

# Train a Random Forest classifier
rf = RandomForestClassifier(n_estimators=100, random_state=42)
rf.fit(X, y)

# Get feature importances
importances = rf.feature_importances_
feature_names = [f'Feature {i}' for i in range(X.shape[1])]

# Sort features by importance
feature_importance = sorted(zip(importances, feature_names), reverse=True)

# Plot feature importances
plt.figure(figsize=(10, 6))
plt.bar(range(X.shape[1]), [imp for imp, _ in feature_importance])
plt.xticks(range(X.shape[1]), [name for _, name in feature_importance], rotation=45)
plt.title('Feature Importances')
plt.xlabel('Features')
plt.ylabel('Importance')
plt.tight_layout()
plt.show()

# Print feature importances
for importance, name in feature_importance:
    print(f'{name}: {importance:.4f}')

🚀 Ensemble Methods: Combining Models for Better Performance - Made Simple!

Ensemble methods combine multiple models to create a more reliable and accurate predictor, often outperforming individual models.

This next part is really neat! Here’s how we can tackle this:

Copyfrom sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier, VotingClassifier
from sklearn.svm import SVC
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Generate a dataset
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define individual models
rf = RandomForestClassifier(n_estimators=100, random_state=42)
gb = GradientBoostingClassifier(n_estimators=100, random_state=42)
svm = SVC(kernel='rbf', probability=True, random_state=42)

# Create a voting classifier
voting_clf = VotingClassifier(
    estimators=[('rf', rf), ('gb', gb), ('svm', svm)],
    voting='soft'
)

# Train and evaluate individual models and the ensemble
models = [rf, gb, svm, voting_clf]
model_names = ['Random Forest', 'Gradient Boosting', 'SVM', 'Voting Classifier']

for model, name in zip(models, model_names):
    model.fit(X_train, y_train)
    y_pred = model.predict(X_test)
    accuracy = accuracy_score(y_test, y_pred)
    print(f'{name} Accuracy: {accuracy:.4f}')

🚀 Additional Resources - Made Simple!

For further exploration of machine learning evaluation metrics and techniques, consider the following resources:

  1. “A Survey of Predictive Modelling under Imbalanced Distributions” by Branco et al. (2016) ArXiv: https://arxiv.org/abs/1505.01658
  2. “Beyond Accuracy: Precision and Recall” by Powers (2011) ArXiv: https://arxiv.org/abs/2101.08458
  3. “ROC Analysis in Pattern Recognition: A Tutorial” by Fawcett (2004) ArXiv: https://arxiv.org/abs/cs/0303033

These papers provide in-depth discussions on various aspects of model evaluation and handling imbalanced datasets in machine learning.

🎊 Awesome Work!

You’ve just learned some really powerful techniques! Don’t worry if everything doesn’t click immediately - that’s totally normal. The best way to master these concepts is to practice with your own data.

What’s next? Try implementing these examples with your own datasets. Start small, experiment, and most importantly, have fun with it! Remember, every data science expert started exactly where you are right now.

Keep coding, keep learning, and keep being awesome! 🚀

Back to Blog

Related Posts

View All Posts »