Data Science

⚡ Exploring Recurrent Neural Networks For Sequence Learning In Python Secrets Every Expert Uses!

Hey there! Ready to dive into Exploring Recurrent Neural Networks For Sequence Learning In Python? This friendly guide will walk you through everything step-by-step with easy-to-follow examples. Perfect for beginners and pros alike!

SuperML Team
Share this article

Share:

🚀

💡 Pro tip: This is one of those techniques that will make you look like a data science wizard! Introduction to Recurrent Neural Networks (RNNs) - Made Simple!

Recurrent Neural Networks (RNNs) are a class of neural networks designed to process sequential data. Unlike traditional feedforward networks, RNNs have connections that form directed cycles, allowing them to maintain an internal state or “memory” of previous inputs. This architecture makes RNNs particularly well-suited for tasks involving time series, natural language processing, and other sequence-based problems.

Let’s make this super clear! Here’s how we can tackle this:

import numpy as np

class SimpleRNN:
    def __init__(self, input_size, hidden_size, output_size):
        self.hidden_size = hidden_size
        self.Wxh = np.random.randn(hidden_size, input_size) * 0.01
        self.Whh = np.random.randn(hidden_size, hidden_size) * 0.01
        self.Why = np.random.randn(output_size, hidden_size) * 0.01
        self.bh = np.zeros((hidden_size, 1))
        self.by = np.zeros((output_size, 1))

    def forward(self, inputs):
        h = np.zeros((self.hidden_size, 1))
        outputs = []
        for i in inputs:
            h = np.tanh(np.dot(self.Wxh, i) + np.dot(self.Whh, h) + self.bh)
            y = np.dot(self.Why, h) + self.by
            outputs.append(y)
        return outputs, h

# Example usage
rnn = SimpleRNN(input_size=10, hidden_size=20, output_size=5)
inputs = [np.random.randn(10, 1) for _ in range(5)]
outputs, final_hidden_state = rnn.forward(inputs)
print(f"Number of outputs: {len(outputs)}")
print(f"Shape of final output: {outputs[-1].shape}")
print(f"Shape of final hidden state: {final_hidden_state.shape}")

🚀

🎉 You’re doing great! This concept might seem tricky at first, but you’ve got this! The Basic Structure of an RNN - Made Simple!

An RNN processes sequences of inputs by iterating through them one at a time. At each step, it takes the current input and the previous hidden state to produce an output and update the hidden state. This recurrent connection allows the network to maintain information about previous inputs, enabling it to learn and recognize patterns in sequential data.

Let’s break this down together! Here’s how we can tackle this:

import numpy as np
import matplotlib.pyplot as plt

def rnn_step(x, h_prev, W_xh, W_hh, W_hy, b_h, b_y):
    h = np.tanh(np.dot(W_xh, x) + np.dot(W_hh, h_prev) + b_h)
    y = np.dot(W_hy, h) + b_y
    return h, y

# Visualize RNN structure
def plot_rnn_structure():
    fig, ax = plt.subplots(figsize=(10, 6))
    ax.set_xlim(0, 4)
    ax.set_ylim(0, 3)
    ax.axis('off')

    # Draw nodes
    circle = plt.Circle((1, 1.5), 0.3, fill=False)
    ax.add_artist(circle)
    ax.text(1, 1.5, 'h(t-1)', ha='center', va='center')

    circle = plt.Circle((2, 1.5), 0.3, fill=False)
    ax.add_artist(circle)
    ax.text(2, 1.5, 'h(t)', ha='center', va='center')

    circle = plt.Circle((3, 1.5), 0.3, fill=False)
    ax.add_artist(circle)
    ax.text(3, 1.5, 'h(t+1)', ha='center', va='center')

    # Draw arrows
    ax.arrow(1.3, 1.5, 0.4, 0, head_width=0.05, head_length=0.1, fc='k', ec='k')
    ax.arrow(2.3, 1.5, 0.4, 0, head_width=0.05, head_length=0.1, fc='k', ec='k')
    ax.arrow(2, 1.8, 0, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k')
    ax.arrow(1, 0.7, 0, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k')
    ax.arrow(2, 0.7, 0, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k')
    ax.arrow(3, 0.7, 0, 0.5, head_width=0.05, head_length=0.1, fc='k', ec='k')

    # Add labels
    ax.text(1, 0.5, 'x(t-1)', ha='center', va='center')
    ax.text(2, 0.5, 'x(t)', ha='center', va='center')
    ax.text(3, 0.5, 'x(t+1)', ha='center', va='center')
    ax.text(2, 2.5, 'y(t)', ha='center', va='center')

    plt.title('Basic RNN Structure')
    plt.show()

plot_rnn_structure()

🚀

Cool fact: Many professional data scientists use this exact approach in their daily work! Backpropagation Through Time (BPTT) - Made Simple!

Backpropagation Through Time (BPTT) is the algorithm used to train RNNs. It’s an extension of the standard backpropagation algorithm used in feedforward networks. BPTT “unrolls” the RNN through time, treating each time step as a layer in a very deep neural network. This allows the gradients to flow backwards through time, updating the weights based on the errors at each time step.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

import numpy as np

def rnn_forward(x, h_prev, params):
    Wxh, Whh, Why = params['Wxh'], params['Whh'], params['Why']
    bh, by = params['bh'], params['by']
    
    h = np.tanh(np.dot(Wxh, x) + np.dot(Whh, h_prev) + bh)
    y = np.dot(Why, h) + by
    return h, y

def rnn_backward(dh_next, cache, params):
    x, h_prev, h = cache
    Wxh, Whh, Why = params['Wxh'], params['Whh'], params['Why']
    
    dh = dh_next + np.dot(Whh.T, dh_next)
    dh_raw = (1 - h * h) * dh
    dbh = dh_raw
    dWxh = np.dot(dh_raw, x.T)
    dWhh = np.dot(dh_raw, h_prev.T)
    dx = np.dot(Wxh.T, dh_raw)
    dh_prev = np.dot(Whh.T, dh_raw)
    
    return dx, dh_prev, dWxh, dWhh, dbh

# Example usage
params = {
    'Wxh': np.random.randn(5, 3),
    'Whh': np.random.randn(5, 5),
    'Why': np.random.randn(2, 5),
    'bh': np.zeros((5, 1)),
    'by': np.zeros((2, 1))
}

x = np.random.randn(3, 1)
h_prev = np.zeros((5, 1))

# Forward pass
h, y = rnn_forward(x, h_prev, params)

# Backward pass (assuming some gradient from the next layer)
dh_next = np.random.randn(5, 1)
cache = (x, h_prev, h)

dx, dh_prev, dWxh, dWhh, dbh = rnn_backward(dh_next, cache, params)

print("Shape of dx:", dx.shape)
print("Shape of dh_prev:", dh_prev.shape)
print("Shape of dWxh:", dWxh.shape)
print("Shape of dWhh:", dWhh.shape)
print("Shape of dbh:", dbh.shape)

🚀

🔥 Level up: Once you master this, you’ll be solving problems like a pro! Vanishing and Exploding Gradients - Made Simple!

One of the main challenges in training RNNs is the problem of vanishing or exploding gradients. As the network processes long sequences, the gradients can either become extremely small (vanishing) or extremely large (exploding). This makes it difficult for the network to learn long-term dependencies in the data.

Let me walk you through this step by step! Here’s how we can tackle this:

import numpy as np
import matplotlib.pyplot as plt

def simulate_gradient_flow(sequence_length, gradient_factor):
    gradients = [1.0]
    for _ in range(sequence_length - 1):
        gradients.append(gradients[-1] * gradient_factor)
    return gradients

# Simulate vanishing and exploding gradients
seq_length = 100
vanishing_gradients = simulate_gradient_flow(seq_length, 0.9)
exploding_gradients = simulate_gradient_flow(seq_length, 1.1)

# Plotting
plt.figure(figsize=(12, 6))
plt.plot(range(seq_length), vanishing_gradients, label='Vanishing Gradients')
plt.plot(range(seq_length), exploding_gradients, label='Exploding Gradients')
plt.xlabel('Time Steps')
plt.ylabel('Gradient Magnitude')
plt.title('Vanishing and Exploding Gradients in RNNs')
plt.legend()
plt.yscale('log')
plt.grid(True)
plt.show()

print(f"Final vanishing gradient: {vanishing_gradients[-1]:.2e}")
print(f"Final exploding gradient: {exploding_gradients[-1]:.2e}")

🚀 Long Short-Term Memory (LSTM) Networks - Made Simple!

Long Short-Term Memory (LSTM) networks are a type of RNN designed to mitigate the vanishing gradient problem. LSTMs use a more complex structure with gates that control the flow of information. This allows them to learn long-term dependencies more effectively than simple RNNs.

Here’s a handy trick you’ll love! Here’s how we can tackle this:

import numpy as np

class LSTMCell:
    def __init__(self, input_size, hidden_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        # Initialize weights and biases
        self.Wf = np.random.randn(hidden_size, input_size + hidden_size)
        self.Wi = np.random.randn(hidden_size, input_size + hidden_size)
        self.Wc = np.random.randn(hidden_size, input_size + hidden_size)
        self.Wo = np.random.randn(hidden_size, input_size + hidden_size)
        self.bf = np.zeros((hidden_size, 1))
        self.bi = np.zeros((hidden_size, 1))
        self.bc = np.zeros((hidden_size, 1))
        self.bo = np.zeros((hidden_size, 1))
        
    def forward(self, x, h_prev, c_prev):
        # Concatenate input and previous hidden state
        z = np.vstack((x, h_prev))
        
        # Compute gates
        f = self.sigmoid(np.dot(self.Wf, z) + self.bf)
        i = self.sigmoid(np.dot(self.Wi, z) + self.bi)
        c_tilde = np.tanh(np.dot(self.Wc, z) + self.bc)
        o = self.sigmoid(np.dot(self.Wo, z) + self.bo)
        
        # Update cell state and hidden state
        c = f * c_prev + i * c_tilde
        h = o * np.tanh(c)
        
        return h, c
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

# Example usage
input_size = 10
hidden_size = 20
lstm_cell = LSTMCell(input_size, hidden_size)

x = np.random.randn(input_size, 1)
h_prev = np.zeros((hidden_size, 1))
c_prev = np.zeros((hidden_size, 1))

h, c = lstm_cell.forward(x, h_prev, c_prev)
print(f"Shape of hidden state: {h.shape}")
print(f"Shape of cell state: {c.shape}")

🚀 Gated Recurrent Units (GRUs) - Made Simple!

Gated Recurrent Units (GRUs) are another variant of RNNs designed to solve the vanishing gradient problem. GRUs are similar to LSTMs but with a simpler structure, using only two gates: a reset gate and an update gate. This simplification can make GRUs faster to train and more memory-efficient than LSTMs, while still maintaining good performance on many tasks.

This next part is really neat! Here’s how we can tackle this:

import numpy as np

class GRUCell:
    def __init__(self, input_size, hidden_size):
        self.input_size = input_size
        self.hidden_size = hidden_size
        
        # Initialize weights and biases
        self.Wz = np.random.randn(hidden_size, input_size + hidden_size)
        self.Wr = np.random.randn(hidden_size, input_size + hidden_size)
        self.Wh = np.random.randn(hidden_size, input_size + hidden_size)
        self.bz = np.zeros((hidden_size, 1))
        self.br = np.zeros((hidden_size, 1))
        self.bh = np.zeros((hidden_size, 1))
        
    def forward(self, x, h_prev):
        # Concatenate input and previous hidden state
        z = np.vstack((x, h_prev))
        
        # Compute gates
        z_t = self.sigmoid(np.dot(self.Wz, z) + self.bz)
        r_t = self.sigmoid(np.dot(self.Wr, z) + self.br)
        
        # Compute candidate hidden state
        h_tilde = np.tanh(np.dot(self.Wh, np.vstack((x, r_t * h_prev))) + self.bh)
        
        # Update hidden state
        h = (1 - z_t) * h_prev + z_t * h_tilde
        
        return h
    
    def sigmoid(self, x):
        return 1 / (1 + np.exp(-x))

# Example usage
input_size = 10
hidden_size = 20
gru_cell = GRUCell(input_size, hidden_size)

x = np.random.randn(input_size, 1)
h_prev = np.zeros((hidden_size, 1))

h = gru_cell.forward(x, h_prev)
print(f"Shape of hidden state: {h.shape}")

🚀 Bidirectional RNNs - Made Simple!

Bidirectional RNNs process sequences in both forward and backward directions, allowing the network to capture context from both past and future elements in the sequence. This is particularly useful in tasks where the entire sequence is available at once, such as language translation or speech recognition.

This next part is really neat! Here’s how we can tackle this:

import numpy as np

class BidirectionalRNN:
    def __init__(self, input_size, hidden_size, output_size):
        self.forward_rnn = SimpleRNN(input_size, hidden_size, output_size)
        self.backward_rnn = SimpleRNN(input_size, hidden_size, output_size)
        self.Wy = np.random.randn(output_size, 2*hidden_size) * 0.01
        self.by = np.zeros((output_size, 1))
        
    def forward(self, inputs):
        forward_outputs, _ = self.forward_rnn.forward(inputs)
        backward_outputs, _ = self.backward_rnn.forward(inputs[::-1])
        backward_outputs = backward_outputs[::-1]
        
        combined_outputs = []
        for f, b in zip(forward_outputs, backward_outputs):
            combined = np.concatenate((f, b), axis=0)
            output = np.dot(self.Wy, combined) + self.by
            combined_outputs.append(output)
        
        return combined_outputs

# Example usage
input_size, hidden_size, output_size = 10, 20, 5
bi_rnn = BidirectionalRNN(input_size, hidden_size, output_size)
inputs = [np.random.randn(input_size, 1) for _ in range(5)]
outputs = bi_rnn.forward(inputs)

print(f"Number of outputs: {len(outputs)}")
print(f"Shape of each output: {outputs[0].shape}")

🚀 Sequence-to-Sequence Models - Made Simple!

Sequence-to-sequence (seq2seq) models are a type of RNN architecture designed for tasks that involve transforming one sequence into another, such as machine translation or text summarization. These models typically consist of an encoder RNN that processes the input sequence and a decoder RNN that generates the output sequence.

Here’s where it gets exciting! Here’s how we can tackle this:

import numpy as np

class Seq2SeqModel:
    def __init__(self, input_size, hidden_size, output_size):
        self.encoder = SimpleRNN(input_size, hidden_size, hidden_size)
        self.decoder = SimpleRNN(output_size, hidden_size, output_size)
        
    def forward(self, input_seq, target_seq_len):
        # Encoder
        _, encoder_state = self.encoder.forward(input_seq)
        
        # Decoder
        decoder_input = np.zeros((self.decoder.input_size, 1))
        decoder_state = encoder_state
        outputs = []
        
        for _ in range(target_seq_len):
            output, decoder_state = self.decoder.forward([decoder_input])
            outputs.append(output[0])
            decoder_input = output[0]  # Use previous output as next input
        
        return outputs

# Example usage
input_size, hidden_size, output_size = 10, 20, 5
seq2seq = Seq2SeqModel(input_size, hidden_size, output_size)

input_seq = [np.random.randn(input_size, 1) for _ in range(4)]
target_seq_len = 3

outputs = seq2seq.forward(input_seq, target_seq_len)
print(f"Number of outputs: {len(outputs)}")
print(f"Shape of each output: {outputs[0].shape}")

🚀 Attention Mechanism in RNNs - Made Simple!

The attention mechanism allows RNNs to focus on different parts of the input sequence when generating each element of the output sequence. This is particularly useful for long sequences where standard RNNs might struggle to capture all relevant information in a fixed-size hidden state.

Don’t worry, this is easier than it looks! Here’s how we can tackle this:

import numpy as np

def attention(query, keys, values):
    scores = np.dot(keys.T, query)
    attention_weights = np.exp(scores) / np.sum(np.exp(scores))
    context = np.dot(values, attention_weights)
    return context, attention_weights

class AttentionRNN:
    def __init__(self, input_size, hidden_size, output_size):
        self.rnn = SimpleRNN(input_size, hidden_size, hidden_size)
        self.Wa = np.random.randn(hidden_size, hidden_size) * 0.01
        self.Wy = np.random.randn(output_size, hidden_size) * 0.01
        self.by = np.zeros((output_size, 1))
        
    def forward(self, inputs):
        rnn_outputs, _ = self.rnn.forward(inputs)
        outputs = []
        
        for t in range(len(inputs)):
            query = np.dot(self.Wa, rnn_outputs[t])
            context, _ = attention(query, np.array(rnn_outputs).T, np.array(rnn_outputs).T)
            output = np.dot(self.Wy, context) + self.by
            outputs.append(output)
        
        return outputs

# Example usage
input_size, hidden_size, output_size = 10, 20, 5
attention_rnn = AttentionRNN(input_size, hidden_size, output_size)
inputs = [np.random.randn(input_size, 1) for _ in range(5)]
outputs = attention_rnn.forward(inputs)

print(f"Number of outputs: {len(outputs)}")
print(f"Shape of each output: {outputs[0].shape}")

🚀 Real-life Example: Sentiment Analysis - Made Simple!

Sentiment analysis is a common application of RNNs in natural language processing. In this example, we’ll use a simple RNN to classify movie reviews as positive or negative based on their text content.

Let’s make this super clear! Here’s how we can tackle this:

import numpy as np

# Simplified tokenization and embedding
def tokenize(text):
    return text.lower().split()

vocab = {'good': 0, 'great': 1, 'bad': 2, 'awful': 3, 'movie': 4, 'film': 5, 'the': 6, 'a': 7}

def embed(tokens):
    return [np.eye(len(vocab))[vocab[token]] for token in tokens if token in vocab]

class SentimentRNN:
    def __init__(self, vocab_size, hidden_size):
        self.rnn = SimpleRNN(vocab_size, hidden_size, 1)
    
    def forward(self, inputs):
        outputs, _ = self.rnn.forward(inputs)
        return outputs[-1]  # Use the last output for classification

# Example usage
sentiment_rnn = SentimentRNN(len(vocab), 10)

positive_review = "The movie was good and great"
negative_review = "The film was bad and awful"

pos_tokens = embed(tokenize(positive_review))
neg_tokens = embed(tokenize(negative_review))

pos_sentiment = sentiment_rnn.forward(pos_tokens)
neg_sentiment = sentiment_rnn.forward(neg_tokens)

print(f"Positive review sentiment: {pos_sentiment[0][0]:.4f}")
print(f"Negative review sentiment: {neg_sentiment[0][0]:.4f}")

🚀 Real-life Example: Time Series Prediction - Made Simple!

RNNs are widely used for time series prediction tasks, such as weather forecasting or stock price prediction. In this example, we’ll use an RNN to predict future values in a simple time series.

Let me walk you through this step by step! Here’s how we can tackle this:

import numpy as np
import matplotlib.pyplot as plt

# Generate a simple time series
def generate_time_series(length):
    t = np.linspace(0, 10, length)
    series = np.sin(t) + np.random.normal(0, 0.1, length)
    return series

# Prepare data for RNN
def prepare_data(series, look_back):
    X, y = [], []
    for i in range(len(series) - look_back):
        X.append(series[i:i+look_back])
        y.append(series[i+look_back])
    return np.array(X), np.array(y)

# Simple RNN for time series prediction
class TimeSeriesRNN:
    def __init__(self, input_size, hidden_size):
        self.rnn = SimpleRNN(input_size, hidden_size, 1)
    
    def forward(self, inputs):
        outputs, _ = self.rnn.forward(inputs)
        return outputs[-1]

# Generate and prepare data
series = generate_time_series(1000)
X, y = prepare_data(series, look_back=10)

# Create and use the model
model = TimeSeriesRNN(1, 20)

# Make predictions
predictions = [model.forward([x.reshape(-1, 1) for x in X_seq])[0][0] for X_seq in X[:100]]

# Plot results
plt.figure(figsize=(12, 6))
plt.plot(range(110), series[:110], label='Actual')
plt.plot(range(10, 110), predictions, label='Predicted')
plt.legend()
plt.title('Time Series Prediction with RNN')
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()

🚀 Training RNNs: Practical Considerations - Made Simple!

Training RNNs effectively requires careful consideration of several factors:

  1. Gradient clipping: To prevent exploding gradients, clip the gradients to a maximum value.
  2. Proper initialization: Initialize weights carefully to avoid vanishing or exploding gradients.
  3. Choosing the right architecture: Select between simple RNNs, LSTMs, or GRUs based on the task.
  4. Regularization: Use techniques like dropout to prevent overfitting.
  5. Learning rate scheduling: Adjust the learning rate during training for better convergence.

🚀 Training RNNs: Practical Considerations - Made Simple!

Let’s make this super clear! Here’s how we can tackle this:

import numpy as np

def clip_gradients(gradients, max_value):
    return [np.clip(grad, -max_value, max_value) for grad in gradients]

def initialize_weights(input_size, hidden_size, output_size):
    scale = np.sqrt(2.0 / (input_size + hidden_size))
    return {
        'Wxh': np.random.randn(hidden_size, input_size) * scale,
        'Whh': np.random.randn(hidden_size, hidden_size) * scale,
        'Why': np.random.randn(output_size, hidden_size) * scale,
        'bh': np.zeros((hidden_size, 1)),
        'by': np.zeros((output_size, 1))
    }

def apply_dropout(x, keep_prob):
    mask = (np.random.rand(*x.shape) < keep_prob) / keep_prob
    return x * mask

def adjust_learning_rate(initial_lr, epoch, decay_rate):
    return initial_lr / (1 + decay_rate * epoch)

# Example usage
input_size, hidden_size, output_size = 10, 20, 5
weights = initialize_weights(input_size, hidden_size, output_size)

gradients = [np.random.randn(*w.shape) * 10 for w in weights.values()]
clipped_gradients = clip_gradients(gradients, max_value=5.0)

x = np.random.randn(hidden_size, 1)
x_with_dropout = apply_dropout(x, keep_prob=0.8)

initial_lr = 0.01
for epoch in range(10):
    lr = adjust_learning_rate(initial_lr, epoch, decay_rate=0.1)
    print(f"Epoch {epoch}, Learning Rate: {lr:.4f}")

🚀 cool RNN Architectures - Made Simple!

As research in deep learning progresses, more cool RNN architectures have been developed to address specific challenges and improve performance:

  1. Transformer models: While not strictly RNNs, Transformers have largely replaced traditional RNNs in many NLP tasks due to their ability to process entire sequences in parallel.
  2. Neural Turing Machines: These combine RNNs with an external memory, allowing for more complex reasoning and computation.
  3. Differentiable Neural Computers: An extension of Neural Turing Machines with a more smart memory addressing mechanism.
  4. Clockwork RNNs: These modify the standard RNN architecture to update different parts of the hidden state at different frequencies, allowing for better modeling of long-term dependencies.

🚀 cool RNN Architectures - Made Simple!

This next part is really neat! Here’s how we can tackle this:

import numpy as np

class TransformerBlock:
    def __init__(self, d_model, num_heads):
        self.attention = MultiHeadAttention(d_model, num_heads)
        self.feed_forward = FeedForward(d_model)
        self.layer_norm1 = LayerNorm(d_model)
        self.layer_norm2 = LayerNorm(d_model)
    
    def forward(self, x):
        attention_output = self.attention(x)
        x = self.layer_norm1(x + attention_output)
        ff_output = self.feed_forward(x)
        return self.layer_norm2(x + ff_output)

class MultiHeadAttention:
    def __init__(self, d_model, num_heads):
        self.num_heads = num_heads
        self.d_model = d_model
        self.d_k = d_model // num_heads
        self.W_q = np.random.randn(d_model, d_model)
        self.W_k = np.random.randn(d_model, d_model)
        self.W_v = np.random.randn(d_model, d_model)
        self.W_o = np.random.randn(d_model, d_model)
    
    def forward(self, x):
        # Simplified multi-head attention
        return np.dot(self.W_o, x)

class FeedForward:
    def __init__(self, d_model):
        self.W1 = np.random.randn(d_model, d_model)
        self.W2 = np.random.randn(d_model, d_model)
    
    def forward(self, x):
        return np.dot(self.W2, np.maximum(np.dot(self.W1, x), 0))

class LayerNorm:
    def __init__(self, d_model):
        self.gamma = np.ones(d_model)
        self.beta = np.zeros(d_model)
    
    def forward(self, x):
        mean = np.mean(x, axis=-1, keepdims=True)
        std = np.std(x, axis=-1, keepdims=True)
        return self.gamma * (x - mean) / (std + 1e-8) + self.beta

# Example usage
d_model, num_heads = 64, 8
transformer_block = TransformerBlock(d_model, num_heads)
x = np.random.randn(10, d_model)  # Sequence of 10 tokens
output = transformer_block.forward(x)
print(f"Input shape: {x.shape}")
print(f"Output shape: {output.shape}")

🚀 Additional Resources - Made Simple!

For those interested in diving deeper into RNNs and sequence learning, here are some valuable resources:

  1. “Sequence Models” course by Andrew Ng on Coursera
  2. “Deep Learning” book by Ian Goodfellow, Yoshua Bengio, and Aaron Courville (Chapter 10 on Sequence Modeling)
  3. “Attention Is All You Need” paper by Vaswani et al. (2017) (arXiv:1706.03762)
  4. “Long Short-Term Memory” paper by Hochreiter and Schmidhuber (1997) (doi:10.1162/neco.1997.9.8.1735)
  5. TensorFlow and PyTorch documentation on RNN implementations
  6. “The Unreasonable Effectiveness of Recurrent Neural Networks” blog post by Andrej Karpathy

These resources provide a mix of theoretical foundations and practical implementations to further your understanding of RNNs and their

🎊 Awesome Work!

You’ve just learned some really powerful techniques! Don’t worry if everything doesn’t click immediately - that’s totally normal. The best way to master these concepts is to practice with your own data.

What’s next? Try implementing these examples with your own datasets. Start small, experiment, and most importantly, have fun with it! Remember, every data science expert started exactly where you are right now.

Keep coding, keep learning, and keep being awesome! 🚀

Back to Blog

Related Posts

View All Posts »