TroelsLLM

GPT Model Built from Scratch

👋 Welcome to TroelsLLM!

A working GPT model (162M parameters) built and trained from scratch.

Try these prompts from the training data:

This is a GPT model I built and trained from scratch following "Build a Large Language Model (From Scratch)" by Sebastian Raschka.

Every component was implemented by hand - no pre-built transformers library!

🔧 Key Components

1. Tokenization

Text is split into tokens using BPE (Byte Pair Encoding)
Each token is mapped to an ID from a 50,257-token vocabulary

2. Embeddings

Token IDs are converted to dense 768-dimensional vectors
Positional encodings are added so the model knows word order

3. Attention Mechanism

Self-attention allows tokens to focus on relevant context
Multi-head attention (12 heads) captures different relationships simultaneously
Causal masking prevents "seeing the future" during training

4. Transformer Blocks

12 stacked layers of attention + feedforward networks
Layer normalization stabilizes training
Residual connections ensure gradients flow through all layers

5. Text Generation

Autoregressive: generates one token at a time
Each token is fed back as input to generate the next
Temperature and top-k sampling control randomness

📊 Model Specifications

Architecture: GPT-2 (124M parameters)
Parameters: 162,419,712 trainable weights
Layers: 12 transformer blocks
Attention Heads: 12 per layer
Embedding Dimension: 768
Context Length: 256 tokens
Vocabulary Size: 50,257 tokens (GPT-2 tokenizer)
Training Data: "The Verdict" by Edith Wharton
Training Time: ~10 minutes (10 epochs)

🛠️ Built With

Backend: Python, PyTorch, FastAPI
Frontend: Vanilla JavaScript, HTML5, CSS3
Hosting: Hugging Face Spaces (backend), GitHub Pages (frontend)
Implementation: Hand-coded from scratch (no transformers library!)