Programming Practical 4: Tokenizers¶
This practical session provides a deep dive into tokenizers — the critical first step in any NLP pipeline. You will compare pre-trained tokenizers (BPE, WordPiece, SentencePiece), train your own from scratch, understand padding/batching/masking, and see how the tokenizer integrates into a full LLM pipeline.