Tokenize Text (Easy) | AI Code Lab

Tokenize TextEasy

00:00

Python idle

Tokenize Text

Implement a **basic whitespace tokenizer** — the foundation of NLP preprocessing.

Rules

1. Lowercase the entire string

2. Remove punctuation: `. , ! ? ; : " '`

3. Split on whitespace

4. Remove empty strings

Example

tokenize("Hello, World!")       →  ["hello", "world"]
tokenize("The quick brown fox") →  ["the", "quick", "brown", "fox"]
tokenize("AI is great!")        →  ["ai", "is", "great"]

Deep dive: NLP concepts