Python proficiency is essential for ML engineers. Interviews test data structures, PyTorch internals, async programming, memory management, and production-quality code.
Key Concepts to Know
Practice Python with AI
Timed session with instant scoring, voice support, and model answers.
14 Interview Questions
Browse all topics →What is asyncio and how can it speed up LLM API calls?
Model Answer
asyncio enables concurrent execution of I/O-bound tasks in a single thread using an event loop and async/await syntax. For LLM APIs: instead of sequential calls (each waits for response), run many API calls concurrently. Example with async OpenAI: `async def main(): tasks = [acall(prompt) for prompt in prompts]; results = await asyncio.gather(*tasks)`. Speedup: if each call takes 2 seconds and you have 100 calls, sequential takes 200s, concurrent takes ~2s (limited by rate limits). Use `asyncio.Semaphore` to respect rate limits. All major AI SDKs (openai, anthropic) have async clients.
What is the difference between list comprehension and map/filter in Python for data processing?
Model Answer
List comprehension `[f(x) for x in lst if condition]` is Pythonic, readable, generally faster than explicit loops. map() applies a function to every element, returns a lazy iterator — useful for large datasets or chaining. filter() selects elements matching a predicate, also lazy. For ML: list comprehensions are preferred for readability. `[x**2 for x in data]` vs `list(map(lambda x: x**2, data))`. Performance: list comprehension is typically fastest for simple operations, map() can be faster with built-in functions (map(str.upper, words)). For large datasets: prefer generators `(f(x) for x in lst)` to avoid loading everything in memory.
Explain Python's context managers and how they're used in ML training.
Model Answer
Context managers define setup/teardown logic using __enter__ and __exit__ or @contextmanager decorator. In ML: 1) `torch.no_grad()` — disables gradient computation during inference (saves memory, speeds up), 2) `torch.autocast("cuda")` — automatic mixed precision (uses float16 where safe, float32 where needed), 3) `with open(file) as f:` — safe file handling, 4) model.train() / model.eval() context managers. Implementation: class with __enter__ returning resource and __exit__ handling cleanup. @contextmanager with yield gives simpler generator-based implementation.
What is the difference between asyncio.gather and asyncio.as_completed?
Model Answer
asyncio.gather(*tasks) waits for ALL tasks and returns results in the SAME ORDER they were passed in — useful when you need all results before continuing. asyncio.as_completed(tasks) yields tasks as they complete (any order), letting you process the fastest results first — useful for race patterns or progress bars. Performance: both run tasks concurrently; the difference is in the result API. gather has return_exceptions=True option to collect failures instead of cancelling everything. For LLM batch calls: gather is usually right; switch to as_completed when you want to stream partial results to the user as each model returns.
How do you profile and optimize Python code for ML training pipelines?
Model Answer
Profiling tools: cProfile/profile (function-level), line_profiler (@profile decorator, line-level), memory_profiler (memory usage), PyTorch Profiler (GPU + CPU timeline). Common bottlenecks: data loading (fix: increase num_workers, pin_memory=True, prefetch), inefficient preprocessing (vectorize with numpy/pandas instead of loops), Python loops in forward pass (use tensor operations), unnecessary CPU-GPU transfers. Tools: torch.utils.bottleneck, Tensorboard Profiler, NVIDIA Nsight. Rule of thumb: profile before optimizing, focus on the top bottleneck first.
How does Python's GIL affect multi-threaded ML workloads?
Model Answer
The Global Interpreter Lock (GIL) prevents multiple Python threads from executing Python bytecode simultaneously. Impact on ML: CPU-bound work (numpy, tensor ops) bypasses the GIL because they release it during C extension calls, so threads work fine there. Pure Python CPU work (data preprocessing loops) is bottlenecked by the GIL. Solutions: 1) multiprocessing — uses separate processes with separate GILs, used by PyTorch DataLoader (num_workers>0), 2) asyncio — concurrent I/O without threads, 3) Cython/C extensions. PyTorch operations release the GIL, so Python threads work for GPU-bound ML training.
How does Python's GIL affect multi-threaded ML workloads?
Model Answer
The Global Interpreter Lock (GIL) prevents multiple Python threads from executing Python bytecode simultaneously. Impact on ML: CPU-bound work (numpy, tensor ops) bypasses the GIL because they release it during C extension calls, so threads work fine there. Pure Python CPU work (data preprocessing loops) is bottlenecked by the GIL. Solutions: 1) multiprocessing — uses separate processes with separate GILs, used by PyTorch DataLoader (num_workers>0), 2) asyncio — concurrent I/O without threads, 3) Cython/C extensions. PyTorch operations release the GIL, so Python threads work for GPU-bound ML training.
Explain Python's context managers and how they're used in ML training.
Model Answer
Context managers define setup/teardown logic using __enter__ and __exit__ or @contextmanager decorator. In ML: 1) `torch.no_grad()` — disables gradient computation during inference (saves memory, speeds up), 2) `torch.autocast("cuda")` — automatic mixed precision (uses float16 where safe, float32 where needed), 3) `with open(file) as f:` — safe file handling, 4) model.train() / model.eval() context managers. Implementation: class with __enter__ returning resource and __exit__ handling cleanup. @contextmanager with yield gives simpler generator-based implementation.
What is asyncio and how can it speed up LLM API calls?
Model Answer
asyncio enables concurrent execution of I/O-bound tasks in a single thread using an event loop and async/await syntax. For LLM APIs: instead of sequential calls (each waits for response), run many API calls concurrently. Example with async OpenAI: `async def main(): tasks = [acall(prompt) for prompt in prompts]; results = await asyncio.gather(*tasks)`. Speedup: if each call takes 2 seconds and you have 100 calls, sequential takes 200s, concurrent takes ~2s (limited by rate limits). Use `asyncio.Semaphore` to respect rate limits. All major AI SDKs (openai, anthropic) have async clients.
How do you profile and optimize Python code for ML training pipelines?
Model Answer
Profiling tools: cProfile/profile (function-level), line_profiler (@profile decorator, line-level), memory_profiler (memory usage), PyTorch Profiler (GPU + CPU timeline). Common bottlenecks: data loading (fix: increase num_workers, pin_memory=True, prefetch), inefficient preprocessing (vectorize with numpy/pandas instead of loops), Python loops in forward pass (use tensor operations), unnecessary CPU-GPU transfers. Tools: torch.utils.bottleneck, Tensorboard Profiler, NVIDIA Nsight. Rule of thumb: profile before optimizing, focus on the top bottleneck first.
How does Python's ContextVar differ from threading.local, and when do you use it?
Model Answer
threading.local stores values per-OS-thread. ContextVar stores values per-CONTEXT — and asyncio creates a new context per task. With asyncio, threading.local breaks (all tasks share the same OS thread) but ContextVar works correctly. Use ContextVar for: request-scoped data (request ID, user ID) in FastAPI / aiohttp servers, OpenTelemetry trace propagation, tenant context in multi-tenant async services. FastAPI uses ContextVar internally for request state. Rule: if your service uses async/await anywhere, replace threading.local with contextvars.
What are Python generators and when are they useful in ML?
Model Answer
Generators are lazy iterators that yield values one at a time instead of storing all in memory. Defined with yield keyword. Use cases in ML: 1) Data pipelines — process large datasets in batches without loading all into memory (yield batch), 2) Infinite streams of data augmentation, 3) Custom DataLoaders. Example: `def data_loader(paths): for p in paths: yield preprocess(load(p))`. Benefits: constant memory usage regardless of dataset size, can interleave I/O with computation. Works well with Python's for-loop and next(). generator expressions: `(x**2 for x in range(1000000))` vs list comprehension.
What is the difference between list comprehension and map/filter in Python for data processing?
Model Answer
List comprehension `[f(x) for x in lst if condition]` is Pythonic, readable, generally faster than explicit loops. map() applies a function to every element, returns a lazy iterator — useful for large datasets or chaining. filter() selects elements matching a predicate, also lazy. For ML: list comprehensions are preferred for readability. `[x**2 for x in data]` vs `list(map(lambda x: x**2, data))`. Performance: list comprehension is typically fastest for simple operations, map() can be faster with built-in functions (map(str.upper, words)). For large datasets: prefer generators `(f(x) for x in lst)` to avoid loading everything in memory.
What are Python generators and when are they useful in ML?
Model Answer
Generators are lazy iterators that yield values one at a time instead of storing all in memory. Defined with yield keyword. Use cases in ML: 1) Data pipelines — process large datasets in batches without loading all into memory (yield batch), 2) Infinite streams of data augmentation, 3) Custom DataLoaders. Example: `def data_loader(paths): for p in paths: yield preprocess(load(p))`. Benefits: constant memory usage regardless of dataset size, can interleave I/O with computation. Works well with Python's for-loop and next(). generator expressions: `(x**2 for x in range(1000000))` vs list comprehension.
Related Topics