DistillPrep
PythonGenAI
Coming Soon
SML System Design
NNLP
MMachine Learning
DDeep Learning
QDB & SQL
TDS & Statistics
OMLOps
CCloud (ML-focused)
Blog
G

GenAI & LLMs

Curriculum Engine

Knowledge Tracks

Mastery Insight

"Focus on topics where you've failed edge-case questions. MAANG interviewers look for conceptual depth, not speed."

Live Engine
Select Topic
easyInference Optimization

A developer deploys a 7B LLM for an API service. They notice that generating the first token of a response takes 80ms (prefill latency), while each subsequent token takes 12ms (decode latency). Users complain that short responses feel slow despite the total latency being acceptable for long responses. A colleague says: "The first-token latency and per-token latency have completely different bottlenecks — optimizing one doesn't optimize the other." What are the two different bottlenecks, and why does the KV cache matter for one but not the other?

Progress0%
0 of 350 concepts cleared
Accuracy
0%
Solved
0

Question Index

Interview Tips

  • 1.Concepts over memorization.
  • 2.Identify trade-offs in every solution.