A team builds a RAG pipeline by embedding entire PDF documents (50–200 pages each) as single vectors and retrieving the top-3 most similar documents for each query. Answer quality is poor — the LLM frequently misses specific facts that exist in the retrieved documents. A colleague suggests: "The embedding model is too weak." A RAG engineer diagnoses a different root cause. What is it?