Your production RAG system processes 1 million queries per day. You use LangChain with OpenAI embeddings and GPT-4o. Your monthly bill is $45,000. An engineer proposes reducing costs by 70%. Which combination of optimizations is realistic and risk-appropriate?