A team fine-tunes a pre-trained GPT-2 model on 50,000 customer support transcripts using the same causal LM objective used during pre-training (next-token prediction). After fine-tuning, the model generates fluent customer-support-style text but still fails to follow explicit instructions like "Summarize this ticket in one sentence." A product manager asks: "Didn't fine-tuning teach it to follow instructions?" What is the precise reason the model still cannot follow instructions?