A team builds a customer service agent and evaluates it using only the final outcome metric: "Did the customer's issue get resolved? (Y/N)." Their success rate is 71%. A researcher says: "End-to-end success rate is necessary but insufficient — you're missing trajectory evaluation." What is trajectory evaluation, and why does it matter even when end-to-end success is achieved?