System Design Practice

Live Engine

Select Topic

hardMonitoring And Observability

A deployed text classification model suddenly shows increased loss across all classes simultaneously. Infrastructure metrics are normal (no latency spike, no error rate). Feature distribution PSI scores are below alert thresholds. The team suspects a silent failure but can't identify the cause. What diagnostic framework systematically identifies the root cause, and what class of monitoring gap does this reveal?

Live Engine

Select Topic

hardMonitoring And Observability

A deployed text classification model suddenly shows increased loss across all classes simultaneously. Infrastructure metrics are normal (no latency spike, no error rate). Feature distribution PSI scores are below alert thresholds. The team suspects a silent failure but can't identify the cause. What diagnostic framework systematically identifies the root cause, and what class of monitoring gap does this reveal?