Live Engine
Select Topic
hardCI/CD for ML
A team's CI/CD pipeline for ML has the following stages: (1) data validation, (2) model training, (3) offline evaluation against a holdout set, (4) model registration if evaluation passes, (5) deployment to production. A critical bug slips through: a feature engineering bug introduces training-serving skew — the preprocessing at training time differs from serving time. All CI gates pass. Why did the CI pipeline fail to catch training-serving skew, and what specific test type closes this gap?
Code
raw_sample = {"x": 100.0, "y": 50.0}
train_features = training_preprocessor.transform(pd.DataFrame([raw_sample]))
serve_features = serving_preprocessor.transform(raw_sample) # or gRPC/REST call
assert train_features == serve_features, \
f"Training-serving skew detected: {train_features} != {serve_features}"