2025-02-14 · validation, tabular, curriculum

Designing Validation Splits for Tabular ML in Production

By Dr. Yuki Tanaka

When teams first move models from notebooks to scheduled jobs, the validation strategy is often an afterthought. A random 80/20 split can look excellent in development while failing quietly once new data arrives with different seasonality or product mix.

In our Machine Learning Foundations Program labs, we ask learners to document three questions before choosing a split: Is the target leaked through future information? Does the business care about ranking or calibration? Will the model be retrained on a fixed cadence?

We teach blocked time splits for ordered data, group splits when multiple rows belong to the same entity, and stratified splits when class balance matters. Each approach gets a one-page decision tree learners can attach to portfolio README files.

The limitation we state openly: no split strategy fixes bad labels. If annotation drift is the root issue, validation metrics will only tell part of the story. That is why week 8 includes a label audit exercise, not just metric tuning.

← All posts