Model Selection Analogy

Aravind Brahmadevara
2 min readDec 5, 2022

Trust me! It is goanna be simple

Scenario: You are trying to select one candidate out of 10 for a job.

You want to test candidates how they perform on a variety of scenarios.

Similarly, you want to train and test ML models on different scenarios

Analogy 1: You can give them a take home test. A take home test has many aspects which will test a candidates’ ability in multiple scenarios

Take home test <=> Large Training set and test set in Machine learning — Large sets are roughly equivalent to population

Alternative 2: You want to test candidates on 4 or 5 rounds of 45 min each Similarly you want to train and test model on different small training sets

Rounds of interview <=> multiple small training sets (statistical learning — small sets are roughly equivalent to sample data)

Analogy 2.1: Against the same set of questions, you want to pick a candidate who performs better

But human world is sometimes NOT FAIR :-). Some candidates might get hard questions. But companies are OKAY to lose a good candidate

Let’s come back from human world to machine world. Let's be fair.

To be fair to be ML models, all of them have to be tested against the same training and test sets. But if you randomly split (train-test), each model does not get the same training set. Evaluation is not FAIR

Here comes cross validation. All models get the same training and sets in k-folds. (I have overly simplified CV just for explanation sake :-).Please understand)

Caution: There might be still randomness inherent in a model.Example: Weight initialization is random, and convergence is not always at the same point)

There is still more to come from me.

You can reach me at

(8) Aravind Brahmadevara | LinkedIn

aravind-deva (aravind brahmadevara) (github.com)

--

--