Which teams have the best balanced lineups this World Cup? Let's see if Machine Learning will help us with this
|May 20||Public post|| 2|
At the beginning of the month, I did a piece for the Hindustan Times on the six kinds of batsmen in the IPL (the piece is not online so I’ve pasted a screenshot at the bottom of this email). Essentially I looked at the distribution of ball-wise outcomes (dots, singles, twos, fours, sixes and outs) for all batsmen who’ve batted for at least 750 balls across all the editions of the IPL, and threw the data into a K-means clustering algorithm.
I got meaningful results when I set the number of clusters at 6, and thus we got “hard-hitting openers”, “steady openers”, “classicists”, “collectors”, “finishers” and “man-beasts”. Since we are in World Cup season now, I won’t delve much more into the IPL, but while performing this clustering I came across an interesting quirk - pretty much all of the Indian one-day international top order was classified by the algorithm as “collectors”.
Virat Kohli, Mahendra Singh Dhoni, Rohit Sharma, Suresh Raina, Dinesh Karthik, KL Rahul, Kedar Jadhav and Ambati Rayudu are all collectors if you go by their IPL batting. I had mentioned in the copy (which later got edited out) that it is worrisome that the Indian batting lineup ahead of the World Cup has a sort of sameness to it.
Fortunately, while they all bat the same way in T20 cricket, they are all different kinds of beasts when it comes to One Day Internationals.
The clustering process remains the same - we only consider “current batsmen” (those who have batted at least once in an ODI since the beginning of 2018 - this strange definition is to let in David Warner and Steve Smith into the dataset), and look at batsmen who have batted at least 1000 balls in the last 5 years (the beginning of June 2014).
To evaluate the “current versions” of the batsmen, we only look at the batsmen’s statistics in the same time period (since June 2014). So, for example, MS Dhoni is more likely to be classified as a “collector” rather than as a “hitter” that he once used to be. Also, since this period has seen several games among teams that haven’t qualified for the World Cup, we will restrict our analysis only to games played between two teams both of which will be playing in the World Cup.
This restriction (played at least 1000 balls against another World Cup team since June 2014) means that we’re left with a field of 61 batsmen, and six clusters do the job once again (this is the tricky thing with K-means clustering - you need to pre-specify how many clusters you want). In fact, the clusters come close enough to what we found for the T20 batsmen that I’m inclined to borrow some of the nomenclature that my HT editor Rudraneil Sengupta came up with.
There are two kinds of openers again - “Pinch Hitters” who score rapidly with lots of boundaries, and “Steady Openers” who play more dot balls but also play a longer innings. There is a class of “Hitters” (though not hard enough hitters to be titled “man beasts”).
And then there are three kinds of middle order batsmen. “Accumulators” who score at a fast clip by playing few dot balls and generally being busy, while both “Classicists” and “Rotators” focus on holding one end up. Classicists are more likely to score through boundaries while Rotators keep the scoreboard ticking through singles. Indeed, an argument can be made that these two clusters (Classicists and Rotators) could be merged.
So what does India’s batting order look like at the World Cup? To get a complete picture we need to complete the classification of the remaining batsmen as well (who haven’t batted 1000 balls against other World Cup qualified nations in the last 5 years). This we do through “out of sample” application of the K-means classifier.
India will open with one Pinch Hitter (Shikhar Dhawan) and one Classicist (Rohit Sharma). Sharma’s classification as a middle order batsmen is interesting, though it might be explained by his proclivity for either playing extra-long innings or getting out early.
Accumulator Virat Kohli will lead the middle order. Dhoni at 5 is a Classicist and Jadhav at 6 is a Hitter. Hardik Pandya at 7, interestingly, has been classified as an Accumulator. So what does that leave for number 4?
KL Rahul is a Rotator, a skill India currently doesn’t possess. Vijay Shankar, based on limited data is an Accumulator while Dinesh Karthik is a Classicist. Since we already have two other Classicists in the eleven (Sharma and Dhoni), Karthik is possibly not the best choice for the position. It will be interesting to see if the team goes with the Rotator or for another Accumulator.
What are the other teams doing? I don’t know yet what the likely lineups will be, but based on my guess here’s what some of the other teams will look like.
Based on my guess on how teams will line up, I like the England line-up the best. They start with two pinch hitters and end with two hitters. South Africa also seems to have a pretty good strategy - again with two pinch hitters up top and then a middle order of rotators. Pakistan will again rely on Fakhar Zaman for the momentum. West Indies is all “bang bang control” - I expect them to be either spectacularly good or spectacularly bad in individual matches. The tournament format, with two knockout rounds, might suit them.
What do you think about this classification? Does the clustering make sense? And which team do you think has the best balanced lineup to do well in the World Cup? Remember that it is going to be a run-fest!
If you liked this, please consider subscribing. And share it with whoever you think will like it. Also, write back with your comments and suggestions on the newsletter, and today’s edition.