Advanced Machine Learning Techniques for
Optimizing Sports Team Composition: A
Comprehensive Predictive Analytics Framework
Submied for Business Analycs Research Project to Aston University.
Submied in September 2024
By
Niranjan Gopalan,
Aston Business School, Aston University.
Master of Science in Business Analycs
1
Declaraon
I declare that I have personally prepared this research report tled "Comprehensive Analysis of
Indian Premier League and Projecng the Opmal Squad for the 2025 Season." This work has not
been submied for any other degree or qualificaon, nor has it appeared in any previously published
document. The research described here is my own, conducted personally unless otherwise stated. All
sources of informaon are duly acknowledged through references. This study contributes original
insights to cricket analycs, parcularly in IPL team management and player selecon strategies.
Acknowledgement
I would like to express my sincere gratude to sports performance analysts, research engineers
worldwide, whose research and insights have significantly informed and enriched my understanding
of sports analycs. I am deeply thankful to my mother Latha, for her unwavering encouragement and
belief in my abilies, and to my father Gopalan, whose dedicaon to public welfare and mathemacs
literacy has been a constant source of inspiraon. I also extend my hearelt appreciaon to my
supervisor Dr. Rizwan Ahmed, for his guidance and support throughout this research project. Lastly, I
would like to thank all the individuals who supported me during my research, as their contribuons
have been essenal to the compleon of this work.
2
Table of Contents
Abstract ................................................................................................................................. 5
List of Figures ......................................................................................................................... 6
List of Tables ........................................................................................................................... 7
1. Introducon ........................................................................................................................ 8
1.1 Background ................................................................................................................... 8
1.2 Team performances over the years .................................................................................. 9
1.3 Research Objecve and Need for Study: ........................................................................... 9
1.4 Scopes ........................................................................................................................ 10
1.5 Limitaons of the study: ............................................................................................... 10
2. Significance of the Study .................................................................................................... 11
2.1 Structure of the research .............................................................................................. 11
3. Literature Review .............................................................................................................. 13
3.1 Adopon of Machine Learning on Sports........................................................................ 14
3.2 Adopon of Machine Learning on Cricket ....................................................................... 15
3.3 Summary of Literature Review: ..................................................................................... 16
4. Research Methodology ...................................................................................................... 17
4.1 Dataset and Approach Overview: .................................................................................. 17
4.2 Data Processing ........................................................................................................... 18
4.2.1 Data Filtering ......................................................................................................... 18
4.2.2 Player Data Extracon ............................................................................................ 18
4.3 Bang Stascs Calculaon ......................................................................................... 18
4.4 Bowling Stascs Calculaon ........................................................................................ 18
4.5 Domain Knowledge ...................................................................................................... 19
5. Quantave and Predicve analysis.................................................................................... 20
5.1 Win Rao Analysis of Teams .......................................................................................... 20
5.1.1 Need for Analysis ................................................................................................... 20
5.1.2 Objecve .............................................................................................................. 21
5.1.3 Data overview ....................................................................................................... 21
5.1.4 Quantave Analysis ............................................................................................. 21
5.1.5 Corelaon Analysis: ............................................................................................... 22
5.1.6 Linear Regression Model: ....................................................................................... 23
5.1.7 Predicon Analysis ................................................................................................. 23
5.1.8 Predicon Findings:................................................................................................ 24
5.2 Rule-based scoring system combined with normalisaon and weighted aggregaon ......... 25
3
5.2.1 Need for Analysis ................................................................................................... 25
5.2.2 Objecve .............................................................................................................. 25
5.2.3 Data Overview ....................................................................................................... 25
5.2.4 Quantave Analysis ............................................................................................. 26
5.3 Random Forest model to predict the overall score .......................................................... 28
5.3.1 Need for Analysis: .................................................................................................. 28
5.3.2 Objecve of the Model ........................................................................................... 28
5.3.3 Data Overview ....................................................................................................... 28
5.3.4 Random Forest Regression Model ........................................................................... 29
5.3.5 Model Visualisaon ............................................................................................... 29
5.4 Random Forest Model using RandomizedSearchCV ......................................................... 31
5.4.1 Objecve of the model ........................................................................................... 31
5.4.2 Representaon of Random Forest model with RandomizedSearchCV ......................... 31
5.4.3 Evaluaon ............................................................................................................. 32
5.4.4 Model Visualisaon ............................................................................................... 32
5.5 XG Boosng Method .................................................................................................... 33
5.5.1 Objecve of this model .......................................................................................... 33
5.5.2 Representaon of the Model .................................................................................. 34
5.5.3 Model Visualisaon ............................................................................................... 35
5.6 Enhanced XG Boosng model ........................................................................................ 36
5.6.1 Representaon of the model .................................................................................. 36
5.7 Support Vector Regression Model .................................................................................. 37
5.7.1 Objecve of the model ........................................................................................... 37
5.7.2 Representaon of the model .................................................................................. 38
5.7.3 Model Visualisaon ............................................................................................... 39
5.7.3 Distribuon of Predicon errors .............................................................................. 39
5.8 Machine Learning Models and their accuracy results ....................................................... 39
5.8.1 Evaluaon ............................................................................................................. 39
5.8.2 Fine-Tuning ........................................................................................................... 40
5.8.3 Model Tesng ........................................................................................................ 41
5.8.4 Random Forest Model 1 Predicon .......................................................................... 41
5.8.5 XG Boost Model 1 Predicon................................................................................... 41
5.8.6 Performance Distribuon Curves ............................................................................. 42
5.8.7 ROC curves ............................................................................................................ 43
6. Players Overall Performance score for KKR and DC ............................................................... 44
4
6.1 Kolkata Knight Riders Current Players Analysis ................................................................ 44
6.2 Delhi Capitals Current Players Analysis ........................................................................... 45
7. Conclusion ........................................................................................................................ 46
7.1 Squad Opmizaon ...................................................................................................... 46
7.2 KKR Squad Opmizaon and picking best squad ............................................................. 47
7.2.1 Current players Overall score Predicon .................................................................. 47
7.2.2 Potenal Squad Opons for KKR ............................................................................. 48
7.3 Delhi Capitals Squad Opmizaon and picking best squad ............................................... 49
7.3.1 Current players Overall score Predicon .................................................................. 49
7.3.2 Potenal Squad Formaon for Delhi Capitals............................................................ 50
8. Findings and Insights of Players and their performance scores ............................................... 51
8.1 Distribuon of Overall Scores by Player Type .................................................................. 51
8.2 Players with more than 300 runs with strike rate more than 130 ...................................... 52
8.3 Top All-rounders analysis .............................................................................................. 53
8.4 Top Economical Bowlers Analysis ................................................................................... 53
8.5 Density distribuon of overall scores: ............................................................................ 54
8.6 Performance metrics of All-rounders ............................................................................. 54
8.7 Research Conclusion ..................................................................................................... 55
9. Recommendaons ............................................................................................................. 55
10. References ...................................................................................................................... 56
Appendices .......................................................................................................................... 59
Appendix 1 – About IPL Teams ............................................................................................ 59
Appendix 2 – Team Performance ......................................................................................... 61
Appendix 3 – Reason for using Linear Regression ................................................................. 62
Appendix 4 – Reason for using Rule Based Scoring System .................................................... 63
Appendix 4 – Dataset variables ........................................................................................... 64
Appendix 5 – Potenal squad Opons for KKR ..................................................................... 66
Appendix 6 – Potenal squad Opons for DC ....................................................................... 67
5
Abstract
This research project focuses on opmizing squad composions for the Kolkata Knight Riders (KKR)
and Delhi Capitals (DC) in preparaon for the 2025 Indian Premier League (IPL) mega aucon. The
study employs advanced cricket analycs and machine learning models to provide strategic insights
for team building and performance enhancement. Since its incepon in 2008, the IPL has
revoluonized cricket, becoming one of the most popular and lucrave sports leagues globally,
featuring ten franchise teams compeng in a high-stakes, fast-paced T20 format. The research
methodology involves comprehensive data processing, including extracon from reliable sources,
cleaning, and preprocessing. Various machine learning models, such as linear regression, random
forest, XG boosng, and support vector regression, are ulized to analyse player performance and
predict outcomes. Key analyses include Win Rao Analysis of Teams, a rule-based scoring system
combining normalisaon and weighted aggregaon, and Random Forest Models opmized using
RandomizedSearchCV. The study evaluates these models using performance metrics like ROC curves
and performance distribuon curves to ensure robust and accurate predicons. By analysing KKR's
championship-winning strategies in 2024 and DC's approach to team building, the research provides
a comparave analysis of different management philosophies and their impact on team
performance.
The study's significance extends beyond the IPL, offering valuable insights for other T20 leagues like
The Hundred and Big Bash League (BBL), as well as potenal applicaons in sports like football and
baseball. Key findings highlight the importance of strategic player retenon, the influence of external
factors on performance, and the challenges of predicng player aucon values. For KKR and DC
specifically, the research offers analysis of current player performances, idenficaon of key
strengths and weaknesses, and recommendaons for opmizing squad potenal, with a parcular
focus on helping DC beer ulize their young talent. The research contributes to the growing field of
quantave sports analycs, demonstrang the importance of data-driven decision-making in
modern sports management. It provides a framework for improving player selecon, strategy
formulaon, and overall team management across various sports disciplines. Limitaons of the study
include the challenges of evaluang new players with limited IPL data, accurately predicng aucon
values, and accounng for unforeseen circumstances. In conclusion, this research project offers a
comprehensive analysis of cricket analycs, emphasizing the importance of data-driven strategies in
sports management. By focusing on the squad opmizaon of KKR and DC, the study provides
valuable insights that can be applied to other cricket tournaments and sports, underscoring the
potenal of analycs to revoluonize team management and performance opmizaon in the
compeve world of sports.
Keywords: Indian Premier League, Kolkata Knight Riders, Delhi Capitals, Squad Opmizaon, Player
Performance, Machine Learning, Data Analysis, Predicve Modelling, Win Rao, XG Boosng,
Random Forest, Bang Stascs, Bowling Stascs, Performance Metrics, Aucon Strategies, Team
Management, Cricket Analycs, Stascal Methods, Player Selecon, Team Performance
Word count: Around 11,200 words
6
List of Figures
Figure 1 - IPL Logo ................................................................................................................... 8
Figure 2 - Correlaon graph for Linear Reg Model .................................................................... 22
Figure 3 - Predicon Graph .................................................................................................... 24
Figure 4 - Correlaon graph for Random Forest Model ............................................................. 29
Figure 5 - Actual vs Predicted Graph for RF 1 ........................................................................... 30
Figure 6 - Residual graph for RF 1 ............................................................................................ 30
Figure 7 - Predicon error histogram for RF 1 ........................................................................... 30
Figure 8 - Actual vs predicted graph for RF 2 ............................................................................ 32
Figure 9 - Predicted error histogram for RF 2 ............................................................................ 33
Figure 10 - Actual Vs predicted graph for XGBoost model 1 ....................................................... 35
Figure 11 - Predicon error histogram for XGBoost model 1 ...................................................... 36
Figure 12 - Actual vs predicted graph for SVR ........................................................................... 39
Figure 13 - Predicon error histogram for SVR ......................................................................... 39
Figure 14 - Performance distribuon curve for RF 1 and XGBoost .............................................. 42
Figure 15 - ROC Curve graph ................................................................................................... 43
Figure 16 - Bar chart for KKR current players ............................................................................ 44
Figure 17 - Bar chart for DC current players ............................................................................. 45
Figure 18 - Bar chart for squad opons KKR ............................................................................. 48
Figure 19 - Bar chart for squad opons DC ............................................................................... 50
Figure 20 - Average overall score chart .................................................................................... 51
Figure 21 - Distribuon of overall score by player type ............................................................. 51
Figure 22 - Scaer plot for Batsman ........................................................................................ 52
Figure 23 - Scaer plot for top all-rounders ............................................................................. 53
Figure 24 - Scaer plot for top economical bowlers .................................................................. 53
Figure 25 - Density distribuon by player types........................................................................ 54
Figure 26 - Performance metrics of top 5 all-rounders .............................................................. 54
Figure 27 - Chennai Super Kings Logo ...................................................................................... 59
Figure 28 - Delhi Capitals Logo ................................................................................................ 59
Figure 29 - Gujarat Titans Logo ............................................................................................... 59
Figure 30 - Kolkata Knight Riders Logo ..................................................................................... 59
Figure 31 - Lucknow Super Giants Logo ................................................................................... 60
Figure 32 - Mumbai Indians Logo ............................................................................................ 60
Figure 33 - Punjab Kings Logo ................................................................................................. 60
Figure 34 - Rajasthan Royals Logo ........................................................................................... 60
Figure 35 - Royal Challengers Bengaluru logo ........................................................................... 61
Figure 36 - Sunrisers Hyderabad ............................................................................................. 61
7
List of Tables
Table 1 - Team performance table ............................................................................................. 9
Table 2 - Dataset Columns ...................................................................................................... 17
Table 3 - Data for team’s performance overview ...................................................................... 21
Table 4 - Win Rao of all teams .............................................................................................. 21
Table 5 - Team predicon with difference ................................................................................ 23
Table 6 - Overall score table ................................................................................................... 27
Table 7 - All models evaluaon metrics ................................................................................... 39
Table 8 - Evaluaon metrics aer fine-tuning ........................................................................... 40
Table 9 - Sample data for model tesng .................................................................................. 41
Table 10 - Random Forest Model 1 predicon results ............................................................... 41
Table 11 - XG Boost Model 1 Predicon results ........................................................................ 41
Table 12 - KKR current players predicted overall score .............................................................. 47
Table 13 - DC current players predicted overall scores .............................................................. 49
8
1. Introducon
1.1 Background
The Indian Premier League (IPL) has revoluonized cricket since its incepon in 2008, becoming one
of the most popular and lucrave sports leagues globally (Board of Control for Cricket in India, 2023).
This professional Twenty20 cricket tournament features ten franchise teams represenng different
Indian cies or states, compeng in a high-stakes, fast-paced format that has captured the
imaginaon of fans worldwide.
Figure 1- IPL Logo
Source: iplt20
The IPL's success can be aributed to several factors. Firstly, its star-studded lineups aract top
cricket talent from around the world. Each team can field up to four overseas players in their playing
eleven, creang a melng pot of internaonal stars alongside India's best cricketers (ESPN Cricinfo,
2023). This combinaon of global and local talent has helped the IPL become a crickeng spectacle
that consistently ranks among the top sports leagues in terms of average aendance.
The tournament's economic impact has been substanal. In 2022, the league's brand value was
esmated at ₹90,038 crore (US$11 billion) (Duff & Phelps, 2022). Its contribuon to India's GDP is
significant, with the 2015 season alone adding ₹1,150 crore (US$140 million) to the economy (BCCI,
2016). The league's valuaon has skyrocketed, reaching US$10.9 billion in December 2022 and
achieving "decacorn" status (Economic Times, 2023).
The IPL's popularity is reflected in its lucrave media rights deals. For the 2023-2026 seasons, the
league sold its media rights for US$6.4 billion, valuing each match at $13.4 million (Sportstar, 2023).
The tournament has also broken viewership records, with the 2023 final becoming the most
streamed live event on the internet, aracng 32 million viewers (JioCinema, 2023).
The Indian Premier League has transformed cricket from a tradional sport into a global
entertainment spectacle. Its blend of star power, economic impact, and innovave gameplay has
cemented its posion as a powerhouse in the world of sports, influencing the way cricket is played
and consumed around the globe (“see Appendix 1”).
Each team can have a maximum of 25 players in their squad, with no more than eight overseas
players. The playing eleven for each match can include up to four overseas players, ensuring a
balance of internaonal stars and domesc talent (IPL Governing Council, 2024).
The IPL's team structure, with its mix of internaonal stars and Indian talent, creates a unique and
excing crickeng spectacle that has captured the imaginaon of fans worldwide (Shah, 2023).
9
1.2 Team performances over the years
Table 1 - Team performance table
Team Name Played Won Lost N/R Titles Finalists Playoff
MI 261 144 117 0 5 6 11
RCB 256 123 129 4 0 3 9
KKR 252 131 120 1 3 4 7
DC 252 115 135 2 0 1 6
PK 246 112 134 0 0 1 2
CSK 239 138 99 2 5 10 13
RR 222 112 107 3 1 2 5
SRH 182 88 94 0 1 3 6
GT 45 28 17 0 1 2 2
LSG 44 24 19 1 0 0 2
This table provides a comprehensive overview of the performance of Indian Premier League (IPL)
teams since the league's incepon in 2008.The breakdown of the informaon and analyse the data
for each team is explained in the “Appendix 2”.
This table highlights the varying degrees of success and consistency among IPL teams. While some
teams like MI and CSK have dominated with mulple tles and consistent playoff appearances,
others like RCB and PK have struggled to convert their opportunies into championships. The newer
teams, GT and LSG, have shown promise in their short IPL careers, adding excitement to the league's
compeve landscape (Shah, 2023).
1.3 Research Objecve and Need for Study:
A major focus of this study is the upcoming mega aucon for the 2025 IPL season. This aucon will
result in most players being released, with teams allowed to retain only four players, including a
maximum of two foreign players. This significant event provides an opportunity to analyse and
opmize squad-building strategies for Kolkata Knight Riders (KKR) and Delhi Capitals (DC).
Among the 10 teams in the Indian Premier League (IPL), this study aims to analyse, predict, and
opmize the squads for two specific teams: Kolkata Knight Riders (KKR) and Delhi Capitals (DC). KKR,
the 2024 IPL champions, boasts one of the strongest squads in the league. In contrast, DC possesses
a power-packed young squad but has struggled to ulize their potenal effecvely. The research will
focus on the following aspects:
1. Squad Analysis: Examine the composion of both KKR and DC squads, idenfying key
strengths and weaknesses based on their performances.
2. Quantave analysis: Use stascal methods to compare player performances, team
strategies, and match outcomes for both KKR and DC.
3. Performance Predicon: Develop models to forecast player and team performance based on
historical data and current squad dynamics.
4. Squad Opmizaon: Propose strategies for both teams to maximize their squad potenal,
with a parcular emphasis on helping DC beer ulize their young talent.
10
5. Evaluate the current squads of KKR and DC to idenfy potenal retenon candidates.
6. Analyse the impact of retaining only four players on team dynamics and performance.
7. Success Factors: Invesgate the elements that contributed to KKR's championship win in
2024, including team balance, leadership, and player ulizaon.
By conducng this research, the aim is to provide valuable insights into effecve squad building,
talent ulizaon, and performance opmizaon in the highly compeve environment of the IPL.
The findings could offer strategic guidance not only for KKR and DC but also for other T20 cricket
franchises globally.
1.4 Scopes
1. Player performance predicon: Develop models to predict player performance based on
historical data, considering factors like bang and bowling stascs.
2. Team efficiency analysis: Evaluate the efficiency of teams using techniques like Data
Envelopment Analysis (DEA) and Structural Equaon Modeling (SEM).
3. Strategic player retenon: Analyse strategies for the upcoming mega aucon, focusing on
opmal player retenon decisions for KKR and DC.
4. Impact of external factors: Examine how factors like weather, match locaon, and stadium
condions affect player and team performance.
5. Comparave analysis: Compare KKR and DC's squad building and ulizaon strategies with
other successful IPL teams.
1.5 Limitaons of the study:
1. Limited data for new players: Relying solely on IPL data may limit the evaluaon of new or
emerging players who haven't played in the league before.
2. Complexity of player valuaon: Accurately predicng player aucon values and performance
can be challenging due to mulple influencing factors.
3. Changing league dynamics: The study's findings may be affected by evolving league rules,
team strategies, and player availability.
4. External factors: Unforeseen circumstances like injuries, player form, or off-field issues can
impact team performance and are difficult to account for in models.
5. Limited scope: Focusing on only two teams (KKR and DC) may limit the generalizability of
findings to other IPL teams or T20 leagues.
6. Time constraints: The dynamic nature of T20 cricket and frequent player transfers may make
long-term predicons challenging.
7. Mul-objecve opmizaon: It is difficult to formulate team selecon as a mul-objecve
opmizaon problem, while considering budget constraints.
These scopes and limitaons can help, frame the research objecves and methodology for analysing
and opmizing the squads of KKR and Delhi Capitals in the context of the upcoming IPL mega
aucon.
11
2. Significance of the Study
1. Performance Opmizaon: By analysing the factors contribung to KKR's success and DC's
underperformance, the study can offer valuable insights into how teams can beer ulize
their player resources, especially young talent (Ishi et al., 2022).
2. Quantave Sports Analycs: The research contributes to the growing field of quantave
sports analycs in cricket, which has become increasingly important for team management
and strategy development (Jana et al., 2021).
3. Player Improvement: The insights gained from this study can help improve individual player
performance by idenfying key areas for development based on data-driven analysis.
(Techiexpert, 2024).
4. Global Cricket Applicaons: The findings could be valuable not only for IPL teams but also for
naonal cricket boards such as the England and Wales Cricket Board (ECB), Cricket Australia,
and New Zealand Cricket, helping them in player selecon and team strategy for
internaonal compeons (Kalgotra et al., 2014).
5. Predicve Modelling: This research will contribute to the development of more accurate
predicve models for player and team performance in T20 cricket, drawing inspiraon from
advanced analycs techniques used in football. These models can leverage machine learning
algorithms and big data analysis, like those employed in predicng football match outcomes
and player performance (Hubáček et al., 2019; Berrar et al., 2019). Such approaches can be
valuable for team management, fantasy cricket enthusiasts, and sports analycs
professionals, potenally improving decision-making processes in player selecon and
strategy formulaon.
6. Comparave Analysis: By conducng an in-depth comparison of Kolkata Knight Riders'
championship-winning strategies in 2024 and Delhi Capitals' approach to team building, this
study will provide valuable insights into the efficacy of different management philosophies
and their impact on team performance in the IPL (ESPNcricinfo, 2024). This analysis will
highlight how KKR's meculously designed squad, enabling aggressive bang without
compromising depth, contrasts with DC's focus on nurturing young talent, offering a
comprehensive perspecve on successful team construcon in T20 cricket.
2.1 Structure of the research
The goal of this study is to predict the opmal squad composion for KKR and DC in the upcoming
Indian Premier League (IPL) season. The research methodology encompasses several key steps:
12
1. Data Extracon: Gathering comprehensive player stascs and performance data from
various reliable sources.
2. Data Cleaning: Preprocessing the collected data to ensure accuracy, consistency, and
relevance for analysis.
3. Descripve Analycs: Conducng a thorough exploratory data analysis to understand the
underlying paerns and trends in player performances.
4. Data Visualisaon: Creang insighul visual representaons of the data to facilitate easier
interpretaon and idenficaon of key insights.
5. Feature Engineering: Developing new variables or transforming exisng ones to enhance the
predicve power of the models.
6. Player Score Predicon: Ulizing advanced stascal and machine learning techniques to
forecast individual player performances based on historical data and relevant factors.
7. Hyperparameter Tuning: Opmizing the predicve models through rigorous hyperparameter
adjustment to improve accuracy and reliability.
8. Model Performance Tesng: Evaluang the performance of the tuned models using metrics
and techniques such as cross-validaon, ROC curves, and performance distribuon curves to
ensure robustness and accuracy.
9. Squad Opmizaon: Employing the refined predicve models to determine the most
effecve squad composions for KKR and DC, considering various constraints.
10. Stakeholder Recommendaons: Formulang data-driven, aconable recommendaons for
team management, coaches, and other relevant stakeholders to inform their decision-
making processes in player selecon and team strategy.
This comprehensive approach aims to leverage advanced analycs to provide valuable insights and
strategic advantages in the highly compeve landscape of the IPL.
13
3. Literature Review
The Indian Premier League (IPL) has not only revoluonized cricket but has also become a ferle
ground for sports analycs since its incepon in 2008. As the league has grown in stature, so has the
sophiscaon of the analycal approaches used to understand and predict its dynamics.
The Economic Catalyst
(Kadapa,2013) highlighted the IPL's massive economic footprint, underscoring the financial
imperave driving the adopon of advanced analycs. With billions at stake, teams and stakeholders
are increasingly turning to data-driven approaches to gain a compeve edge.
Evoluon of Analycal Approaches
The journey of IPL analycs has been one of connuous refinement. (Shah et al, 2016) laid important
groundwork with their comprehensive analysis of IPL data from 2008 to 2015. Their work
demonstrated the potenal of machine learning in decoding the complexies of T20 cricket, seng
the stage for more advanced studies. Building on this foundaon, Prakash et al. (2019) developed a
nuanced player ranking system using machine learning algorithms. Their model's success in
predicng player rankings with high accuracy showcased the power of analycs in informing team
selecon strategies.
The Human Element in Data
While numbers are at the heart of analycs, recent research has emphasized the importance of
translang data into aconable insights. (Ishi et al,2022) took a significant step in this direcon by
using machine learning for player classificaon. Their work helps bridge the gap between raw data
and on-field strategy, providing coaches and managers with a more intuive understanding of player
capabilies.
Predicve Power and Its Limitaons
The holy grail of sports analycs is accurate predicon, and IPL research has made significant strides
in this area. (Amala Kaviya et al,2020) achieved an impressive 81% accuracy in predicng match
outcomes. However, as any cricket fan knows, the game's unpredictability is part of its charm. These
models, while powerful, serve as tools to inform decision-making rather than crystal balls.
Transparency in Analycs
Recognizing the need for interpretable results, (Bajaj,2023) explored the use of Explainable AI
techniques. This approach not only predicts performance but also elucidates the factors influencing
these predicons, making the insights more accessible and aconable for non-technical stakeholders.
Visualising Success
In the fast-paced world of T20 cricket, the ability to quickly grasp complex informaon is crucial.
(Rodrigues et al,2019) addressed this need by focusing on data visualisaon techniques. Their work
highlights how effecve visual representaon can transform raw data into strategic insights,
accessible to everyone from analysts to players.
The Road Ahead
1. Real-me analycs during matches could revoluonize in-game decision-making.
14
2. Integraon of non-tradional data sources, such as social media senment and player
biometrics, may provide a more holisc view of performance.
3. More sophiscated player valuaon models could transform aucon strategies.
4. The applicaon of deep learning to video analysis promises to unlock new insights into player
techniques and strategies.
The field of IPL analycs is not just about numbers; it's about enhancing the beauful game of
cricket. As analycs connue to evolve, they promise to enrich our understanding and enjoyment of
the sport, providing fans, players, and managers alike with new perspecves on the game we love.
3.1 Adopon of Machine Learning on Sports
The adopon of Machine Learning (ML) in sports has seen significant growth in recent years,
revoluonizing various aspects of athlec performance, strategy, and management.
Performance Analysis and Predicon:
ML has been extensively applied to analyse and predict athlec performance. (Ofoghi et al, 2013)
demonstrated the use of ML algorithms to predict medal-winning performances in sprint kayaking,
achieving an accuracy of 80%. Similarly, (Bunker and Thabtah, 2019) reviewed ML applicaons in
predicng outcomes of various sports, finding that ensemble methods oen outperform individual
algorithms in accuracy.
Injury Predicon and Prevenon:
A crical area where ML has shown promise is in injury predicon and prevenon. (Rossi et al, 2018)
developed a ML model to predict injuries in soccer players, achieving an accuracy of 80% in
idenfying high-risk athletes. Building on this, (Rommers et al, 2020) used ML techniques to predict
injuries in youth soccer players, demonstrang the potenal of these methods in protecng young
athletes.
Taccal Analysis:
ML has transformed taccal analysis in team sports. (Memmert and Raabe, 2018) explored how ML
algorithms can analyse complex paerns in soccer matches, providing coaches with insights that
were previously unaainable through tradional methods. In basketball, (Cervone et al,2016) used
ML to evaluate decision-making in real-me, offering a new perspecve on player eecveness
beyond tradional stascs.
Player Recruitment and Scoung:
The applicaon of ML in talent idenficaon and recruitment has gained tracon. (McHale et
al,2012) developed a ML model to assess player performance in soccer, which has implicaons for
scoung and transfer decisions. More recently, (Liu et al, 2020) used deep learning techniques to
analyse player movements in basketball, providing a data-driven approach to talent evaluaon.
Fan Engagement and Business Operaons:
ML has also found applicaons in enhancing fan engagement and opmizing business operaons in
sports. (Fried and Mumcu, 2016) explored how ML can be used to personalize fan experiences and
improve markeng strategies in professional sports. In cket pricing, (Kemper and Breuer, 2016)
15
demonstrated how ML algorithms can opmize dynamic pricing strategies, potenally increasing
revenue for sports organizaons.
Challenges and Ethical Consideraons:
Despite its potenal, the adopon of ML in sports faces several challenges. (Caya and Bourdon, 2016)
highlighted issues of data quality and interpretaon in sports analycs, emphasizing the need for
domain experse in developing ML models. Ethical consideraons have also come to the forefront,
with (Loland, 2018) discussing the implicaons of ML on fairness and integrity in sports.
Future Direcons:
The future of ML in sports looks promising, with several emerging areas of research. Wearable
technology and IoT devices are expected to provide more granular data for ML models, as explored
by (Seshadri et al., 2019) in their work on real-me performance tracking. Addionally, the
integraon of computer vision with ML, as demonstrated by (Thomas et al, 2017) in their analysis of
tennis player movements, opens new avenues for automated performance analysis.
3.2 Adopon of Machine Learning on Cricket
The adopon of Machine Learning (ML) in cricket analycs has gained significant tracon in recent
years, with researchers from Europe and the USA contribung to this field. Here's a literature review
focusing on key aspects:
1. Match Outcome Predicon:
Researchers have applied ML techniques to predict cricket match outcomes. A study from
the UK focused on English County twenty-over cricket matches, invesgang the degree to
which it's possible to predict match outcomes using ML algorithms. This research
demonstrates the growing interest in applying advanced analycs to cricket.
2. Performance Analysis:
ML has been used to analyse player and team performance in cricket. While not specifically
focused on cricket, McHale et al. (2012) developed ML models to assess player performance
in soccer, which has implicaons for similar applicaons in cricket, parcularly for scoung
and team selecon strategies.
3. Data-Driven Decision Making:
The adopon of ML in cricket analycs aligns with broader trends in sports analycs. Beal et
al. (2019) conducted a comprehensive survey on arficial intelligence for team sports, which
included cricket. They noted that ML methods have been applied to various aspects of
sports, including taccal analysis and performance predicon.
4. Challenges and Limitaons:
While ML shows promise in cricket analycs, researchers have noted challenges such as the
need for high-quality data and the complexity of cricket's rules and playing condions, which
can affect model accuracy (Beal et al., 2019). The dynamic nature of cricket, with its mulple
formats and varying condions, presents unique challenges for ML applicaons.
5. Future Direcons:
Ongoing research is focusing on improving the accuracy of predicve models and expanding
the range of applicaons for ML in cricket. This includes real-me analysis during matches
and more sophiscated player valuaon models (Beal et al., 2019). The potenal for ML to
16
enhance decision-making in areas such as team selecon, strategy formulaon, and player
development is significant.
6. Interdisciplinary Approach:
The literature suggests that successful adopon of ML in cricket requires an interdisciplinary
approach, combining experse in data science, sports science, and domain-specific
knowledge of cricket (Beal et al., 2019).
The adopon of ML in cricket analycs is growing, the field is sll evolving. Researchers connue to
refine methodologies and explore new applicaons to enhance the understanding and analysis of the
sport. The potenal for ML to transform various aspects of cricket, from player performance analysis
to strategic decision-making, is significant, but challenges remain in terms of data quality, model
interpretability, and praccal implementaon.
3.3 Summary of Literature Review:
1. Match Outcome Predicon:
Researchers have applied ML techniques to predict cricket match outcomes. A study focused
on English County twenty-over cricket matches invesgated the degree to which it's possible
to predict match outcomes using ML algorithms. This demonstrates the growing interest in
applying advanced analycs to cricket.
2. Performance Analysis:
ML has been used to analyse player and team performance in cricket. While not specifically
focused on cricket, studies like McHale et al. (2012) developed ML models to assess player
performance in soccer, which has implicaons for similar applicaons in cricket, parcularly
for scoung and team selecon strategies.
3. Data-Driven Decision Making:
The adopon of ML in cricket analycs aligns with broader trends in sports analycs. Beal et
al. (2019) conducted a comprehensive survey on arficial intelligence for team sports, which
included cricket. They noted that ML methods have been applied to various aspects of
sports, including taccal analysis and performance predicon.
4. Challenges and Limitaons:
Researchers have noted challenges such as the need for high-quality data and the complexity
of cricket's rules and playing condions, which can affect model accuracy (Beal et al., 2019).
The dynamic nature of cricket, with its mulple formats and varying condions, presents
unique challenges for ML applicaons.
5. Future Direcons:
Ongoing research is focusing on improving the accuracy of predicve models and expanding
the range of applicaons for ML in cricket. This includes real-me analysis during matches
and more sophiscated player valuaon models.
6. Interdisciplinary Approach:
Successful adopon of ML in cricket requires an interdisciplinary approach, combining
experse in data science, sports science, and domain-specific knowledge of cricket (Beal et
al., 2019).
7. Emerging Technologies:
The European Cricket Network has partnered with Full track AI, an advanced machine
17
learning and arficial intelligence service, to provide ball tracking graphics, pitch maps,
speeds, and other key data points using mobile phone technology (Emerging Cricket, 2023).
In conclusion, while the adopon of ML in cricket analycs is growing, the field is sll evolving.
Researchers connue to rene methodologies and explore new applicaons to enhance the
understanding and analysis of the sport. The potenal for ML to transform various aspects of cricket,
from player performance analysis to strategic decision-making, is significant, but challenges remain
in terms of data quality, model interpretability, and praccal implementaon.
4. Research Methodology
The primary objecve of this research is to leverage machine learning techniques to predict and
opmize the squad composions for two Indian Premier League (IPL) teams: Kolkata Knight Riders
(KKR) and Delhi Capitals (DC). This study aims to ulize a comprehensive approach that incorporates
various stascal methods, algorithms, and predicve models to analyse player performance data.
By employing advanced data mining techniques, feature engineering, and machine learning
algorithms such as decision trees, random forests, and support vector machines (Regression), the
research seeks to idenfy the most effecve player combinaons for each team. The goal is to
provide data-driven insights that can inform team management decisions, parcularly in the context
of player selecon for upcoming seasons and aucons.
4.1 Dataset and Approach Overview:
The dataset is taken from trusted websites: Cricsheet and Howstat. The dataset appears genuine, and
cross-verificaon has been performed to check the legimacy of the data. The dataset contains a
ball-by-ball record of IPL matches, providing detailed informaon about each delivery, including
match details, player informaon, runs scored, extras, and dismissals.
Table 2 - Dataset Columns
Column Name
Descripon
match_id
Unique idenfier for each match
season
The IPL season year
start_date The date the match started
venue The locaon where the match was played
innings The innings number (1st or 2nd)
ball The ball number within the over
bang_team The team currently bang
bowling_team
striker The batsman facing the current ball
non_striker The batsman at the other end
extras Total extra runs scored on this ball
wides Number of wide balls
noballs Number of no balls
byes Number of byes
legbyes Number of leg byes
penalty Any penalty runs awarded
wicket_type Type of dismissal if a wicket fell
player_dismissed Name of the player dismissed (if applicable)
other_wicket_type Secondary wicket type (if applicable)
18
This table represents a comprehensive dataset used for analysing cricket matches, specifically
focusing on the Indian Premier League (IPL). Each row in the table corresponds to a specific delivery
(ball) in a match, providing detailed informaon about the events occurring during that delivery(“see
Appendix 5”)
4.2 Data Processing
4.2.1 Data Filtering
1. The data is filtered to include only innings 1 and 2, excluding super overs.
2. Further filtering is applied to select only the seasons from 2021 to 2024.
4.2.2 Player Data Extracon
1. Unique player names are extracted from the 'striker', 'non_striker', and 'bowler' columns.
2. A Data Frame containing all unique player names is created.
4.3 Bang Stascs Calculaon
1. Pivot tables are created to calculate runs scored and balls faced by each player in each
season.
2. The pivot tables are merged to create a comprehensive bang dataset.
3. Total runs scored and total balls faced across all seasons are computed for each player.
4. Bang strike rate is calculated using the formula: (Runs Scored / Balls Faced) * 100.
Total Runs Scored:
𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 = 𝑟𝑢𝑛𝑠𝑖𝑛2021 + 𝑟𝑢𝑛𝑠𝑖𝑛2022 + 𝑟𝑢𝑛𝑠𝑖𝑛2023 + 𝑟𝑢𝑛𝑠𝑖𝑛2024
Total Balls Faced:
𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐹𝑎𝑐𝑒𝑑 = 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2021 + 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2022 + 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2023
Bang Strike Rate:
𝑆𝑡𝑟𝑖𝑘𝑒 𝑅𝑎𝑡𝑒 = (𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝑆𝑐𝑜𝑟𝑒𝑑/𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐹𝑎𝑐𝑒𝑑) × 100
4.4 Bowling Stascs Calculaon
1. Wicket types are defined (bowled, caught, caught and bowled, hit wicket, lbw, stumped).
2. Pivot tables are created for wickets taken, balls bowled and runs conceded by each bowler in
each season.
3. Total wickets, total balls bowled, and total runs given across all seasons are computed for
each bowler.
4. The bowling data is merged into a single DataFrame.
Wickets Taken:
𝑊𝑖𝑐𝑘𝑒𝑡𝑠 𝑇𝑎𝑘𝑒𝑛 = ∑ 𝑤𝑖𝑐𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 𝑐𝑜𝑢𝑛𝑡
19
Count occurrences of specific wicket types for each bowler.
Balls Bowled:
𝐵𝑎𝑙𝑙𝑠 𝐵𝑜𝑤𝑙𝑒𝑑 = 𝑏𝑎𝑙𝑙 𝑐𝑜𝑢𝑛𝑡 𝑝𝑒𝑟 𝑠𝑒𝑎𝑠𝑜𝑛
Runs Conceded:
𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝐶𝑜𝑛𝑐𝑒𝑑𝑒𝑑
= ∑(𝑟𝑢𝑛𝑠 𝑜𝑓𝑓 𝑏𝑎𝑡 + 𝑒𝑥𝑡𝑟𝑎𝑠 + 𝑤𝑖𝑑𝑒𝑠 + 𝑛𝑜𝑏𝑎𝑙𝑙𝑠 + 𝑏𝑦𝑒𝑠 + 𝑙𝑒𝑔𝑏𝑦𝑒𝑠 + 𝑝𝑒𝑛𝑎𝑙𝑡𝑦)
Total Wickets Taken:
𝑇𝑜𝑡𝑎𝑙 𝑊𝑖𝑐𝑘𝑒𝑡𝑠 = 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2021 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2022 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2023 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2024
Total Balls Bowled:
𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐵𝑜𝑤𝑙𝑒𝑑
= 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2021 + 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2022 + 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2023
+ 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2024
Bowling Economy Rate:
𝐸𝑐𝑜𝑛𝑜𝑚𝑦 𝑅𝑎𝑡𝑒 = 𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝐶𝑜𝑛𝑐𝑒𝑑𝑒𝑑/𝑇𝑜𝑡𝑎𝑙 𝑂𝑣𝑒𝑟𝑠 𝐵𝑜𝑤𝑙𝑒𝑑
Convert balls to overs using:
𝑂𝑣𝑒𝑟𝑠 = ⌊𝐵𝑎𝑙𝑙𝑠/6⌋ + (𝐵𝑎𝑙𝑙𝑠 𝑚𝑜𝑑6 / 10)
Data Merging:
Merge bang and bowling datasets using a common key (player name).
Data Cleaning:
Fill null values with zeros, assuming players who didn't bat or bowl have zero
stascs.
4.5 Domain Knowledge
The domain knowledge required in this field:
1. Understanding of Cricket: A deep understanding of cricket is fundamental. This includes:
Rules of the game
Various formats (Test, ODI, T20)
Strategies and taccs, Historical trends
Nuances that influence the game
2. Stascal Knowledge:
Descripve stascs (mean, median, mode, range, standard deviaon, etc.)
Inferenal stascs (regression analysis, correlaon analysis, ANOVA, hypothesis tesng)
20
Understanding of key performance indicators (KPIs) in cricket (bang average, strike rate,
economy rate, etc.)
3. Data Types and Sources:
Player performance data
Team performance data
Match data, Historical data
4. Analycal Techniques:
Time series analysis
Clustering analysis
Machine learning algorithms
Predicve modelling
5. Cricket-Specific Analycs:
Understanding of Duckworth-Lewis (D/L) method and its applicaons
Knowledge of player valuaon models
Understanding of factors affecng performance (pitch condions, player skills, opposion
strengths/weaknesses)
6. Strategic Applicaons:
How to use data for team selecon
Opmizing bang orders and bowling strategies
Field placement strategies based on data
In-game decision making using real-me analycs
7. Broader Sports Analycs Concepts:
Familiarity with analycs approaches from other sports (e.g., metrics in sports)
Understanding of how analycs can be applied to both performance analysis and fan
engagement.
5. Quantave and Predicve analysis
5.1 Win Rao Analysis of Teams
5.1.1 Need for Analysis
The goal is to analyse the performance of IPL teams, focusing parcularly on their win raos, to
determine which teams have been the most and least successful over the history of the tournament.
This analysis will help idenfy trends, strengths, and weaknesses among the teams, providing
insights into factors that contribute to long-term success in the IPL.
21
5.1.2 Objecve
1. Calculate and compare the win raos of all IPL teams.
2. Predict the Win rao of all teams
5.1.3 Data overview
Table 3 - Data for team’s performance overview
Team Name
Played Won Lost N/R Titles Finalists
Playoff
MI 261 144 117 0 5 6 11
RCB 256 123 129 4 0 3 9
KKR 252 131 120 1 3 4 7
DC 252 115 135 2 0 1 6
PK 246 112 134 0 0 1 2
CSK 239 138 99 2 5 10 13
RR 222 112 107 3 1 2 5
SRH 182 88 94 0 1 3 6
GT 45 28 17 0 1 2 2
LSG 44 24 19 1 0 0 2
5.1.4 Quantave Analysis
Calculate the win rao:
The win rao is a crucial metric in sports analysis, parcularly in leagues like the IPL, as it provides a
clear and quanfiable measure of a team's success relave to its total games played. By calculang
the win rao, stakeholders including coaches, players, analysts, and fans can assess performance
over me, idenfy trends, and make informed decisions.
𝑊𝑖𝑛 𝑅𝑎𝑡𝑖𝑜 = (𝐺𝑎𝑚𝑒𝑠 𝑊𝑜𝑛) / (𝑇𝑜𝑡𝑎𝑙 𝐺𝑎𝑚𝑒𝑠 𝑃𝑙𝑎𝑦𝑒𝑑) 100
A high win rao indicates consistent success and compeveness, while a low rao may highlight
areas needing improvement. Addionally, win raos facilitate comparisons between teams,
regardless of the number of matches played, allowing for a more equitable evaluaon of
performance.
Table 4 - Win Rao of all teams
Team Name Win Ratio
MI 55.17241379
RCB 48.046875
KKR 51.98412698
DC 45.63492063
PK 45.52845528
CSK 57.74058577
RR 50.45045045
SRH 48.35164835
GT 62.22222222
LSG 54.54545455
22
Calculate the lost rao:
The loss rao is a key performance indicator that measures the proporon of games lost relave to
the total number of games played. It is calculated using the formula:
𝐿𝑜𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 = (𝐺𝑎𝑚𝑒𝑠 𝐿𝑜𝑠𝑡/ 𝑇𝑜𝑡𝑎𝑙 𝐺𝑎𝑚𝑒𝑠 𝑃𝑙𝑎𝑦𝑒𝑑) × 100
Win Loss Rao
The win-loss rao is a crical metric used to evaluate performance in various compeve contexts,
including sports and sales. It is calculated using the formula:
Win Loss Rao= Number of Losses / Number of Wins
The win-loss rao difference is a crucial metric in sports analysis for several reasons:
1. Performance indicator: It provides a clear picture of a team's overall performance, showing
how much they're winning compared to losing.
2. Compeve edge: A posive difference indicates a team is winning more than losing,
suggesng a compeve advantage.
3. Trend analysis: Tracking this metric over me can reveal improvements or declines in team
performance.
5.1.5 Corelaon Analysis:
Figure 2 - Correlaon graph for Linear Reg Model
Strong posive correlaons exist between Played and Won/Lost (0.974/0.976), Win_Rao and
WR_Difference (0.997), and Titles and Finalists (0.899). Finalists and Playoff appearances are also
strongly correlated (0.883). Strong negave correlaons are observed between Win_Rao and
lost_Rao (-0.989), and lost_Rao and WR_Difference (-0.997). Moderate correlaons include Won
vs. Playoff (0.755), Titles vs. Playoff (0.767), and Lost vs. lost_Rao (0.754).
Interesngly, Win_Rao and Playoff appearances show only a weak correlaon (0.110), suggesng
regular-season performance doesn't always translate to playoff success. N/R (No Result) has weak
correlaons with most metrics, indicang minimal impact on overall performance. These correlaons
provide insights into team performance paerns, highlighng relaonships between various metrics
in the dataset.
23
5.1.6 Linear Regression Model:
Linear Regression is ideal for analysing the IPL team performance data due to several factors. The
connuous dependent variable (Win Rao) and mulple independent variables make it suitable for
exploring relaonships and predicng outcomes. It offers interpretable results through quanfiable
impacts of each predictor, crucial for sports analycs. The model's simplicity makes it effecve for
avoiding overfing (“see Appendix 3”). The high R-squared value (0.9969) indicates a strong fit.
Overall, Linear Regression provides a balance of predicve power, interpretability, and robustness for
this performance analysis.
5.1.7 Predicon Analysis
Table 5 - Team predicon with difference
Team Actual Win% Predicted Win% Difference
MI 55.17 55.21 +0.04
RCB 48.05 48.10 +0.05
KKR 51.98 51.95 -0.03
DC 45.63 45.68 +0.05
PK 45.53 45.87 +0.34
CSK 57.74 57.95 +0.21
RR
50.45
50.07
-
0.38
SRH
48.35
47.89
-
0.46
GT 62.22 61.94 -0.28
LSG 54.55 55.01 +0.46
1. Accuracy: The model's predicons are remarkably close to the actual values, with most
differences being less than 0.5 percentage points.
2. Consistency: The model performs well across different teams, showing no significant bias
towards over or under-predicon for specific teams.
3. Best Predicons:
Mumbai Indians: Only a 0.04 difference.
Kolkata Knight Riders: Only a 0.03 difference.
4. Largest Discrepancies:
Sunrisers Hyderabad: Underpredicted by 0.46.
Lucknow Super Giants: Overpredicted by 0.46.
5. Overall Trend: There is a slight tendency to overpredict for lower-performing teams and
underpredict for higher-performing teams, but the differences are minimal.
Model Performance
1. Mean Absolute Error (MAE): Approximately 0.23.
2. Root Mean Squared Error (RMSE): 0.2874.
These error metrics confirm the high accuracy of the predicons, with an average deviaon of less
than 0.3 percentage points.
24
5.1.8 Predicon Findings:
Figure 3 - Predicon Graph
Trends in the performance of IPL teams:
1. Top Performers:
Chennai Super Kings (CSK) and Gujarat Titans emerge as the top performers, with predicted
win percentages of 57.95% and 61.94% respecvely. This suggests these teams have strong
overall player stascs and team dynamics that contribute to their success.
2. Mid-Range Performers:
Teams like Mumbai Indians (55.21%), Lucknow Super Giants (55.01%), and Kolkata Knight
Riders (51.95%) fall into the mid-range of performance. Their predicted win percentages
suggest consistent but not dominant performance.
3. Lower Performers:
Teams such as Punjab Kings (45.87%), Delhi Capitals (45.68%), and Royal Challengers
Bengaluru (48.10%) have lower predicted win percentages, indicang potenal areas for
improvement in their team composion or strategy.
4. Consistency in Predicon:
The model shows remarkable consistency across different teams, with predicons closely
aligning with actual performance. This suggests the model has effecvely captured key
factors influencing team success in the IPL.
5. Narrow Performance Range:
The predicted win percentages range from about 45% to 62%, indicang a compeve
league where even lower-performing teams have a substanal chance of winning matches.
These trends suggest that the predicon model has effecvely captured the nuances of team
performance in the IPL, reflecng both the strengths of top teams and the areas for improvement for
others.
25
5.2 Rule-based scoring system combined with normalisaon and
weighted aggregaon
5.2.1 Need for Analysis
The rule-based scoring system in cricket provides a holisc player assessment by combining mulple
performance metrics, offering a comprehensive evaluaon beyond individual stascs (“see
Appendix 4”).
It employs role-based evaluaon, categorising players as Batsmen, Bowlers, or All-rounders,
enabling fair comparisons within similar roles. Normalised comparisons allow for unified
scoring across diverse player types, while weighted performance metrics reflect the strategic
priories of T20 cricket.
The system excels in idenfying all-round talent and ranking players within categories,
providing context-specific assessments that are valuable for team selecon and player
development. It supports data-driven decision-making, performance benchmarking across
seasons or teams, and talent idenficaon of potenally undervalued players.
This analysis can inform contract and aucon strategies, parcularly useful for leagues like
the IPL. It enhances fan engagement and is applicable to fantasy cricket leagues. The system
allows for connuous performance monitoring, easily updated with new match data.
Overall, this comprehensive approach provides an objecve basis for strategic planning, team
composion, and player valuaon, making it a valuable tool for cricket management and analysis.
5.2.2 Objecve
The primary objecve of this analysis is to create a comprehensive, data-driven evaluaon system for
cricket players in T20 leagues like the IPL. It aims to quanfy player performance across mulple
dimensions, providing a single, numerical score that reflects a player's overall value to their team.
By combining and normalising various performance metrics such as runs scored, bang average,
strike rate, wickets taken, and economy rate, the analysis offers a balanced assessment of player
contribuons. It disnguishes between different player roles (batsmen, bowlers, and all-rounders),
ensuring fair comparisons within each category while also recognizing the unique value of versale
players.
5.2.3 Data Overview
This dataset provides a comprehensive overview of player performance in the Indian Premier League
(IPL) from 2021 to 2024, capturing key metrics for both bang and bowling. The dimensions of data
are: 300 rows x 8 columns. The data encompasses:
1. Bang Performance:
"totalrunsscored": Aggregate runs scored by each player
"Total_bang_average": Average runs scored per dismissal
"bang_strike_rate": Runs scored per 100 balls faced
"totalballsfaced": Total number of deliveries faced
2. Bowling Performance:
26
"totalwickets": Number of wickets taken
"economyrate": Average runs conceded per over
"oversbowled_clean": Total overs bowled
The "striker" column likely idenfies individual players. This dataset allows for a mulfaceted analysis
of player contribuons, enabling comparisons between different aspects of the game. It captures
both volume (total runs, wickets) and efficiency (average, strike rate, economy) metrics, providing a
balanced view of player performance. Inclusion of data over mulple seasons (2021-2024) allows for
trend analysis, tracking player development, and assessing consistency over me.
5.2.4 Quantave Analysis
Rule-Based Categorisaon
The players are categorised into different roles (batsman, bowler, or all-rounder) based on
predefined rules. These rules are simple condional checks based on the player's performance
metrics:
Let R = Total runs scored, W = Total wickets, B = Total balls faced
𝐵𝑎𝑡𝑠𝑚𝑎𝑛: 𝑅 100 𝑊 2 𝐵 40
𝐵𝑜𝑤𝑙𝑒𝑟: 𝑊 > 5 𝑅 100
𝐴𝑙𝑙 𝑟𝑜𝑢𝑛𝑑𝑒𝑟: 𝑊 3 𝑅 100
𝑂𝑡ℎ𝑒𝑟 𝑃𝑙𝑎𝑦𝑒𝑟𝑠: ¬(𝐵𝑎𝑡𝑠𝑚𝑎𝑛 𝐵𝑜𝑤𝑙𝑒𝑟 𝐴𝑙𝑙 𝑟𝑜𝑢𝑛𝑑𝑒𝑟)
Batsman: More scored more than or equal to 100 runs and taken 2 or fewer wickets.
Bowler: More than 5 wickets and scored 100 or fewer runs.
All-rounder: Taken more than or equal to 3 wickets and scored 100 runs or more.
This categorisaon helps in determining which metrics are relevant for calculang the player's score.
Data Normalisaon:
Normalisaon is used to scale different performance metrics to a common range (0 to 1). This
ensures that metrics with different units and ranges can be compared and combined meaningfully.
Min-Max Normalisaon: For metrics where a higher value is beer (e.g., runs scored,
wickets taken), the formula used is:
𝑋_𝑛𝑜𝑟𝑚 = (𝑋 𝑋_𝑚𝑖𝑛) / (𝑋_𝑚𝑎𝑥 𝑋_𝑚𝑖𝑛)
Inverted Normalisaon for Economy Rate: Since a lower economy rate is beer, the
normalisaon is inverted:
𝐸_𝑛𝑜𝑟𝑚 = 1 (𝐸 𝐸_𝑚𝑖𝑛) / (𝐸_𝑚𝑎𝑥 𝐸_𝑚𝑖𝑛)
Weight Aggregaon
The system uses a weighted sum approach to aggregate mulple normalised performance metrics
into a single score. The weights are assigned differently for batsmen, bowlers, and all-rounders to
reflect the relave importance of different skills in T20 cricket.
27
For Batsmen:
𝑆𝑐𝑜𝑟𝑒 = (0.4 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑟𝑢𝑛𝑠 + 0.3 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 + 0.3
𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑠𝑡𝑟𝑖𝑘𝑒_𝑟𝑎𝑡𝑒) 100
For Bowlers:
𝑆𝑐𝑜𝑟𝑒 = (0.6 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑤𝑖𝑐𝑘𝑒𝑡𝑠 + 0.4 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑒𝑐𝑜𝑛𝑜𝑚𝑦_𝑟𝑎𝑡𝑒) 100
For All-rounders:
𝑆𝑐𝑜𝑟𝑒 = (𝐵𝑎𝑡𝑡𝑖𝑛𝑔 𝑆𝑐𝑜𝑟𝑒 + 𝐵𝑜𝑤𝑙𝑖𝑛𝑔 𝑆𝑐𝑜𝑟𝑒) / 2
This weighted aggregaon allows for:
Combining mulple performance aspects into a single, comprehensive score
Adjusng the importance of different metrics based on player role
Balancing volume (e.g., total runs) with efficiency (e.g., strike rate)
Ranking
Aer calculang the overall scores, players are ranked within their respecve categories (Batsman,
Bowler, All-rounder, Other). The ranking is done using the 'min' method.
For each player type PT {Batsman, Bowler, All-rounder, Other}:
𝑅𝑎𝑛𝑘(𝑝𝑙𝑎𝑦𝑒𝑟_𝑖) = |{𝑝𝑙𝑎𝑦𝑒𝑟_𝑗 𝑃𝑇 𝑂𝑆(𝑝𝑙𝑎𝑦𝑒𝑟_𝑗) > 𝑂𝑆(𝑝𝑙𝑎𝑦𝑒𝑟_𝑖)}| + 1
Where |•| denotes the cardinality of the set.
player_i is the player being ranked
PT is the set of all players of the same player type (e.g., all batsmen)
Score(player_x) is the overall score calculated for player x.
|{...}| denotes the cardinality (size) of the set
This set comprehension idenfies all players (player_j) within the same player type (PT) whose scores
are strictly greater than the score of the player being ranked (player_i).
This approach is used because:
It allows for fair comparison within roles, recognizing that different skills are valued for
different posions
It provides a clear hierarchy within each player type
The 'min' method ensures that players with equal scores receive the same rank, avoiding
arbitrary disncons
Overall Score Calculaon
The analysis has produced overall scores and rankings for IPL players across different roles (Batsmen,
Bowlers, and All-rounders). The scores reflect a comprehensive evaluaon of player performance,
considering mulple metrics normalised and weighted according to their importance in T20 cricket.
Table 6 - Overall score table
Rank Player Player Type Overall Score
1 YS Chahal Bowler 87.70
28
2 CV Varun Bowler 73.69
3 Mohammed Shami Bowler 72.78
1 F du Plessis Batsman 71.37
2 Shubman Gill Batsman 70.19
3 RD Gaikwad Batsman 69.86
1 Rashid Khan All-rounder 54.49
2 AD Russell All-rounder 51.74
3 RA Jadeja All-rounder 51.51
This table highlights the top-ranked players in each category (Bowlers, Batsmen, and All-rounders)
along with their overall scores. It provides a clear overview of the leading performers in the IPL based
on the analysis conducted from 2021 to 2024.
5.3 Random Forest model to predict the overall score
5.3.1 Need for Analysis:
The random forest regression model is an excellent choice for predicng a player's Overall_score.
This model can effecvely process the diverse set of features, including bang stascs ( total runs
scored, bang_strike_rate), bowling metrics (e.g., totalwickets, economyrate), and the crucial
Player_type category. By leveraging these varied inputs, the model can discern complex paerns that
contribute to a player's overall performance rang. The inclusion of normalised features allows for
fair comparison across different stascal scales.
5.3.2 Objecve of the Model
To develop and implement a random forest regression model that accurately predicts the
Overall_score for cricket players based on their comprehensive performance stascs, including
bang and bowling metrics. The model aims to provide a data-driven, unbiased evaluaon of player
performance that can be used for team selecon, player ranking, and strategic decision-making in
cricket management and analysis.
5.3.3 Data Overview
The dataset contains various cricket player stascs, including both bang and bowling metrics. Key
features include:
1. Bang stascs: totalrunsscored, Total_bang_average, bang_strike_rate, totalballsfaced
2. Bowling stascs: totalwickets, economyrate, overs bowled
3. Normalised versions of features: totalrunsscored_norm, Total_bang_average_norm,
bang_strike_rate_norm, totalwickets_norm, economyrate_norm.
4. Player_type: Categorizes players as Batsman or Bowler or All-rounder
5. Overall_score: The target variable.
6. Rank: Player ranking based on Overall_score.
29
The dataset includes players with diverse roles (batsmen and bowlers), allowing the model to learn
paerns specific to each player type. The presence of both raw and normalised features provides
flexibility in how the model interprets the data.
Figure 4 - Correlaon graph for Random Forest Model
This matrix suggests that bang performance has a stronger influence on Overall_score. The model
will likely give more weight to bang and bowling stascs, especially totalrunsscored and
Total_bang_average, and overall economy when predicng Overall_score.
5.3.4 Random Forest Regression Model
Evaluaon
1. Mean Squared Error (MSE): 3.365822488403902
This is a relavely low MSE, suggesng that on average, the model's predicons
deviate from the actual Overall_score by about √3.37 ≈ 1.84 points.
Given that the Overall_score likely spans a wider range, this level of error is quite
small.
2. R-squared Score: 0.9921912055446288 (99.22%)
This is an extremely high R-squared value, indicang this model explains about
99.22% of the variance in the Overall_score.
It suggests a very strong fit between model's predicons and the actual
Overall_scores.
The model demonstrates excellent predicve power, capturing almost all the variability in the
Overall_score based on the provided features.
With an R-squared of 99.22%, the model's predicons are very closely aligned with the actual scores,
leaving only about 0.78% of the variance unexplained.The low MSE further confirms the high
accuracy of the predicons.
5.3.5 Model Visualisaon
Actual vs Predicted Values Scaer Plot shows how well the predicted values align with the actual
values. Points closer to the red dashed line indicate beer predicons.
30
Figure 5 - Actual vs Predicted Graph for RF 1
Residuals Plot:
This plot helps idenfy any paerns in the residuals (predicon errors). Ideally, the residuals should
be randomly scaered around the horizontal line at y=0.
Figure 6 - Residual graph for RF 1
Predicon Error Distribuon
This histogram shows the distribuon of predicon errors. A distribuon cantered around zero and
symmetric indicates good model performance.
Figure 7 - Predicon error histogram for RF 1
31
5.4 Random Forest Model using RandomizedSearchCV
5.4.1 Objecve of the model
The objecve includes finding the opmal combinaon of hyperparameters for the random forest
model using RandomizedSearchCV. This aims to improve model performance beyond what's
achievable with default sengs.
5.4.2 Representaon of Random Forest model with RandomizedSearchCV
Let f
(x) be the Random Forest predicon for input x. The Random Forest model is an ensemble of
decision trees, and its predicon is the average of the predicons of all trees:
𝑓(𝑥) = 1/𝑀 ∑[𝑚 = 1 𝑡𝑜 𝑀] 𝑇_𝑚(𝑥)
Where:
M is the number of trees (n_esmators in the grid)
T_m(x) is the predicon of the m-th tree
Each tree T_m is constructed as follows:
1. Bootstrap sampling (if bootstrap=True):
Draw n samples with replacement from the training data, where n is the number of training
samples.
2. At each node of the tree:
a. Select k features randomly, where k is determined by max_features:
If max_features='sqrt', k = √p, where p is the total number of features
If max_features='auto', it's the same as 'sqrt' for regression
To Find the best split among the k features based on mean squared error reducon:
𝛥𝐼 = 𝐼(𝑝𝑎𝑟𝑒𝑛𝑡) (𝑛_𝑙𝑒𝑓𝑡/𝑛 𝐼(𝑙𝑒𝑓𝑡) + 𝑛_𝑟𝑖𝑔ℎ𝑡/𝑛 𝐼(𝑟𝑖𝑔ℎ𝑡))
where I is the impurity measure (variance for regression), and n is the number of samples.c. Split the
node if:
The number of samples is ≥ min_samples_split
The depth of the node is < max_depth (if specified)
3. Stop growing the tree when:
A node has ≤ min_samples_leaf samples
No further splits can improve the model
The tuned hyperparameters affect this process as follows:
n_esmators: Determines M
max_features: Affects k in step 2a
max_depth: Limits the depth in step 2c
min_samples_split: Used in step 2c
32
min_samples_leaf: Used in step 3
bootstrap: Determines whether step 1 is performed
The final predicon for a new input x is:
ŷ = 𝑓(𝑥) = 1/𝑀 ∑[𝑚 = 1 𝑡𝑜 𝑀] 𝑇_𝑚(𝑥)
RandomizedSearchCV will try different combinaons of these hyperparameters to minimize the
cross-validaon error, typically mean squared error for regression:
𝑀𝑆𝐸 = 1/𝑛 ∑[𝑖 = 1 𝑡𝑜 𝑛] (𝑦_𝑖 ŷ_𝑖)²
Where y_i are the true values and ŷ_i are the predicted values.
5.4.3 Evaluaon
1. Mean Squared Error (MSE): 5.564417967827715
This is slightly higher than previous model (which had an MSE of 3.37).
It indicates that, on average, predicons deviate from the actual Overall_score by
about √5.56 ≈ 2.36 points.
2. R-squared Score: 0.9879148111371704
This is sll an excellent R-squared value, indicang this model explains about 98.79%
of the variance in the Overall_score.
It's slightly lower than the previous model (which had an R-squared of 0.9922).
5.4.4 Model Visualisaon
Actual vs Predicted Analysis
Figure 8 - Actual vs predicted graph for RF 2
1. Strong Correlaon: The scaer plot should show a very strong linear relaonship between
actual and predicted values, with points clustering ghtly around the diagonal line (y=x).
2. Minimal Scaer: Given the high R-squared value of 0.9879, there is a lile scaer or
deviaon from the diagonal line.
3. Consistent Accuracy: The model's predicons should be consistently accurate across the
range of Overall_scores, without significant bias towards over- or under-predicon.
33
4. Small Deviaons: The MSE of 5.564 suggests that, on average, predicons deviate from
actual values by about √5.564 ≈ 2.36 points. This small deviaon might be barely noceable
in the plot.
5. Range Coverage: The plot should show that the model performs well across the enre range
of Overall_scores, from low to high values.
Distribuon of Predicon Errors Analysis
Figure 9 - Predicted error histogram for RF 2
1. Cantered around Zero: The histogram should be cantered very close to zero, indicang that
the model's predicons are unbiased. This means the model is equally likely to slightly
overpredict or underpredict.
2. Narrow Distribuon: Given the low MSE and high R-squared, should see a narrow
distribuon of errors. Most errors will be clustered ghtly around zero.
3. Symmetry: The distribuon should appear roughly symmetrical, resembling a normal
distribuon. This suggests that posive and negave errors are equally likely and of similar
magnitudes.
4. Smooth KDE Line: The Kernel Density Esmaon (KDE) line should show a smooth, bell-
shaped curve overlaying the histogram, further emphasizing the normal-like distribuon of
errors.
5.5 XG Boosng Method
5.5.1 Objecve of this model
1. Handling Complex Relaonships
XGBoost is parcularly effecve in capturing complex, non-linear relaonships between features. In
the context of cricket, the relaonship between various player stascs such as bang average,
strike rate, total runs scored, and wickets taken and the Overall_score is likely to be intricate.
2. Feature Importance
The model provides built-in feature importance scores, which are valuable for idenfying the most
significant cricket stascs that contribute to a player's Overall_score.
3. Mixed Data Types
34
The dataset includes both connuous variables (e.g., bang average, economy rate) and categorical
variables (e.g., Player_type). XGBoost effecvely handles both types of data, allowing for a
comprehensive analysis of player performance without extensive preprocessing.
4. Flexibility in Loss Funcons
XGBoost allows for the customizaon of loss funcons, which can be beneficial for tailoring the
model to specific nuances in the calculaon of Overall_score. This flexibility enhances the model's
applicability to various performance metrics. This structured explanaon provides a comprehensive
raonale for the use of the XGBoost model in predicng player Overall_score, suitable for publicaon
or formal reporng.
5.5.2 Representaon of the Model
1. Data Preparaon: This XGBoost model employs a comprehensive approach. It starts with
data preparaon, scaling features using StandardScaler. The model formulaon uses an
ensemble of decision trees, with each tree contribung to the final predicon. The objecve
funcon balances predicon accuracy and model complexity through regularizaon.
𝑋 =
{
𝑥
}{
𝑖 = 1
}
, 𝑤ℎ𝑒𝑟𝑒 𝑥_𝑖
ℝ^𝑝 (𝑝 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑎𝑓𝑡𝑒𝑟 𝑑𝑟𝑜𝑝𝑝𝑖𝑛𝑔 ′𝑈𝑛𝑛𝑎𝑚𝑒𝑑: 0′, ′𝑠𝑡𝑟𝑖𝑘𝑒𝑟′, ′𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒′, 𝑅𝑎𝑛𝑘′)
𝑦 = {𝑦_𝑖}{𝑖 = 1}^𝑛, 𝑤ℎ𝑒𝑟𝑒 𝑦_𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒
2. Feature Transformaon:
𝑋_𝑠𝑐𝑎𝑙𝑒𝑑 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑆𝑐𝑎𝑙𝑒𝑟(𝑋)
𝑋_𝑠𝑐𝑎𝑙𝑒𝑑_𝑖 = (𝑥_𝑖 𝜇_𝑖) / 𝜎_𝑖, 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑖
3. Model Formulaon:
ŷ_𝑖 = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑥_𝑠𝑐𝑎𝑙𝑒𝑑_𝑖)
Where:
K is the number of trees (n_esmators in param_grid)
f_k is the k-th tree in the ensemble
4. Objecve Funcon:
𝑂𝑏𝑗
(
𝜃
)
= 𝛴
(
𝑖 = 1 𝑡𝑜 𝑛
)(
𝑦
ŷ
)
+ 𝛴
(
𝑘 = 1 𝑡𝑜 𝐾
)
𝛺
(
𝑓
)
𝑊ℎ𝑒𝑟𝑒 𝛺(𝑓) = 𝛾𝑇 + 1/2 𝜆||𝑤||^2 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚
5. Tree Building Process: The tree-building process involves calculang gradients and hessians,
then selecng opmal splits based on gain. Leaf weights are calculated to minimize the
objecve funcon. Hyperparameter opmizaon is performed using RandomizedSearchCV,
exploring various combinaons of tree numbers, depth, learning rate, and sampling
parameters.
For each tree f_k:
𝑎. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡𝑠: 𝑔_𝑖 = 𝜕(𝑦_𝑖 ŷ_𝑖^(𝑡 1))^2 / 𝜕ŷ_𝑖^(𝑡 1)
𝑏. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 ℎ𝑒𝑠𝑠𝑖𝑎𝑛𝑠: ℎ_𝑖 = 𝜕^2(𝑦_𝑖 ŷ_𝑖^(𝑡 1))^2 / 𝜕ŷ_𝑖^(𝑡 1)^2
𝑐. 𝐹𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 𝑠𝑝𝑙𝑖𝑡:
𝐺𝑎𝑖𝑛 = 1/2 [ (𝛴𝑔_𝐿)^2 / (𝛴ℎ_𝐿 + 𝜆) + (𝛴𝑔_𝑅)^2 / (𝛴ℎ_𝑅 + 𝜆) (𝛴𝑔)^2 / (𝛴ℎ
+ 𝜆) ] 𝛾
𝑑. 𝐶ℎ𝑜𝑜𝑠𝑒 𝑠𝑝𝑙𝑖𝑡 𝑤𝑖𝑡ℎ 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑔𝑎𝑖𝑛
𝑒. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑙𝑒𝑎𝑓 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝑤_𝑗 = −𝛴(𝑖 𝐼_𝑗) 𝑔_𝑖 / (𝛴(𝑖 𝐼_𝑗) ℎ_𝑖 + 𝜆)
6. Hyperparameter Opmizaon:
Using RandomizedSearchCV to opmize over:
35
𝑛_𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠 {100, 200, 300, 400, 500}
𝑚𝑎𝑥_𝑑𝑒𝑝𝑡ℎ {3, 4, 5, 6, 7, 8}
𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 {0.01, 0.05, 0.1, 0.2}
𝑠𝑢𝑏𝑠𝑎𝑚𝑝𝑙𝑒 {0.6, 0.7, 0.8, 0.9, 1.0}
𝑐𝑜𝑙𝑠𝑎𝑚𝑝𝑙𝑒_𝑏𝑦𝑡𝑟𝑒𝑒 {0.6, 0.7, 0.8, 0.9, 1.0}
𝑚𝑖𝑛_𝑐ℎ𝑖𝑙𝑑_𝑤𝑒𝑖𝑔ℎ𝑡 {1, 2, 3, 4, 5}
7. Final Predicon:
For a new scaled input x_new:
ŷ_𝑛𝑒𝑤 = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑥_𝑛𝑒𝑤)
8. Model Evaluaon:
𝑀𝑆𝐸 = 1/𝑛 𝛴(𝑖 = 1 𝑡𝑜 𝑛) (𝑦_𝑖 ŷ_𝑖)^2 = 3.2368
𝑅^2 = 1 𝛴(𝑦_𝑖 ŷ_𝑖)^2 / 𝛴(𝑦_𝑖 ȳ)^2 = 0.9930
Final predicons are made by summing contribuons from all trees. The model's performance is
evaluated using Mean Squared Error (3.2368) and R-squared (0.9930), indicang high accuracy in
predicng Overall_score. This approach allows for complex, non-linear relaonships between cricket
stascs and overall performance to be captured effecvely.
5.5.3 Model Visualisaon
Actual vs predicted analysis
Figure 10 - Actual Vs predicted graph for XGBoost model 1
This visualisaon, combined with the low Mean Squared Error of 3.2368, demonstrates the XGBoost
model's exceponal ability to capture the underlying paerns in the cricket performance data and
accurately predict player Overall_scores.
The Actual vs Predicted scaer plot demonstrates the XGBoost model's high accuracy in predicng
player Overall_scores. Points closely align with the diagonal, reflecng the strong R-squared value
(0.9930). The ght clustering and absence of significant deviaons indicate consistent performance
across all score ranges, validang the model's robustness and predicve power in cricket
performance analysis.
36
Distribuon of Predicon Errors Analysis
Figure 11 - Predicon error histogram for XGBoost model 1
The error histogram shows a narrow, symmetrical distribuon centred at zero, indicang unbiased
and accurate predicons. The high central peak and short tails confirm low error rates, aligning with
the model's strong R-squared (0.9930) and low MSE (3.2368). This validates the XGBoost model's
effecveness in cricket performance analysis.
5.6 Enhanced XG Boosng model
The enhanced XGBoost model, incorporang feature engineering and extensive hyperparameter
tuning, demonstrates a robust performance in predicng player Overall_scores.
5.6.1 Representaon of the model
Feature Engineering:
𝑋_𝑖 = [𝑥_1, . . . , 𝑥_𝑝, 𝑟𝑢𝑛𝑠_𝑝𝑒𝑟_𝑏𝑎𝑙𝑙, 𝑤𝑖𝑐𝑘𝑒𝑡𝑠_𝑝𝑒𝑟_𝑜𝑣𝑒𝑟]
Where:
𝑟𝑢𝑛𝑠_𝑝𝑒𝑟_𝑏𝑎𝑙𝑙 = 𝑡𝑜𝑡𝑎𝑙𝑟𝑢𝑛𝑠𝑠𝑐𝑜𝑟𝑒𝑑 / 𝑡𝑜𝑡𝑎𝑙𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑
𝑤𝑖𝑐𝑘𝑒𝑡𝑠_𝑝𝑒𝑟_𝑜𝑣𝑒𝑟 = 𝑡𝑜𝑡𝑎𝑙𝑤𝑖𝑐𝑘𝑒𝑡𝑠 / 𝑜𝑣𝑒𝑟𝑠𝑏𝑜𝑤𝑙𝑒𝑑_𝑐𝑙𝑒𝑎𝑛
Model Structure:
𝑓(𝑋) = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑋)
Where K is the number of trees (n_esmators)
Tree Structure:
𝑓_𝑘(𝑋) = 𝑤_𝑞(𝑋), 𝑤ℎ𝑒𝑟𝑒 𝑞: ℝ^𝑑 {1,2, . . . , 𝑇}, 𝑤 ^𝑇
T is the number of leaves in the tree
Objecve Funcon:
𝑂𝑏𝑗(𝜃) = 𝛴(𝑖 = 1 𝑡𝑜 𝑛) 𝑙(𝑦_𝑖, ŷ_𝑖) + 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝛺(𝑓_𝑘)
Where:
𝑙(𝑦_𝑖, ŷ_𝑖) 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑡𝑦𝑝𝑖𝑐𝑎𝑙𝑙𝑦 𝑀𝑆𝐸 𝑓𝑜𝑟 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛)
𝛺(𝑓) = 𝛾𝑇 + 1/2 𝜆||𝑤||^2 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚
37
Update Rule:
𝑓_𝑚(𝑥) = 𝑓_𝑚 1(𝑥) + 𝜂 ℎ_𝑚(𝑥)
Where η is the learning rate and h_m is the weak learner
Hyperparameter Space:
θ {n_esmators, max_depth, learning_rate, subsample, colsample_bytree, min_child_weight,
gamma, reg_alpha, reg_lambda}
Feature Selecon:
𝑋_𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 = 𝑆(𝑋), where S is the selecon funcon based on feature importance
Final Predicon:
ŷ = 𝑓_𝑓𝑖𝑛𝑎𝑙(𝑋_𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑)
Model Evaluaon:
𝑀𝑆𝐸 = 1/𝑛 𝛴(𝑖 = 1 𝑡𝑜 𝑛) (𝑦_𝑖 ŷ_𝑖)^2 = 85.5695
𝑅^2 = 1 𝛴(𝑦_𝑖 ŷ_𝑖)^2 / 𝛴(𝑦_𝑖 ȳ)^2 = 0.8142
The enhanced XGBoost model for predicng cricket player Overall_scores combine feature
engineering, ensemble decision trees, and advanced opmizaon. It creates efficiency metrics,
ulizes regularizaon, and employs RandomizedSearchCV for hyperparameter tuning. Feature
selecon focuses on impacul variables. With an R-squared of 0.8142 and MSE of 85.5695, the
model explains 81.42% of Overall_scores variance, offering a robust tool for player evaluaon and
team strategy formulaon.
This model performance is not great when compared with other models, hence less priority is given
to the model.
5.7 Support Vector Regression Model
5.7.1 Objecve of the model
The Support Vector Regression (SVR) for predicng cricket player Overall_scores aim to develop a
robust and accurate model capable of handling complex, non-linear relaonships within
performance data (Smola and Schölkopf, 2004). This approach offers several key advantages:
1. Accurate predicon of Overall_scores using a subset of crical performance metrics.
2. Idenficaon of non-linear paerns in cricket performance data that may be overlooked by
simpler models.
3. Opmizaon of the balance between model complexity and predicon accuracy through
hyperparameter tuning (Cherkassky and Ma, 2004).
SVR is parcularly well-suited for this dataset and predicon task due to its:
Ability to capture non-linear relaonships using the RBF kernel (Drucker et al., 1997).
Robustness to outliers.
Regularizaon capabilies through the 'C' parameter (James et al., 2013).
Precision control via the 'epsilon' parameter.
38
Versality in kernel selecon,
5.7.2 Representaon of the model
Feature Selecon and Scaling:
𝑋_𝑠𝑐𝑎𝑙𝑒𝑑 = (𝑋 𝜇) / 𝜎
Where X is the feature matrix.
This step standardizes the selected features, ensuring they're on the same scale.
SVR Objecve Funcon:
𝑚𝑖𝑛_{𝑤, 𝑏, 𝜉, 𝜉 ∗} 1/2 ||𝑤||^2 + 𝐶 ∑_{𝑖 = 1}^𝑚 (𝜉_𝑖 + 𝜉_𝑖 ∗)
𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜:
𝑦_𝑖 (𝑤^𝑇 𝜑(𝑥_𝑖) + 𝑏) 𝜀 + 𝜉_𝑖
(𝑤^𝑇 𝜑(𝑥_𝑖) + 𝑏) 𝑦_𝑖 𝜀 + 𝜉_𝑖
𝜉_𝑖, 𝜉_𝑖 ∗ ≥ 0
𝑊ℎ𝑒𝑟𝑒 𝑦 𝑖𝑠 𝑡ℎ𝑒 ′𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒′ 𝑣𝑒𝑐𝑡𝑜𝑟.
RBF Kernel:
𝐾(𝑥_𝑖, 𝑥_𝑗) = 𝑒𝑥𝑝(−𝛾 ||𝑥_𝑖 𝑥_𝑗||^2)
The RBF kernel was selected as the best performing kernel in grid search.
Hyperparameter Opmizaon:
(𝐶 , 𝜀 , 𝑘𝑒𝑟𝑛𝑒𝑙 ∗) = 𝑎𝑟𝑔𝑚𝑖𝑛_{𝐶, 𝜀, 𝑘𝑒𝑟𝑛𝑒𝑙} 𝐶𝑉_𝑒𝑟𝑟𝑜𝑟(𝑆𝑉𝑅(𝐶, 𝜀, 𝑘𝑒𝑟𝑛𝑒𝑙))
𝑊ℎ𝑒𝑟𝑒:
𝐶 {0.1, 1, 10, 100}
𝜀 {0.01, 0.1, 0.5, 1}
𝑘𝑒𝑟𝑛𝑒𝑙 {′𝑟𝑏𝑓′, ′𝑝𝑜𝑙𝑦′, ′𝑠𝑖𝑔𝑚𝑜𝑖𝑑′}
The grid search found the opmal parameters: C = 100, ε = 0.01, kernel = 'rbf'.
Predicon Funcon:
𝑓(𝑥) = ∑_{𝑖 = 1}^𝑚 (𝛼_𝑖 𝛼_𝑖 ) 𝐾(𝑥_𝑖, 𝑥) + 𝑏
Model Evaluaon:
𝑀𝑆𝐸 = 1/𝑛 ∑_{𝑖 = 1}^𝑛 (𝑦_𝑖 𝑓(𝑥_𝑖))^2 = 16.932580949419034
𝑅^2 = 1 ∑(𝑦_𝑖 𝑓(𝑥_𝑖))^2 / ∑(𝑦_𝑖 ȳ)^2 = 0.9632246463346164
The SVR model demonstrates strong performance in predicng cricket player Overall_scores, as
evidenced by its high R-squared value (0.9632) and low MSE (16.9326). Compared to other models,
SVR's ability to capture non-linear relaonships through its RBF kernel and its robustness to outliers
make it parcularly well-suited for this complex sports data. The model's opmized hyperparameters
further enhance its predicve accuracy.
39
5.7.3 Model Visualisaon
Figure 12 - Actual vs predicted graph for SVR
The Actual vs Predicted plot for the SVR model illustrates its high predicve accuracy. The scaer
points closely align with the red diagonal line, indicang strong agreement between actual and
predicted Overall_scores. This visual representaon corroborates the model's high R-squared value
(0.9632) and low MSE (16.9326), demonstrang the SVR's effecveness in capturing the underlying
paerns in cricket performance data for accurate player evaluaon.
5.7.3 Distribuon of Predicon errors
Figure 13 - Predicon error histogram for SVR
The histogram of predicon errors shows a symmetric distribuon cantered near zero, indicang
unbiased predicons. The narrow spread suggests small errors, confirming the model's high
accuracy. This visualisaon aligns with the low MSE (16.9326) and high R-squared (0.9632) values.
5.8 Machine Learning Models and their accuracy results
5.8.1 Evaluaon
Table 7 - All models evaluaon metrics
Model Name MSE R2 Accuracy in %
Linear Regression Model (used only
for team analysis)
0.08258 0.9969 99.6%
Random Forest Model 1 3.3658 0.9922 99.22%
Random Forest Model 2 5.5644 0.9879 98.79%
XG Boosng Model 1 3.2368 0.9930 99.30%
40
XG Boo
s
ng Model 2
85.570
0.8142
81.42
Support Vector Regression Model
16.933
0.9632
96.32%
The XG Boosng Model 1 appears to be the most suitable for predicon. It demonstrates the best
overall performance with:
1. Lowest Mean Squared Error (MSE) of 3.2368
2. Highest R-squared value of 0.9930
3. Highest accuracy of 99.30%
XG Boosng Model 1 outperforms all others, with the lowest MSE and highest R-squared, explaining
99.30% of target variable variance. Random Forest models follow closely. Support Vector Regression
performs well but less accurately. XG Boosng Model 2 underperforms significantly. XG Boosng
Model 1 is recommended for cricket player performance predicon.
5.8.2 Fine-Tuning
The fine-tuning process led to noceable improvements in error metrics (lower MSE) and
explanatory power (higher R-squared) for both top models. This indicates that the fine-tuning
successfully opmized the models to beer fit the specific paerns in player performance data. The
marginal gains in these already high-performing models suggest that fine-tuning helped capture
subtle nuances in the data, potenally leading to more precise player analysis and predicons.
Table 8 - Evaluaon metrics aer fine-tuning
Model Name MSE R2 Accuracy in %
Random Forest Model 1 2.636 0.9942 99.42%
Random Forest Model 2
5.5644
0.9879
98.79%
XG Boo
s
ng Model 1
2.49
0.9946
99.
46
%
XG Boosng Model 2 85.570 0.8142 81.42
Support Vector Regression Model 16.933 0.9632 96.32%
Fine-tuning had a posive impact on the overall performance metrics of top models:
1. Random Forest Model 1:
R-squared increased from 0.9922 to 0.9942
Accuracy improved from 99.22% to 99.42%
2. XG Boosng Model 1:
MSE improved from 3.2368 to 2.49
R-squared increased from 0.9930 to 0.9946
Accuracy improved to 99.46%
41
5.8.3 Model Tesng
Sample data
Table 9 - Sample data for model tesng
Name
Total
Runs
Batting
Average
Batting
Strike Rate
Total
Wickets
Economy
Rate
Balls
Batted
Balls
Bowled
Jos Buttler 391 43.44 158.62 0 0 246 0
Tymal Mills 15 7.5 125 16 8.2 20 120
Will Jacks 230 32.86 145.57 3 7.8 158 30
Liam
Livingstone 185 26.43 152.89 5 8.5 140 40
Reece Topley 20 10 111.11 11 7.9 25 90
Dawid Malan 278 39.71 140.4 0 0 180 0
Sam Curran 160 22.86 133.33 8 8.7 130 70
Tom Abell 145 24.17 128.32 2 9.2 120 20
Adil Rashid 35 11.67 106.06 10 7.5 30 80
Harry Brook 238 47.6 172.46 0 0 150 0
The data presented in the table comes from the excing 2023 “Hundred” tournament held in
England (ECB, 2023). 10 players are selected at random to highlight their performance metrics,
including total runs scored, bang averages, strike rates, total wickets taken, economy rates, balls
baed, and balls bowled (ESPNcricinfo, 2023).
5.8.4 Random Forest Model 1 Predicon
Table 10 - Random Forest Model 1 predicon results
Player Predicted Overall score
Jos Buttler 68.40700315
Tymal Mills 74.50774429
Will Jacks 51.35719062
Liam Livingstone 48.25745156
Reece Topley 70.82660776
Dawid Malan 57.29453392
Sam Curran 44.74423243
Tom Abell 39.63716106
Adil Rashid 63.22290778
Harry Brook 51.47323937
Random Forest predicons show a range of scores from 39.64 to 74.51, with an average of around
57. The model seems to predict higher scores for bowlers like Tymal Mills (74.51) and Reece Topley
(70.83), while predicng lower scores for some batsmen like Tom Abell (39.64).
5.8.5 XG Boost Model 1 Predicon
Table 11 - XG Boost Model 1 Predicon results
Player Predicted Overall score
Jos Buttler 70.39937
42
Tymal Mills 72.09699
Will Jacks 45.95546
Liam Livingstone 44.136646
Reece Topley 67.81441
Dawid Malan 61.69961
Sam Curran 46.41379
Tom Abell 33.50635
Adil Rashid 48.7163
Harry Brook 57.242996
XGBoost predicons range from 33.51 to 72.10, averaging around 54. This model also predicts high
scores for bowlers, with Tymal Mills at 72.10 and Reece Topley at 67.81. However, it predicts lower
scores for some players like Tom Abell (33.51) and Will Jacks (45.96).
5.8.6 Performance Distribuon Curves
Figure 14 - Performance distribuon curve for RF 1 and XGBoost
The performance distribuon curves show the spread and frequency of predicted and actual scores
for the cricket players.
Random Forest Model Distribuon:
The curve for the Random Forest model predicons likely shows a relavely widespread, with scores
ranging from about 39 to 75. The peak of the curve might be around the mid-50s, indicang that the
model frequently predicts scores in this range. There may be a slight right skew, suggesng the
model tends to predict higher scores more oen than lower ones.
XGBoost Model Distribuon:
The XGBoost model's distribuon curve probably shows a similar range to the Random Forest model,
from about 33 to 72. However, the shape of the curve might be different, possibly with a sharper
peak or mulple smaller peaks, reflecng the model's tendency to make more extreme predicons in
some cases.
43
Actual Scores Distribuon:
The curve for actual scores likely shows the widest spread, ranging from about 35 to 90. This curve
might have a flaer shape compared to the model predicons, indicang more variability in real-
world performance.
5.8.7 ROC curves
ROC curves visually represent the trade-off between true posive rate (sensivity) and false posive
rate (1 - specificity) as the classificaon threshold changes. Using ROC curves, comprehensively
assess and compare models' abilies to disnguish between different levels of cricket player
performance (Brownlee, 2018). The AUC provides a single scalar value summarizing the model's
performance, making it easy to quickly compare model.
𝐴𝑈𝐶 = 𝑇𝑃𝑅 𝑑(𝐹𝑃𝑅)
𝑊ℎ𝑒𝑟𝑒, 𝑇𝑃𝑅 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒
𝑑(𝐹𝑃𝑅) = 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑜𝑓 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒
The AUC can be interpreted as the probability that the model ranks a random posive example
higher than a random negave example, which is parcularly relevant for ranking player
performance (Hajian-Tilaki, 2013).
Figure 15 - ROC Curve graph
Based on the AUC scores, XGBoost (AUC = 0.92) outperforms Random Forest (AUC = 0.79) in
predicng cricket player performance.
XGBoost's higher AUC indicates superior ability to disnguish between high and low performers. This
model demonstrates a 92% probability of correctly ranking players, making it more reliable for
performance predicons and team selecon decisions in cricket analycs.
44
6. Players Overall Performance score for KKR and DC
6.1 Kolkata Knight Riders Current Players Analysis
Using the “rule-based scoring system”, the overall scores for these players are calculated.
Figure 16 - Bar chart for KKR current players
This chart shows the performance score for the players who played in the 2024 season. The data is
calculated using rule-based scoring system and data taken from 2021 to 2024. Top performers for
KKR are Rinku Singh, Varun Chakaravarthy, Venkatesh Iyer, Shreyas Iyer, Andre Russell, Phil Salt, Sunil
Narine.
Bang Dominance
KKR's bang lineup has been formidable, with several players making substanal contribuons:
1. Sunil Narine has emerged as the team's top run-geer, accumulang 488 runs at a strike rate
of 180.74.
2. Phil Salt has been a revelaon at the top of the order, amassing 435 runs with a blistering
strike rate of 182.00.
3. Venkatesh Iyer has shown remarkable consistency, scoring 370 runs at an average of 46.25.
4. Shreyas Iyer has also been a key player, scoring 351 runs at an average of 39.00, further
solidifying the middle order. Ramandeep Singh has shown promise with 125 runs in 10
matches, including a highest score of 35 and a strike rate of 205.88.
All-Round Excellence
The team's all-round capabilies have been a key factor in their success:
Sunil Narine has excelled as an all-rounder, complemenng his bang prowess with 17
wickets at an economical rate of 6.69 runs per over.
Andre Russell connues to be a vital asset, contribung 222 runs at a strike rate of 185 while
also claiming 19 wickets.
Bowling strength
KKR's bowling aack has been equally impressive:
0
20
40
60
80
100
KKR current players overall_score
45
1. Varun Chakaravarthy leads the wicket-taking charts with 21 scalps.
2. Andre Russell has provided crucial breakthroughs, securing 19 wickets.
3. Mitchell Starc, despite a higher economy rate, has taken 17 wickets, including a 4-wicket
haul.
4. Harshit Rana has added depth to the bowling lineup with 19 wickets. Vaibhav Arora has
made a significant impact, taking 11 wickets in 10 matches at an average of 25.09 and an
economy of 8.24
Emerging Talent
Angkrish Raghuvanshi has shown promise as a future prospect, scoring 163 runs in 10 matches at a
strike rate of 155.23.
Team Balance
The team must carefully weigh retaining star performers against nurturing emerging talents, while
also considering team chemistry and long-term strategy. This intricate decision-making process is
crical for KKR's future success and compeveness in the league.
6.2 Delhi Capitals Current Players Analysis
Figure 17 - Bar chart for DC current players
This bar graph data shows overall scores for Delhi Capitals players based on a rule-based scoring
system for the 2024 IPL season. Rishabh Pant leads with the highest score of 82.45, followed closely
by Jake Fraser-McGurk at 79.97. Key players like Khaleel Ahmed, Kuldeep Yadav, and Mukesh Kumar
also scored well, indicang their significant contribuons. The scores reflect a combinaon of
bang, bowling, and all-round performances throughout the season. Lower scores for some players
suggest either limited opportunies or underperformance, while a few players received no score,
due to lack of playing me or poor performance.
Bang Strength
1. Rishabh Pant led the bang charts with 446 runs at an impressive average of 40.55 and a
strike rate of 155.4.
0
10
20
30
40
50
60
70
80
90
Overall_score
46
2. Tristan Stubbs showed excellent form, scoring 378 runs at a high average of 54 and a strike
rate of 190.9.
3. Jake Fraser-McGurk emerged as an explosive batsman, scoring 330 runs at a strike rate of
234.04.
4. Abishek Porel contributed significantly with 327 runs at a strike rate of 159.51.
Bowling Strength:
1. Kuldeep Yadav was the standout bowler, taking 16 wickets at an average of 23.37 and an
economy of 8.69.
2. Mukesh Kumar impressed with 17 wickets at an average of 21.64.
3. Axar Patel contributed with 11 wickets with a good economy of 7.65.
4. Khaleel Ahmed took 17 wickets with a decent economy of 9.58.
All-Round Performance:
1. Axar Patel showcased his all-round abilies, scoring 235 runs and took 11 wickets.
2. Tristan Stubbs, primarily a batsman, also took 3 wickets.
Emerging Talent:
1. Jake Fraser-McGurk stood out as a promising talent with his explosive bang.
2. Abishek Porel showed potenal as a consistent run-scorer.
3. Rasikh Salam took 9 wickets in 8 matches.
The team's strength clearly lies in its bang, with mulple players capable of scoring quickly. The
bowling unit, led by Kuldeep Yadav and supported by Mukesh Kumar and Axar Patel, also performed
well. The emergence of young talents like Fraser-McGurk and Porel adds depth to the squad.
Key factors in DC's decision-making process include:
1. Rishabh Pant's leadership and bang prowess
2. The all-round abilies of Axar Patel
3. Kuldeep Yadav's consistent spin bowling performances
4. The explosive bang potenal of Jake Fraser-McGurk
5. Tristan Stubbs' impressive bang in the previous season
7. Conclusion
7.1 Squad Opmizaon
Note: As of August 2024, there is no new informaon regarding player retenon rules or the Right to
Match (RTM) policy for IPL 2025. This analysis is based on the 2024 rules. Addionally, the model
predicons used here will not impact future decisions or changes.
47
7.2 KKR Squad Opmizaon and picking best squad
7.2.1 Current players Overall score Predicon
Table 12 - KKR current players predicted overall score
Player Predicted Overall score
Andre Russell 62.681908
Angkrish Raghuvanshi 43.662876
Anukul Roy 0.01888789
Harshit Rana 74.76428
Manish Pandey 18.542007
Mitchell Starc 66.87246
Nitish Rana 15.08972
Phil Salt 68.24854
Rahmanullah Gurbaz 18.410284
Ramandeep Singh 47.748146
Rinku Singh 40.41537
Shreyas Iyer 62.031254
Sunil Narine 65.13953
Vaibhav Arora 54.331146
Varun Chakaravarthy 74.76428
Venkatesh Iyer 62.110737
Based on the XGBoost model predicons and team dynamics, here's analysis of Kolkata Knight Riders'
(KKR) potenal retenon strategy for IPL 2025: Core Retenons:
1. Sunil Narine (65.14)
2. Andre Russell (62.68)
3. Shreyas Iyer (62.03) - Captain
4. Varun Chakaravarthy (74.76)
KKR's retenon strategy likely priorizes a blend of consistent performers and recent standouts.
Narine and Russell, with their high predicted scores and long-standing contribuons to the franchise,
are prime candidates. Shreyas Iyer, as the current captain and a solid middle-order batsman, provides
leadership connuity. Varun Chakaravarthy's top predicted score and impressive bowling
performances make him an asset for the team's bowling aack. Right to Match (RTM) Opons:
1. Venkatesh Iyer (62.11)
2. Rinku Singh (40.42)
The RTM card could be used on Venkatesh Iyer, given his versality and strong predicted
performance. Rinku Singh, despite a lower predicted score, has shown potenal as a finisher and
could be a strategic RTM pick based on his past performances and future potenal. Difficult
Decisions:
Phil Salt (68.25) and Mitchell Starc (66.87), despite their high predicted scores and
contribuons, may be released due to the limit on foreign player retenons.
48
Nish Rana (15.09), though injured in 2024 and having a low predicted score, might sll be
considered for RTM based on his past performances and experience with the team.
Potenal Releases:
Rahmanullah Gurbaz (18.41)
Ramandeep Singh (47.75)
Vaibhav Arora (54.33)
Angkrish Raghuvanshi (43.66)
These players, while showing promise with their predicted scores, may not fit into the retenon
strategy given the limited slots available and the need to maintain a balanced squad. This approach
balances maintaining the core team with strategic decisions for future success. The management
faces tough choices, parcularly regarding foreign players and emerging talents, as they aim to build
a compeve squad for IPL 2025. The predicted scores provide valuable insight, but the final
decisions will also consider factors such as team chemistry, player roles, and long-term strategy.
7.2.2 Potenal Squad Opons for KKR
The squad of KKR in 2024 contains 23 players with 8 overseas players, with 6 uncapped players.
Including 9 batsman with 3 wicket keepers, 4 all-rounders and 10 bowlers. Out of these players 16
players have contributed for teams’ success. Hence the squad is suggested based on the team
possible retenon, possible link to the team and available players and predicted data.
Note: The predicon is based on data is taken from 2021 to 2024
Suggested Squad Opons for KKR IPL 2025
Figure 18 - Bar chart for squad opons KKR
Overseas opons include wicketkeepers Phil Salt and Ryan Rickelton, alongside batsmen Ben Ducke
and Steve Smith. All-rounders Andre Russell, Sunil Narine, Chris Woakes, and David Willey offer
versality. The bowling aack features Josh Hazlewood, Mark Wood, Mitchell Starc, and Jofra Archer,
with emerging talents like Atkinson and Pos.
49
Domesc choices highlight captain Shreyas Iyer, with K.S. Bharat as wicketkeeper. Bang strength
comes from Venkatesh Iyer, Rinku Singh, Nish Rana, Mayank Agarwal, Devdu Padikkal, Rahul
Tripathi, and Manish Pandey. All-rounders Washington Sundar, Krishappa Gowtham, and Shardul
Thakur provide balance.
The bowling lineup includes promising pacers Harshit Rana, Karthik Tyagi, Shivam Mavi, and Mohsin
Khan, alongside experienced opons like Sandeep Warrier. Spin opons feature Varun Chakravarthy
and Mayank Markande (“see Appendix 5”)
This suggested squad opons for KKR in IPL 2025 are based on predicted performance data, potenal
retenon strategies, and team dynamics. The focus is on creang a balanced and compeve team
that leverages both internaonal experience and domesc talent, ensuring KKR remains a formidable
force in the league.
7.3 Delhi Capitals Squad Opmizaon and picking best squad
7.3.1 Current players Overall score Predicon
Table 13 - DC current players predicted overall scores
Player Predicted Overall score
Mukesh Kumar 79.07398
Khaleel Ahmed 75.189224
Rishabh Pant 70.06949
Ishant Sharma 63.152454
Jake Fraser - McGurk 61.636425
Abishek Porel 60.6897
Tristan Stubbs 58.139206
Kuldeep Yadav 57.190998
Axar Patel 56.546555
Anrich Nortje 52.347652
Prithvi Shaw 45.95865
Shai Hope 44.87039
David Warner 43.445335
Rasikh Salam 40.407764
Mitchell Marsh 10.7273855
Gulbadin Naib 1.8736305
Ricky Bhui 0.2515321
Kumar Kushagra 0.096818216
Sumit Kumar 0.006852619
Jhye Richardson -0.22722892
Lalit Yadav -0.26193994
Lizaad Williams -0.60568804
Based on the XGBoost model predicons and team dynamics, here's analysis of Delhi Capitals' (DC)
potenal retenon strategy for IPL 2025:
Core Retenons:
Rishabh Pant (70.07) - Captain and wicketkeeper-batsman
50
Axar Patel (56.55) - All-rounder and consistent performer
Jake Fraser-McGurk (61.64) - Explosive opener
Kuldeep Yadav (57.19) - Key spinner
Right to Match (RTM) Opons:
Mukesh Kumar (79.07) - Highest predicted score
Khaleel Ahmed (75.19) - Second-highest predicted score
Tristan Stubbs (58.14) - Potenal Player
Abishek Porel (60.68) - Young talent
This revised strategy aligns beer with the search results and acknowledges Stubbs' potenal. The
inclusion of Stubbs in the RTM list allows DC to potenally retain a player who has shown exceponal
finishing skills and could be a long-term asset.
Difficult Decisions:
Ishant Sharma (63.15), Anrich Nortje (52.35), and Prithvi Shaw (45.96) might sll be
released to create room for new strategies.
Potenal Releases:
David Warner (43.45)
Shai Hope (44.87)
Mitchell Marsh (10.73)
This approach balances retaining key performers, securing young talent with high potenal, and
creang opportunies for significant changes in the squad. It addresses DC's need to move from an
average team to a top contender by making strategic decisions that combine experience (Pant, Axar)
with emerging talents (Fraser-McGurk, Stubbs, Porel).
7.3.2 Potenal Squad Formaon for Delhi Capitals
The squad of DC in 2024 contains 27 players with 8 overseas players, with 11 uncapped players.
Including 12 batsman with 6 wicket keepers, 5 all-rounders and 10 bowlers. Out of these players 22
players have contributed for team and out of 22, 7 players performed very poor. Hence the squad is
suggested based on the team possible retenon, possible link to the team and available players and
predicted data.
Note: The suggeson is based on data is taken from 2021 to 2024
Figure 19 - Bar chart for squad opons DC
51
Overseas opons feature Jake Fraser-McGurk, an explosive batsman; Reeza Hendricks, a consistent
T20 performer; and Tristan Stubbs, a dynamic middle-order batsman. Rassie van der Dussen brings
experience, while all-rounders like Daryl Mitchell, Jason Holder, Jimmy Neesham, Ben Stokes, and
Romario Shepherd add versality. Fast bowlers include Adam Milne, Ma Henry, Mark Wood, and
emerging talent Joshua Lile.
Domesc choices highlight captain Rishabh Pant alongside wicketkeepers Abishek Porel, Anuj Rawat,
and N. Jagadeesan. Bang strength comes from Devdu Padikkal, Mayank Agarwal, and domesc
star Sarfraz Khan. All-rounders like Axar Patel and Shardul Thakur provide balance.
The bowling aack features experienced pacer Bhuvneshwar Kumar, along with le-arm pacer
Khaleel Ahmed and emerging talents like Mukesh Kumar, Vaibhav Arora, and Kuldeep Sen. Spin
opons include Kuldeep Yadav and emerging spinner Hrithik Shokeen (“see Appendix 6”).
8. Findings and Insights of Players and their
performance scores
Figure 20 - Average overall score chart
This bar chart displays that bowlers have the highest average overall score, significantly higher than
both batsmen and all-rounders. Interesngly, batsmen and all-rounders have very similar average
scores, with batsmen only slightly outperforming all-rounders by 0.01 points.
8.1 Distribuon of Overall Scores by Player Type
Figure 21 - Distribuon of overall score by player type
52
1. The distribuon shows that bowlers tend to perform beer in terms of Overall score
compared to the other two player types.
2. The similarity between batsmen and all-rounders' scores suggests that all-rounders are not
necessarily at a disadvantage in terms of overall performance despite having to excel in both
bang and bowling.
3. The highest individual Overall_score menoned in the data is for YS Chahal, a bowler, with
87.70, which aligns with the higher average for bowlers.
8.2 Players with more than 300 runs with strike rate more than 130
Figure 22 - Scaer plot for Batsman
This scaer plot illustrates the analysis of high-performing batsmen reveals notable players with
impressive stascs. Jos Buler an extraordinary strike rate of 158.62, showcasing his explosive
bang style. Other key players include F du Plessis with 2257 runs at a strike rate of 141.59, and
Shubman Gill, who has 2229 runs with a strike rate of 136.83. Addionally, T Head, Abhishek Sharma,
and H Klassen also demonstrate strong performances, with strike rates exceeding 140. Fraser and
Salt contribute to the aggressive bang lineup, emphasizing the importance of scoring quickly. These
players exemplify a combinaon of high run totals and aggressive strike rates, making them valuable
assets in compeve cricket, capable of changing the course of a match with their bang prowess.
53
8.3 Top All-rounders analysis
Figure 23 - Scaer plot for top all-rounders
This plot reveals several high-performing all-rounders in T20 cricket. Players like Rashid Khan, Andre
Russell, Sunil Narine and Ravindra Jadeja stand out with impressive overall scores above 50. These
players excel in both bang and bowling aspects of the game.
Rashid Khan leads the pack with an overall score of 54.49, showcasing his exceponal bowling skills
combined with useful bang contribuons. Andre Russell and Ravindra Jadeja follow closely, known
for their explosive bang and crucial wicket-taking abilies.
Other notable performers include Harshal Patel, and Axar Patel, all scoring above 48. These players
demonstrate the valuable combinaon of aggressive bang (high strike rates) and effecve bowling
(wicket-taking ability and economy).
The scaer plot effecvely visualises the balance between run-scoring and wicket-taking abilies of
these all-rounders, with the added dimension of strike rate represented by point size.
8.4 Top Economical Bowlers Analysis
Figure 24 - Scaer plot for top economical bowlers
54
The analysis of top bowlers in T20 cricket reveals a group of exceponal performers who combine
wicket-taking prowess with economical bowling. YS Chahal leads the pack with an impressive 84
wickets and an overall score of 87.70, showcasing his dominance in the format. The list features a
mix of spin and pace bowlers, including standouts like CV Varun, Mohammed Shami, and Jasprit
Bumrah. Notably, Bumrah boasts the best economy rate at 7.38.
8.5 Density distribuon of overall scores:
Figure 25 - Density distribuon by player types.
The violin plot illustrates overall score distribuons by player type. Batsmen likely show a wider
spread with higher median scores, reflecng diverse roles and run-scoring focus. Bowlers may display
a more compact distribuon with lower median scores, indicang consistent, specialized
performances. All-rounders potenally exhibit a broad range with median scores between batsmen
and bowlers, represenng their dual contribuons. This visualizaon effecvely captures the disnct
performance characteriscs of each player type in cricket.
8.6 Performance metrics of All-rounders
Figure 26 - Performance metrics of top 5 all-rounders
55
The radar chart effecvely compares top all-rounders' strengths and weaknesses. Rashid Khan excels
in bowling with high wicket-taking ability and good economy. Andre Russell shines as an aggressive
batsman with useful bowling skills. Ravindra Jadeja offers a balanced performance with strong
bowling economy and consistent bang. Sunil Narine is primarily a bowler with excellent economy
and the ability to score quick runs. Harshal Patel stands out as a bowling all-rounder with strong
wicket-taking ability and moderate bang contribuons. This visualizaon provides an intuive
understanding of each player's performance profile across mulple cricket aspects.
8.7 Research Conclusion
The conclusion of the research project emphasizes the significance of strategic squad opmizaon
for the Kolkata Knight Riders (KKR) and Delhi Capitals (DC) in the context of the upcoming IPL mega
aucon in 2025.
Key findings indicate that KKR's successful championship strategies in 2024 stemmed from effecve
player retenon and ulizaon, while DC's potenal remains underulized despite having a strong
young core. The study ulized various machine learning models to analyse player performance and
predict outcomes, revealing crical insights into team dynamics and performance metrics.
The research highlights the importance of quantave analysis in sports, offering a framework for
teams to enhance decision-making processes regarding player selecon and strategic planning. By
focusing on the unique challenges faced by both teams, the study provides aconable
recommendaons for opmizing squad composion, parcularly for DC in leveraging their young
talent effecvely.
Overall, this research contributes to the broader field of quantave sports analycs, oering
valuable insights not only for KKR and DC but also for other T20 franchises globally. It underscores
the evolving nature of team management in cricket, advocang for data-driven approaches to
improve performance and compeveness in the IPL.
9. Recommendaons
Based on the comprehensive research on opmizing squad composions for IPL teams, parcularly
focusing on Kolkata Knight Riders (KKR) and Delhi Capitals (DC), here are some key
recommendaons:
1. Embrace data-driven decision making: The IPL is evolving rapidly. Teams should leverage the
power of analycs to make smarter choices in player selecon and strategy formulaon. This
approach can provide valuable insights that might not be apparent to the naked eye.
2. Nurture young talent strategically: While it's tempng to always go for established stars,
don't underesmate the potenal of young players. Develop a system to idenfy and groom
emerging talents, giving them the right opportunies to shine. This is especially crucial for
teams like DC, which has a wealth of young talent waing to be unleashed.
3. Balance squad wisely: Cricket is a game of balance, and so is team composion. Aim for a mix
of experienced veterans and energec youngsters, aggressive hiers and steady anchors,
pace bowlers and cray spinners. This diversity can help teams adapt to various match
situaons and condions.
56
4. Invest in mul-dimensional players: In T20 cricket. Players who can contribute to mulple
areas – be it bang, bowling, or fielding – can be game-changers. They provide captains with
more opons and can turn matches on their head.
5. Stay adaptable: The IPL is a long tournament with changing condions. Teams that can
quickly adapt their strategies based on performance data and match situaons oen come
out on top. Flexibility in approach can be a key differenator.
6. Opmize the aucon strategy: With the mega aucon coming up, use predicve models to
inform bidding decisions. Focus on players who not only have good historical stats but also
show potenal for growth and fit well within the team's overall strategy.
7. Foster a culture of connuous improvement: Encourage players and coaching staff to
regularly review performance data and work on areas of improvement. Create an
environment where everyone is commied to geng beer every day.
8. Look beyond the boundaries: While focusing on the IPL, keep an eye on performances in
other T20 leagues worldwide. This global perspecve can help in idenfying undervalued
players who might become match-winners for a team.
While data and analycs are powerful tools, cricket is sll a human game. The most successful teams
will be those that can blend analycal insights with the intangibles of team spirit, leadership, and on-
field chemistry. By implemenng these recommendaons, teams can posion themselves for success
in the highly compeve world of the Cricket.
10. References
1. Amala Kaviya, V.S., Mishra, A.S. and Valarmathi, B. (2020) 'Comprehensive Data Analysis and
Predicon on IPL using Machine Learning Algorithms', Internaonal Journal on Emerging
Technologies, 11(3), pp. 218-228. (Accessed: 15 August 2024).
2. Bajaj, A. (2023) 'Predicon of Player Performance for IPL and Analyzing the Aributes
Involved, Using Explainable AI', MSc Research Project, Naonal College of Ireland. Available
at: hps://norma.ncirl.ie/6564/1/ayushibajaj.pdf (Accessed: 15 August 2024).
3. Berrar, D., Lopes, P. and Dubitzky, W. (2019). Incorporang domain knowledge in machine
learning for soccer outcome predicon. Machine Learning, 108(1), pp.97-126.
4. Board of Control for Cricket in India (2023) Indian Premier League. Available at:
hps://www.iplt20.com/ (Accessed: 15 August 2024).
5. Brownlee, J., 2018. How to Use ROC Curves and Precision-Recall Curves for Classificaon in
Python. [online] Machine Learning Mastery. Available at:
hps://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-
classificaon-in-python/ [Accessed 25 August 2024].
6. Bunker, R.P. and Thabtah, F., 2019. A machine learning framework for sport result predicon.
Applied compung and informacs, 15(1), pp.27-33.
7. Caya, O. and Bourdon, A., 2016. A framework of value creaon from business intelligence
and analycs in compeve sports. In 2016 49th Hawaii Internaonal Conference on System
Sciences (HICSS) (pp. 1061-1071). IEEE.
8. Cervone, D., D'Amour, A., Bornn, L. and Goldsberry, K., 2016. A mulresoluon stochasc
process model for predicng basketball possession outcomes. Journal of the American
Stascal Associaon, 111(514), pp.585-599.
57
9. Cherkassky, V. and Ma, Y., 2004. Praccal selecon of SVM parameters and noise esmaon
for SVM regression. Neural Networks, 17(1), pp.113-126.
10. Colwell, D., Jones, B. and Gille, J. (1991) “75.7 A Markov Chain in Cricket (MCC!),” The
Mathemacal Gazee, 75(472), pp. 183–185. Available at: hps://doi.org/10.2307/3620249.
11. Duff & Phelps (2022) IPL Brand Valuaon Report 2022. Mumbai: Duff & Phelps.
12. Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J. and Vapnik, V., 1997. Support vector
regression machines. Advances in Neural Informaon Processing Systems, 9, pp.155-161.
13. Economic Times (2023) 'IPL becomes decacorn, valuaon soars 75% since 2020', 27
December.
14. ESPN Cricinfo (2023) Indian Premier League. Available at:
hps://www.espncricinfo.com/series/indian-premier-league-2023-1345038 (Accessed: 15
August 2024).
15. ESPNcricinfo (2024). How KKR shaped themselves into the awesome class of 2024. [online]
Available at: hps://www.espncricinfo.com/story/ipl-2024-final-kkr-vs-srh-how-kkr-shaped-
themselves-into-the-awesome-class-of-2024-1435320 [Accessed 15 Aug. 2024].
16. Fried, G. and Mumcu, C. eds., 2016. Sport analycs: A data-driven approach to sport
business and management. Taylor & Francis.
17. Hajian-Tilaki, K., 2013. Receiver Operang Characterisc (ROC) Curve Analysis for Medical
Diagnosc Test Evaluaon. Caspian Journal of Internal Medicine, 4(2), pp.627-635.
18. Hubáček, O., Šourek, G. and Železný, F. (2019). Exploing sports-beng market using
machine learning. Internaonal Journal of Forecasng, 35(2), pp.783-796.
19. Ishi, M., Pal, D.J., Pal, D.N. and Pal, D.V. (2022) 'Winner Predicon in One Day
Internaonal Cricket Matches Using Machine Learning Framework: An Ensemble Approach',
Indian Journal of Computer Science and Engineering, 13, pp. 628–641.
20. James, G., Wien, D., Hase, T. and Tibshirani, R., 2013. An introducon to stascal
learning. New York: Springer.
21. JioCinema (2023) 'IPL 2023 Final Sets Global Streaming Record', Press Release, 30 May.
22. Kadapa, S. (2013) 'How Sustainable is the Strategy of the Indian Premier League-IPL? A
Crical Review of 10 Key Issues That Impact the IPL Strategy', Internaonal Journal of
Scienfic and Research Publicaons, 3.
23. Kemper, C. and Breuer, C., 2016. How efficient is dynamic pricing for sport events? Designing
a dynamic pricing model for Bayern Munich. Internaonal Journal of Sport Finance, 11(1),
pp.4-25.
24. Liu, G., Luo, Y., Schulte, O. and Kharrat, T., 2020. Deep soccer analycs: learning an acon-
value funcon for evaluang soccer players. Data Mining and Knowledge Discovery, 34(5),
pp.1531-1559.
25. Loland, S., 2018. Performance-enhancing drugs, sport, and the ideal of natural athlec
performance. The American Journal of Bioethics, 18(6), pp.8-15.
26. McHale, I.G., Scarf, P.A. and Folker, D.E., 2012. On the development of a soccer player
performance rang system for the English Premier League. Interfaces, 42(4), pp.339-351.
27. Memmert, D. and Raabe, D., 2018. Data analycs in football: Posional data collecon,
modelling and analysis. Routledge.
28. Ofoghi, B., Zeleznikow, J., MacMahon, C. and Raab, M., 2013. Data mining in elite sports: a
review and a framework. Measurement in Physical Educaon and Exercise Science, 17(3),
pp.171-186.
29. Peacock, R.H. (1950) “2124. The New Ball in Cricket,” The Mathemacal Gazee, 34(307), pp.
58–60. Available at: hps://doi.org/10.2307/3610894.
58
30. Prakash, A., Ghosh, A. and Guha, B. (2019) 'Player Ranking System for IPL Using Machine
Learning', Internaonal Journal of Sports Analycs, 5(1), pp. 1-12.
31. Rodrigues, M., Vinay, S., Naik, N., Deshpande, S. and Samant, S. (2019). Data visualizaon
and toss related analysis of IPL teams and batsmen performances. [online] ResearchGate.
32. Rommers, N., Rössler, R., Goossens, L., Vaeyens, R., Lenoir, M., Witvrouw, E. and D'Hondt, E.,
2020. Risk of acute and overuse injuries in youth elite soccer players: Body size and growth
maer. Journal of Science and Medicine in Sport, 23(3), pp.246-251.
33. Rossi, A., Pappalardo, L., Cina, P., Iaia, F.M., Fernàndez, J. and Medina, D., 2018. Eecve
injury forecasng in soccer with GPS training data and machine learning. PloS one, 13(7),
p.e0201264.
34. Shah, J. (2023) The IPL Story: Cricket, Commerce and Glamour. New Delhi: Rupa Publicaons.
35. Shah, R., Ghosh, A. and Guha, B. (2016) 'IPL 2016: A Comprehensive Analysis of the
Performance of Teams', Internaonal Journal of Sports Analycs, 2(1), pp. 1-15.
36. Seshadri, D.R., Drummond, C., Craker, J., Rowboom, J.R. and Voos, J.E., 2019. Wearable
devices for sports: New integrated technologies allow coaches, physicians, and trainers to
beer understand the physical demands of athletes in real me. IEEE pulse, 10(1), pp.38-43.
37. Smola, A.J. and Schölkopf, B., 2004. A tutorial on support vector regression. Stascs and
Compung, 14(3), pp.199-222.
38. Sportstar (2023) 'IPL media rights sold for Rs 48,390 crore: Disney Star retains TV rights,
Viacom18 bags digital package', The Hindu, 14 June.
39. Thomas, G., Gade, R., Moeslund, T.B., Carr, P. and Hilton, A., 2017. Computer vision for
sports: Current applicaons and research topics. Computer Vision and Image Understanding,
159, pp.3-18.
40. IPL Governing Council (2024) IPL 2024: Playing Condions. Mumbai: BCCI.
41. Delhi Capitals (2023) Official Website. Available at: hps://www.delhicapitals.in/ (Accessed:
15 August 2024).
42. Gujarat Titans (2023) Official Website. Available at: hps://www.gujaratansipl.com/
(Accessed: 15 August 2024).
43. Kolkata Knight Riders (2023) Official Website. Available at: hps://www.kkr.in/ (Accessed: 15
August 2024).
44. Lucknow Super Giants (2023) Official Website. Available at:
hps://www.lucknowsupergiants.in/ (Accessed: 15 August 2024).
45. Mumbai Indians (2023) Official Website. Available at: hps://www.mumbaiindians.com/
(Accessed: 15 August 2024).
46. Punjab Kings (2023) Official Website. Available at: hps://www.punjabkingsipl.in/ (Accessed:
15 August 2024).
47. Rajasthan Royals (2023) Official Website. Available at: hps://www.rajasthanroyals.com/
(Accessed: 15 August 2024).
48. Royal Challengers Bangalore (2023) Official Website. Available at:
hps://www.royalchallengers.com/ (Accessed: 15 August 2024).
49. Sunrisers Hyderabad (2023) Official Website. Available at:
hps://www.sunrisershyderabad.in/ (Accessed: 15 August 2024).
59
Appendices
Appendix 1 – About IPL Teams
The Indian Premier League (IPL) currently features ten franchise teams, each represenng different
cies or states across India (Board of Control for Cricket in India, 2023):
Figure 277 - Chennai Super Kings Logo
Chennai Super Kings (CSK): Known for their consistency and led by the iconic MS Dhoni, CSK has won
four IPL tles (ESPN Cricinfo, 2023).
Figure 28 - Delhi Capitals Logo
Delhi Capitals (DC): Formerly Delhi Daredevils, this team rebranded in 2018 and has been building a
strong young core of Indian talent (Delhi Capitals, 2023).
Figure 29 - Gujarat Titans Logo
Gujarat Titans (GT): One of the newest addions to the IPL, joining in 2022, they made an immediate
impact by winning the tle in their debut season (Gujarat Titans, 2023).
Figure 30 - Kolkata Knight Riders Logo
Kolkata Knight Riders (KKR): Co-owned by Bollywood star Shah Rukh Khan, KKR has won two IPL tles
and has a massive fan following (Kolkata Knight Riders, 2023).
60
Figure 31 - Lucknow Super Giants Logo
Lucknow Super Giants (LSG): Another new franchise that joined in 2022, they've quickly established
themselves as strong contenders (Lucknow Super Giants, 2023).
Figure 32 - Mumbai Indians Logo
Mumbai Indians (MI): The most successful IPL team with five tles, MI is known for its star-studded
lineup and ability to nurture young talent (Mumbai Indians, 2023).
Figure 33 - Punjab Kings Logo
Punjab Kings (PBKS): Formerly Kings XI Punjab, this team rebranded in 2021 and is sll seeking its
first IPL tle (Punjab Kings, 2023).
Figure 34 - Rajasthan Royals Logo
Rajasthan Royals (RR): The inaugural IPL champions in 2008, RR is known for its ability to unearth and
develop lesser-known players (Rajasthan Royals, 2023).
61
Figure 35 - Royal Challengers Bengaluru logo
Royal Challengers Bangalore (RCB): Despite boasng some of cricket's biggest names, RCB is sll
chasing their first IPL tle (Royal Challengers Bangalore, 2023).
Figure 36 - Sunrisers Hyderabad
Sunrisers Hyderabad (SRH): Known for their strong bowling aacks, SRH won the tle in 2016 and
has consistently been a playoff contender (Sunrisers Hyderabad, 2023).
Appendix 2 – Team Performance
Mumbai Indians (MI):
Mumbai Indians have played the most matches (261) and won the most games (144) in IPL. Their
success is evident in their 5 IPL tles, the highest among all teams. They've reached the finals 6 mes
and made it to the playoffs 11 mes, showcasing their consistency (Board of Control for Cricket in
India, 2023).
Royal Challengers Bangalore (RCB):
Despite playing 256 matches and winning 123, RCB has never won an IPL tle. They've reached the
finals 3 mes and made the playoffs 9 mes. Their inability to convert playoff appearances into tles
has been a point of discussion among cricket analysts (ESPN Cricinfo, 2023).
Kolkata Knight Riders (KKR):
KKR has played 252 matches, winning 131. They've clinched 3 IPL tles and reached the finals 4
mes. With 7 playoff appearances, they've shown consistency in reaching the later stages of the
tournament (Kolkata Knight Riders, 2023).
Delhi Capitals (DC):
Formerly Delhi Daredevils, DC has played 252 matches but won only 115. They've never won an IPL
tle and have reached the finals only once. With 6 playoff appearances, they've struggled to make a
significant impact in the tournament's history (Delhi Capitals, 2023).
Punjab Kings (PK):
PK has played 246 matches, winning 112. They've never won an IPL tle and have reached the finals
only once. With just 2 playoff appearances, they've been one of the less successful teams in the IPL
(Punjab Kings, 2023).
62
Chennai Super Kings (CSK):
Despite playing fewer matches (239) than some other teams, CSK has been incredibly successful.
They've won 138 matches and 5 IPL tles, equaling MI's record. With 10 final appearances and 13
playoff qualificaons, they're considered one of the most consistent teams in IPL history (Chennai
Super Kings, 2023).
Rajasthan Royals (RR):
RR has played 222 matches, winning 112. They won the inaugural IPL in 2008 but haven't replicated
that success since. With 2 final appearances and 5 playoff qualificaons, they've had mixed fortunes
in the tournament (Rajasthan Royals, 2023).
Sunrisers Hyderabad (SRH):
SRH entered the IPL later than the original teams but has made a significant impact. They've played
182 matches, winning 88. They've won 1 IPL tle and reached the finals 3 mes, with 6 playo
appearances (Sunrisers Hyderabad, 2023).
Gujarat Titans (GT):
As one of the newest teams, GT has played only 45 matches but has already won 28 of them. They
won the IPL in their debut season in 2022 and reached the finals again in 2023, showing immediate
success (Gujarat Titans, 2023).
Lucknow Super Giants (LSG):
Another new entrant, LSG, has played 44 matches and won 24. While they haven't reached a final
yet, they've made it to the playoffs in both their seasons, indicang a strong start to their IPL journey
(Lucknow Super Giants, 2023).
Appendix 3 – Reason for using Linear Regression
1. Connuous Dependent Variable:
Dependent variable, Win_Rao, is a connuous variable, which is suitable for linear
regression analysis.
2. Mulple Independent Variables:
The dataset includes mulple potenal predictors (e.g., Played, Won, Lost, N/R, lost_Rao,
Titles, Finalists, Playoff), making mulple linear regression an appropriate choice.
3. Relaonship Exploraon:
Linear regression can help idenfy which factors have the strongest influence on a team's
win rao, providing valuable insights into team performance.
4. Performance Predicon:
The model can be used to predict a team's expected win rao based on other performance
metrics, which could be useful for team management and strategy planning.
5. Quanfiable Impact:
Linear regression provides coefficients that quanfy the impact of each independent variable
on the win rao, allowing for a clear understanding of each factor's importance.
63
6. Model Interpretability:
In sports analycs, it's oen crucial to have models that can be easily interpreted by coaches,
managers, and other stakeholders. Linear regression provides this interpretability.
7. Baseline Model:
Even if more complex models might be explored later, linear regression serves as an excellent
baseline model to compare against more sophiscated approaches.
8. Assumpon Tesng:
The dataset allows for tesng various assumpons of linear regression (like linearity,
homoscedascity, and mulcollinearity), which can provide insights into the data's structure.
9. Small Dataset Handling:
With a relavely small dataset (10 observaons), linear regression can sll provide reliable
results, whereas more complex models might overfit.
10. Performance Metrics:
The high R-squared value (0.9969) suggests that linear regression is capturing a significant
amount of variance in the win rao, indicang a good fit for this data.
Appendix 4 – Reason for using Rule Based Scoring System
The rule-based scoring system for cricket aim to quanfy a player's overall performance by assigning
points based on various aspects of their game.
1. Holisc Player Assessment:
The code combines mulple performance metrics (runs, average, strike rate, wickets,
economy rate) to create a comprehensive evaluaon of each player.
This approach provides a more complete picture of a player's contribuon than
individual stascs alone.
2. Role-Based Evaluaon:
By categorizing players as Batsmen, Bowlers, or All-rounders, the analysis
acknowledges the different roles within a cricket team.
This allows for fair comparisons between players with similar roles and
responsibilies.
3. Normalised Comparisons:
Normalising metrics enables fair comparisons across different scales (e.g., comparing
runs scored with wickets taken).
This is essenal for creang a unified scoring system that can be applied across
diverse player types.
4. Weighted Performance Metrics:
Assigning weights to different metrics (e.g., giving more importance to total runs for
batsmen or wickets for bowlers) reflects the relave importance of various aspects
of performance.
This nuanced approach aligns the analysis with the strategic priories of T20 cricket.
64
5. Idenfying All-Round Talent:
The system's ability to evaluate all-rounders separately recognizes the unique value
of players who contribute significantly in both bang and bowling.
6. Ranking Within Categories:
Ranking players within their specific roles (batsman, bowler, all-rounder) provides
context-specific performance assessments.
This is valuable for team selecon, strategy formulaon, and player development.
7. Data-Driven Decision Making:
The analysis provides an objecve, data-driven basis for decisions related to team
composion, player retenon, and strategic planning.
8. Performance Benchmarking:
By creang a standardized scoring system, teams can benchmark player
performances across seasons or compare players from different teams.
9. Talent Idenficaon:
This system can help idenfy undervalued players or rising talents who might not
stand out in tradional stascs but perform well in this comprehensive analysis.
10. Contract and Aucon Strategies:
For leagues like the IPL, this analysis can inform bidding strategies during player
aucons and help in determining player values for contracts.
11. Fan Engagement and Fantasy Sports:
Providing a single, comprehensive score for each player enhances fan engagement
and can be parcularly useful for fantasy cricket leagues.
12. Connuous Performance Monitoring:
This type of analysis can be easily updated with new match data, allowing for
connuous monitoring of player performance throughout a season or across
mulple seasons.
Appendix 4 – Dataset variables
1. match_id: This column contains a unique idenfier for each match, allowing for easy
referencing and data management. It helps disnguish between different matches in the
dataset.
2. season: This indicates the specific IPL season during which the match took place. It typically
refers to the year of the tournament, providing context for the data.
3. start_date: This column records the date on which the match commenced. It is essenal for
temporal analysis, allowing researchers to study trends over different seasons or specific
me periods.
65
4. venue: This specifies the locaon where the match was held. Knowing the venue is
important for analysing home advantage, pitch condions, and crowd influence on the game.
5. innings: This indicates whether the data pertains to the first or second innings of the match.
In cricket, each team bats for one or two innings, and this column helps differenate
between them.
6. ball: This column records the specific ball number within the over. It provides granular detail
about the match, allowing for in-depth analysis of individual deliveries.
7. bang_team: This specifies the team that is currently bang during the delivery. It is crucial
for understanding team performance and strategies.
8. bowling_team: This indicates the team that is currently bowling. This informaon is essenal
for analysing bowling strategies and effecveness.
9. striker: This column names the batsman facing the current delivery. It is important for
analysing individual player performance and contribuons.
10. non_striker: This indicates the batsman at the other end of the pitch who is not facing the
current delivery. It provides context for partnerships and running between the wickets.
11. extras: This column records the total extra runs scored on that delivery, which can include
wides, no-balls, and other extras. It is important for assessing the impact of extras on the
match outcome.
12. wides: This specifies the number of wide balls bowled during that delivery. Wides contribute
to the extras and can affect the match's flow and scoring.
13. noballs: This indicates the number of no-balls bowled on that delivery. No-balls also
contribute to extras and can lead to free hits, impacng scoring opportunies.
14. byes: This column records the number of byes scored on that delivery, which occur when the
ball passes the wicketkeeper without touching the bat or body of the batsman.
15. legbyes: This specifies the number of leg byes scored, which occur when the ball hits the
batsman's body (excluding the hand) and runs are taken.
16. penalty: This column records any penalty runs awarded to the bang or bowling team,
which can occur due to infracons by the fielding team.
17. wicket_type: This indicates the type of dismissal if a wicket fell on that delivery (e.g., bowled,
caught, LBW). It is crucial for analysing how wickets are taken.
18. player_dismissed: This column names the player who was dismissed on that delivery,
providing insight into key moments in the match.
19. other_wicket_type: This specifies any secondary wicket type, if applicable, for cases where
mulple dismissals occur in a single delivery (e.g., run out).
20. other_player_dismissed: This column names any other player who was dismissed on that
delivery, providing addional context for significant events.
66
Appendix 5 – Potenal squad Opons for KKR
Overseas Players Opons
Wicketkeepers:
Phil Salt: An explosive batsman who can change the game.
Rickelton: A solid opon with potenal.
Jamie Smith: A young talent for future growth.
Batsmen:
Ben Ducke: A dynamic player with a strong T20 record.
Steve Smith: An experienced batsman known for his technique and leadership.
All-rounders:
Andre Russell: A key all-rounder with match-winning capabilies.
Sunil Narine: A long-me KKR asset with both bang and bowling skills.
Chris Woakes: Adds versality and experience to the squad.
David Willey: Offers depth and balance as an all-rounder.
Bowlers:
Josh Hazlewood: Known for his precision and effecveness.
Mark Wood: Brings express pace and aggression.
Mitchell Starc: A premier fast bowler with the ability to take wickets.
Atkinson: A developing talent with potenal.
Pos: An emerging bowler to consider.
Jofra Archer: A high-impact player with a proven track record.
Domesc Players Opons
Wicketkeepers:
K.S. Bharat: A reliable opon for the wicketkeeping role.
Batsmen:
Shreyas Iyer: The captain and a crucial middle-order batsman.
Venkatesh Iyer: Offers flexibility in the bang lineup.
Rinku Singh: A promising finisher with a bright future.
Nish Rana: Experienced and capable of anchoring the innings.
Mayank Agarwal: Adds stability and experience.
Devdu Padikkal: A young talent with strong potenal.
67
Rahul Tripathi: Known for his aggressive bang style.
Manish Pandey: Brings experience and depth to the bang order.
All-rounders:
Washington Sundar: Valuable for his bowling and bang skills.
Krishappa Gowtham: Adds depth and versality.
Shardul Thakur: Known for his ability to contribute in mulple ways.
Bowlers:
Harshit Rana: An emerging fast bowler with promise.
Varun Chakravarthy: A key spinner with wicket-taking ability.
Karthik Tyagi: Young and talented fast bowler.
Shivam Mavi: Known for his pace and skill.
Sakariya: Adds depth to the pace aack.
Mohsin Khan: A promising young bowler.
Sandeep Warrier: Experienced and reliable.
Mayank Markande: Spin opon with experience.
Appendix 6 – Potenal squad Opons for DC
Overseas Players Opons
Jake Fraser-McGurk: A young talent with explosive bang capabilies, adding depth to the
bang lineup.
Reeza Hendricks: A consistent performer in T20 cricket, known for his ability to anchor
innings and score quickly.
Ryan Rickelton: An emerging batsman with a strong domesc record, capable of playing
aggressive innings.
Tristan Stubbs: A dynamic batsman with power-hing skills, ideal for the middle order.
Rassie van der Dussen: A seasoned internaonal player known for his technique and ability
to play under pressure.
Daryl Mitchell: A versale all-rounder who can contribute with both bat and ball, enhancing
team balance.
Jason Holder: An experienced all-rounder with a proven track record in T20s, offering both
bowling and bang depth.
Jimmy Neesham: A dynamic all-rounder known for his big-hing ability and useful seam
bowling.
Ben Stokes: A match-winner with exceponal all-round skills, capable of changing games
single-handedly.
68
Romario Shepherd: A powerful all-rounder who can contribute significantly with the bat and
provide pace bowling opons.
Adam Milne: A fast bowler with express pace, known for his wicket-taking ability in T20
cricket.
Ma Henry: A skilled bowler with experience in internaonal cricket, effecve in both
powerplays and death overs.
Mark Wood: An aggressive fast bowler known for his pace and ability to take key wickets.
Joshua Lile: An emerging talent with potenal as a le-arm fast bowler.
Domesc Players Opons
Rishabh Pant: The captain and wicketkeeper, known for his explosive bang and game-
changing abilies.
Abishek Porel: A promising wicketkeeper-batsman, providing depth in the lower order.
Anuj Rawat: A young wicketkeeper with potenal, looking to make an impact in the IPL.
N. Jagadeesan: A reliable wicketkeeper-batsman with a solid domesc record.
Devdu Padikkal: A talented batsman with a strong ability to score quickly, adding firepower
to the top order.
Mayank Agarwal: An experienced opener known for his solid technique and ability to build
innings.
Sarfraz Khan: A domesc star with a strong record, capable of performing under pressure.
Axar Patel: A key all-rounder known for his bowling and handy bang, providing balance to
the team.
Shardul Thakur: An all-rounder who can contribute with both bat and ball, known for his
wicket-taking ability.
Khaleel Ahmed: A le-arm pacer with experience in T20 cricket, effecve in the powerplay.
Mukesh Kumar: An emerging fast bowler with potenal, looking to establish himself in the
IPL.
Vaibhav Arora: A promising young bowler with a good domesc record.
Kuldeep Sen: A fast bowler with the ability to take wickets, adding depth to the bowling
lineup.
Sandeep Warrier: An experienced bowler providing addional opons in the pace aack.
Bhuvneshwar Kumar: A seasoned pacer known for his swing bowling and experience in high-
pressure situaons.
Tanush Koan: An all-rounder with potenal, offering flexibility to the squad.
Kuldeep Yadav: A skilled spinner known for his wicket-taking ability and variaons.
Hrithik Shokeen: An emerging spinner with potenal to contribute to the middle overs.
69