Advanced Machine Learning Techniques for

Optimizing Sports Team Composition: A

Comprehensive Predictive Analytics Framework

Submied for Business Analycs Research Project to Aston University.

Submied in September 2024

Niranjan Gopalan,

Aston Business School, Aston University.

Master of Science in Business Analycs

Declaraon

I declare that I have personally prepared this research report tled "Comprehensive Analysis of

Indian Premier League and Projecng the Opmal Squad for the 2025 Season." This work has not

been submied for any other degree or qualiﬁcaon, nor has it appeared in any previously published

document. The research described here is my own, conducted personally unless otherwise stated. All

sources of informaon are duly acknowledged through references. This study contributes original

insights to cricket analycs, parcularly in IPL team management and player selecon strategies.

Acknowledgement

I would like to express my sincere gratude to sports performance analysts, research engineers

worldwide, whose research and insights have signiﬁcantly informed and enriched my understanding

of sports analycs. I am deeply thankful to my mother Latha, for her unwavering encouragement and

belief in my abilies, and to my father Gopalan, whose dedicaon to public welfare and mathemacs

literacy has been a constant source of inspiraon. I also extend my hearelt appreciaon to my

supervisor Dr. Rizwan Ahmed, for his guidance and support throughout this research project. Lastly, I

would like to thank all the individuals who supported me during my research, as their contribuons

have been essenal to the compleon of this work.

Table of Contents

Abstract ................................................................................................................................. 5

List of Figures ......................................................................................................................... 6

List of Tables ........................................................................................................................... 7

1. Introducon ........................................................................................................................ 8

1.1 Background ................................................................................................................... 8

1.2 Team performances over the years .................................................................................. 9

1.3 Research Objecve and Need for Study: ........................................................................... 9

1.4 Scopes ........................................................................................................................ 10

1.5 Limitaons of the study: ............................................................................................... 10

2. Signiﬁcance of the Study .................................................................................................... 11

2.1 Structure of the research .............................................................................................. 11

3. Literature Review .............................................................................................................. 13

3.1 Adopon of Machine Learning on Sports........................................................................ 14

3.2 Adopon of Machine Learning on Cricket ....................................................................... 15

3.3 Summary of Literature Review: ..................................................................................... 16

4. Research Methodology ...................................................................................................... 17

4.1 Dataset and Approach Overview: .................................................................................. 17

4.2 Data Processing ........................................................................................................... 18

4.2.1 Data Filtering ......................................................................................................... 18

4.2.2 Player Data Extracon ............................................................................................ 18

4.3 Bang Stascs Calculaon ......................................................................................... 18

4.4 Bowling Stascs Calculaon ........................................................................................ 18

4.5 Domain Knowledge ...................................................................................................... 19

5. Quantave and Predicve analysis.................................................................................... 20

5.1 Win Rao Analysis of Teams .......................................................................................... 20

5.1.1 Need for Analysis ................................................................................................... 20

5.1.2 Objecve .............................................................................................................. 21

5.1.3 Data overview ....................................................................................................... 21

5.1.4 Quantave Analysis ............................................................................................. 21

5.1.5 Corelaon Analysis: ............................................................................................... 22

5.1.6 Linear Regression Model: ....................................................................................... 23

5.1.7 Predicon Analysis ................................................................................................. 23

5.1.8 Predicon Findings:................................................................................................ 24

5.2 Rule-based scoring system combined with normalisaon and weighted aggregaon ......... 25

5.2.1 Need for Analysis ................................................................................................... 25

5.2.2 Objecve .............................................................................................................. 25

5.2.3 Data Overview ....................................................................................................... 25

5.2.4 Quantave Analysis ............................................................................................. 26

5.3 Random Forest model to predict the overall score .......................................................... 28

5.3.1 Need for Analysis: .................................................................................................. 28

5.3.2 Objecve of the Model ........................................................................................... 28

5.3.3 Data Overview ....................................................................................................... 28

5.3.4 Random Forest Regression Model ........................................................................... 29

5.3.5 Model Visualisaon ............................................................................................... 29

5.4 Random Forest Model using RandomizedSearchCV ......................................................... 31

5.4.1 Objecve of the model ........................................................................................... 31

5.4.2 Representaon of Random Forest model with RandomizedSearchCV ......................... 31

5.4.3 Evaluaon ............................................................................................................. 32

5.4.4 Model Visualisaon ............................................................................................... 32

5.5 XG Boosng Method .................................................................................................... 33

5.5.1 Objecve of this model .......................................................................................... 33

5.5.2 Representaon of the Model .................................................................................. 34

5.5.3 Model Visualisaon ............................................................................................... 35

5.6 Enhanced XG Boosng model ........................................................................................ 36

5.6.1 Representaon of the model .................................................................................. 36

5.7 Support Vector Regression Model .................................................................................. 37

5.7.1 Objecve of the model ........................................................................................... 37

5.7.2 Representaon of the model .................................................................................. 38

5.7.3 Model Visualisaon ............................................................................................... 39

5.7.3 Distribuon of Predicon errors .............................................................................. 39

5.8 Machine Learning Models and their accuracy results ....................................................... 39

5.8.1 Evaluaon ............................................................................................................. 39

5.8.2 Fine-Tuning ........................................................................................................... 40

5.8.3 Model Tesng ........................................................................................................ 41

5.8.4 Random Forest Model 1 Predicon .......................................................................... 41

5.8.5 XG Boost Model 1 Predicon................................................................................... 41

5.8.6 Performance Distribuon Curves ............................................................................. 42

5.8.7 ROC curves ............................................................................................................ 43

6. Players Overall Performance score for KKR and DC ............................................................... 44

6.1 Kolkata Knight Riders Current Players Analysis ................................................................ 44

6.2 Delhi Capitals Current Players Analysis ........................................................................... 45

7. Conclusion ........................................................................................................................ 46

7.1 Squad Opmizaon ...................................................................................................... 46

7.2 KKR Squad Opmizaon and picking best squad ............................................................. 47

7.2.1 Current players Overall score Predicon .................................................................. 47

7.2.2 Potenal Squad Opons for KKR ............................................................................. 48

7.3 Delhi Capitals Squad Opmizaon and picking best squad ............................................... 49

7.3.1 Current players Overall score Predicon .................................................................. 49

7.3.2 Potenal Squad Formaon for Delhi Capitals............................................................ 50

8. Findings and Insights of Players and their performance scores ............................................... 51

8.1 Distribuon of Overall Scores by Player Type .................................................................. 51

8.2 Players with more than 300 runs with strike rate more than 130 ...................................... 52

8.3 Top All-rounders analysis .............................................................................................. 53

8.4 Top Economical Bowlers Analysis ................................................................................... 53

8.5 Density distribuon of overall scores: ............................................................................ 54

8.6 Performance metrics of All-rounders ............................................................................. 54

8.7 Research Conclusion ..................................................................................................... 55

9. Recommendaons ............................................................................................................. 55

10. References ...................................................................................................................... 56

Appendices .......................................................................................................................... 59

Appendix 1 – About IPL Teams ............................................................................................ 59

Appendix 2 – Team Performance ......................................................................................... 61

Appendix 3 – Reason for using Linear Regression ................................................................. 62

Appendix 4 – Reason for using Rule Based Scoring System .................................................... 63

Appendix 4 – Dataset variables ........................................................................................... 64

Appendix 5 – Potenal squad Opons for KKR ..................................................................... 66

Appendix 6 – Potenal squad Opons for DC ....................................................................... 67

Abstract

This research project focuses on opmizing squad composions for the Kolkata Knight Riders (KKR)

and Delhi Capitals (DC) in preparaon for the 2025 Indian Premier League (IPL) mega aucon. The

study employs advanced cricket analycs and machine learning models to provide strategic insights

for team building and performance enhancement. Since its incepon in 2008, the IPL has

revoluonized cricket, becoming one of the most popular and lucrave sports leagues globally,

featuring ten franchise teams compeng in a high-stakes, fast-paced T20 format. The research

methodology involves comprehensive data processing, including extracon from reliable sources,

cleaning, and preprocessing. Various machine learning models, such as linear regression, random

forest, XG boosng, and support vector regression, are ulized to analyse player performance and

predict outcomes. Key analyses include Win Rao Analysis of Teams, a rule-based scoring system

combining normalisaon and weighted aggregaon, and Random Forest Models opmized using

RandomizedSearchCV. The study evaluates these models using performance metrics like ROC curves

and performance distribuon curves to ensure robust and accurate predicons. By analysing KKR's

championship-winning strategies in 2024 and DC's approach to team building, the research provides

a comparave analysis of diﬀerent management philosophies and their impact on team

performance.

The study's signiﬁcance extends beyond the IPL, oﬀering valuable insights for other T20 leagues like

The Hundred and Big Bash League (BBL), as well as potenal applicaons in sports like football and

baseball. Key ﬁndings highlight the importance of strategic player retenon, the inﬂuence of external

factors on performance, and the challenges of predicng player aucon values. For KKR and DC

speciﬁcally, the research oﬀers analysis of current player performances, idenﬁcaon of key

strengths and weaknesses, and recommendaons for opmizing squad potenal, with a parcular

focus on helping DC beer ulize their young talent. The research contributes to the growing ﬁeld of

quantave sports analycs, demonstrang the importance of data-driven decision-making in

modern sports management. It provides a framework for improving player selecon, strategy

formulaon, and overall team management across various sports disciplines. Limitaons of the study

include the challenges of evaluang new players with limited IPL data, accurately predicng aucon

values, and accounng for unforeseen circumstances. In conclusion, this research project oﬀers a

comprehensive analysis of cricket analycs, emphasizing the importance of data-driven strategies in

sports management. By focusing on the squad opmizaon of KKR and DC, the study provides

valuable insights that can be applied to other cricket tournaments and sports, underscoring the

potenal of analycs to revoluonize team management and performance opmizaon in the

compeve world of sports.

Keywords: Indian Premier League, Kolkata Knight Riders, Delhi Capitals, Squad Opmizaon, Player

Performance, Machine Learning, Data Analysis, Predicve Modelling, Win Rao, XG Boosng,

Random Forest, Bang Stascs, Bowling Stascs, Performance Metrics, Aucon Strategies, Team

Management, Cricket Analycs, Stascal Methods, Player Selecon, Team Performance

Word count: Around 11,200 words

List of Figures

Figure 1 - IPL Logo ................................................................................................................... 8

Figure 2 - Correlaon graph for Linear Reg Model .................................................................... 22

Figure 3 - Predicon Graph .................................................................................................... 24

Figure 4 - Correlaon graph for Random Forest Model ............................................................. 29

Figure 5 - Actual vs Predicted Graph for RF 1 ........................................................................... 30

Figure 6 - Residual graph for RF 1 ............................................................................................ 30

Figure 7 - Predicon error histogram for RF 1 ........................................................................... 30

Figure 8 - Actual vs predicted graph for RF 2 ............................................................................ 32

Figure 9 - Predicted error histogram for RF 2 ............................................................................ 33

Figure 10 - Actual Vs predicted graph for XGBoost model 1 ....................................................... 35

Figure 11 - Predicon error histogram for XGBoost model 1 ...................................................... 36

Figure 12 - Actual vs predicted graph for SVR ........................................................................... 39

Figure 13 - Predicon error histogram for SVR ......................................................................... 39

Figure 14 - Performance distribuon curve for RF 1 and XGBoost .............................................. 42

Figure 15 - ROC Curve graph ................................................................................................... 43

Figure 16 - Bar chart for KKR current players ............................................................................ 44

Figure 17 - Bar chart for DC current players ............................................................................. 45

Figure 18 - Bar chart for squad opons KKR ............................................................................. 48

Figure 19 - Bar chart for squad opons DC ............................................................................... 50

Figure 20 - Average overall score chart .................................................................................... 51

Figure 21 - Distribuon of overall score by player type ............................................................. 51

Figure 22 - Scaer plot for Batsman ........................................................................................ 52

Figure 23 - Scaer plot for top all-rounders ............................................................................. 53

Figure 24 - Scaer plot for top economical bowlers .................................................................. 53

Figure 25 - Density distribuon by player types........................................................................ 54

Figure 26 - Performance metrics of top 5 all-rounders .............................................................. 54

Figure 27 - Chennai Super Kings Logo ...................................................................................... 59

Figure 28 - Delhi Capitals Logo ................................................................................................ 59

Figure 29 - Gujarat Titans Logo ............................................................................................... 59

Figure 30 - Kolkata Knight Riders Logo ..................................................................................... 59

Figure 31 - Lucknow Super Giants Logo ................................................................................... 60

Figure 32 - Mumbai Indians Logo ............................................................................................ 60

Figure 33 - Punjab Kings Logo ................................................................................................. 60

Figure 34 - Rajasthan Royals Logo ........................................................................................... 60

Figure 35 - Royal Challengers Bengaluru logo ........................................................................... 61

Figure 36 - Sunrisers Hyderabad ............................................................................................. 61

List of Tables

Table 1 - Team performance table ............................................................................................. 9

Table 2 - Dataset Columns ...................................................................................................... 17

Table 3 - Data for team’s performance overview ...................................................................... 21

Table 4 - Win Rao of all teams .............................................................................................. 21

Table 5 - Team predicon with diﬀerence ................................................................................ 23

Table 6 - Overall score table ................................................................................................... 27

Table 7 - All models evaluaon metrics ................................................................................... 39

Table 8 - Evaluaon metrics aer ﬁne-tuning ........................................................................... 40

Table 9 - Sample data for model tesng .................................................................................. 41

Table 10 - Random Forest Model 1 predicon results ............................................................... 41

Table 11 - XG Boost Model 1 Predicon results ........................................................................ 41

Table 12 - KKR current players predicted overall score .............................................................. 47

Table 13 - DC current players predicted overall scores .............................................................. 49

1. Introducon

1.1 Background

The Indian Premier League (IPL) has revoluonized cricket since its incepon in 2008, becoming one

of the most popular and lucrave sports leagues globally (Board of Control for Cricket in India, 2023).

This professional Twenty20 cricket tournament features ten franchise teams represenng diﬀerent

Indian cies or states, compeng in a high-stakes, fast-paced format that has captured the

imaginaon of fans worldwide.

Figure 1- IPL Logo

Source: iplt20

The IPL's success can be aributed to several factors. Firstly, its star-studded lineups aract top

cricket talent from around the world. Each team can ﬁeld up to four overseas players in their playing

eleven, creang a melng pot of internaonal stars alongside India's best cricketers (ESPN Cricinfo,

2023). This combinaon of global and local talent has helped the IPL become a crickeng spectacle

that consistently ranks among the top sports leagues in terms of average aendance.

The tournament's economic impact has been substanal. In 2022, the league's brand value was

esmated at ₹90,038 crore (US$11 billion) (Duﬀ & Phelps, 2022). Its contribuon to India's GDP is

signiﬁcant, with the 2015 season alone adding ₹1,150 crore (US$140 million) to the economy (BCCI,

2016). The league's valuaon has skyrocketed, reaching US$10.9 billion in December 2022 and

achieving "decacorn" status (Economic Times, 2023).

The IPL's popularity is reﬂected in its lucrave media rights deals. For the 2023-2026 seasons, the

league sold its media rights for US$6.4 billion, valuing each match at $13.4 million (Sportstar, 2023).

The tournament has also broken viewership records, with the 2023 ﬁnal becoming the most

streamed live event on the internet, aracng 32 million viewers (JioCinema, 2023).

The Indian Premier League has transformed cricket from a tradional sport into a global

entertainment spectacle. Its blend of star power, economic impact, and innovave gameplay has

cemented its posion as a powerhouse in the world of sports, inﬂuencing the way cricket is played

and consumed around the globe (“see Appendix 1”).

Each team can have a maximum of 25 players in their squad, with no more than eight overseas

players. The playing eleven for each match can include up to four overseas players, ensuring a

balance of internaonal stars and domesc talent (IPL Governing Council, 2024).

The IPL's team structure, with its mix of internaonal stars and Indian talent, creates a unique and

excing crickeng spectacle that has captured the imaginaon of fans worldwide (Shah, 2023).

1.2 Team performances over the years

Table 1 - Team performance table

Team Name Played Won Lost N/R Titles Finalists Playoff

MI 261 144 117 0 5 6 11

RCB 256 123 129 4 0 3 9

KKR 252 131 120 1 3 4 7

DC 252 115 135 2 0 1 6

PK 246 112 134 0 0 1 2

CSK 239 138 99 2 5 10 13

RR 222 112 107 3 1 2 5

SRH 182 88 94 0 1 3 6

GT 45 28 17 0 1 2 2

LSG 44 24 19 1 0 0 2

This table provides a comprehensive overview of the performance of Indian Premier League (IPL)

teams since the league's incepon in 2008.The breakdown of the informaon and analyse the data

for each team is explained in the “Appendix 2”.

This table highlights the varying degrees of success and consistency among IPL teams. While some

teams like MI and CSK have dominated with mulple tles and consistent playoﬀ appearances,

others like RCB and PK have struggled to convert their opportunies into championships. The newer

teams, GT and LSG, have shown promise in their short IPL careers, adding excitement to the league's

compeve landscape (Shah, 2023).

1.3 Research Objecve and Need for Study:

A major focus of this study is the upcoming mega aucon for the 2025 IPL season. This aucon will

result in most players being released, with teams allowed to retain only four players, including a

maximum of two foreign players. This signiﬁcant event provides an opportunity to analyse and

opmize squad-building strategies for Kolkata Knight Riders (KKR) and Delhi Capitals (DC).

Among the 10 teams in the Indian Premier League (IPL), this study aims to analyse, predict, and

opmize the squads for two speciﬁc teams: Kolkata Knight Riders (KKR) and Delhi Capitals (DC). KKR,

the 2024 IPL champions, boasts one of the strongest squads in the league. In contrast, DC possesses

a power-packed young squad but has struggled to ulize their potenal eﬀecvely. The research will

focus on the following aspects:

1. Squad Analysis: Examine the composion of both KKR and DC squads, idenfying key

strengths and weaknesses based on their performances.

2. Quantave analysis: Use stascal methods to compare player performances, team

strategies, and match outcomes for both KKR and DC.

3. Performance Predicon: Develop models to forecast player and team performance based on

historical data and current squad dynamics.

4. Squad Opmizaon: Propose strategies for both teams to maximize their squad potenal,

with a parcular emphasis on helping DC beer ulize their young talent.

5. Evaluate the current squads of KKR and DC to idenfy potenal retenon candidates.

6. Analyse the impact of retaining only four players on team dynamics and performance.

7. Success Factors: Invesgate the elements that contributed to KKR's championship win in

2024, including team balance, leadership, and player ulizaon.

By conducng this research, the aim is to provide valuable insights into eﬀecve squad building,

talent ulizaon, and performance opmizaon in the highly compeve environment of the IPL.

The ﬁndings could oﬀer strategic guidance not only for KKR and DC but also for other T20 cricket

franchises globally.

1.4 Scopes

1. Player performance predicon: Develop models to predict player performance based on

historical data, considering factors like bang and bowling stascs.

2. Team eﬃciency analysis: Evaluate the eﬃciency of teams using techniques like Data

Envelopment Analysis (DEA) and Structural Equaon Modeling (SEM).

3. Strategic player retenon: Analyse strategies for the upcoming mega aucon, focusing on

opmal player retenon decisions for KKR and DC.

4. Impact of external factors: Examine how factors like weather, match locaon, and stadium

condions aﬀect player and team performance.

5. Comparave analysis: Compare KKR and DC's squad building and ulizaon strategies with

other successful IPL teams.

1.5 Limitaons of the study:

1. Limited data for new players: Relying solely on IPL data may limit the evaluaon of new or

emerging players who haven't played in the league before.

2. Complexity of player valuaon: Accurately predicng player aucon values and performance

can be challenging due to mulple inﬂuencing factors.

3. Changing league dynamics: The study's ﬁndings may be aﬀected by evolving league rules,

team strategies, and player availability.

4. External factors: Unforeseen circumstances like injuries, player form, or oﬀ-ﬁeld issues can

impact team performance and are diﬃcult to account for in models.

5. Limited scope: Focusing on only two teams (KKR and DC) may limit the generalizability of

ﬁndings to other IPL teams or T20 leagues.

6. Time constraints: The dynamic nature of T20 cricket and frequent player transfers may make

long-term predicons challenging.

7. Mul-objecve opmizaon: It is diﬃcult to formulate team selecon as a mul-objecve

opmizaon problem, while considering budget constraints.

These scopes and limitaons can help, frame the research objecves and methodology for analysing

and opmizing the squads of KKR and Delhi Capitals in the context of the upcoming IPL mega

aucon.

2. Signiﬁcance of the Study

1. Performance Opmizaon: By analysing the factors contribung to KKR's success and DC's

underperformance, the study can oﬀer valuable insights into how teams can beer ulize

their player resources, especially young talent (Ishi et al., 2022).

2. Quantave Sports Analycs: The research contributes to the growing ﬁeld of quantave

sports analycs in cricket, which has become increasingly important for team management

and strategy development (Jana et al., 2021).

3. Player Improvement: The insights gained from this study can help improve individual player

performance by idenfying key areas for development based on data-driven analysis.

(Techiexpert, 2024).

4. Global Cricket Applicaons: The ﬁndings could be valuable not only for IPL teams but also for

naonal cricket boards such as the England and Wales Cricket Board (ECB), Cricket Australia,

and New Zealand Cricket, helping them in player selecon and team strategy for

internaonal compeons (Kalgotra et al., 2014).

5. Predicve Modelling: This research will contribute to the development of more accurate

predicve models for player and team performance in T20 cricket, drawing inspiraon from

advanced analycs techniques used in football. These models can leverage machine learning

algorithms and big data analysis, like those employed in predicng football match outcomes

and player performance (Hubáček et al., 2019; Berrar et al., 2019). Such approaches can be

valuable for team management, fantasy cricket enthusiasts, and sports analycs

professionals, potenally improving decision-making processes in player selecon and

strategy formulaon.

6. Comparave Analysis: By conducng an in-depth comparison of Kolkata Knight Riders'

championship-winning strategies in 2024 and Delhi Capitals' approach to team building, this

study will provide valuable insights into the eﬃcacy of diﬀerent management philosophies

and their impact on team performance in the IPL (ESPNcricinfo, 2024). This analysis will

highlight how KKR's meculously designed squad, enabling aggressive bang without

compromising depth, contrasts with DC's focus on nurturing young talent, oﬀering a

comprehensive perspecve on successful team construcon in T20 cricket.

2.1 Structure of the research

The goal of this study is to predict the opmal squad composion for KKR and DC in the upcoming

Indian Premier League (IPL) season. The research methodology encompasses several key steps:

1. Data Extracon: Gathering comprehensive player stascs and performance data from

various reliable sources.

2. Data Cleaning: Preprocessing the collected data to ensure accuracy, consistency, and

relevance for analysis.

3. Descripve Analycs: Conducng a thorough exploratory data analysis to understand the

underlying paerns and trends in player performances.

4. Data Visualisaon: Creang insighul visual representaons of the data to facilitate easier

interpretaon and idenﬁcaon of key insights.

5. Feature Engineering: Developing new variables or transforming exisng ones to enhance the

predicve power of the models.

6. Player Score Predicon: Ulizing advanced stascal and machine learning techniques to

forecast individual player performances based on historical data and relevant factors.

7. Hyperparameter Tuning: Opmizing the predicve models through rigorous hyperparameter

adjustment to improve accuracy and reliability.

8. Model Performance Tesng: Evaluang the performance of the tuned models using metrics

and techniques such as cross-validaon, ROC curves, and performance distribuon curves to

ensure robustness and accuracy.

9. Squad Opmizaon: Employing the reﬁned predicve models to determine the most

eﬀecve squad composions for KKR and DC, considering various constraints.

10. Stakeholder Recommendaons: Formulang data-driven, aconable recommendaons for

team management, coaches, and other relevant stakeholders to inform their decision-

making processes in player selecon and team strategy.

This comprehensive approach aims to leverage advanced analycs to provide valuable insights and

strategic advantages in the highly compeve landscape of the IPL.

3. Literature Review

The Indian Premier League (IPL) has not only revoluonized cricket but has also become a ferle

ground for sports analycs since its incepon in 2008. As the league has grown in stature, so has the

sophiscaon of the analycal approaches used to understand and predict its dynamics.

The Economic Catalyst

(Kadapa,2013) highlighted the IPL's massive economic footprint, underscoring the ﬁnancial

imperave driving the adopon of advanced analycs. With billions at stake, teams and stakeholders

are increasingly turning to data-driven approaches to gain a compeve edge.

Evoluon of Analycal Approaches

The journey of IPL analycs has been one of connuous reﬁnement. (Shah et al, 2016) laid important

groundwork with their comprehensive analysis of IPL data from 2008 to 2015. Their work

demonstrated the potenal of machine learning in decoding the complexies of T20 cricket, seng

the stage for more advanced studies. Building on this foundaon, Prakash et al. (2019) developed a

nuanced player ranking system using machine learning algorithms. Their model's success in

predicng player rankings with high accuracy showcased the power of analycs in informing team

selecon strategies.

The Human Element in Data

While numbers are at the heart of analycs, recent research has emphasized the importance of

translang data into aconable insights. (Ishi et al,2022) took a signiﬁcant step in this direcon by

using machine learning for player classiﬁcaon. Their work helps bridge the gap between raw data

and on-ﬁeld strategy, providing coaches and managers with a more intuive understanding of player

capabilies.

Predicve Power and Its Limitaons

The holy grail of sports analycs is accurate predicon, and IPL research has made signiﬁcant strides

in this area. (Amala Kaviya et al,2020) achieved an impressive 81% accuracy in predicng match

outcomes. However, as any cricket fan knows, the game's unpredictability is part of its charm. These

models, while powerful, serve as tools to inform decision-making rather than crystal balls.

Transparency in Analycs

Recognizing the need for interpretable results, (Bajaj,2023) explored the use of Explainable AI

techniques. This approach not only predicts performance but also elucidates the factors inﬂuencing

these predicons, making the insights more accessible and aconable for non-technical stakeholders.

Visualising Success

In the fast-paced world of T20 cricket, the ability to quickly grasp complex informaon is crucial.

(Rodrigues et al,2019) addressed this need by focusing on data visualisaon techniques. Their work

highlights how eﬀecve visual representaon can transform raw data into strategic insights,

accessible to everyone from analysts to players.

The Road Ahead

1. Real-me analycs during matches could revoluonize in-game decision-making.

2. Integraon of non-tradional data sources, such as social media senment and player

biometrics, may provide a more holisc view of performance.

3. More sophiscated player valuaon models could transform aucon strategies.

4. The applicaon of deep learning to video analysis promises to unlock new insights into player

techniques and strategies.

“The ﬁeld of IPL analycs is not just about numbers; it's about enhancing the beauful game of

cricket”. As analycs connue to evolve, they promise to enrich our understanding and enjoyment of

the sport, providing fans, players, and managers alike with new perspecves on the game we love.

3.1 Adopon of Machine Learning on Sports

The adopon of Machine Learning (ML) in sports has seen signiﬁcant growth in recent years,

revoluonizing various aspects of athlec performance, strategy, and management.

Performance Analysis and Predicon:

ML has been extensively applied to analyse and predict athlec performance. (Ofoghi et al, 2013)

demonstrated the use of ML algorithms to predict medal-winning performances in sprint kayaking,

achieving an accuracy of 80%. Similarly, (Bunker and Thabtah, 2019) reviewed ML applicaons in

predicng outcomes of various sports, ﬁnding that ensemble methods oen outperform individual

algorithms in accuracy.

Injury Predicon and Prevenon:

A crical area where ML has shown promise is in injury predicon and prevenon. (Rossi et al, 2018)

developed a ML model to predict injuries in soccer players, achieving an accuracy of 80% in

idenfying high-risk athletes. Building on this, (Rommers et al, 2020) used ML techniques to predict

injuries in youth soccer players, demonstrang the potenal of these methods in protecng young

athletes.

Taccal Analysis:

ML has transformed taccal analysis in team sports. (Memmert and Raabe, 2018) explored how ML

algorithms can analyse complex paerns in soccer matches, providing coaches with insights that

were previously unaainable through tradional methods. In basketball, (Cervone et al,2016) used

ML to evaluate decision-making in real-me, oﬀering a new perspecve on player eﬀecveness

beyond tradional stascs.

Player Recruitment and Scoung:

The applicaon of ML in talent idenﬁcaon and recruitment has gained tracon. (McHale et

al,2012) developed a ML model to assess player performance in soccer, which has implicaons for

scoung and transfer decisions. More recently, (Liu et al, 2020) used deep learning techniques to

analyse player movements in basketball, providing a data-driven approach to talent evaluaon.

Fan Engagement and Business Operaons:

ML has also found applicaons in enhancing fan engagement and opmizing business operaons in

sports. (Fried and Mumcu, 2016) explored how ML can be used to personalize fan experiences and

improve markeng strategies in professional sports. In cket pricing, (Kemper and Breuer, 2016)

demonstrated how ML algorithms can opmize dynamic pricing strategies, potenally increasing

revenue for sports organizaons.

Challenges and Ethical Consideraons:

Despite its potenal, the adopon of ML in sports faces several challenges. (Caya and Bourdon, 2016)

highlighted issues of data quality and interpretaon in sports analycs, emphasizing the need for

domain experse in developing ML models. Ethical consideraons have also come to the forefront,

with (Loland, 2018) discussing the implicaons of ML on fairness and integrity in sports.

Future Direcons:

The future of ML in sports looks promising, with several emerging areas of research. Wearable

technology and IoT devices are expected to provide more granular data for ML models, as explored

by (Seshadri et al., 2019) in their work on real-me performance tracking. Addionally, the

integraon of computer vision with ML, as demonstrated by (Thomas et al, 2017) in their analysis of

tennis player movements, opens new avenues for automated performance analysis.

3.2 Adopon of Machine Learning on Cricket

The adopon of Machine Learning (ML) in cricket analycs has gained signiﬁcant tracon in recent

years, with researchers from Europe and the USA contribung to this ﬁeld. Here's a literature review

focusing on key aspects:

1. Match Outcome Predicon:

Researchers have applied ML techniques to predict cricket match outcomes. A study from

the UK focused on English County twenty-over cricket matches, invesgang the degree to

which it's possible to predict match outcomes using ML algorithms. This research

demonstrates the growing interest in applying advanced analycs to cricket.

2. Performance Analysis:

ML has been used to analyse player and team performance in cricket. While not speciﬁcally

focused on cricket, McHale et al. (2012) developed ML models to assess player performance

in soccer, which has implicaons for similar applicaons in cricket, parcularly for scoung

and team selecon strategies.

3. Data-Driven Decision Making:

The adopon of ML in cricket analycs aligns with broader trends in sports analycs. Beal et

al. (2019) conducted a comprehensive survey on arﬁcial intelligence for team sports, which

included cricket. They noted that ML methods have been applied to various aspects of

sports, including taccal analysis and performance predicon.

4. Challenges and Limitaons:

While ML shows promise in cricket analycs, researchers have noted challenges such as the

need for high-quality data and the complexity of cricket's rules and playing condions, which

can aﬀect model accuracy (Beal et al., 2019). The dynamic nature of cricket, with its mulple

formats and varying condions, presents unique challenges for ML applicaons.

5. Future Direcons:

Ongoing research is focusing on improving the accuracy of predicve models and expanding

the range of applicaons for ML in cricket. This includes real-me analysis during matches

and more sophiscated player valuaon models (Beal et al., 2019). The potenal for ML to

enhance decision-making in areas such as team selecon, strategy formulaon, and player

development is signiﬁcant.

6. Interdisciplinary Approach:

The literature suggests that successful adopon of ML in cricket requires an interdisciplinary

approach, combining experse in data science, sports science, and domain-speciﬁc

knowledge of cricket (Beal et al., 2019).

The adopon of ML in cricket analycs is growing, the ﬁeld is sll evolving. Researchers connue to

reﬁne methodologies and explore new applicaons to enhance the understanding and analysis of the

sport. The potenal for ML to transform various aspects of cricket, from player performance analysis

to strategic decision-making, is signiﬁcant, but challenges remain in terms of data quality, model

interpretability, and praccal implementaon.

3.3 Summary of Literature Review:

1. Match Outcome Predicon:

Researchers have applied ML techniques to predict cricket match outcomes. A study focused

on English County twenty-over cricket matches invesgated the degree to which it's possible

to predict match outcomes using ML algorithms. This demonstrates the growing interest in

applying advanced analycs to cricket.

2. Performance Analysis:

ML has been used to analyse player and team performance in cricket. While not speciﬁcally

focused on cricket, studies like McHale et al. (2012) developed ML models to assess player

performance in soccer, which has implicaons for similar applicaons in cricket, parcularly

for scoung and team selecon strategies.

3. Data-Driven Decision Making:

The adopon of ML in cricket analycs aligns with broader trends in sports analycs. Beal et

al. (2019) conducted a comprehensive survey on arﬁcial intelligence for team sports, which

included cricket. They noted that ML methods have been applied to various aspects of

sports, including taccal analysis and performance predicon.

4. Challenges and Limitaons:

Researchers have noted challenges such as the need for high-quality data and the complexity

of cricket's rules and playing condions, which can aﬀect model accuracy (Beal et al., 2019).

The dynamic nature of cricket, with its mulple formats and varying condions, presents

unique challenges for ML applicaons.

5. Future Direcons:

Ongoing research is focusing on improving the accuracy of predicve models and expanding

the range of applicaons for ML in cricket. This includes real-me analysis during matches

and more sophiscated player valuaon models.

6. Interdisciplinary Approach:

Successful adopon of ML in cricket requires an interdisciplinary approach, combining

experse in data science, sports science, and domain-speciﬁc knowledge of cricket (Beal et

al., 2019).

7. Emerging Technologies:

The European Cricket Network has partnered with Full track AI, an advanced machine

learning and arﬁcial intelligence service, to provide ball tracking graphics, pitch maps,

speeds, and other key data points using mobile phone technology (Emerging Cricket, 2023).

In conclusion, while the adopon of ML in cricket analycs is growing, the ﬁeld is sll evolving.

Researchers connue to reﬁne methodologies and explore new applicaons to enhance the

understanding and analysis of the sport. The potenal for ML to transform various aspects of cricket,

from player performance analysis to strategic decision-making, is signiﬁcant, but challenges remain

in terms of data quality, model interpretability, and praccal implementaon.

4. Research Methodology

The primary objecve of this research is to leverage machine learning techniques to predict and

opmize the squad composions for two Indian Premier League (IPL) teams: Kolkata Knight Riders

(KKR) and Delhi Capitals (DC). This study aims to ulize a comprehensive approach that incorporates

various stascal methods, algorithms, and predicve models to analyse player performance data.

By employing advanced data mining techniques, feature engineering, and machine learning

algorithms such as decision trees, random forests, and support vector machines (Regression), the

research seeks to idenfy the most eﬀecve player combinaons for each team. The goal is to

provide data-driven insights that can inform team management decisions, parcularly in the context

of player selecon for upcoming seasons and aucons.

4.1 Dataset and Approach Overview:

The dataset is taken from trusted websites: Cricsheet and Howstat. The dataset appears genuine, and

cross-veriﬁcaon has been performed to check the legimacy of the data. The dataset contains a

ball-by-ball record of IPL matches, providing detailed informaon about each delivery, including

match details, player informaon, runs scored, extras, and dismissals.

Table 2 - Dataset Columns

Column Name

Descripon

match_id

Unique idenﬁer for each match

season

The IPL season year

start_date The date the match started

venue The locaon where the match was played

innings The innings number (1st or 2nd)

ball The ball number within the over

bang_team The team currently bang

bowling_team

The team currently bowling

striker The batsman facing the current ball

non_striker The batsman at the other end

extras Total extra runs scored on this ball

wides Number of wide balls

noballs Number of no balls

byes Number of byes

legbyes Number of leg byes

penalty Any penalty runs awarded

wicket_type Type of dismissal if a wicket fell

player_dismissed Name of the player dismissed (if applicable)

other_wicket_type Secondary wicket type (if applicable)

This table represents a comprehensive dataset used for analysing cricket matches, speciﬁcally

focusing on the Indian Premier League (IPL). Each row in the table corresponds to a speciﬁc delivery

(ball) in a match, providing detailed informaon about the events occurring during that delivery(“see

Appendix 5”)

4.2 Data Processing

4.2.1 Data Filtering

1. The data is ﬁltered to include only innings 1 and 2, excluding super overs.

2. Further ﬁltering is applied to select only the seasons from 2021 to 2024.

4.2.2 Player Data Extracon

1. Unique player names are extracted from the 'striker', 'non_striker', and 'bowler' columns.

2. A Data Frame containing all unique player names is created.

4.3 Bang Stascs Calculaon

1. Pivot tables are created to calculate runs scored and balls faced by each player in each

season.

2. The pivot tables are merged to create a comprehensive bang dataset.

3. Total runs scored and total balls faced across all seasons are computed for each player.

4. Bang strike rate is calculated using the formula: (Runs Scored / Balls Faced) * 100.

Total Runs Scored:

𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝑆𝑐𝑜𝑟𝑒𝑑 = 𝑟𝑢𝑛𝑠𝑖𝑛2021 + 𝑟𝑢𝑛𝑠𝑖𝑛2022 + 𝑟𝑢𝑛𝑠𝑖𝑛2023 + 𝑟𝑢𝑛𝑠𝑖𝑛2024

Total Balls Faced:

𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐹𝑎𝑐𝑒𝑑 = 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2021 + 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2022 + 𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑𝑖𝑛2023

Bang Strike Rate:

𝑆𝑡𝑟𝑖𝑘𝑒 𝑅𝑎𝑡𝑒 = (𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝑆𝑐𝑜𝑟𝑒𝑑/𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐹𝑎𝑐𝑒𝑑) × 100

4.4 Bowling Stascs Calculaon

1. Wicket types are deﬁned (bowled, caught, caught and bowled, hit wicket, lbw, stumped).

2. Pivot tables are created for wickets taken, balls bowled and runs conceded by each bowler in

each season.

3. Total wickets, total balls bowled, and total runs given across all seasons are computed for

each bowler.

4. The bowling data is merged into a single DataFrame.

Wickets Taken:

𝑊𝑖𝑐𝑘𝑒𝑡𝑠 𝑇𝑎𝑘𝑒𝑛 = ∑ 𝑤𝑖𝑐𝑘𝑒𝑡 𝑡𝑦𝑝𝑒 𝑐𝑜𝑢𝑛𝑡

 Count occurrences of speciﬁc wicket types for each bowler.

Balls Bowled:

𝐵𝑎𝑙𝑙𝑠 𝐵𝑜𝑤𝑙𝑒𝑑 = ∑ 𝑏𝑎𝑙𝑙 𝑐𝑜𝑢𝑛𝑡 𝑝𝑒𝑟 𝑠𝑒𝑎𝑠𝑜𝑛

Runs Conceded:

𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝐶𝑜𝑛𝑐𝑒𝑑𝑒𝑑

= ∑(𝑟𝑢𝑛𝑠 𝑜𝑓𝑓 𝑏𝑎𝑡 + 𝑒𝑥𝑡𝑟𝑎𝑠 + 𝑤𝑖𝑑𝑒𝑠 + 𝑛𝑜𝑏𝑎𝑙𝑙𝑠 + 𝑏𝑦𝑒𝑠 + 𝑙𝑒𝑔𝑏𝑦𝑒𝑠 + 𝑝𝑒𝑛𝑎𝑙𝑡𝑦)

Total Wickets Taken:

𝑇𝑜𝑡𝑎𝑙 𝑊𝑖𝑐𝑘𝑒𝑡𝑠 = 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2021 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2022 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2023 + 𝑤𝑖𝑐𝑘𝑒𝑡𝑠𝑖𝑛2024

Total Balls Bowled:

𝑇𝑜𝑡𝑎𝑙 𝐵𝑎𝑙𝑙𝑠 𝐵𝑜𝑤𝑙𝑒𝑑

= 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2021 + 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2022 + 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2023

+ 𝑏𝑎𝑙𝑙𝑠𝑏𝑜𝑤𝑙𝑒𝑑𝑖𝑛2024

Bowling Economy Rate:

𝐸𝑐𝑜𝑛𝑜𝑚𝑦 𝑅𝑎𝑡𝑒 = 𝑇𝑜𝑡𝑎𝑙 𝑅𝑢𝑛𝑠 𝐶𝑜𝑛𝑐𝑒𝑑𝑒𝑑/𝑇𝑜𝑡𝑎𝑙 𝑂𝑣𝑒𝑟𝑠 𝐵𝑜𝑤𝑙𝑒𝑑

Convert balls to overs using:

𝑂𝑣𝑒𝑟𝑠 = ⌊𝐵𝑎𝑙𝑙𝑠/6⌋ + (𝐵𝑎𝑙𝑙𝑠 𝑚𝑜𝑑6 / 10)

Data Merging:

 Merge bang and bowling datasets using a common key (player name).

Data Cleaning:

 Fill null values with zeros, assuming players who didn't bat or bowl have zero

stascs.

4.5 Domain Knowledge

The domain knowledge required in this ﬁeld:

1. Understanding of Cricket: A deep understanding of cricket is fundamental. This includes:

 Rules of the game

 Various formats (Test, ODI, T20)

 Strategies and taccs, Historical trends

 Nuances that inﬂuence the game

2. Stascal Knowledge:

 Descripve stascs (mean, median, mode, range, standard deviaon, etc.)

 Inferenal stascs (regression analysis, correlaon analysis, ANOVA, hypothesis tesng)

 Understanding of key performance indicators (KPIs) in cricket (bang average, strike rate,

economy rate, etc.)

3. Data Types and Sources:

 Player performance data

 Team performance data

 Match data, Historical data

4. Analycal Techniques:

 Time series analysis

 Clustering analysis

 Machine learning algorithms

 Predicve modelling

5. Cricket-Speciﬁc Analycs:

 Understanding of Duckworth-Lewis (D/L) method and its applicaons

 Knowledge of player valuaon models

 Understanding of factors aﬀecng performance (pitch condions, player skills, opposion

strengths/weaknesses)

6. Strategic Applicaons:

 How to use data for team selecon

 Opmizing bang orders and bowling strategies

 Field placement strategies based on data

 In-game decision making using real-me analycs

7. Broader Sports Analycs Concepts:

 Familiarity with analycs approaches from other sports (e.g., metrics in sports)

 Understanding of how analycs can be applied to both performance analysis and fan

engagement.

5. Quantave and Predicve analysis

5.1 Win Rao Analysis of Teams

5.1.1 Need for Analysis

The goal is to analyse the performance of IPL teams, focusing parcularly on their win raos, to

determine which teams have been the most and least successful over the history of the tournament.

This analysis will help idenfy trends, strengths, and weaknesses among the teams, providing

insights into factors that contribute to long-term success in the IPL.

5.1.2 Objecve

1. Calculate and compare the win raos of all IPL teams.

2. Predict the Win rao of all teams

5.1.3 Data overview

Table 3 - Data for team’s performance overview

Team Name

Played Won Lost N/R Titles Finalists

Playoff

MI 261 144 117 0 5 6 11

RCB 256 123 129 4 0 3 9

KKR 252 131 120 1 3 4 7

DC 252 115 135 2 0 1 6

PK 246 112 134 0 0 1 2

CSK 239 138 99 2 5 10 13

RR 222 112 107 3 1 2 5

SRH 182 88 94 0 1 3 6

GT 45 28 17 0 1 2 2

LSG 44 24 19 1 0 0 2

5.1.4 Quantave Analysis

Calculate the win rao:

The win rao is a crucial metric in sports analysis, parcularly in leagues like the IPL, as it provides a

clear and quanﬁable measure of a team's success relave to its total games played. By calculang

the win rao, stakeholders including coaches, players, analysts, and fans can assess performance

over me, idenfy trends, and make informed decisions.

𝑊𝑖𝑛 𝑅𝑎𝑡𝑖𝑜 = (𝐺𝑎𝑚𝑒𝑠 𝑊𝑜𝑛) / (𝑇𝑜𝑡𝑎𝑙 𝐺𝑎𝑚𝑒𝑠 𝑃𝑙𝑎𝑦𝑒𝑑) ∗ 100

A high win rao indicates consistent success and compeveness, while a low rao may highlight

areas needing improvement. Addionally, win raos facilitate comparisons between teams,

regardless of the number of matches played, allowing for a more equitable evaluaon of

performance.

Table 4 - Win Rao of all teams

Team Name Win Ratio

MI 55.17241379

RCB 48.046875

KKR 51.98412698

DC 45.63492063

PK 45.52845528

CSK 57.74058577

RR 50.45045045

SRH 48.35164835

GT 62.22222222

LSG 54.54545455

Calculate the lost rao:

The loss rao is a key performance indicator that measures the proporon of games lost relave to

the total number of games played. It is calculated using the formula:

𝐿𝑜𝑠𝑠 𝑅𝑎𝑡𝑖𝑜 = (𝐺𝑎𝑚𝑒𝑠 𝐿𝑜𝑠𝑡/ 𝑇𝑜𝑡𝑎𝑙 𝐺𝑎𝑚𝑒𝑠 𝑃𝑙𝑎𝑦𝑒𝑑) × 100

Win Loss Rao

The win-loss rao is a crical metric used to evaluate performance in various compeve contexts,

including sports and sales. It is calculated using the formula:

Win Loss Rao= Number of Losses / Number of Wins

The win-loss rao diﬀerence is a crucial metric in sports analysis for several reasons:

1. Performance indicator: It provides a clear picture of a team's overall performance, showing

how much they're winning compared to losing.

2. Compeve edge: A posive diﬀerence indicates a team is winning more than losing,

suggesng a compeve advantage.

3. Trend analysis: Tracking this metric over me can reveal improvements or declines in team

performance.

5.1.5 Corelaon Analysis:

Figure 2 - Correlaon graph for Linear Reg Model

Strong posive correlaons exist between Played and Won/Lost (0.974/0.976), Win_Rao and

WR_Diﬀerence (0.997), and Titles and Finalists (0.899). Finalists and Playoﬀ appearances are also

strongly correlated (0.883). Strong negave correlaons are observed between Win_Rao and

lost_Rao (-0.989), and lost_Rao and WR_Diﬀerence (-0.997). Moderate correlaons include Won

vs. Playoﬀ (0.755), Titles vs. Playoﬀ (0.767), and Lost vs. lost_Rao (0.754).

Interesngly, Win_Rao and Playoﬀ appearances show only a weak correlaon (0.110), suggesng

regular-season performance doesn't always translate to playoﬀ success. N/R (No Result) has weak

correlaons with most metrics, indicang minimal impact on overall performance. These correlaons

provide insights into team performance paerns, highlighng relaonships between various metrics

in the dataset.

5.1.6 Linear Regression Model:

Linear Regression is ideal for analysing the IPL team performance data due to several factors. The

connuous dependent variable (Win Rao) and mulple independent variables make it suitable for

exploring relaonships and predicng outcomes. It oﬀers interpretable results through quanﬁable

impacts of each predictor, crucial for sports analycs. The model's simplicity makes it eﬀecve for

avoiding overﬁng (“see Appendix 3”). The high R-squared value (0.9969) indicates a strong ﬁt.

Overall, Linear Regression provides a balance of predicve power, interpretability, and robustness for

this performance analysis.

5.1.7 Predicon Analysis

Table 5 - Team predicon with diﬀerence

Team Actual Win% Predicted Win% Diﬀerence

MI 55.17 55.21 +0.04

RCB 48.05 48.10 +0.05

KKR 51.98 51.95 -0.03

DC 45.63 45.68 +0.05

PK 45.53 45.87 +0.34

CSK 57.74 57.95 +0.21

50.45

50.07

0.38

SRH

48.35

47.89

0.46

GT 62.22 61.94 -0.28

LSG 54.55 55.01 +0.46

1. Accuracy: The model's predicons are remarkably close to the actual values, with most

diﬀerences being less than 0.5 percentage points.

2. Consistency: The model performs well across diﬀerent teams, showing no signiﬁcant bias

towards over or under-predicon for speciﬁc teams.

3. Best Predicons:

 Mumbai Indians: Only a 0.04 diﬀerence.

 Kolkata Knight Riders: Only a 0.03 diﬀerence.

4. Largest Discrepancies:

 Sunrisers Hyderabad: Underpredicted by 0.46.

 Lucknow Super Giants: Overpredicted by 0.46.

5. Overall Trend: There is a slight tendency to overpredict for lower-performing teams and

underpredict for higher-performing teams, but the diﬀerences are minimal.

Model Performance

1. Mean Absolute Error (MAE): Approximately 0.23.

2. Root Mean Squared Error (RMSE): 0.2874.

These error metrics conﬁrm the high accuracy of the predicons, with an average deviaon of less

than 0.3 percentage points.

5.1.8 Predicon Findings:

Figure 3 - Predicon Graph

Trends in the performance of IPL teams:

1. Top Performers:

Chennai Super Kings (CSK) and Gujarat Titans emerge as the top performers, with predicted

win percentages of 57.95% and 61.94% respecvely. This suggests these teams have strong

overall player stascs and team dynamics that contribute to their success.

2. Mid-Range Performers:

Teams like Mumbai Indians (55.21%), Lucknow Super Giants (55.01%), and Kolkata Knight

Riders (51.95%) fall into the mid-range of performance. Their predicted win percentages

suggest consistent but not dominant performance.

3. Lower Performers:

Teams such as Punjab Kings (45.87%), Delhi Capitals (45.68%), and Royal Challengers

Bengaluru (48.10%) have lower predicted win percentages, indicang potenal areas for

improvement in their team composion or strategy.

4. Consistency in Predicon:

The model shows remarkable consistency across diﬀerent teams, with predicons closely

aligning with actual performance. This suggests the model has eﬀecvely captured key

factors inﬂuencing team success in the IPL.

5. Narrow Performance Range:

The predicted win percentages range from about 45% to 62%, indicang a compeve

league where even lower-performing teams have a substanal chance of winning matches.

These trends suggest that the predicon model has eﬀecvely captured the nuances of team

performance in the IPL, reﬂecng both the strengths of top teams and the areas for improvement for

others.

5.2 Rule-based scoring system combined with normalisaon and

weighted aggregaon

5.2.1 Need for Analysis

The rule-based scoring system in cricket provides a holisc player assessment by combining mulple

performance metrics, oﬀering a comprehensive evaluaon beyond individual stascs (“see

Appendix 4”).

 It employs role-based evaluaon, categorising players as Batsmen, Bowlers, or All-rounders,

enabling fair comparisons within similar roles. Normalised comparisons allow for uniﬁed

scoring across diverse player types, while weighted performance metrics reﬂect the strategic

priories of T20 cricket.

 The system excels in idenfying all-round talent and ranking players within categories,

providing context-speciﬁc assessments that are valuable for team selecon and player

development. It supports data-driven decision-making, performance benchmarking across

seasons or teams, and talent idenﬁcaon of potenally undervalued players.

 This analysis can inform contract and aucon strategies, parcularly useful for leagues like

the IPL. It enhances fan engagement and is applicable to fantasy cricket leagues. The system

allows for connuous performance monitoring, easily updated with new match data.

Overall, this comprehensive approach provides an objecve basis for strategic planning, team

composion, and player valuaon, making it a valuable tool for cricket management and analysis.

5.2.2 Objecve

The primary objecve of this analysis is to create a comprehensive, data-driven evaluaon system for

cricket players in T20 leagues like the IPL. It aims to quanfy player performance across mulple

dimensions, providing a single, numerical score that reﬂects a player's overall value to their team.

By combining and normalising various performance metrics such as runs scored, bang average,

strike rate, wickets taken, and economy rate, the analysis oﬀers a balanced assessment of player

contribuons. It disnguishes between diﬀerent player roles (batsmen, bowlers, and all-rounders),

ensuring fair comparisons within each category while also recognizing the unique value of versale

players.

5.2.3 Data Overview

This dataset provides a comprehensive overview of player performance in the Indian Premier League

(IPL) from 2021 to 2024, capturing key metrics for both bang and bowling. The dimensions of data

are: 300 rows x 8 columns. The data encompasses:

1. Bang Performance:

 "totalrunsscored": Aggregate runs scored by each player

 "Total_bang_average": Average runs scored per dismissal

 "bang_strike_rate": Runs scored per 100 balls faced

 "totalballsfaced": Total number of deliveries faced

2. Bowling Performance:

 "totalwickets": Number of wickets taken

 "economyrate": Average runs conceded per over

 "oversbowled_clean": Total overs bowled

The "striker" column likely idenﬁes individual players. This dataset allows for a mulfaceted analysis

of player contribuons, enabling comparisons between diﬀerent aspects of the game. It captures

both volume (total runs, wickets) and eﬃciency (average, strike rate, economy) metrics, providing a

balanced view of player performance. Inclusion of data over mulple seasons (2021-2024) allows for

trend analysis, tracking player development, and assessing consistency over me.

5.2.4 Quantave Analysis

Rule-Based Categorisaon

The players are categorised into diﬀerent roles (batsman, bowler, or all-rounder) based on

predeﬁned rules. These rules are simple condional checks based on the player's performance

metrics:

Let R = Total runs scored, W = Total wickets, B = Total balls faced

𝐵𝑎𝑡𝑠𝑚𝑎𝑛: 𝑅 ≥ 100 ∧ 𝑊 ≤ 2 ∧ 𝐵 ≥ 40

𝐵𝑜𝑤𝑙𝑒𝑟: 𝑊 > 5 ∧ 𝑅 ≤ 100

𝐴𝑙𝑙 − 𝑟𝑜𝑢𝑛𝑑𝑒𝑟: 𝑊 ≥ 3 ∧ 𝑅 ≥ 100

𝑂𝑡ℎ𝑒𝑟 𝑃𝑙𝑎𝑦𝑒𝑟𝑠: ¬(𝐵𝑎𝑡𝑠𝑚𝑎𝑛 ∨ 𝐵𝑜𝑤𝑙𝑒𝑟 ∨ 𝐴𝑙𝑙 − 𝑟𝑜𝑢𝑛𝑑𝑒𝑟)

 Batsman: More scored more than or equal to 100 runs and taken 2 or fewer wickets.

 Bowler: More than 5 wickets and scored 100 or fewer runs.

 All-rounder: Taken more than or equal to 3 wickets and scored 100 runs or more.

This categorisaon helps in determining which metrics are relevant for calculang the player's score.

Data Normalisaon:

Normalisaon is used to scale diﬀerent performance metrics to a common range (0 to 1). This

ensures that metrics with diﬀerent units and ranges can be compared and combined meaningfully.

 Min-Max Normalisaon: For metrics where a higher value is beer (e.g., runs scored,

wickets taken), the formula used is:

𝑋_𝑛𝑜𝑟𝑚 = (𝑋 − 𝑋_𝑚𝑖𝑛) / (𝑋_𝑚𝑎𝑥 − 𝑋_𝑚𝑖𝑛)

 Inverted Normalisaon for Economy Rate: Since a lower economy rate is beer, the

normalisaon is inverted:

𝐸_𝑛𝑜𝑟𝑚 = 1 − (𝐸 − 𝐸_𝑚𝑖𝑛) / (𝐸_𝑚𝑎𝑥 − 𝐸_𝑚𝑖𝑛)

Weight Aggregaon

The system uses a weighted sum approach to aggregate mulple normalised performance metrics

into a single score. The weights are assigned diﬀerently for batsmen, bowlers, and all-rounders to

reﬂect the relave importance of diﬀerent skills in T20 cricket.

For Batsmen:

𝑆𝑐𝑜𝑟𝑒 = (0.4 ∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑟𝑢𝑛𝑠 + 0.3 ∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑎𝑣𝑒𝑟𝑎𝑔𝑒 + 0.3

∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑠𝑡𝑟𝑖𝑘𝑒_𝑟𝑎𝑡𝑒) ∗ 100

For Bowlers:

𝑆𝑐𝑜𝑟𝑒 = (0.6 ∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑤𝑖𝑐𝑘𝑒𝑡𝑠 + 0.4 ∗ 𝑛𝑜𝑟𝑚𝑎𝑙𝑖𝑧𝑒𝑑_𝑒𝑐𝑜𝑛𝑜𝑚𝑦_𝑟𝑎𝑡𝑒) ∗ 100

For All-rounders:

𝑆𝑐𝑜𝑟𝑒 = (𝐵𝑎𝑡𝑡𝑖𝑛𝑔 𝑆𝑐𝑜𝑟𝑒 + 𝐵𝑜𝑤𝑙𝑖𝑛𝑔 𝑆𝑐𝑜𝑟𝑒) / 2

This weighted aggregaon allows for:

 Combining mulple performance aspects into a single, comprehensive score

 Adjusng the importance of diﬀerent metrics based on player role

 Balancing volume (e.g., total runs) with eﬃciency (e.g., strike rate)

Ranking

Aer calculang the overall scores, players are ranked within their respecve categories (Batsman,

Bowler, All-rounder, Other). The ranking is done using the 'min' method.

For each player type PT ∈ {Batsman, Bowler, All-rounder, Other}:

𝑅𝑎𝑛𝑘(𝑝𝑙𝑎𝑦𝑒𝑟_𝑖) = |{𝑝𝑙𝑎𝑦𝑒𝑟_𝑗 ∈ 𝑃𝑇 ∶ 𝑂𝑆(𝑝𝑙𝑎𝑦𝑒𝑟_𝑗) > 𝑂𝑆(𝑝𝑙𝑎𝑦𝑒𝑟_𝑖)}| + 1

Where |•| denotes the cardinality of the set.

 player_i is the player being ranked

 PT is the set of all players of the same player type (e.g., all batsmen)

 Score(player_x) is the overall score calculated for player x.

 |{...}| denotes the cardinality (size) of the set

This set comprehension idenﬁes all players (player_j) within the same player type (PT) whose scores

are strictly greater than the score of the player being ranked (player_i).

This approach is used because:

 It allows for fair comparison within roles, recognizing that diﬀerent skills are valued for

diﬀerent posions

 It provides a clear hierarchy within each player type

 The 'min' method ensures that players with equal scores receive the same rank, avoiding

arbitrary disncons

Overall Score Calculaon

The analysis has produced overall scores and rankings for IPL players across diﬀerent roles (Batsmen,

Bowlers, and All-rounders). The scores reﬂect a comprehensive evaluaon of player performance,

considering mulple metrics normalised and weighted according to their importance in T20 cricket.

Table 6 - Overall score table

Rank Player Player Type Overall Score

1 YS Chahal Bowler 87.70

2 CV Varun Bowler 73.69

3 Mohammed Shami Bowler 72.78

1 F du Plessis Batsman 71.37

2 Shubman Gill Batsman 70.19

3 RD Gaikwad Batsman 69.86

1 Rashid Khan All-rounder 54.49

2 AD Russell All-rounder 51.74

3 RA Jadeja All-rounder 51.51

This table highlights the top-ranked players in each category (Bowlers, Batsmen, and All-rounders)

along with their overall scores. It provides a clear overview of the leading performers in the IPL based

on the analysis conducted from 2021 to 2024.

5.3 Random Forest model to predict the overall score

5.3.1 Need for Analysis:

The random forest regression model is an excellent choice for predicng a player's Overall_score.

This model can eﬀecvely process the diverse set of features, including bang stascs ( total runs

scored, bang_strike_rate), bowling metrics (e.g., totalwickets, economyrate), and the crucial

Player_type category. By leveraging these varied inputs, the model can discern complex paerns that

contribute to a player's overall performance rang. The inclusion of normalised features allows for

fair comparison across diﬀerent stascal scales.

5.3.2 Objecve of the Model

To develop and implement a random forest regression model that accurately predicts the

Overall_score for cricket players based on their comprehensive performance stascs, including

bang and bowling metrics. The model aims to provide a data-driven, unbiased evaluaon of player

performance that can be used for team selecon, player ranking, and strategic decision-making in

cricket management and analysis.

5.3.3 Data Overview

The dataset contains various cricket player stascs, including both bang and bowling metrics. Key

features include:

1. Bang stascs: totalrunsscored, Total_bang_average, bang_strike_rate, totalballsfaced

2. Bowling stascs: totalwickets, economyrate, overs bowled

3. Normalised versions of features: totalrunsscored_norm, Total_bang_average_norm,

bang_strike_rate_norm, totalwickets_norm, economyrate_norm.

4. Player_type: Categorizes players as Batsman or Bowler or All-rounder

5. Overall_score: The target variable.

6. Rank: Player ranking based on Overall_score.

The dataset includes players with diverse roles (batsmen and bowlers), allowing the model to learn

paerns speciﬁc to each player type. The presence of both raw and normalised features provides

ﬂexibility in how the model interprets the data.

Figure 4 - Correlaon graph for Random Forest Model

This matrix suggests that bang performance has a stronger inﬂuence on Overall_score. The model

will likely give more weight to bang and bowling stascs, especially totalrunsscored and

Total_bang_average, and overall economy when predicng Overall_score.

5.3.4 Random Forest Regression Model

Evaluaon

1. Mean Squared Error (MSE): 3.365822488403902

 This is a relavely low MSE, suggesng that on average, the model's predicons

deviate from the actual Overall_score by about √3.37 ≈ 1.84 points.

 Given that the Overall_score likely spans a wider range, this level of error is quite

small.

2. R-squared Score: 0.9921912055446288 (99.22%)

 This is an extremely high R-squared value, indicang this model explains about

99.22% of the variance in the Overall_score.

 It suggests a very strong ﬁt between model's predicons and the actual

Overall_scores.

The model demonstrates excellent predicve power, capturing almost all the variability in the

Overall_score based on the provided features.

With an R-squared of 99.22%, the model's predicons are very closely aligned with the actual scores,

leaving only about 0.78% of the variance unexplained.The low MSE further conﬁrms the high

accuracy of the predicons.

5.3.5 Model Visualisaon

Actual vs Predicted Values Scaer Plot shows how well the predicted values align with the actual

values. Points closer to the red dashed line indicate beer predicons.

Figure 5 - Actual vs Predicted Graph for RF 1

Residuals Plot:

This plot helps idenfy any paerns in the residuals (predicon errors). Ideally, the residuals should

be randomly scaered around the horizontal line at y=0.

Figure 6 - Residual graph for RF 1

Predicon Error Distribuon

This histogram shows the distribuon of predicon errors. A distribuon cantered around zero and

symmetric indicates good model performance.

Figure 7 - Predicon error histogram for RF 1

5.4 Random Forest Model using RandomizedSearchCV

5.4.1 Objecve of the model

The objecve includes ﬁnding the opmal combinaon of hyperparameters for the random forest

model using RandomizedSearchCV. This aims to improve model performance beyond what's

achievable with default sengs.

5.4.2 Representaon of Random Forest model with RandomizedSearchCV

Let f



(x) be the Random Forest predicon for input x. The Random Forest model is an ensemble of

decision trees, and its predicon is the average of the predicons of all trees:

𝑓(𝑥) = 1/𝑀 ∑[𝑚 = 1 𝑡𝑜 𝑀] 𝑇_𝑚(𝑥)

Where:

 M is the number of trees (n_esmators in the grid)

 T_m(x) is the predicon of the m-th tree

Each tree T_m is constructed as follows:

1. Bootstrap sampling (if bootstrap=True):

Draw n samples with replacement from the training data, where n is the number of training

samples.

2. At each node of the tree:

a. Select k features randomly, where k is determined by max_features:

 If max_features='sqrt', k = √p, where p is the total number of features

 If max_features='auto', it's the same as 'sqrt' for regression

To Find the best split among the k features based on mean squared error reducon:

𝛥𝐼 = 𝐼(𝑝𝑎𝑟𝑒𝑛𝑡) − (𝑛_𝑙𝑒𝑓𝑡/𝑛 ∗ 𝐼(𝑙𝑒𝑓𝑡) + 𝑛_𝑟𝑖𝑔ℎ𝑡/𝑛 ∗ 𝐼(𝑟𝑖𝑔ℎ𝑡))

where I is the impurity measure (variance for regression), and n is the number of samples.c. Split the

node if:

 The number of samples is ≥ min_samples_split

 The depth of the node is < max_depth (if speciﬁed)

3. Stop growing the tree when:

 A node has ≤ min_samples_leaf samples

 No further splits can improve the model

The tuned hyperparameters aﬀect this process as follows:

 n_esmators: Determines M

 max_features: Aﬀects k in step 2a

 max_depth: Limits the depth in step 2c

 min_samples_split: Used in step 2c

 min_samples_leaf: Used in step 3

 bootstrap: Determines whether step 1 is performed

The ﬁnal predicon for a new input x is:

ŷ = 𝑓(𝑥) = 1/𝑀 ∑[𝑚 = 1 𝑡𝑜 𝑀] 𝑇_𝑚(𝑥)

RandomizedSearchCV will try diﬀerent combinaons of these hyperparameters to minimize the

cross-validaon error, typically mean squared error for regression:

𝑀𝑆𝐸 = 1/𝑛 ∑[𝑖 = 1 𝑡𝑜 𝑛] (𝑦_𝑖 − ŷ_𝑖)²

Where y_i are the true values and ŷ_i are the predicted values.

5.4.3 Evaluaon

1. Mean Squared Error (MSE): 5.564417967827715

 This is slightly higher than previous model (which had an MSE of 3.37).

 It indicates that, on average, predicons deviate from the actual Overall_score by

about √5.56 ≈ 2.36 points.

2. R-squared Score: 0.9879148111371704

 This is sll an excellent R-squared value, indicang this model explains about 98.79%

of the variance in the Overall_score.

 It's slightly lower than the previous model (which had an R-squared of 0.9922).

5.4.4 Model Visualisaon

Actual vs Predicted Analysis

Figure 8 - Actual vs predicted graph for RF 2

1. Strong Correlaon: The scaer plot should show a very strong linear relaonship between

actual and predicted values, with points clustering ghtly around the diagonal line (y=x).

2. Minimal Scaer: Given the high R-squared value of 0.9879, there is a lile scaer or

deviaon from the diagonal line.

3. Consistent Accuracy: The model's predicons should be consistently accurate across the

range of Overall_scores, without signiﬁcant bias towards over- or under-predicon.

4. Small Deviaons: The MSE of 5.564 suggests that, on average, predicons deviate from

actual values by about √5.564 ≈ 2.36 points. This small deviaon might be barely noceable

in the plot.

5. Range Coverage: The plot should show that the model performs well across the enre range

of Overall_scores, from low to high values.

Distribuon of Predicon Errors Analysis

Figure 9 - Predicted error histogram for RF 2

1. Cantered around Zero: The histogram should be cantered very close to zero, indicang that

the model's predicons are unbiased. This means the model is equally likely to slightly

overpredict or underpredict.

2. Narrow Distribuon: Given the low MSE and high R-squared, should see a narrow

distribuon of errors. Most errors will be clustered ghtly around zero.

3. Symmetry: The distribuon should appear roughly symmetrical, resembling a normal

distribuon. This suggests that posive and negave errors are equally likely and of similar

magnitudes.

4. Smooth KDE Line: The Kernel Density Esmaon (KDE) line should show a smooth, bell-

shaped curve overlaying the histogram, further emphasizing the normal-like distribuon of

errors.

5.5 XG Boosng Method

5.5.1 Objecve of this model

1. Handling Complex Relaonships

XGBoost is parcularly eﬀecve in capturing complex, non-linear relaonships between features. In

the context of cricket, the relaonship between various player stascs such as bang average,

strike rate, total runs scored, and wickets taken and the Overall_score is likely to be intricate.

2. Feature Importance

The model provides built-in feature importance scores, which are valuable for idenfying the most

signiﬁcant cricket stascs that contribute to a player's Overall_score.

3. Mixed Data Types

The dataset includes both connuous variables (e.g., bang average, economy rate) and categorical

variables (e.g., Player_type). XGBoost eﬀecvely handles both types of data, allowing for a

comprehensive analysis of player performance without extensive preprocessing.

4. Flexibility in Loss Funcons

XGBoost allows for the customizaon of loss funcons, which can be beneﬁcial for tailoring the

model to speciﬁc nuances in the calculaon of Overall_score. This ﬂexibility enhances the model's

applicability to various performance metrics. This structured explanaon provides a comprehensive

raonale for the use of the XGBoost model in predicng player Overall_score, suitable for publicaon

or formal reporng.

5.5.2 Representaon of the Model

1. Data Preparaon: This XGBoost model employs a comprehensive approach. It starts with

data preparaon, scaling features using StandardScaler. The model formulaon uses an

ensemble of decision trees, with each tree contribung to the ﬁnal predicon. The objecve

funcon balances predicon accuracy and model complexity through regularizaon.

𝑋 =

{

𝑥



}{

𝑖 = 1

}



, 𝑤ℎ𝑒𝑟𝑒 𝑥_𝑖

∈ ℝ^𝑝 (𝑝 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠 𝑎𝑓𝑡𝑒𝑟 𝑑𝑟𝑜𝑝𝑝𝑖𝑛𝑔 ′𝑈𝑛𝑛𝑎𝑚𝑒𝑑: 0′, ′𝑠𝑡𝑟𝑖𝑘𝑒𝑟′, ′𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒′, ′𝑅𝑎𝑛𝑘′)

𝑦 = {𝑦_𝑖}{𝑖 = 1}^𝑛, 𝑤ℎ𝑒𝑟𝑒 𝑦_𝑖 𝑖𝑠 𝑡ℎ𝑒 𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒

2. Feature Transformaon:

𝑋_𝑠𝑐𝑎𝑙𝑒𝑑 = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑𝑆𝑐𝑎𝑙𝑒𝑟(𝑋)

𝑋_𝑠𝑐𝑎𝑙𝑒𝑑_𝑖 = (𝑥_𝑖 − 𝜇_𝑖) / 𝜎_𝑖, 𝑓𝑜𝑟 𝑒𝑎𝑐ℎ 𝑓𝑒𝑎𝑡𝑢𝑟𝑒 𝑖

3. Model Formulaon:

ŷ_𝑖 = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑥_𝑠𝑐𝑎𝑙𝑒𝑑_𝑖)

Where:

 K is the number of trees (n_esmators in param_grid)

 f_k is the k-th tree in the ensemble

4. Objecve Funcon:

𝑂𝑏𝑗

(

𝜃

)

= 𝛴

(

𝑖 = 1 𝑡𝑜 𝑛

)(

𝑦



− ŷ



)



+ 𝛴

(

𝑘 = 1 𝑡𝑜 𝐾

)

𝛺

(

𝑓



)

𝑊ℎ𝑒𝑟𝑒 𝛺(𝑓) = 𝛾𝑇 + 1/2 𝜆||𝑤||^2 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚

5. Tree Building Process: The tree-building process involves calculang gradients and hessians,

then selecng opmal splits based on gain. Leaf weights are calculated to minimize the

objecve funcon. Hyperparameter opmizaon is performed using RandomizedSearchCV,

exploring various combinaons of tree numbers, depth, learning rate, and sampling

parameters.

For each tree f_k:

𝑎. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑔𝑟𝑎𝑑𝑖𝑒𝑛𝑡𝑠: 𝑔_𝑖 = 𝜕(𝑦_𝑖 − ŷ_𝑖^(𝑡 − 1))^2 / 𝜕ŷ_𝑖^(𝑡 − 1)

𝑏. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 ℎ𝑒𝑠𝑠𝑖𝑎𝑛𝑠: ℎ_𝑖 = 𝜕^2(𝑦_𝑖 − ŷ_𝑖^(𝑡 − 1))^2 / 𝜕ŷ_𝑖^(𝑡 − 1)^2

𝑐. 𝐹𝑜𝑟 𝑒𝑎𝑐ℎ 𝑝𝑜𝑡𝑒𝑛𝑡𝑖𝑎𝑙 𝑠𝑝𝑙𝑖𝑡:

𝐺𝑎𝑖𝑛 = 1/2 [ (𝛴𝑔_𝐿)^2 / (𝛴ℎ_𝐿 + 𝜆) + (𝛴𝑔_𝑅)^2 / (𝛴ℎ_𝑅 + 𝜆) − (𝛴𝑔)^2 / (𝛴ℎ

+ 𝜆) ] − 𝛾

𝑑. 𝐶ℎ𝑜𝑜𝑠𝑒 𝑠𝑝𝑙𝑖𝑡 𝑤𝑖𝑡ℎ 𝑚𝑎𝑥𝑖𝑚𝑢𝑚 𝑔𝑎𝑖𝑛

𝑒. 𝐶𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒 𝑙𝑒𝑎𝑓 𝑤𝑒𝑖𝑔ℎ𝑡𝑠: 𝑤_𝑗 = −𝛴(𝑖 ∈ 𝐼_𝑗) 𝑔_𝑖 / (𝛴(𝑖 ∈ 𝐼_𝑗) ℎ_𝑖 + 𝜆)

6. Hyperparameter Opmizaon:

Using RandomizedSearchCV to opmize over:

 𝑛_𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑜𝑟𝑠 ∈ {100, 200, 300, 400, 500}

 𝑚𝑎𝑥_𝑑𝑒𝑝𝑡ℎ ∈ {3, 4, 5, 6, 7, 8}

 𝑙𝑒𝑎𝑟𝑛𝑖𝑛𝑔_𝑟𝑎𝑡𝑒 ∈ {0.01, 0.05, 0.1, 0.2}

 𝑠𝑢𝑏𝑠𝑎𝑚𝑝𝑙𝑒 ∈ {0.6, 0.7, 0.8, 0.9, 1.0}

 𝑐𝑜𝑙𝑠𝑎𝑚𝑝𝑙𝑒_𝑏𝑦𝑡𝑟𝑒𝑒 ∈ {0.6, 0.7, 0.8, 0.9, 1.0}

 𝑚𝑖𝑛_𝑐ℎ𝑖𝑙𝑑_𝑤𝑒𝑖𝑔ℎ𝑡 ∈ {1, 2, 3, 4, 5}

7. Final Predicon:

For a new scaled input x_new:

ŷ_𝑛𝑒𝑤 = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑥_𝑛𝑒𝑤)

8. Model Evaluaon:

𝑀𝑆𝐸 = 1/𝑛 𝛴(𝑖 = 1 𝑡𝑜 𝑛) (𝑦_𝑖 − ŷ_𝑖)^2 = 3.2368

𝑅^2 = 1 − 𝛴(𝑦_𝑖 − ŷ_𝑖)^2 / 𝛴(𝑦_𝑖 − ȳ)^2 = 0.9930

Final predicons are made by summing contribuons from all trees. The model's performance is

evaluated using Mean Squared Error (3.2368) and R-squared (0.9930), indicang high accuracy in

predicng Overall_score. This approach allows for complex, non-linear relaonships between cricket

stascs and overall performance to be captured eﬀecvely.

5.5.3 Model Visualisaon

Actual vs predicted analysis

Figure 10 - Actual Vs predicted graph for XGBoost model 1

This visualisaon, combined with the low Mean Squared Error of 3.2368, demonstrates the XGBoost

model's exceponal ability to capture the underlying paerns in the cricket performance data and

accurately predict player Overall_scores.

The Actual vs Predicted scaer plot demonstrates the XGBoost model's high accuracy in predicng

player Overall_scores. Points closely align with the diagonal, reﬂecng the strong R-squared value

(0.9930). The ght clustering and absence of signiﬁcant deviaons indicate consistent performance

across all score ranges, validang the model's robustness and predicve power in cricket

performance analysis.

Distribuon of Predicon Errors Analysis

Figure 11 - Predicon error histogram for XGBoost model 1

The error histogram shows a narrow, symmetrical distribuon centred at zero, indicang unbiased

and accurate predicons. The high central peak and short tails conﬁrm low error rates, aligning with

the model's strong R-squared (0.9930) and low MSE (3.2368). This validates the XGBoost model's

eﬀecveness in cricket performance analysis.

5.6 Enhanced XG Boosng model

The enhanced XGBoost model, incorporang feature engineering and extensive hyperparameter

tuning, demonstrates a robust performance in predicng player Overall_scores.

5.6.1 Representaon of the model

Feature Engineering:

𝑋_𝑖 = [𝑥_1, . . . , 𝑥_𝑝, 𝑟𝑢𝑛𝑠_𝑝𝑒𝑟_𝑏𝑎𝑙𝑙, 𝑤𝑖𝑐𝑘𝑒𝑡𝑠_𝑝𝑒𝑟_𝑜𝑣𝑒𝑟]

Where:

𝑟𝑢𝑛𝑠_𝑝𝑒𝑟_𝑏𝑎𝑙𝑙 = 𝑡𝑜𝑡𝑎𝑙𝑟𝑢𝑛𝑠𝑠𝑐𝑜𝑟𝑒𝑑 / 𝑡𝑜𝑡𝑎𝑙𝑏𝑎𝑙𝑙𝑠𝑓𝑎𝑐𝑒𝑑

𝑤𝑖𝑐𝑘𝑒𝑡𝑠_𝑝𝑒𝑟_𝑜𝑣𝑒𝑟 = 𝑡𝑜𝑡𝑎𝑙𝑤𝑖𝑐𝑘𝑒𝑡𝑠 / 𝑜𝑣𝑒𝑟𝑠𝑏𝑜𝑤𝑙𝑒𝑑_𝑐𝑙𝑒𝑎𝑛

Model Structure:

𝑓(𝑋) = 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝑓_𝑘(𝑋)

Where K is the number of trees (n_esmators)

Tree Structure:

𝑓_𝑘(𝑋) = 𝑤_𝑞(𝑋), 𝑤ℎ𝑒𝑟𝑒 𝑞: ℝ^𝑑 → {1,2, . . . , 𝑇}, 𝑤 ∈ ℝ^𝑇

T is the number of leaves in the tree

Objecve Funcon:

𝑂𝑏𝑗(𝜃) = 𝛴(𝑖 = 1 𝑡𝑜 𝑛) 𝑙(𝑦_𝑖, ŷ_𝑖) + 𝛴(𝑘 = 1 𝑡𝑜 𝐾) 𝛺(𝑓_𝑘)

Where:

𝑙(𝑦_𝑖, ŷ_𝑖) 𝑖𝑠 𝑡ℎ𝑒 𝑙𝑜𝑠𝑠 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛 (𝑡𝑦𝑝𝑖𝑐𝑎𝑙𝑙𝑦 𝑀𝑆𝐸 𝑓𝑜𝑟 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛)

𝛺(𝑓) = 𝛾𝑇 + 1/2 𝜆||𝑤||^2 𝑖𝑠 𝑡ℎ𝑒 𝑟𝑒𝑔𝑢𝑙𝑎𝑟𝑖𝑧𝑎𝑡𝑖𝑜𝑛 𝑡𝑒𝑟𝑚

Update Rule:

𝑓_𝑚(𝑥) = 𝑓_𝑚 − 1(𝑥) + 𝜂 ∗ ℎ_𝑚(𝑥)

Where η is the learning rate and h_m is the weak learner

Hyperparameter Space:

θ ∈ {n_esmators, max_depth, learning_rate, subsample, colsample_bytree, min_child_weight,

gamma, reg_alpha, reg_lambda}

Feature Selecon:

𝑋_𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑 = 𝑆(𝑋), where S is the selecon funcon based on feature importance

Final Predicon:

ŷ = 𝑓_𝑓𝑖𝑛𝑎𝑙(𝑋_𝑠𝑒𝑙𝑒𝑐𝑡𝑒𝑑)

Model Evaluaon:

𝑀𝑆𝐸 = 1/𝑛 𝛴(𝑖 = 1 𝑡𝑜 𝑛) (𝑦_𝑖 − ŷ_𝑖)^2 = 85.5695

𝑅^2 = 1 − 𝛴(𝑦_𝑖 − ŷ_𝑖)^2 / 𝛴(𝑦_𝑖 − ȳ)^2 = 0.8142

The enhanced XGBoost model for predicng cricket player Overall_scores combine feature

engineering, ensemble decision trees, and advanced opmizaon. It creates eﬃciency metrics,

ulizes regularizaon, and employs RandomizedSearchCV for hyperparameter tuning. Feature

selecon focuses on impacul variables. With an R-squared of 0.8142 and MSE of 85.5695, the

model explains 81.42% of Overall_scores variance, oﬀering a robust tool for player evaluaon and

team strategy formulaon.

This model performance is not great when compared with other models, hence less priority is given

to the model.

5.7 Support Vector Regression Model

5.7.1 Objecve of the model

The Support Vector Regression (SVR) for predicng cricket player Overall_scores aim to develop a

robust and accurate model capable of handling complex, non-linear relaonships within

performance data (Smola and Schölkopf, 2004). This approach oﬀers several key advantages:

1. Accurate predicon of Overall_scores using a subset of crical performance metrics.

2. Idenﬁcaon of non-linear paerns in cricket performance data that may be overlooked by

simpler models.

3. Opmizaon of the balance between model complexity and predicon accuracy through

hyperparameter tuning (Cherkassky and Ma, 2004).

SVR is parcularly well-suited for this dataset and predicon task due to its:

 Ability to capture non-linear relaonships using the RBF kernel (Drucker et al., 1997).

 Robustness to outliers.

 Regularizaon capabilies through the 'C' parameter (James et al., 2013).

 Precision control via the 'epsilon' parameter.

 Versality in kernel selecon,

5.7.2 Representaon of the model

Feature Selecon and Scaling:

𝑋_𝑠𝑐𝑎𝑙𝑒𝑑 = (𝑋 − 𝜇) / 𝜎

Where X is the feature matrix.

This step standardizes the selected features, ensuring they're on the same scale.

SVR Objecve Funcon:

𝑚𝑖𝑛_{𝑤, 𝑏, 𝜉, 𝜉 ∗} 1/2 ||𝑤||^2 + 𝐶 ∑_{𝑖 = 1}^𝑚 (𝜉_𝑖 + 𝜉_𝑖 ∗)

𝑠𝑢𝑏𝑗𝑒𝑐𝑡 𝑡𝑜:

𝑦_𝑖 − (𝑤^𝑇 𝜑(𝑥_𝑖) + 𝑏) ≤ 𝜀 + 𝜉_𝑖

(𝑤^𝑇 𝜑(𝑥_𝑖) + 𝑏) − 𝑦_𝑖 ≤ 𝜀 + 𝜉_𝑖 ∗

𝜉_𝑖, 𝜉_𝑖 ∗ ≥ 0

𝑊ℎ𝑒𝑟𝑒 𝑦 𝑖𝑠 𝑡ℎ𝑒 ′𝑂𝑣𝑒𝑟𝑎𝑙𝑙_𝑠𝑐𝑜𝑟𝑒′ 𝑣𝑒𝑐𝑡𝑜𝑟.

RBF Kernel:

𝐾(𝑥_𝑖, 𝑥_𝑗) = 𝑒𝑥𝑝(−𝛾 ||𝑥_𝑖 − 𝑥_𝑗||^2)

The RBF kernel was selected as the best performing kernel in grid search.

Hyperparameter Opmizaon:

(𝐶 ∗, 𝜀 ∗, 𝑘𝑒𝑟𝑛𝑒𝑙 ∗) = 𝑎𝑟𝑔𝑚𝑖𝑛_{𝐶, 𝜀, 𝑘𝑒𝑟𝑛𝑒𝑙} 𝐶𝑉_𝑒𝑟𝑟𝑜𝑟(𝑆𝑉𝑅(𝐶, 𝜀, 𝑘𝑒𝑟𝑛𝑒𝑙))

𝑊ℎ𝑒𝑟𝑒:

𝐶 ∈ {0.1, 1, 10, 100}

𝜀 ∈ {0.01, 0.1, 0.5, 1}

𝑘𝑒𝑟𝑛𝑒𝑙 ∈ {′𝑟𝑏𝑓′, ′𝑝𝑜𝑙𝑦′, ′𝑠𝑖𝑔𝑚𝑜𝑖𝑑′}

The grid search found the opmal parameters: C = 100, ε = 0.01, kernel = 'rbf'.

Predicon Funcon:

𝑓(𝑥) = ∑_{𝑖 = 1}^𝑚 (𝛼_𝑖 − 𝛼_𝑖 ∗) 𝐾(𝑥_𝑖, 𝑥) + 𝑏

Model Evaluaon:

𝑀𝑆𝐸 = 1/𝑛 ∑_{𝑖 = 1}^𝑛 (𝑦_𝑖 − 𝑓(𝑥_𝑖))^2 = 16.932580949419034

𝑅^2 = 1 − ∑(𝑦_𝑖 − 𝑓(𝑥_𝑖))^2 / ∑(𝑦_𝑖 − ȳ)^2 = 0.9632246463346164

The SVR model demonstrates strong performance in predicng cricket player Overall_scores, as

evidenced by its high R-squared value (0.9632) and low MSE (16.9326). Compared to other models,

SVR's ability to capture non-linear relaonships through its RBF kernel and its robustness to outliers

make it parcularly well-suited for this complex sports data. The model's opmized hyperparameters

further enhance its predicve accuracy.

5.7.3 Model Visualisaon

Figure 12 - Actual vs predicted graph for SVR

The Actual vs Predicted plot for the SVR model illustrates its high predicve accuracy. The scaer

points closely align with the red diagonal line, indicang strong agreement between actual and

predicted Overall_scores. This visual representaon corroborates the model's high R-squared value

(0.9632) and low MSE (16.9326), demonstrang the SVR's eﬀecveness in capturing the underlying

paerns in cricket performance data for accurate player evaluaon.

5.7.3 Distribuon of Predicon errors

Figure 13 - Predicon error histogram for SVR

The histogram of predicon errors shows a symmetric distribuon cantered near zero, indicang

unbiased predicons. The narrow spread suggests small errors, conﬁrming the model's high

accuracy. This visualisaon aligns with the low MSE (16.9326) and high R-squared (0.9632) values.

5.8 Machine Learning Models and their accuracy results

5.8.1 Evaluaon

Table 7 - All models evaluaon metrics

Model Name MSE R2 Accuracy in %

Linear Regression Model (used only

for team analysis)

0.08258 0.9969 99.6%

Random Forest Model 1 3.3658 0.9922 99.22%

Random Forest Model 2 5.5644 0.9879 98.79%

XG Boosng Model 1 3.2368 0.9930 99.30%

XG Boo

ng Model 2

85.570

0.8142

81.42

Support Vector Regression Model

16.933

0.9632

96.32%

The XG Boosng Model 1 appears to be the most suitable for predicon. It demonstrates the best

overall performance with:

1. Lowest Mean Squared Error (MSE) of 3.2368

2. Highest R-squared value of 0.9930

3. Highest accuracy of 99.30%

XG Boosng Model 1 outperforms all others, with the lowest MSE and highest R-squared, explaining

99.30% of target variable variance. Random Forest models follow closely. Support Vector Regression

performs well but less accurately. XG Boosng Model 2 underperforms signiﬁcantly. XG Boosng

Model 1 is recommended for cricket player performance predicon.

5.8.2 Fine-Tuning

The ﬁne-tuning process led to noceable improvements in error metrics (lower MSE) and

explanatory power (higher R-squared) for both top models. This indicates that the ﬁne-tuning

successfully opmized the models to beer ﬁt the speciﬁc paerns in player performance data. The

marginal gains in these already high-performing models suggest that ﬁne-tuning helped capture

subtle nuances in the data, potenally leading to more precise player analysis and predicons.

Table 8 - Evaluaon metrics aer ﬁne-tuning

Model Name MSE R2 Accuracy in %

Random Forest Model 1 2.636 0.9942 99.42%

Random Forest Model 2

5.5644

0.9879

98.79%

XG Boo

ng Model 1

2.49

0.9946

99.

XG Boosng Model 2 85.570 0.8142 81.42

Support Vector Regression Model 16.933 0.9632 96.32%

Fine-tuning had a posive impact on the overall performance metrics of top models:

1. Random Forest Model 1:

 R-squared increased from 0.9922 to 0.9942

 Accuracy improved from 99.22% to 99.42%

2. XG Boosng Model 1:

 MSE improved from 3.2368 to 2.49

 R-squared increased from 0.9930 to 0.9946

 Accuracy improved to 99.46%

5.8.3 Model Tesng

Sample data

Table 9 - Sample data for model tesng

Name

Total

Runs

Batting

Average

Batting

Strike Rate

Total

Wickets

Economy

Rate

Balls

Batted

Balls

Bowled

Jos Buttler 391 43.44 158.62 0 0 246 0

Tymal Mills 15 7.5 125 16 8.2 20 120

Will Jacks 230 32.86 145.57 3 7.8 158 30

Liam

Livingstone 185 26.43 152.89 5 8.5 140 40

Reece Topley 20 10 111.11 11 7.9 25 90

Dawid Malan 278 39.71 140.4 0 0 180 0

Sam Curran 160 22.86 133.33 8 8.7 130 70

Tom Abell 145 24.17 128.32 2 9.2 120 20

Adil Rashid 35 11.67 106.06 10 7.5 30 80

Harry Brook 238 47.6 172.46 0 0 150 0

The data presented in the table comes from the excing 2023 “Hundred” tournament held in

England (ECB, 2023). 10 players are selected at random to highlight their performance metrics,

including total runs scored, bang averages, strike rates, total wickets taken, economy rates, balls

baed, and balls bowled (ESPNcricinfo, 2023).

5.8.4 Random Forest Model 1 Predicon

Table 10 - Random Forest Model 1 predicon results

Player Predicted Overall score

Jos Buttler 68.40700315

Tymal Mills 74.50774429

Will Jacks 51.35719062

Liam Livingstone 48.25745156

Reece Topley 70.82660776

Dawid Malan 57.29453392

Sam Curran 44.74423243

Tom Abell 39.63716106

Adil Rashid 63.22290778

Harry Brook 51.47323937

Random Forest predicons show a range of scores from 39.64 to 74.51, with an average of around

57. The model seems to predict higher scores for bowlers like Tymal Mills (74.51) and Reece Topley

(70.83), while predicng lower scores for some batsmen like Tom Abell (39.64).

5.8.5 XG Boost Model 1 Predicon

Table 11 - XG Boost Model 1 Predicon results

Player Predicted Overall score

Jos Buttler 70.39937

Tymal Mills 72.09699

Will Jacks 45.95546

Liam Livingstone 44.136646

Reece Topley 67.81441

Dawid Malan 61.69961

Sam Curran 46.41379

Tom Abell 33.50635

Adil Rashid 48.7163

Harry Brook 57.242996

XGBoost predicons range from 33.51 to 72.10, averaging around 54. This model also predicts high

scores for bowlers, with Tymal Mills at 72.10 and Reece Topley at 67.81. However, it predicts lower

scores for some players like Tom Abell (33.51) and Will Jacks (45.96).

5.8.6 Performance Distribuon Curves

Figure 14 - Performance distribuon curve for RF 1 and XGBoost

The performance distribuon curves show the spread and frequency of predicted and actual scores

for the cricket players.

Random Forest Model Distribuon:

The curve for the Random Forest model predicons likely shows a relavely widespread, with scores

ranging from about 39 to 75. The peak of the curve might be around the mid-50s, indicang that the

model frequently predicts scores in this range. There may be a slight right skew, suggesng the

model tends to predict higher scores more oen than lower ones.

XGBoost Model Distribuon:

The XGBoost model's distribuon curve probably shows a similar range to the Random Forest model,

from about 33 to 72. However, the shape of the curve might be diﬀerent, possibly with a sharper

peak or mulple smaller peaks, reﬂecng the model's tendency to make more extreme predicons in

some cases.

Actual Scores Distribuon:

The curve for actual scores likely shows the widest spread, ranging from about 35 to 90. This curve

might have a ﬂaer shape compared to the model predicons, indicang more variability in real-

world performance.

5.8.7 ROC curves

ROC curves visually represent the trade-oﬀ between true posive rate (sensivity) and false posive

rate (1 - speciﬁcity) as the classiﬁcaon threshold changes. Using ROC curves, comprehensively

assess and compare models' abilies to disnguish between diﬀerent levels of cricket player

performance (Brownlee, 2018). The AUC provides a single scalar value summarizing the model's

performance, making it easy to quickly compare model.

𝐴𝑈𝐶 = ∫ 𝑇𝑃𝑅 𝑑(𝐹𝑃𝑅)

𝑊ℎ𝑒𝑟𝑒, 𝑇𝑃𝑅 = 𝑇𝑟𝑢𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒

𝑑(𝐹𝑃𝑅) = 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑜𝑓 𝐹𝑎𝑙𝑠𝑒 𝑃𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑅𝑎𝑡𝑒

The AUC can be interpreted as the probability that the model ranks a random posive example

higher than a random negave example, which is parcularly relevant for ranking player

performance (Hajian-Tilaki, 2013).

Figure 15 - ROC Curve graph

Based on the AUC scores, XGBoost (AUC = 0.92) outperforms Random Forest (AUC = 0.79) in

predicng cricket player performance.

XGBoost's higher AUC indicates superior ability to disnguish between high and low performers. This

model demonstrates a 92% probability of correctly ranking players, making it more reliable for

performance predicons and team selecon decisions in cricket analycs.

6. Players Overall Performance score for KKR and DC

6.1 Kolkata Knight Riders Current Players Analysis

Using the “rule-based scoring system”, the overall scores for these players are calculated.

Figure 16 - Bar chart for KKR current players

This chart shows the performance score for the players who played in the 2024 season. The data is

calculated using rule-based scoring system and data taken from 2021 to 2024. Top performers for

KKR are Rinku Singh, Varun Chakaravarthy, Venkatesh Iyer, Shreyas Iyer, Andre Russell, Phil Salt, Sunil

Narine.

Bang Dominance

KKR's bang lineup has been formidable, with several players making substanal contribuons:

1. Sunil Narine has emerged as the team's top run-geer, accumulang 488 runs at a strike rate

of 180.74.

2. Phil Salt has been a revelaon at the top of the order, amassing 435 runs with a blistering

strike rate of 182.00.

3. Venkatesh Iyer has shown remarkable consistency, scoring 370 runs at an average of 46.25.

4. Shreyas Iyer has also been a key player, scoring 351 runs at an average of 39.00, further

solidifying the middle order. Ramandeep Singh has shown promise with 125 runs in 10

matches, including a highest score of 35 and a strike rate of 205.88.

All-Round Excellence

The team's all-round capabilies have been a key factor in their success:

 Sunil Narine has excelled as an all-rounder, complemenng his bang prowess with 17

wickets at an economical rate of 6.69 runs per over.

 Andre Russell connues to be a vital asset, contribung 222 runs at a strike rate of 185 while

also claiming 19 wickets.

Bowling strength

KKR's bowling aack has been equally impressive:

100

KKR current players overall_score

1. Varun Chakaravarthy leads the wicket-taking charts with 21 scalps.

2. Andre Russell has provided crucial breakthroughs, securing 19 wickets.

3. Mitchell Starc, despite a higher economy rate, has taken 17 wickets, including a 4-wicket

haul.

4. Harshit Rana has added depth to the bowling lineup with 19 wickets. Vaibhav Arora has

made a signiﬁcant impact, taking 11 wickets in 10 matches at an average of 25.09 and an

economy of 8.24

Emerging Talent

Angkrish Raghuvanshi has shown promise as a future prospect, scoring 163 runs in 10 matches at a

strike rate of 155.23.

Team Balance

The team must carefully weigh retaining star performers against nurturing emerging talents, while

also considering team chemistry and long-term strategy. This intricate decision-making process is

crical for KKR's future success and compeveness in the league.

6.2 Delhi Capitals Current Players Analysis

Figure 17 - Bar chart for DC current players

This bar graph data shows overall scores for Delhi Capitals players based on a rule-based scoring

system for the 2024 IPL season. Rishabh Pant leads with the highest score of 82.45, followed closely

by Jake Fraser-McGurk at 79.97. Key players like Khaleel Ahmed, Kuldeep Yadav, and Mukesh Kumar

also scored well, indicang their signiﬁcant contribuons. The scores reﬂect a combinaon of

bang, bowling, and all-round performances throughout the season. Lower scores for some players

suggest either limited opportunies or underperformance, while a few players received no score,

due to lack of playing me or poor performance.

Bang Strength

1. Rishabh Pant led the bang charts with 446 runs at an impressive average of 40.55 and a

strike rate of 155.4.

Overall_score

2. Tristan Stubbs showed excellent form, scoring 378 runs at a high average of 54 and a strike

rate of 190.9.

3. Jake Fraser-McGurk emerged as an explosive batsman, scoring 330 runs at a strike rate of

234.04.

4. Abishek Porel contributed signiﬁcantly with 327 runs at a strike rate of 159.51.

Bowling Strength:

1. Kuldeep Yadav was the standout bowler, taking 16 wickets at an average of 23.37 and an

economy of 8.69.

2. Mukesh Kumar impressed with 17 wickets at an average of 21.64.

3. Axar Patel contributed with 11 wickets with a good economy of 7.65.

4. Khaleel Ahmed took 17 wickets with a decent economy of 9.58.

All-Round Performance:

1. Axar Patel showcased his all-round abilies, scoring 235 runs and took 11 wickets.

2. Tristan Stubbs, primarily a batsman, also took 3 wickets.

Emerging Talent:

1. Jake Fraser-McGurk stood out as a promising talent with his explosive bang.

2. Abishek Porel showed potenal as a consistent run-scorer.

3. Rasikh Salam took 9 wickets in 8 matches.

The team's strength clearly lies in its bang, with mulple players capable of scoring quickly. The

bowling unit, led by Kuldeep Yadav and supported by Mukesh Kumar and Axar Patel, also performed

well. The emergence of young talents like Fraser-McGurk and Porel adds depth to the squad.

Key factors in DC's decision-making process include:

1. Rishabh Pant's leadership and bang prowess

2. The all-round abilies of Axar Patel

3. Kuldeep Yadav's consistent spin bowling performances

4. The explosive bang potenal of Jake Fraser-McGurk

5. Tristan Stubbs' impressive bang in the previous season

7. Conclusion

7.1 Squad Opmizaon

Note: As of August 2024, there is no new informaon regarding player retenon rules or the Right to

Match (RTM) policy for IPL 2025. This analysis is based on the 2024 rules. Addionally, the model

predicons used here will not impact future decisions or changes.

7.2 KKR Squad Opmizaon and picking best squad

7.2.1 Current players Overall score Predicon

Table 12 - KKR current players predicted overall score

Player Predicted Overall score

Andre Russell 62.681908

Angkrish Raghuvanshi 43.662876

Anukul Roy 0.01888789

Harshit Rana 74.76428

Manish Pandey 18.542007

Mitchell Starc 66.87246

Nitish Rana 15.08972

Phil Salt 68.24854

Rahmanullah Gurbaz 18.410284

Ramandeep Singh 47.748146

Rinku Singh 40.41537

Shreyas Iyer 62.031254

Sunil Narine 65.13953

Vaibhav Arora 54.331146

Varun Chakaravarthy 74.76428

Venkatesh Iyer 62.110737

Based on the XGBoost model predicons and team dynamics, here's analysis of Kolkata Knight Riders'

(KKR) potenal retenon strategy for IPL 2025: Core Retenons:

1. Sunil Narine (65.14)

2. Andre Russell (62.68)

3. Shreyas Iyer (62.03) - Captain

4. Varun Chakaravarthy (74.76)

KKR's retenon strategy likely priorizes a blend of consistent performers and recent standouts.

Narine and Russell, with their high predicted scores and long-standing contribuons to the franchise,

are prime candidates. Shreyas Iyer, as the current captain and a solid middle-order batsman, provides

leadership connuity. Varun Chakaravarthy's top predicted score and impressive bowling

performances make him an asset for the team's bowling aack. Right to Match (RTM) Opons:

1. Venkatesh Iyer (62.11)

2. Rinku Singh (40.42)

The RTM card could be used on Venkatesh Iyer, given his versality and strong predicted

performance. Rinku Singh, despite a lower predicted score, has shown potenal as a ﬁnisher and

could be a strategic RTM pick based on his past performances and future potenal. Diﬃcult

Decisions:

 Phil Salt (68.25) and Mitchell Starc (66.87), despite their high predicted scores and

contribuons, may be released due to the limit on foreign player retenons.

 Nish Rana (15.09), though injured in 2024 and having a low predicted score, might sll be

considered for RTM based on his past performances and experience with the team.

Potenal Releases:

 Rahmanullah Gurbaz (18.41)

 Ramandeep Singh (47.75)

 Vaibhav Arora (54.33)

 Angkrish Raghuvanshi (43.66)

These players, while showing promise with their predicted scores, may not ﬁt into the retenon

strategy given the limited slots available and the need to maintain a balanced squad. This approach

balances maintaining the core team with strategic decisions for future success. The management

faces tough choices, parcularly regarding foreign players and emerging talents, as they aim to build

a compeve squad for IPL 2025. The predicted scores provide valuable insight, but the ﬁnal

decisions will also consider factors such as team chemistry, player roles, and long-term strategy.

7.2.2 Potenal Squad Opons for KKR

The squad of KKR in 2024 contains 23 players with 8 overseas players, with 6 uncapped players.

Including 9 batsman with 3 wicket keepers, 4 all-rounders and 10 bowlers. Out of these players 16

players have contributed for teams’ success. Hence the squad is suggested based on the team

possible retenon, possible link to the team and available players and predicted data.

Note: The predicon is based on data is taken from 2021 to 2024

Suggested Squad Opons for KKR IPL 2025

Figure 18 - Bar chart for squad opons KKR

Overseas opons include wicketkeepers Phil Salt and Ryan Rickelton, alongside batsmen Ben Ducke

and Steve Smith. All-rounders Andre Russell, Sunil Narine, Chris Woakes, and David Willey oﬀer

versality. The bowling aack features Josh Hazlewood, Mark Wood, Mitchell Starc, and Jofra Archer,

with emerging talents like Atkinson and Pos.

Domesc choices highlight captain Shreyas Iyer, with K.S. Bharat as wicketkeeper. Bang strength

comes from Venkatesh Iyer, Rinku Singh, Nish Rana, Mayank Agarwal, Devdu Padikkal, Rahul

Tripathi, and Manish Pandey. All-rounders Washington Sundar, Krishappa Gowtham, and Shardul

Thakur provide balance.

The bowling lineup includes promising pacers Harshit Rana, Karthik Tyagi, Shivam Mavi, and Mohsin

Khan, alongside experienced opons like Sandeep Warrier. Spin opons feature Varun Chakravarthy

and Mayank Markande (“see Appendix 5”)

This suggested squad opons for KKR in IPL 2025 are based on predicted performance data, potenal

retenon strategies, and team dynamics. The focus is on creang a balanced and compeve team

that leverages both internaonal experience and domesc talent, ensuring KKR remains a formidable

force in the league.

7.3 Delhi Capitals Squad Opmizaon and picking best squad

7.3.1 Current players Overall score Predicon

Table 13 - DC current players predicted overall scores

Player Predicted Overall score

Mukesh Kumar 79.07398

Khaleel Ahmed 75.189224

Rishabh Pant 70.06949

Ishant Sharma 63.152454

Jake Fraser - McGurk 61.636425

Abishek Porel 60.6897

Tristan Stubbs 58.139206

Kuldeep Yadav 57.190998

Axar Patel 56.546555

Anrich Nortje 52.347652

Prithvi Shaw 45.95865

Shai Hope 44.87039

David Warner 43.445335

Rasikh Salam 40.407764

Mitchell Marsh 10.7273855

Gulbadin Naib 1.8736305

Ricky Bhui 0.2515321

Kumar Kushagra 0.096818216

Sumit Kumar 0.006852619

Jhye Richardson -0.22722892

Lalit Yadav -0.26193994

Lizaad Williams -0.60568804

Based on the XGBoost model predicons and team dynamics, here's analysis of Delhi Capitals' (DC)

potenal retenon strategy for IPL 2025:

Core Retenons:

 Rishabh Pant (70.07) - Captain and wicketkeeper-batsman

 Axar Patel (56.55) - All-rounder and consistent performer

 Jake Fraser-McGurk (61.64) - Explosive opener

 Kuldeep Yadav (57.19) - Key spinner

Right to Match (RTM) Opons:

 Mukesh Kumar (79.07) - Highest predicted score

 Khaleel Ahmed (75.19) - Second-highest predicted score

 Tristan Stubbs (58.14) - Potenal Player

 Abishek Porel (60.68) - Young talent

This revised strategy aligns beer with the search results and acknowledges Stubbs' potenal. The

inclusion of Stubbs in the RTM list allows DC to potenally retain a player who has shown exceponal

ﬁnishing skills and could be a long-term asset.

Diﬃcult Decisions:

• Ishant Sharma (63.15), Anrich Nortje (52.35), and Prithvi Shaw (45.96) might sll be

released to create room for new strategies.

Potenal Releases:

• David Warner (43.45)

• Shai Hope (44.87)

• Mitchell Marsh (10.73)

This approach balances retaining key performers, securing young talent with high potenal, and

creang opportunies for signiﬁcant changes in the squad. It addresses DC's need to move from an

average team to a top contender by making strategic decisions that combine experience (Pant, Axar)

with emerging talents (Fraser-McGurk, Stubbs, Porel).

7.3.2 Potenal Squad Formaon for Delhi Capitals

The squad of DC in 2024 contains 27 players with 8 overseas players, with 11 uncapped players.

Including 12 batsman with 6 wicket keepers, 5 all-rounders and 10 bowlers. Out of these players 22

players have contributed for team and out of 22, 7 players performed very poor. Hence the squad is

suggested based on the team possible retenon, possible link to the team and available players and

predicted data.

Note: The suggeson is based on data is taken from 2021 to 2024

Figure 19 - Bar chart for squad opons DC

Overseas opons feature Jake Fraser-McGurk, an explosive batsman; Reeza Hendricks, a consistent

T20 performer; and Tristan Stubbs, a dynamic middle-order batsman. Rassie van der Dussen brings

experience, while all-rounders like Daryl Mitchell, Jason Holder, Jimmy Neesham, Ben Stokes, and

Romario Shepherd add versality. Fast bowlers include Adam Milne, Ma Henry, Mark Wood, and

emerging talent Joshua Lile.

Domesc choices highlight captain Rishabh Pant alongside wicketkeepers Abishek Porel, Anuj Rawat,

and N. Jagadeesan. Bang strength comes from Devdu Padikkal, Mayank Agarwal, and domesc

star Sarfraz Khan. All-rounders like Axar Patel and Shardul Thakur provide balance.

The bowling aack features experienced pacer Bhuvneshwar Kumar, along with le-arm pacer

Khaleel Ahmed and emerging talents like Mukesh Kumar, Vaibhav Arora, and Kuldeep Sen. Spin

opons include Kuldeep Yadav and emerging spinner Hrithik Shokeen (“see Appendix 6”).

8. Findings and Insights of Players and their

performance scores

Figure 20 - Average overall score chart

This bar chart displays that bowlers have the highest average overall score, signiﬁcantly higher than

both batsmen and all-rounders. Interesngly, batsmen and all-rounders have very similar average

scores, with batsmen only slightly outperforming all-rounders by 0.01 points.

8.1 Distribuon of Overall Scores by Player Type

Figure 21 - Distribuon of overall score by player type

1. The distribuon shows that bowlers tend to perform beer in terms of Overall score

compared to the other two player types.

2. The similarity between batsmen and all-rounders' scores suggests that all-rounders are not

necessarily at a disadvantage in terms of overall performance despite having to excel in both

bang and bowling.

3. The highest individual Overall_score menoned in the data is for YS Chahal, a bowler, with

87.70, which aligns with the higher average for bowlers.

8.2 Players with more than 300 runs with strike rate more than 130

Figure 22 - Scaer plot for Batsman

This scaer plot illustrates the analysis of high-performing batsmen reveals notable players with

impressive stascs. Jos Buler an extraordinary strike rate of 158.62, showcasing his explosive

bang style. Other key players include F du Plessis with 2257 runs at a strike rate of 141.59, and

Shubman Gill, who has 2229 runs with a strike rate of 136.83. Addionally, T Head, Abhishek Sharma,

and H Klassen also demonstrate strong performances, with strike rates exceeding 140. Fraser and

Salt contribute to the aggressive bang lineup, emphasizing the importance of scoring quickly. These

players exemplify a combinaon of high run totals and aggressive strike rates, making them valuable

assets in compeve cricket, capable of changing the course of a match with their bang prowess.

8.3 Top All-rounders analysis

Figure 23 - Scaer plot for top all-rounders

This plot reveals several high-performing all-rounders in T20 cricket. Players like Rashid Khan, Andre

Russell, Sunil Narine and Ravindra Jadeja stand out with impressive overall scores above 50. These

players excel in both bang and bowling aspects of the game.

Rashid Khan leads the pack with an overall score of 54.49, showcasing his exceponal bowling skills

combined with useful bang contribuons. Andre Russell and Ravindra Jadeja follow closely, known

for their explosive bang and crucial wicket-taking abilies.

Other notable performers include Harshal Patel, and Axar Patel, all scoring above 48. These players

demonstrate the valuable combinaon of aggressive bang (high strike rates) and eﬀecve bowling

(wicket-taking ability and economy).

The scaer plot eﬀecvely visualises the balance between run-scoring and wicket-taking abilies of

these all-rounders, with the added dimension of strike rate represented by point size.

8.4 Top Economical Bowlers Analysis

Figure 24 - Scaer plot for top economical bowlers

The analysis of top bowlers in T20 cricket reveals a group of exceponal performers who combine

wicket-taking prowess with economical bowling. YS Chahal leads the pack with an impressive 84

wickets and an overall score of 87.70, showcasing his dominance in the format. The list features a

mix of spin and pace bowlers, including standouts like CV Varun, Mohammed Shami, and Jasprit

Bumrah. Notably, Bumrah boasts the best economy rate at 7.38.

8.5 Density distribuon of overall scores:

Figure 25 - Density distribuon by player types.

The violin plot illustrates overall score distribuons by player type. Batsmen likely show a wider

spread with higher median scores, reﬂecng diverse roles and run-scoring focus. Bowlers may display

a more compact distribuon with lower median scores, indicang consistent, specialized

performances. All-rounders potenally exhibit a broad range with median scores between batsmen

and bowlers, represenng their dual contribuons. This visualizaon eﬀecvely captures the disnct

performance characteriscs of each player type in cricket.

8.6 Performance metrics of All-rounders

Figure 26 - Performance metrics of top 5 all-rounders

The radar chart eﬀecvely compares top all-rounders' strengths and weaknesses. Rashid Khan excels

in bowling with high wicket-taking ability and good economy. Andre Russell shines as an aggressive

batsman with useful bowling skills. Ravindra Jadeja oﬀers a balanced performance with strong

bowling economy and consistent bang. Sunil Narine is primarily a bowler with excellent economy

and the ability to score quick runs. Harshal Patel stands out as a bowling all-rounder with strong

wicket-taking ability and moderate bang contribuons. This visualizaon provides an intuive

understanding of each player's performance proﬁle across mulple cricket aspects.

8.7 Research Conclusion

The conclusion of the research project emphasizes the signiﬁcance of strategic squad opmizaon

for the Kolkata Knight Riders (KKR) and Delhi Capitals (DC) in the context of the upcoming IPL mega

aucon in 2025.

Key ﬁndings indicate that KKR's successful championship strategies in 2024 stemmed from eﬀecve

player retenon and ulizaon, while DC's potenal remains underulized despite having a strong

young core. The study ulized various machine learning models to analyse player performance and

predict outcomes, revealing crical insights into team dynamics and performance metrics.

The research highlights the importance of quantave analysis in sports, oﬀering a framework for

teams to enhance decision-making processes regarding player selecon and strategic planning. By

focusing on the unique challenges faced by both teams, the study provides aconable

recommendaons for opmizing squad composion, parcularly for DC in leveraging their young

talent eﬀecvely.

Overall, this research contributes to the broader ﬁeld of quantave sports analycs, oﬀering

valuable insights not only for KKR and DC but also for other T20 franchises globally. It underscores

the evolving nature of team management in cricket, advocang for data-driven approaches to

improve performance and compeveness in the IPL.

9. Recommendaons

Based on the comprehensive research on opmizing squad composions for IPL teams, parcularly

focusing on Kolkata Knight Riders (KKR) and Delhi Capitals (DC), here are some key

recommendaons:

1. Embrace data-driven decision making: The IPL is evolving rapidly. Teams should leverage the

power of analycs to make smarter choices in player selecon and strategy formulaon. This

approach can provide valuable insights that might not be apparent to the naked eye.

2. Nurture young talent strategically: While it's tempng to always go for established stars,

don't underesmate the potenal of young players. Develop a system to idenfy and groom

emerging talents, giving them the right opportunies to shine. This is especially crucial for

teams like DC, which has a wealth of young talent waing to be unleashed.

3. Balance squad wisely: Cricket is a game of balance, and so is team composion. Aim for a mix

of experienced veterans and energec youngsters, aggressive hiers and steady anchors,

pace bowlers and cray spinners. This diversity can help teams adapt to various match

situaons and condions.

4. Invest in mul-dimensional players: In T20 cricket. Players who can contribute to mulple

areas – be it bang, bowling, or ﬁelding – can be game-changers. They provide captains with

more opons and can turn matches on their head.

5. Stay adaptable: The IPL is a long tournament with changing condions. Teams that can

quickly adapt their strategies based on performance data and match situaons oen come

out on top. Flexibility in approach can be a key diﬀerenator.

6. Opmize the aucon strategy: With the mega aucon coming up, use predicve models to

inform bidding decisions. Focus on players who not only have good historical stats but also

show potenal for growth and ﬁt well within the team's overall strategy.

7. Foster a culture of connuous improvement: Encourage players and coaching staﬀ to

regularly review performance data and work on areas of improvement. Create an

environment where everyone is commied to geng beer every day.

8. Look beyond the boundaries: While focusing on the IPL, keep an eye on performances in

other T20 leagues worldwide. This global perspecve can help in idenfying undervalued

players who might become match-winners for a team.

While data and analycs are powerful tools, cricket is sll a human game. The most successful teams

will be those that can blend analycal insights with the intangibles of team spirit, leadership, and on-

ﬁeld chemistry. By implemenng these recommendaons, teams can posion themselves for success

in the highly compeve world of the Cricket.

10. References

1. Amala Kaviya, V.S., Mishra, A.S. and Valarmathi, B. (2020) 'Comprehensive Data Analysis and

Predicon on IPL using Machine Learning Algorithms', Internaonal Journal on Emerging

Technologies, 11(3), pp. 218-228. (Accessed: 15 August 2024).

2. Bajaj, A. (2023) 'Predicon of Player Performance for IPL and Analyzing the Aributes

Involved, Using Explainable AI', MSc Research Project, Naonal College of Ireland. Available

at: hps://norma.ncirl.ie/6564/1/ayushibajaj.pdf (Accessed: 15 August 2024).

3. Berrar, D., Lopes, P. and Dubitzky, W. (2019). Incorporang domain knowledge in machine

learning for soccer outcome predicon. Machine Learning, 108(1), pp.97-126.

4. Board of Control for Cricket in India (2023) Indian Premier League. Available at:

hps://www.iplt20.com/ (Accessed: 15 August 2024).

5. Brownlee, J., 2018. How to Use ROC Curves and Precision-Recall Curves for Classiﬁcaon in

Python. [online] Machine Learning Mastery. Available at:

hps://machinelearningmastery.com/roc-curves-and-precision-recall-curves-for-

classiﬁcaon-in-python/ [Accessed 25 August 2024].

6. Bunker, R.P. and Thabtah, F., 2019. A machine learning framework for sport result predicon.

Applied compung and informacs, 15(1), pp.27-33.

7. Caya, O. and Bourdon, A., 2016. A framework of value creaon from business intelligence

and analycs in compeve sports. In 2016 49th Hawaii Internaonal Conference on System

Sciences (HICSS) (pp. 1061-1071). IEEE.

8. Cervone, D., D'Amour, A., Bornn, L. and Goldsberry, K., 2016. A mulresoluon stochasc

process model for predicng basketball possession outcomes. Journal of the American

Stascal Associaon, 111(514), pp.585-599.

9. Cherkassky, V. and Ma, Y., 2004. Praccal selecon of SVM parameters and noise esmaon

for SVM regression. Neural Networks, 17(1), pp.113-126.

10. Colwell, D., Jones, B. and Gille, J. (1991) “75.7 A Markov Chain in Cricket (MCC!),” The

Mathemacal Gazee, 75(472), pp. 183–185. Available at: hps://doi.org/10.2307/3620249.

11. Duﬀ & Phelps (2022) IPL Brand Valuaon Report 2022. Mumbai: Duﬀ & Phelps.

12. Drucker, H., Burges, C.J., Kaufman, L., Smola, A.J. and Vapnik, V., 1997. Support vector

regression machines. Advances in Neural Informaon Processing Systems, 9, pp.155-161.

13. Economic Times (2023) 'IPL becomes decacorn, valuaon soars 75% since 2020', 27

December.

14. ESPN Cricinfo (2023) Indian Premier League. Available at:

hps://www.espncricinfo.com/series/indian-premier-league-2023-1345038 (Accessed: 15

August 2024).

15. ESPNcricinfo (2024). How KKR shaped themselves into the awesome class of 2024. [online]

Available at: hps://www.espncricinfo.com/story/ipl-2024-ﬁnal-kkr-vs-srh-how-kkr-shaped-

themselves-into-the-awesome-class-of-2024-1435320 [Accessed 15 Aug. 2024].

16. Fried, G. and Mumcu, C. eds., 2016. Sport analycs: A data-driven approach to sport

business and management. Taylor & Francis.

17. Hajian-Tilaki, K., 2013. Receiver Operang Characterisc (ROC) Curve Analysis for Medical

Diagnosc Test Evaluaon. Caspian Journal of Internal Medicine, 4(2), pp.627-635.

18. Hubáček, O., Šourek, G. and Železný, F. (2019). Exploing sports-beng market using

machine learning. Internaonal Journal of Forecasng, 35(2), pp.783-796.

19. Ishi, M., Pal, D.J., Pal, D.N. and Pal, D.V. (2022) 'Winner Predicon in One Day

Internaonal Cricket Matches Using Machine Learning Framework: An Ensemble Approach',

Indian Journal of Computer Science and Engineering, 13, pp. 628–641.

20. James, G., Wien, D., Hase, T. and Tibshirani, R., 2013. An introducon to stascal

learning. New York: Springer.

21. JioCinema (2023) 'IPL 2023 Final Sets Global Streaming Record', Press Release, 30 May.

22. Kadapa, S. (2013) 'How Sustainable is the Strategy of the Indian Premier League-IPL? A

Crical Review of 10 Key Issues That Impact the IPL Strategy', Internaonal Journal of

Scienﬁc and Research Publicaons, 3.

23. Kemper, C. and Breuer, C., 2016. How eﬃcient is dynamic pricing for sport events? Designing

a dynamic pricing model for Bayern Munich. Internaonal Journal of Sport Finance, 11(1),

pp.4-25.

24. Liu, G., Luo, Y., Schulte, O. and Kharrat, T., 2020. Deep soccer analycs: learning an acon-

value funcon for evaluang soccer players. Data Mining and Knowledge Discovery, 34(5),

pp.1531-1559.

25. Loland, S., 2018. Performance-enhancing drugs, sport, and the ideal of natural athlec

performance. The American Journal of Bioethics, 18(6), pp.8-15.

26. McHale, I.G., Scarf, P.A. and Folker, D.E., 2012. On the development of a soccer player

performance rang system for the English Premier League. Interfaces, 42(4), pp.339-351.

27. Memmert, D. and Raabe, D., 2018. Data analycs in football: Posional data collecon,

modelling and analysis. Routledge.

28. Ofoghi, B., Zeleznikow, J., MacMahon, C. and Raab, M., 2013. Data mining in elite sports: a

review and a framework. Measurement in Physical Educaon and Exercise Science, 17(3),

pp.171-186.

29. Peacock, R.H. (1950) “2124. The New Ball in Cricket,” The Mathemacal Gazee, 34(307), pp.

58–60. Available at: hps://doi.org/10.2307/3610894.

30. Prakash, A., Ghosh, A. and Guha, B. (2019) 'Player Ranking System for IPL Using Machine

Learning', Internaonal Journal of Sports Analycs, 5(1), pp. 1-12.

31. Rodrigues, M., Vinay, S., Naik, N., Deshpande, S. and Samant, S. (2019). Data visualizaon

and toss related analysis of IPL teams and batsmen performances. [online] ResearchGate.

32. Rommers, N., Rössler, R., Goossens, L., Vaeyens, R., Lenoir, M., Witvrouw, E. and D'Hondt, E.,

2020. Risk of acute and overuse injuries in youth elite soccer players: Body size and growth

maer. Journal of Science and Medicine in Sport, 23(3), pp.246-251.

33. Rossi, A., Pappalardo, L., Cina, P., Iaia, F.M., Fernàndez, J. and Medina, D., 2018. Eﬀecve

injury forecasng in soccer with GPS training data and machine learning. PloS one, 13(7),

p.e0201264.

34. Shah, J. (2023) The IPL Story: Cricket, Commerce and Glamour. New Delhi: Rupa Publicaons.

35. Shah, R., Ghosh, A. and Guha, B. (2016) 'IPL 2016: A Comprehensive Analysis of the

Performance of Teams', Internaonal Journal of Sports Analycs, 2(1), pp. 1-15.

36. Seshadri, D.R., Drummond, C., Craker, J., Rowboom, J.R. and Voos, J.E., 2019. Wearable

devices for sports: New integrated technologies allow coaches, physicians, and trainers to

beer understand the physical demands of athletes in real me. IEEE pulse, 10(1), pp.38-43.

37. Smola, A.J. and Schölkopf, B., 2004. A tutorial on support vector regression. Stascs and

Compung, 14(3), pp.199-222.

38. Sportstar (2023) 'IPL media rights sold for Rs 48,390 crore: Disney Star retains TV rights,

Viacom18 bags digital package', The Hindu, 14 June.

39. Thomas, G., Gade, R., Moeslund, T.B., Carr, P. and Hilton, A., 2017. Computer vision for

sports: Current applicaons and research topics. Computer Vision and Image Understanding,

159, pp.3-18.

40. IPL Governing Council (2024) IPL 2024: Playing Condions. Mumbai: BCCI.

41. Delhi Capitals (2023) Oﬃcial Website. Available at: hps://www.delhicapitals.in/ (Accessed:

15 August 2024).

42. Gujarat Titans (2023) Oﬃcial Website. Available at: hps://www.gujaratansipl.com/

(Accessed: 15 August 2024).

43. Kolkata Knight Riders (2023) Oﬃcial Website. Available at: hps://www.kkr.in/ (Accessed: 15

August 2024).

44. Lucknow Super Giants (2023) Oﬃcial Website. Available at:

hps://www.lucknowsupergiants.in/ (Accessed: 15 August 2024).

45. Mumbai Indians (2023) Oﬃcial Website. Available at: hps://www.mumbaiindians.com/

(Accessed: 15 August 2024).

46. Punjab Kings (2023) Oﬃcial Website. Available at: hps://www.punjabkingsipl.in/ (Accessed:

15 August 2024).

47. Rajasthan Royals (2023) Oﬃcial Website. Available at: hps://www.rajasthanroyals.com/

(Accessed: 15 August 2024).

48. Royal Challengers Bangalore (2023) Oﬃcial Website. Available at:

hps://www.royalchallengers.com/ (Accessed: 15 August 2024).

49. Sunrisers Hyderabad (2023) Oﬃcial Website. Available at:

hps://www.sunrisershyderabad.in/ (Accessed: 15 August 2024).

Appendices

Appendix 1 – About IPL Teams

The Indian Premier League (IPL) currently features ten franchise teams, each represenng diﬀerent

cies or states across India (Board of Control for Cricket in India, 2023):

Figure 277 - Chennai Super Kings Logo

Chennai Super Kings (CSK): Known for their consistency and led by the iconic MS Dhoni, CSK has won

four IPL tles (ESPN Cricinfo, 2023).

Figure 28 - Delhi Capitals Logo

Delhi Capitals (DC): Formerly Delhi Daredevils, this team rebranded in 2018 and has been building a

strong young core of Indian talent (Delhi Capitals, 2023).

Figure 29 - Gujarat Titans Logo

Gujarat Titans (GT): One of the newest addions to the IPL, joining in 2022, they made an immediate

impact by winning the tle in their debut season (Gujarat Titans, 2023).

Figure 30 - Kolkata Knight Riders Logo

Kolkata Knight Riders (KKR): Co-owned by Bollywood star Shah Rukh Khan, KKR has won two IPL tles

and has a massive fan following (Kolkata Knight Riders, 2023).

Figure 31 - Lucknow Super Giants Logo

Lucknow Super Giants (LSG): Another new franchise that joined in 2022, they've quickly established

themselves as strong contenders (Lucknow Super Giants, 2023).

Figure 32 - Mumbai Indians Logo

Mumbai Indians (MI): The most successful IPL team with ﬁve tles, MI is known for its star-studded

lineup and ability to nurture young talent (Mumbai Indians, 2023).

Figure 33 - Punjab Kings Logo

Punjab Kings (PBKS): Formerly Kings XI Punjab, this team rebranded in 2021 and is sll seeking its

ﬁrst IPL tle (Punjab Kings, 2023).

Figure 34 - Rajasthan Royals Logo

Rajasthan Royals (RR): The inaugural IPL champions in 2008, RR is known for its ability to unearth and

develop lesser-known players (Rajasthan Royals, 2023).

Figure 35 - Royal Challengers Bengaluru logo

Royal Challengers Bangalore (RCB): Despite boasng some of cricket's biggest names, RCB is sll

chasing their ﬁrst IPL tle (Royal Challengers Bangalore, 2023).

Figure 36 - Sunrisers Hyderabad

Sunrisers Hyderabad (SRH): Known for their strong bowling aacks, SRH won the tle in 2016 and

has consistently been a playoﬀ contender (Sunrisers Hyderabad, 2023).

Appendix 2 – Team Performance

Mumbai Indians (MI):

Mumbai Indians have played the most matches (261) and won the most games (144) in IPL. Their

success is evident in their 5 IPL tles, the highest among all teams. They've reached the ﬁnals 6 mes

and made it to the playoﬀs 11 mes, showcasing their consistency (Board of Control for Cricket in

India, 2023).

Royal Challengers Bangalore (RCB):

Despite playing 256 matches and winning 123, RCB has never won an IPL tle. They've reached the

ﬁnals 3 mes and made the playoﬀs 9 mes. Their inability to convert playoﬀ appearances into tles

has been a point of discussion among cricket analysts (ESPN Cricinfo, 2023).

Kolkata Knight Riders (KKR):

KKR has played 252 matches, winning 131. They've clinched 3 IPL tles and reached the ﬁnals 4

mes. With 7 playoﬀ appearances, they've shown consistency in reaching the later stages of the

tournament (Kolkata Knight Riders, 2023).

Delhi Capitals (DC):

Formerly Delhi Daredevils, DC has played 252 matches but won only 115. They've never won an IPL

tle and have reached the ﬁnals only once. With 6 playoﬀ appearances, they've struggled to make a

signiﬁcant impact in the tournament's history (Delhi Capitals, 2023).

Punjab Kings (PK):

PK has played 246 matches, winning 112. They've never won an IPL tle and have reached the ﬁnals

only once. With just 2 playoﬀ appearances, they've been one of the less successful teams in the IPL

(Punjab Kings, 2023).

Chennai Super Kings (CSK):

Despite playing fewer matches (239) than some other teams, CSK has been incredibly successful.

They've won 138 matches and 5 IPL tles, equaling MI's record. With 10 ﬁnal appearances and 13

playoﬀ qualiﬁcaons, they're considered one of the most consistent teams in IPL history (Chennai

Super Kings, 2023).

Rajasthan Royals (RR):

RR has played 222 matches, winning 112. They won the inaugural IPL in 2008 but haven't replicated

that success since. With 2 ﬁnal appearances and 5 playoﬀ qualiﬁcaons, they've had mixed fortunes

in the tournament (Rajasthan Royals, 2023).

Sunrisers Hyderabad (SRH):

SRH entered the IPL later than the original teams but has made a signiﬁcant impact. They've played

182 matches, winning 88. They've won 1 IPL tle and reached the ﬁnals 3 mes, with 6 playoﬀ

appearances (Sunrisers Hyderabad, 2023).

Gujarat Titans (GT):

As one of the newest teams, GT has played only 45 matches but has already won 28 of them. They

won the IPL in their debut season in 2022 and reached the ﬁnals again in 2023, showing immediate

success (Gujarat Titans, 2023).

Lucknow Super Giants (LSG):

Another new entrant, LSG, has played 44 matches and won 24. While they haven't reached a ﬁnal

yet, they've made it to the playoﬀs in both their seasons, indicang a strong start to their IPL journey

(Lucknow Super Giants, 2023).

Appendix 3 – Reason for using Linear Regression

1. Connuous Dependent Variable:

Dependent variable, Win_Rao, is a connuous variable, which is suitable for linear

regression analysis.

2. Mulple Independent Variables:

The dataset includes mulple potenal predictors (e.g., Played, Won, Lost, N/R, lost_Rao,

Titles, Finalists, Playoﬀ), making mulple linear regression an appropriate choice.

3. Relaonship Exploraon:

Linear regression can help idenfy which factors have the strongest inﬂuence on a team's

win rao, providing valuable insights into team performance.

4. Performance Predicon:

The model can be used to predict a team's expected win rao based on other performance

metrics, which could be useful for team management and strategy planning.

5. Quanﬁable Impact:

Linear regression provides coeﬃcients that quanfy the impact of each independent variable

on the win rao, allowing for a clear understanding of each factor's importance.

6. Model Interpretability:

In sports analycs, it's oen crucial to have models that can be easily interpreted by coaches,

managers, and other stakeholders. Linear regression provides this interpretability.

7. Baseline Model:

Even if more complex models might be explored later, linear regression serves as an excellent

baseline model to compare against more sophiscated approaches.

8. Assumpon Tesng:

The dataset allows for tesng various assumpons of linear regression (like linearity,

homoscedascity, and mulcollinearity), which can provide insights into the data's structure.

9. Small Dataset Handling:

With a relavely small dataset (10 observaons), linear regression can sll provide reliable

results, whereas more complex models might overﬁt.

10. Performance Metrics:

The high R-squared value (0.9969) suggests that linear regression is capturing a signiﬁcant

amount of variance in the win rao, indicang a good ﬁt for this data.

Appendix 4 – Reason for using Rule Based Scoring System

The rule-based scoring system for cricket aim to quanfy a player's overall performance by assigning

points based on various aspects of their game.

1. Holisc Player Assessment:

 The code combines mulple performance metrics (runs, average, strike rate, wickets,

economy rate) to create a comprehensive evaluaon of each player.

 This approach provides a more complete picture of a player's contribuon than

individual stascs alone.

2. Role-Based Evaluaon:

 By categorizing players as Batsmen, Bowlers, or All-rounders, the analysis

acknowledges the diﬀerent roles within a cricket team.

 This allows for fair comparisons between players with similar roles and

responsibilies.

3. Normalised Comparisons:

 Normalising metrics enables fair comparisons across diﬀerent scales (e.g., comparing

runs scored with wickets taken).

 This is essenal for creang a uniﬁed scoring system that can be applied across

diverse player types.

4. Weighted Performance Metrics:

 Assigning weights to diﬀerent metrics (e.g., giving more importance to total runs for

batsmen or wickets for bowlers) reﬂects the relave importance of various aspects

of performance.

 This nuanced approach aligns the analysis with the strategic priories of T20 cricket.

5. Idenfying All-Round Talent:

 The system's ability to evaluate all-rounders separately recognizes the unique value

of players who contribute signiﬁcantly in both bang and bowling.

6. Ranking Within Categories:

 Ranking players within their speciﬁc roles (batsman, bowler, all-rounder) provides

context-speciﬁc performance assessments.

 This is valuable for team selecon, strategy formulaon, and player development.

7. Data-Driven Decision Making:

 The analysis provides an objecve, data-driven basis for decisions related to team

composion, player retenon, and strategic planning.

8. Performance Benchmarking:

 By creang a standardized scoring system, teams can benchmark player

performances across seasons or compare players from diﬀerent teams.

9. Talent Idenﬁcaon:

 This system can help idenfy undervalued players or rising talents who might not

stand out in tradional stascs but perform well in this comprehensive analysis.

10. Contract and Aucon Strategies:

 For leagues like the IPL, this analysis can inform bidding strategies during player

aucons and help in determining player values for contracts.

11. Fan Engagement and Fantasy Sports:

 Providing a single, comprehensive score for each player enhances fan engagement

and can be parcularly useful for fantasy cricket leagues.

12. Connuous Performance Monitoring:

 This type of analysis can be easily updated with new match data, allowing for

connuous monitoring of player performance throughout a season or across

mulple seasons.

Appendix 4 – Dataset variables

1. match_id: This column contains a unique idenﬁer for each match, allowing for easy

referencing and data management. It helps disnguish between diﬀerent matches in the

dataset.

2. season: This indicates the speciﬁc IPL season during which the match took place. It typically

refers to the year of the tournament, providing context for the data.

3. start_date: This column records the date on which the match commenced. It is essenal for

temporal analysis, allowing researchers to study trends over diﬀerent seasons or speciﬁc

me periods.

4. venue: This speciﬁes the locaon where the match was held. Knowing the venue is

important for analysing home advantage, pitch condions, and crowd inﬂuence on the game.

5. innings: This indicates whether the data pertains to the ﬁrst or second innings of the match.

In cricket, each team bats for one or two innings, and this column helps diﬀerenate

between them.

6. ball: This column records the speciﬁc ball number within the over. It provides granular detail

about the match, allowing for in-depth analysis of individual deliveries.

7. bang_team: This speciﬁes the team that is currently bang during the delivery. It is crucial

for understanding team performance and strategies.

8. bowling_team: This indicates the team that is currently bowling. This informaon is essenal

for analysing bowling strategies and eﬀecveness.

9. striker: This column names the batsman facing the current delivery. It is important for

analysing individual player performance and contribuons.

10. non_striker: This indicates the batsman at the other end of the pitch who is not facing the

current delivery. It provides context for partnerships and running between the wickets.

11. extras: This column records the total extra runs scored on that delivery, which can include

wides, no-balls, and other extras. It is important for assessing the impact of extras on the

match outcome.

12. wides: This speciﬁes the number of wide balls bowled during that delivery. Wides contribute

to the extras and can aﬀect the match's ﬂow and scoring.

13. noballs: This indicates the number of no-balls bowled on that delivery. No-balls also

contribute to extras and can lead to free hits, impacng scoring opportunies.

14. byes: This column records the number of byes scored on that delivery, which occur when the

ball passes the wicketkeeper without touching the bat or body of the batsman.

15. legbyes: This speciﬁes the number of leg byes scored, which occur when the ball hits the

batsman's body (excluding the hand) and runs are taken.

16. penalty: This column records any penalty runs awarded to the bang or bowling team,

which can occur due to infracons by the ﬁelding team.

17. wicket_type: This indicates the type of dismissal if a wicket fell on that delivery (e.g., bowled,

caught, LBW). It is crucial for analysing how wickets are taken.

18. player_dismissed: This column names the player who was dismissed on that delivery,

providing insight into key moments in the match.

19. other_wicket_type: This speciﬁes any secondary wicket type, if applicable, for cases where

mulple dismissals occur in a single delivery (e.g., run out).

20. other_player_dismissed: This column names any other player who was dismissed on that

delivery, providing addional context for signiﬁcant events.

Appendix 5 – Potenal squad Opons for KKR

Overseas Players Opons

 Wicketkeepers:

 Phil Salt: An explosive batsman who can change the game.

 Rickelton: A solid opon with potenal.

 Jamie Smith: A young talent for future growth.

 Batsmen:

 Ben Ducke: A dynamic player with a strong T20 record.

 Steve Smith: An experienced batsman known for his technique and leadership.

 All-rounders:

 Andre Russell: A key all-rounder with match-winning capabilies.

 Sunil Narine: A long-me KKR asset with both bang and bowling skills.

 Chris Woakes: Adds versality and experience to the squad.

 David Willey: Oﬀers depth and balance as an all-rounder.

 Bowlers:

 Josh Hazlewood: Known for his precision and eﬀecveness.

 Mark Wood: Brings express pace and aggression.

 Mitchell Starc: A premier fast bowler with the ability to take wickets.

 Atkinson: A developing talent with potenal.

 Pos: An emerging bowler to consider.

 Jofra Archer: A high-impact player with a proven track record.

Domesc Players Opons

 Wicketkeepers:

 K.S. Bharat: A reliable opon for the wicketkeeping role.

 Batsmen:

 Shreyas Iyer: The captain and a crucial middle-order batsman.

 Venkatesh Iyer: Oﬀers ﬂexibility in the bang lineup.

 Rinku Singh: A promising ﬁnisher with a bright future.

 Nish Rana: Experienced and capable of anchoring the innings.

 Mayank Agarwal: Adds stability and experience.

 Devdu Padikkal: A young talent with strong potenal.

 Rahul Tripathi: Known for his aggressive bang style.

 Manish Pandey: Brings experience and depth to the bang order.

 All-rounders:

 Washington Sundar: Valuable for his bowling and bang skills.

 Krishappa Gowtham: Adds depth and versality.

 Shardul Thakur: Known for his ability to contribute in mulple ways.

 Bowlers:

 Harshit Rana: An emerging fast bowler with promise.

 Varun Chakravarthy: A key spinner with wicket-taking ability.

 Karthik Tyagi: Young and talented fast bowler.

 Shivam Mavi: Known for his pace and skill.

 Sakariya: Adds depth to the pace aack.

 Mohsin Khan: A promising young bowler.

 Sandeep Warrier: Experienced and reliable.

 Mayank Markande: Spin opon with experience.

Appendix 6 – Potenal squad Opons for DC

Overseas Players Opons

 Jake Fraser-McGurk: A young talent with explosive bang capabilies, adding depth to the

bang lineup.

 Reeza Hendricks: A consistent performer in T20 cricket, known for his ability to anchor

innings and score quickly.

 Ryan Rickelton: An emerging batsman with a strong domesc record, capable of playing

aggressive innings.

 Tristan Stubbs: A dynamic batsman with power-hing skills, ideal for the middle order.

 Rassie van der Dussen: A seasoned internaonal player known for his technique and ability

to play under pressure.

 Daryl Mitchell: A versale all-rounder who can contribute with both bat and ball, enhancing

team balance.

 Jason Holder: An experienced all-rounder with a proven track record in T20s, oﬀering both

bowling and bang depth.

 Jimmy Neesham: A dynamic all-rounder known for his big-hing ability and useful seam

bowling.

 Ben Stokes: A match-winner with exceponal all-round skills, capable of changing games

single-handedly.

 Romario Shepherd: A powerful all-rounder who can contribute signiﬁcantly with the bat and

provide pace bowling opons.

 Adam Milne: A fast bowler with express pace, known for his wicket-taking ability in T20

cricket.

 Ma Henry: A skilled bowler with experience in internaonal cricket, eﬀecve in both

powerplays and death overs.

 Mark Wood: An aggressive fast bowler known for his pace and ability to take key wickets.

 Joshua Lile: An emerging talent with potenal as a le-arm fast bowler.

Domesc Players Opons

 Rishabh Pant: The captain and wicketkeeper, known for his explosive bang and game-

changing abilies.

 Abishek Porel: A promising wicketkeeper-batsman, providing depth in the lower order.

 Anuj Rawat: A young wicketkeeper with potenal, looking to make an impact in the IPL.

 N. Jagadeesan: A reliable wicketkeeper-batsman with a solid domesc record.

 Devdu Padikkal: A talented batsman with a strong ability to score quickly, adding ﬁrepower

to the top order.

 Mayank Agarwal: An experienced opener known for his solid technique and ability to build

innings.

 Sarfraz Khan: A domesc star with a strong record, capable of performing under pressure.

 Axar Patel: A key all-rounder known for his bowling and handy bang, providing balance to

the team.

 Shardul Thakur: An all-rounder who can contribute with both bat and ball, known for his

wicket-taking ability.

 Khaleel Ahmed: A le-arm pacer with experience in T20 cricket, eﬀecve in the powerplay.

 Mukesh Kumar: An emerging fast bowler with potenal, looking to establish himself in the

IPL.

 Vaibhav Arora: A promising young bowler with a good domesc record.

 Kuldeep Sen: A fast bowler with the ability to take wickets, adding depth to the bowling

lineup.

 Sandeep Warrier: An experienced bowler providing addional opons in the pace aack.

 Bhuvneshwar Kumar: A seasoned pacer known for his swing bowling and experience in high-

pressure situaons.

 Tanush Koan: An all-rounder with potenal, oﬀering ﬂexibility to the squad.

 Kuldeep Yadav: A skilled spinner known for his wicket-taking ability and variaons.

 Hrithik Shokeen: An emerging spinner with potenal to contribute to the middle overs.