Prediction of Rising Stars in the Gameof Cricket Among all sports, cricket that wasoriginated from England, is the second most popular game and now has its rootsround the globe. Rising Star Prediction (RSP) is made based on the currentcontributions of rising star. A star in the cricket is an experienced playerwith extraordinary performance throughout his career. Whereas, a risingcricketer or a rising star is an emerging player, who currently has a lowprofile, but could be a star cricketer in future based on consistentimprovements in performance. The players canbe categorized into different classes based on their performance evolution.

Thefigure gives the different classes of performance evolution. Finding RSs inacademic networks future prediction of citation count and temporal expert findingare such proposal among many other results. But these proposals did notconsider sports domain.

While considering the sports domain, some issuesrelated to sports but proper prediction mechanism is not presented by theauthors. There are further proposals for ranking of batsman based onperformance metric and for ranking of cricket teams by employing h-index and PageRank . A criterion for RSP is in such a way that the early years performance ofan emerging player and the performances of his Co-players are incorporated.

Such that anemerging player finds the opportunity to learn from the playing strategies ofCo-players under the same playing conditions in order to improve its performance.There are three categories of features (Cobatsmen,Team and Opposite teams) for batsmen as well as similar categories(Co-bowlers, Team and Opposite teams) for bowlers. The brief description of the work is given by1)Anefficient methodology for RSP within the cricket domain while incorporating theconcept of Co players.

A set of 9 features is formulated for RSP ofbatsmenas well as a set of 11 features for the bowlers. 2)By testing different classification algorithms on the datasets, fourappropriate machine learning classification algorithms are selected for binaryclassification .3)The performance of employed machine learning algorithms is critically examined duringthe evaluation phase.4)RSP is made with high accuracy, and rankings of leading RSs from both domainsbased on three defined metrics are presented and compared with the ICC rankingsof players from 2013-2016.

5)This innovative idea can be used for RSP in other sports domains such asbaseball, football and basketball. Thebasic concepts and terminologies including a brief introduction of cricketgame, its rules and regulations and ranking metrics are explained. ICC issuesthe rankings on a regular basis.

Each team gets or loses points while winningorlosing cricket matches, respectively. Considering a particular time span,these points are utilized for ranking the teams by ICC. Some of basic metricsfor ranking the batsmen are listed as: 1) runs 2) batting average and 3) battingstrike rate. Similarly, the metrics for ranking bowlers are: 1) bowling average2) bowling strike rate and 3) bowling economy. RisingStar Prediction Givena set of tuples with n training examples(X1; yi); (X2; yi); (Xn; yi), where, Xidenotes the feature vector of cricketer ci, while Xi 2 Rm, R is the realfeatures space, m is the total count of features, n is the total count ofunderlyingcricketers.

Moreover, for RSPa predictionfunction PRS is defined as follow :F= PRS(ci=X); (1)where,PRS(ci=X)= < 0 if y = -1; notRS; > 0 if y = +1; RS:A.Generative Models The important characteristics for centralcomparative analysis of two widespread generative classifiers: BN and NB arereviewed.BayesianNetwork (BN): A BN is a directed acyclic graph representing joint probability distribution over a set ofrandom variables in terms of their conditional dependencies.Nat?veBayesian (NB): NB is the first successor classifier of BN, but with additionaldifference of independence between the features. DiscriminativeModels 1) Support Vector Machines (SVM): Amongstate-of-theart binary classifiers based on supervised machine learning, SupportVector Machines (SVM) have gained broader popularity due to efficientinvestigation of data while identifyingthepatterns . More precisely, for efficient separation of two different classes,SVM model constructs the optimal hyper plane with largest functional margin.Moreover, it can handle linear and non-linear data.

2)Classification And Regression Tree (CART): CART is fundamentally anon-parametric model used for making prediction on underlying data. Basically,CART is comprised of three main steps 1) Maximum tree construction2) Right selection of tree size3) Classification of unseen data basedon former trained tree. PerformanceEvaluation Metrics Precision,Recall and balanced F-measure are standard metrics that are employed to checkthe performance of binary classification models. Theimpact of defined features for RSP and find that all state-of-the-artclassifiers are showing outclass performance. The underlying subsectionprovides the analysis for learning RSs from the defined features for WA (B)measure based batting dataset. Thus, proposed features can be generalized forRSP in cricket domain.

Fordifferentnumber of instances, every classifier is predicting RSs with 100% accuracy.However, overall NB is dominating all the remaining classifiers while achievingthe average of 94.5% learning accuracy for 10-100 instances. The second best performanceis exposed by SVM model with the average of 92.6% accuracy.

BN stands at thirdwith the average of 91.1% learning accuracy, while CART is ranked at last withthe average of 90.1% accuracy for 10-100 instances.theinfluence of defined features for RSP and find that all the state-of-the-artclassifiers are showing excellent performance.

The underlying subsectionprovides the analysis for learning RSs from the defined features for WA (Bow)measure based bowling dataset. Thus, proposed features can be generalized forRSP in cricket domain RANKINGSOF RSSA.Batting DomainForbatting domain,the ranking of top 10RS batsmen based on WA (B), PE (B) and RS(B).

Themetrics WA (B) and PE (B) are formally presented .The third metric RS(B) is composed of aggregate score that is calculated by adding all thepositively. Correlated features to the batting performance, while the negativecorrelated features are subtracted.

More precisely,among the 9 defined featuresfor RS, 6 features belonging to Co-batsmen and Team categories are positivelycorrelated to the RS batsmen performance, because higher values of thesefeaturesindicate the higher chance for an emerging batsman of becoming a RS. Oncontrary, the three features of Opposite teams category are negativelycorrelated with the performance of batsman. B.Bowling DomainForbowling domain ranking of top 10RS bowlers based on WA (Bow), PE (Bow) and RS(Bow).The metrics WA (Bow) and PE (Bow) are formally presented in the formersubsection. The third metric RS (Bow) is composed of aggregate score that iscalculated by adding all the positively correlated features to the bowlingperformance, while the negative correlated features are subtracted.

Measuresare explicitly adopted for rising star prediction in batting and bowlingdomains. More precisely, three categories(Co-players, Team and Opposite teams)are incorporated, in which 9 and 11 features are defined for the prediction of battingand bowling rising stars, respectively. Two types of datasets are generatedbased on weighted average and performance evolution metrics. The definedfeatures are tested while employing generative (BN and NB) and discriminative(SVM and CART) machine learning algorithms. For batting domain, Co-batsmencategory suppresses the remaining two categories, while in bowling realm, Team categoryoutperforms for rising star prediction.

Overall, it is observed that NBoutperforms the remaining models. Finally, ranking lists of rising stars basedon weighted average, performance evolution and rising star score are presentedfor both domain. These rankings are compared with the ICC rankings during2013-16 and it is found that our presented approaches are functional for risingstar prediction.

Therefore, these features can also be used for rising starprediction in test and T20 formats. Moreover, some additional features such asopposite team diversity, home or away, 100s, 50s (for batsmen) and 4, 5 wickets(for bowlers)canalso be incorporated in order to get even better results. Finding RSs withinthe cricket and other domains is quite useful, so that the authorities(coaches, managers etc.) can put efforts to maximize the expertise of such RSsin order to get the optimal performances in future. Similar methodology can beadopted for RSP in different sports domains and other organizations.