TY - JOUR
T1 - Boosted decision trees in the era of new physics
T2 - a smuon analysis case study
AU - Cornell, Alan S.
AU - Doorsamy, Wesley
AU - Fuks, Benjamin
AU - Harmsen, Gerhard
AU - Mason, Lara
N1 - Publisher Copyright:
© 2022, The Author(s).
PY - 2022/4
Y1 - 2022/4
N2 - Machine learning algorithms are growing increasingly popular in particle physics analyses, where they are used for their ability to solve difficult classification and regression problems. While the tools are very powerful, they may often be under- or mis-utilised. In the following, we investigate the use of gradient boosting techniques as applicable to a generic particle physics problem. We use as an example a Beyond the Standard Model smuon collider analysis which applies to both current and future hadron colliders, and we compare our results to a traditional cut-and-count approach. In particular, we interrogate the use of metrics in imbalanced datasets which are characteristic of high energy physics problems, offering an alternative to the widely used area under the curve (auc) metric through a novel use of the F-score metric. We present an in-depth comparison of feature selection and investigation using a principal component analysis, Shapley values, and feature permutation methods in a way which we hope will be widely applicable to future particle physics analyses. Moreover, we show that a machine learning model can extend the 95% confidence level exclusions obtained in a traditional cut-and-count analysis, while potentially bypassing the need for complicated feature selections. Finally, we discuss the possibility of constructing a general machine learning model which is applicable to probe a two-dimensional mass plane.
AB - Machine learning algorithms are growing increasingly popular in particle physics analyses, where they are used for their ability to solve difficult classification and regression problems. While the tools are very powerful, they may often be under- or mis-utilised. In the following, we investigate the use of gradient boosting techniques as applicable to a generic particle physics problem. We use as an example a Beyond the Standard Model smuon collider analysis which applies to both current and future hadron colliders, and we compare our results to a traditional cut-and-count approach. In particular, we interrogate the use of metrics in imbalanced datasets which are characteristic of high energy physics problems, offering an alternative to the widely used area under the curve (auc) metric through a novel use of the F-score metric. We present an in-depth comparison of feature selection and investigation using a principal component analysis, Shapley values, and feature permutation methods in a way which we hope will be widely applicable to future particle physics analyses. Moreover, we show that a machine learning model can extend the 95% confidence level exclusions obtained in a traditional cut-and-count analysis, while potentially bypassing the need for complicated feature selections. Finally, we discuss the possibility of constructing a general machine learning model which is applicable to probe a two-dimensional mass plane.
KW - Supersymmetry Phenomenology
UR - http://www.scopus.com/inward/record.url?scp=85127767365&partnerID=8YFLogxK
U2 - 10.1007/JHEP04(2022)015
DO - 10.1007/JHEP04(2022)015
M3 - Article
AN - SCOPUS:85127767365
SN - 1126-6708
VL - 2022
JO - Journal of High Energy Physics
JF - Journal of High Energy Physics
IS - 4
M1 - 15
ER -