TY - GEN
T1 - TSN vs I3D vs Pose-C3D
T2 - 7th International Conference on Computational Intelligence and Intelligent Systems, CIIS 2024
AU - Chiura, Tafadzwa Blessing
AU - Van Der Haar, Dustin
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s).
PY - 2025/2/7
Y1 - 2025/2/7
N2 - Most of the research conducted in action recognition is mainly focused on general human action recognition, and most of the available datasets support studies in general human action recognition. In more specific contexts, such as basketball, datasets that are as comprehensive and publicly available are limited. This study proposes taking three popular and mature methods in the field of action recognition, namely Temporal Segment Networks (TSN), Two-Stream CNN using Inflated 3D-convolutional Neural Networks (I3D) and Pose-C3D, and applying them to the SpaceJam dataset, which is a basketball-specific action dataset. All three experiments used pre-trained ImageNet models and were fine-tuned on the SpaceJam dataset. TSN was the oldest of the methods but obtained the best results of the three experiments, scoring a top-1 and top-5 accuracy of 59% and 96%, respectively. I3D was second best, with a top-1 and top-5 accuracy of 41% and 85%, respectively. Pose-C3D came in third, scoring a top-1 and top-5 accuracy of 15% and 50%, respectively. The results show that the models cannot distinguish significantly between some actions, such as ball in hand, pass and dribble. The study shows that it is feasible for context-specific fine-grain action recognition, but more needs to be done to discriminate against similar actions.
AB - Most of the research conducted in action recognition is mainly focused on general human action recognition, and most of the available datasets support studies in general human action recognition. In more specific contexts, such as basketball, datasets that are as comprehensive and publicly available are limited. This study proposes taking three popular and mature methods in the field of action recognition, namely Temporal Segment Networks (TSN), Two-Stream CNN using Inflated 3D-convolutional Neural Networks (I3D) and Pose-C3D, and applying them to the SpaceJam dataset, which is a basketball-specific action dataset. All three experiments used pre-trained ImageNet models and were fine-tuned on the SpaceJam dataset. TSN was the oldest of the methods but obtained the best results of the three experiments, scoring a top-1 and top-5 accuracy of 59% and 96%, respectively. I3D was second best, with a top-1 and top-5 accuracy of 41% and 85%, respectively. Pose-C3D came in third, scoring a top-1 and top-5 accuracy of 15% and 50%, respectively. The results show that the models cannot distinguish significantly between some actions, such as ball in hand, pass and dribble. The study shows that it is feasible for context-specific fine-grain action recognition, but more needs to be done to discriminate against similar actions.
KW - Action Recognition
KW - Basketball
KW - Computer Vision
UR - http://www.scopus.com/inward/record.url?scp=85219532892&partnerID=8YFLogxK
U2 - 10.1145/3708778.3708788
DO - 10.1145/3708778.3708788
M3 - Conference contribution
AN - SCOPUS:85219532892
T3 - CIIS 2024 - 2024 the 7th International Conference on Computational Intelligence and Intelligent Systems
SP - 65
EP - 71
BT - CIIS 2024 - 2024 the 7th International Conference on Computational Intelligence and Intelligent Systems
PB - Association for Computing Machinery, Inc
Y2 - 22 November 2024 through 24 November 2024
ER -