Over the spring of 2025 I took 6.S058 (now 6.4300): Introduction to Computer Vision. For our final project, I and a partner worked on a system to detect and classify MMA strikes from video footage, motivated by the discrepancy between real-time and post-match strike tallies. We also wrote a paper and delivered a presentation summarizing our work.
We had difficulty accessing useful existing datasets, so my partner, who had martial arts experience, recorded himself moving around while kicking, punching, and standing for 30 minutes each. We used a pre-trained YOLOv8 model to extract pose keypoints from the videos, then trained a CNN to predict whether a kick, punch, or neither occurred over one second of sequential pose data.
We evaluated our model on a variety of video footage. Further work could involve curating our dataset, detecting finer action classes, and using a graph-based architecture.