BasketTube - AI Game Analyzer

A dedicated repository with a full application (not a notebook) is available at aegean-ai/basket-tube. It exposes a UI that you can use to see how this kind of application can be engineered end-to-end. Please note that the app is still work in progress and it lacks behind some of the choices made in the notebook we refer to below.

BasketTube application UI showing video analysis with player tracking and action pipeline

Basketball games are complex, dynamic environments where players must make quick decisions based on their understanding of the game state. In this assignment, you will explore how to extract player actions from a basketball game such as the one shown below

You can use the following Colab notebook to complete this assignment: Basketball State Extraction

Task 1 - Analyzing Player Performance from Commentary

Build a chat-based assistant that answers questions about player performance using the spoken commentary from the game video. The system should be able to respond to user queries in a chat interface that mimic Google’s responses to queries such as:

Q: “Analyze the player that scored the most in this game”A: While the video does not provide a final leading scorer for the entire game, Stephen Curry is noted as having the most points among the individual player scores mentioned in the first half, reaching 12 points by (37:12).Here’s an analysis of his performance in the first half:Early Struggles and Defensive Coverage: The Lakers made it difficult for Curry early in the game (16:22). He took only three shots initially (15:17) as the Lakers forced him to get rid of the ball, often with two defenders when he came off a pick and roll (15:10-15:16).First Field Goal: Curry scored his first field goal later in the first half (11:06).Offensive Impact: By the end of the first half, he came “alive from three-point range” (42:22-42:25), hitting back-to-back three-pointers (37:09-37:12, 42:01-42:08), helping the Warriors extend their lead to 13 points by halftime (42:10).Overall Role: He was described as the oldest scoring champion in the NBA in 23 years, showcasing his elite scoring ability (19:09-19:12). LeBron James even called him the MVP, stating he had the best season all year (19:23-19:29).Underrated Aspects: Commentators highlighted his rebounding and finishing around the rim as underrated aspects of his game (18:54-19:06).

You may use either:

a local LLM running on your machine, or
a hosted LLM/API

If you choose a local LLM, you can run it through a tool such as Ollama, LM Studio, vLLM. In either case, the model must answer using the commentary transcript as its main source of evidence. Your system should:

extract or load the video commentary as timestamped text
organize the commentary so relevant parts can be retrieved for a user query
answer natural-language questions such as “Who scored the most?”, “How did Curry perform in the first half?”, or “When did LeBron start impacting the game?”
support each answer with timestamps from the commentary
clearly separate directly stated facts from higher-level summaries or interpretations

For each answer, the assistant should provide:

a direct response to the question
a short summary of the player’s performance
2 to 5 commentary-based pieces of evidence with timestamps
a brief note that the answer is based only on commentary and may not fully reflect what actually happened in the game

A recommended workflow is:

Obtain the commentary transcript from the video.
Split it into timestamped chunks.
Retrieve the chunks most relevant to the user’s question.
Pass those chunks to the LLM.
Prompt the model to answer only from the retrieved commentary and cite timestamps.

The commentary provided in the video and may not reflect the reality (see task 2). The goal of this task is not to verify whether the commentators are correct but to build a grounded question- answering system that can retrieve, organize, and summarize commentary into useful player analysis.

If you use a hosted LLM or API, set spending limits on your account before running any experiments. API calls can accumulate costs quickly, especially when processing long transcripts or making repeated queries. Also, never commit API keys or tokens to a public repository, use environment variables or a .env file and make sure it is listed in .gitignore.

Task 2 - Verifying Player Actions from Video

In Task 1, you analyzed player performance using the spoken commentary. In this task, you must verify player actions by analyzing the video footage itself. The goal is to determine what a player actually does on the court and support your conclusions with visual evidence. Your system should examine the game video, identify short plays or action segments, and recognize the actions performed by a player of interest. Rather than producing a generic summary, the system should return structured evidence about what happened, when it happened, and why the prediction is reasonable based on the visible content of the video. You may use any suitable approach, including vision-language models, video-language models, object detection, tracking, pose estimation, or a combination of methods. You do not need to build a perfect end-to-end sports analytics system. A strong solution is one that focuses on a clear method, produces interpretable outputs, and explains its limitations. Your output should be grounded in the visual content of the video. For each detected action, you should aim to provide:

the player name
the action category
the start timestamp
optionally the end timestamp
a short explanation of the evidence visible in the clip
one or more representative frames or visual summaries
if possible, a bird’s-eye or court-level representation of the play

Examples of action categories include:

shooting
passing
dribbling
rebounding
defending
assisting
screening
cutting
stealing
blocking

A recommended workflow is:

Split the game video into smaller clips or candidate plays.
Select a player of interest.
Track that player, and if possible the ball, across the clip.
Extract visual evidence from frames, motion, or trajectories.
Predict the action performed.
Return the result with timestamps and supporting evidence.

Task 2.1 - Chunking the Video into a series of plays

Before recognizing player actions, you should divide the game video into smaller units that are easier to analyze. A full basketball game is too long and visually complex to process effectively in one pass, so your first step is to segment the video into short plays, possessions, or candidate action windows. In basketball, a play is a coordinated sequence of actions intended to create a scoring opportunity or prevent one. Plays may be offensive or defensive, and they often involve multiple players moving at the same time. For this assignment, you do not need to produce a perfect tactical classification of every play. Instead, you should aim to identify meaningful video segments that can later be analyzed for player actions. Examples of useful play boundaries include:

a possession beginning after a rebound or inbound
a clear offensive setup followed by a shot attempt
a transition play
a defensive sequence ending in a steal, rebound, or whistle
a short interval around a highlighted player action

You may use different strategies for chunking, such as:

fixed-length windows
scene or shot change detection
scoreboard or clock changes
commentary cues
model-based identification of action peaks or possession changes

The objective of this step is to create short, coherent video segments that are easier to process in Task 2.2.

Task 2.2 - Analyzing Specific Player Actions

Once the video has been divided into short segments, focus on a player of interest and identify the actions that player performs. Your system should attempt to recognize specific basketball actions and provide evidence for each prediction. Examples of player actions include:

Shooting: attempting to score by releasing the ball toward the basket
Passing: moving the ball to a teammate
Dribbling: controlling the ball while moving
Rebounding: gaining possession after a missed shot
Defending: pressuring or preventing an opponent’s movement or shot
Assisting: making the pass that directly leads to a score
Screening: positioning to block or delay a defender
Cutting: moving quickly into open space to receive the ball
Stealing: taking possession from an opponent
Blocking: deflecting or stopping a shot attempt

For each identified action, provide:

the timestamp
the player
the action category
a short explanation of why the action was assigned
a representative frame, clip, or bird’s-eye view if available

For example:

Bird's eye view of basketball court with player positions

Timestamp	Player	Action	Birds eye view
12:34	Stephen Curry	3-Point Shot
15:20	LeBron James	Assist

Some possible technical approaches include:

using a VLM on key frames sampled from each play
using a video-language model to process short clips directly
detecting and tracking players and the ball
using pose estimation to capture motion patterns for shooting or defending
reasoning over trajectories to infer passes, cuts, and drives
combining commentary-based retrieval with video verification

You do not need to recognize every possible basketball action. A good submission may focus on:

one player at a time
a limited set of action categories
short clips rather than the full game
interpretable evidence rather than perfect accuracy

You should also discuss the limitations of your approach. Common challenges include:

small or fast-moving ball trajectories
player occlusion
unclear jersey numbers
camera cuts and zoom changes
actions that require longer context, such as assists or defensive pressure

The goal of this task is to show that your system can use visual evidence from the game footage to recognize and justify player actions, even if the final predictions are imperfect.

Deliverables

Submit the following via your project repository:

Code repository, a private GitHub repo containing all source code, notebooks, and configuration files needed to reproduce your results. Add the course TAs as collaborators. The repo must include a README.md with setup instructions.
Task 1 notebook, a Jupyter notebook at notebooks/task1.ipynb demonstrating your commentary-based question-answering system. Include at least 3 example queries with full outputs showing retrieved evidence, timestamps, and generated answers.
Task 2 notebook, a Jupyter notebook at notebooks/task2.ipynb demonstrating your video-based action recognition pipeline. Include:
- examples of chunked video segments (Task 2.1)
- at least 5 detected player actions with timestamps, action categories, explanations, and representative frames or visual summaries (Task 2.2)
Written report, covering:
- the approach and tools chosen for each task, with justification
- key results and example outputs
- a comparison between commentary-based analysis (Task 1) and video-based verification (Task 2), where do they agree and where do they diverge?
- limitations of your approach and what you would improve with more time

Separate notebooks per task are optional. You may use the provided Basketball State Extraction Colab notebook as your starting point and include your code, analysis, and written report inline as markdown cells. Save a copy to your repo at notebooks/project.ipynb and clearly label the Task 1, Task 2, and report sections with markdown headings.

Demo video, a short screen recording (3-5 minutes) walking through both tasks, showing your system in action on at least one play sequence.

Edit this page on GitHub or file an issue.

​Task 1 - Analyzing Player Performance from Commentary

​Task 2 - Verifying Player Actions from Video

​Task 2.1 - Chunking the Video into a series of plays

​Task 2.2 - Analyzing Specific Player Actions

​Deliverables

Task 1 - Analyzing Player Performance from Commentary

Task 2 - Verifying Player Actions from Video

Task 2.1 - Chunking the Video into a series of plays

Task 2.2 - Analyzing Specific Player Actions

Deliverables