Automated sports production could be the next big thing in sports broadcasting.
Combined with OTT distribution, it could open the flood gates for around 200 million sporting events that are not broadcasted due to limited resources. To hit mainstream adoption, automated production technology needs to meet the quality thresholds spectators expect.
Although many variations of robotic capture technologies exist in the market, this article will focus on the following:
• Robotic capture (a.k.a PTZ - Pan Tilt Zoom) – this technology uses a robotic camera to follow the action with panning and zooming presenting a closer perspective. One key challenge is for the system to understand in real-time what to focus on while maintaining smooth human-like movements. If a goal keeper in a soccer game performed a long-range kick at the beginning of a play, the robotic capture will have a hard time maintaining the ball in range without fast movements.
• Panoramic capture with latency – panoramic capture uses wide-angle camera or multiple wide-angle cameras and then stitches these images together. The technology uses advanced auto-tracking algorithms to follow the flow of play within the hi-res panoramic capture. These Artificial Intelligence (AI) decisions need to be made within a 5-second latency buffer to allow accurate human-like capturing. Because much of today’s high-quality internet broadcasts encounter a latency of 20 seconds, this 5-second latency does not have a negative impact. The latency buffer, in essence, is like a camera operator who looks into the future to understand the flow of the play and then makes a conscious decision about how to shoot it.
Automatic Capture Technology
While varying in their specific algorithmic implementation, most automatic capture technologies share the following principles. However, because each sport has its own game logic each solution requires completely different algorithms:
• Automatic ball detection – in ball-based sports (e.g. soccer, basketball, etc.), usually the ball is located at the heart of the action. To capture the action, the algorithm tries to detect and follow the ball.
• Player(s) detection – in more advanced technologies automatic ball detection is complemented by the detection of players. This allows better understanding of the action and serves as a basis for game state detection. Both the ball and player detections are based on the ability to analyze the images and distinguish between the background and the objects of interest (i.e. the ball and the players).
One of the challenges in player detection is when a player is standing still for a few seconds. For example, in soccer, during a free kick, some of the players may not move for a period of about 30 seconds. When this occurs, the algorithm must ensure that the player will not blend in to the background.
In addition, the algorithm should be able to distinguish between players who are not active (e.g. waiting in the sidelines) from the active players, even when some of the active players are far from the ball. The algorithm should also be able to identify the referee, who is not part of the game play.
• Game state detection – Based on the ball and the player detection, the algorithm needs to identify the game state. The game state is the type of play currently happening. For example, a corner kick (in soccer), a counterattack, a free throw, a penalty kick, etc. Each game state has its own visual characteristics. By understanding the game state, the algorithm can predict certain behaviors and make smarter decisions about how to best capture the action. Every sport has a long list of different game states, making this a challenging task for an automated system.
To overcome this challenge, the game state detection may be based on Deep Learning algorithms, which can automatically learn how to identify a corner kick, based on a data set of perhaps 50 examples. In the case of deep learning, the system trainer doesn't have to come up with rules about how to identify the corner kick. The system will automatically generate its own rules and select its own characteristics to identify this specific game state.
By taking into account all these parameters, the system can make a decision on how to capture each frame. In the image below, we can see a visual representation of this decision process. It’s a panoramic image captured from multiple cameras that has been stitched together. The red rectangle represents the desired frame to capture. The system recognized it as an attack on right. Notice how the players who are not participating in the play are marked with X's, while the players participating in the play are marked within the frame, with data regarding their speed etc.
The Baseline Characteristics
To provide an engaging viewing experience, the technology needs to simulate the human camera operator capture with smooth, non-robotic movements, preferably simulating the movement of a video tripod with fluid head.
Automated Capture Scenarios
Scenario #1 – additional unreal ball (not in play)
A common scenario in lower-tier leagues is the existence of second ball that is not part of the game, such as another ball used for practice during the match. Focusing on this ball instead of the real ball is a mistake. In this case the primary ball disappears (is obstructed by something or someone), and a second unreal ball appears.
• Human camera operator – the human will undoubtedly notice that this ball is not relevant.
• Robotic capture – a robot capture camera that follows the ball can mistakenly think the second ball is the real ball and jump quickly to follow it. When the real ball appears again it may jump to it again.
• Panoramic with latency – using the 5-second latency, the system can learn the ball is not real. During this buffer, the AI may think the second ball is real, but when the real ball reappears and the second ball is out of play, the AI, like the human camera operator, will know that the first ball is the only real ball. From the spectators’ perspective, nothing happened as this mistake is resolved during the 5-second latency buffer.