Camera Calibration for the Analysis of Sport Sequences

A semantic analysis of sport sequences requires camera calibration in order to obtain player and ball positions in real-world coordinates. Calibration is carried out based on the matching of a line-model of the playing field with the lines in the input image. The usually large number of lines detected in the input results a challenging combinatorial optimization problem. We describe a new calibration system combining a calibration-parameter initialization and model-tracking step to achieve real-time performance. Our results show that robust calibration of, e.g., tennis and soccer sequences is possible with a computation time of only about 6ms per frame.

The calibration system consists of the following processing steps:

Court-line pixel detection. This step identified the pixels that belong to court lines. Since court lines are usually white, this step is essentially a white pixel detector. Additional constraints are that large white areas (the player's white clothing) and fine-textured areas (e.g., in the audience) should not be selected. The fine-texture detector can be disabled in order to decrease the computation time in cases where it does not influence the result.


input frame	white-pixel detection without texture filter (note that the white player is not marked)	white-pixel detection with texture filter (the audience is not marked anymore)

Line Extraction. Because of the high computational complexity of a Hough-transform based line detector, we apply a RANSAC-based line detector. Additionally, the extent of the line is determined to obtain line segments instead of infinite lines.
Line Tracking. The line extraction step usually detects more lines than only the court lines. Most of these lines are spurious detections that are temporally unstable, because they are part of the foreground objects. The line-segment tracking step observes the temporal behaviour of the line segments and sorts out the unstable detections. On the other hand, occasionally non-detected lines are filled in with their previous estimates.

lines and line-segments detected in one frame the current line hypotheses
(id:age:confidence)
Court-Model Fitting. This step finds the correspondences between the lines detected in the image and the lines in the model of the court. The assignment is obtained with a combinatorial optimization, in which different assignment configurations are evaluated. The support of each configuration is evaluated by back-projecting the court model into the input image and searching for a matching line segment for each of the lines in the model.

Four lines in the court model are mapped to their corresponding lines in the image.
Refinement / Tracking. When an approximate position of the court is known, the calibration parameters are refined such that the back-projected court model matches best the court-line pixels in the input image. This refinement operation is also capable to track the court if the displacement is only a few pixels away from the previous position. Hence, we use the same algorithm for updating the calibration parameters while the camera is moving.

The model-to-image transformation is predicted from the previous two transforms.

Calibration System Architecture

The calibration system basically operates in two modes: initialization and tracking. During the initialization mode, all steps up to "model fitting" are executed until a good initialization is found (see Figure below). Note that there is a short delay in the court detection because the lines have to reach a high reliability in the line tracking before they are passed on to the model fitting. When the court is detected, the calibration parameters are refined and the system enters the tracking mode. During tracking mode, only the court-line pixel detection and the tracking steps are executed. Because the result of the tracking is practically not affected when the white texture pixels are still included in the input, this computationally intensive step is disabled during the tracking mode.

In the tracking algorithm, the error is continuously monitored, in order to detect when the tracking quality degrades. When the tracking quality becomes low, the line extraction and line-segment tracking is enabled again with a lower frame-rate. Every n-th frame, the model-fitting step is executed in order to try if a better court location can be found. If the fitting error of the new initialization is lower than the error of the tracked court, the parameters are re-initialized to the new set of parameters.

framework diagram of calibration algorithm

Framework diagram of the calibration algorithm.

Results

detected court model (movie)	internal algorithm-trace (movie)	<--- Example results for a tennis sequence See two different outputs. While the court is not initialized yet, the line hypotheses are shown. Darker lines indicate lines with higher confidence. Only red lines (very high confidence) are passed on. In the internal program-trace movie, you see the model fitting. When the court is initialized, the Chamfer distance map to white pixels is shown. The court position is optimized such that it is covers for distance values. In the middle of the sequence, you can notice a case where the court tracking loses the correct position. Shortly after it, the court position is reinitialized. You can see the current tracking quality color-indicator in the top left.
example with occlusions (movie)	example with fast motion (movie)	badminton example (movie)
volleyball examples	soccer examples	more tennis examples

References

Dirk Farin, Susanne Krabbe, Wolfgang Effelsberg, Peter H. N. de With, "Robust Camera Calibration for Sport Videos using Court Models", SPIE Storage and Retrieval Methods and Applications for Multimedia, vol. 5307 p. 80-91, January 2004, San Jose (CA), USA (pdf)
Dirk Farin, Jungong Han, Peter H. N. de With, "Fast Camera Calibration for the Analysis of Sport Sequences", IEEE International Conference on Multimedia and Expo (ICME), July 2005, Amsterdam, Netherlands, (pdf)


lines and line-segments detected in one frame	the current line hypotheses (id:age:confidence)