Knowing the background image of a video scene simplifies the video-object segmentation problem and therefore it is re- quired by several automatic segmentation algorithms. The algorithm described on this page was developed with the MPEG-4 object coding in mind. As such, it is an offline algorithm which attempts to generate the best possible background for a given short video sequence. For an online background computation, it will have to be modified.
The algorithm is based on the observation that each small block in the input image can be in one of these three categories:
In order to find the static background for each block, the algorithm employs a similarity matrix. Each element Ma;b rates the visual similarity of the block in frame a and b (actually, a value of 0 indicates maximum similarity while 1 is minimum similarity). Currently, this is simply the sum of absolute differences over the block. For frames a in which the block contains motion, the matrix elements Ma;x and Mx;a are set to 1 (no similarity).
The subsequent optimization step computes the set of frames T that most probably show background. It does so by maximizing the similarity between pairs of frames in T, while minimizing the similarity of pairs (with at least one image) not in T. More specifically, the following minimization is carried out:
The optimization is carried out iteratively by adding or removing frames k from the set T until convergence. The change in cost when adding a frame k to T can be computed efficiently using
Note that this is correct because M is symmetric and Mk,k=0. Conversely, the change in cost when removing a frame k from T is clearly ΔC-k=-ΔCk.
Since neighboring blocks will often show background during similar periods of time, one can use the solution found for a neighboring block as initialization for the optimization of the current block. This not only increases computational efficiency, but reduces the risk of converging to a local optimum. In some cases, it may even force the optimization to the "correct optimum" even though this is not the global optimum.
We initialize the optimization with a set that contains frames if they are marked as background for the block on the left and the block on the top side. As one can see in a real-world example, these are highly correlated:
Generation of the final background image simply applies a pixel-based median filter, but including only data from static background blocks. Occasional optimization errors are further reduced in this step.
The following images show the example results for the classification of each block. You may notice that the classification does not seem very accurate for the first image. However, what is important to consider is that accidentally marking a background block as static foreground or moving does not hurt the overall performance much as long as there are not too many foreground blocks classified as static background. Hence, the parameters have been adjusted in such a way that classification as foreground is preferred.
Note that the background of the last sequence is impossible to reconstruct, as not all parts of the image are visible in the sequence.
Below, you can find a linux binary of the algorithm for your evaluation purposes. Additionally to the described algorithm, this program also implements other baseline algorithms (e.g., median, aging, mode), each with the option to exclude moving regions.
The full list of command line options is:
BackgroundEstimator 0.1 (02 Mar 2004) / (c) Dirk Farin
Usage: BackgroundEstimator [OPTIONS]... [FILES]...
-h --help Print help and exit
-V --version Print version and exit
-aSTRING --algo=STRING background reconstruction algorithm (--algo=list for available options)
-nINT --history=INT number of input frames the algorithm may use
-GDOUBLE --aging-factor=DOUBLE set aging factor for 'aging' algorithm
-FINT --fuzzyness=INT set fuzzyness fator for 'mode' algorithm
-bINT --blocksize=INT set block size for 'simmat' algorithm
-2STRING --algo2=STRING second background reconstruction algorithm
--history2=INT number of input frames the second algorithm may use (overrides -n)
--aging-factor2=DOUBLE set aging factor for 'aging' algorithm (overrides -G)
--fuzzyness2=INT set fuzzyness fator for 'mode' algorithm (overrides -F)
--blocksize2=INT set block size for 'simmat' algorithm (overrides -b)
-dINT --decimate=INT only use every 'n'th input frame
-sINT --speedup=INT show output only after each 'n'th step
-rSTRING --resolution=STRING resolution of input frames (YUV input only)
-fSTRING --format=STRING input format (yuv, mpeg)
-oSTRING --save-output=STRING write sequence of generated backgrounds to PPM files
-l --offline only generate one background image for the whole sequence (overrides -s)
-ASTRING --alpha=STRING use input file with alpha-channel information
-t --transparent-motion make areas with image motion transparent
-x --no-x11 disable X11 output
and the list of available algorithms is:
Available algorithms: aging, average, median, mode, first, last, simmat
Aging: A current background image is maintained. After each input frame 'f',
the background 'b' is updated according to 'b <- a*b + (1-a)*f'.
the aging factor 'a' can be adjusted.
Average: Compute pixel-wise average in temporal direction (excluding
transparent pixels).
Median: Compute median filter in temporal direction (excluding transparent
pixels).
Mode: Select most frequently occuring pixel value (equality is considered
up to a fuzzyness difference that can be adjusted)
First: Take first non-transparent pixel into background
Last: Take last non-transparent pixel into background
SimMat: Algorithm based on block-similarity matrices and foreground time-range
prediction (see Farin, ICIP'03 paper for longer explanation).
The program reads YUV and MPEG sequences. However, for reading MPEG sequences, you will need to have DVDview installed. A typical command line looks like this:
./bkgEstimator.x86.static.bin -f yuv -r 352x288 -o output -l -a simmat hall_cif.yuv
Writing the output into output0300.ppm
or similar.