Two-frame motion estimation based on polynomial expansion
In this work, the author wants to estimate the displacement field from two frames while compensating for the background motion.
To achieve this, each pixel’s neighborhood is approximated by a polynomial \(f(x) = x^T Ax + b^T x + c\), where \(A\) is a symmetric matrix, \(b\) a vector, \(c\) a scalar and \(x\) is coordinate.
These coefficients are estimated by a least squares fit. (Note: done by a separable convolution in practice.)
From the first and second images, they create \(A_1\) and \(A_2\) and calculate the mean to get a new matrix \(A\).
They also introduce \(\delta b = - \frac{1}{2}(b_2 (x) - b_1 (x))\). The problem is then to solve \(A(x)d(x) = \delta b(x)\) where \(d(x)\) is the displacement field.
The result from this equation is too noisy. In consequence, the actual \(d(x)\) is calculated from a weighted average of all displacement fields from the neighborhood.
Adding an a priori
They add an a priori \(\bar d (x)\), we only need to add it to the \(\delta b(x)\) estimation.
\(\delta b = - \frac{1}{2}(b_2 (\bar x) - b_1 (x)) + A(x)\bar d (x)\) where \(\bar x = x + \bar d (x)\).
As we can see, the method can be iterative and process multiple frames in a sequence.
Results
Using a \(39x39, N(0,6)\) gaussian weighting function and a \(11 \times 11, N(0,1.5)\) gaussian for the polynomial expansion, the author shows great results on the Yosemite dataset.
- 1.58 average error from Mémin & Perez
- 1.40 average error for Farneback
The method doesn’t work well for huge displacements (as we can see in low frame-rate cameras).