Camera matching

Code for this is in CameraMatcher5.java.

Suppose that we have a still photo (or a video where the camera set up does not change). We want to insert some computer generated objects in the scene. Thus the composite scene has constributions from two camera, the original one and the CG one. For the insertion to occur seamlessly the two camera must have matching settings. While we have complete control over the CG camera, the specifications of the real camera are more difficult (if not impossible) to obtain. Here we shall discuss how to achieve the camera matching even when the real life camera parameters are completly unknown. For this we need to use a calibration shot. Also we shall use Art of Illusion for the CG part.

The set up

The two worlds (real and CG) are to be matched. For this we need some common reference object whose position w.r.t. both the This will be a square frame that we hold in the calibration shot. We shall consider the frame to be the unit square in the $xy$-plane in the CG world.

Using some mathematics (given below) we shall work out the position and orientation of the real camera w.r.t the CG coordinate system. Then we shall build our CG object and place a CG camera at that position with that orientation. The rendered image will be ready to be inserted in the scene (with just 2D dilation and translation).

The mathematics

We want to find the position and orientation of the real camera. That means 6 numbers (3 for position and 3 for orientation, say as Euler angles). We have 4 points (the 4 vertices of the square) for which we know the CG coordinates and the pixel coordinates. So we get 8 equations (each pixel means 2 coordinates). These involve the 6 unknown quantities plus the unknown image resolution. All the 7 unknown quantities are hopelessly confounded in the 8 equations. When you add the additional fact that the Euler angles are not unique (i.e., there are multiple solutions), the task looks truly daunting.

However, we simplify the task using a crucial observation:

The 4 points are chosen in such a way that we can find the vanishing points (VPs) in the $x$- and $y$-directions of the CG world. These VPs involve only the orientation and the pixel resolution. So basically this eliminates 3 parameters.

Our input consists of $(o_1,o_2),$ $(p_1,p_2),$ $(q_1,q_2)$ and $(r_1,r_2).$ We compute the VPs $(x_1,x_2)$ and $(z_1,z_2).$ All pixel coordinates are shown in blue. All CG coordinates are shown in red.

Next we shall link the position of the VPs with the camera orientation.

A camera has 7 parameters (3 for position, 3 for orientation, and 1 for pixel resolution). We need to find all these. Here we shall employ a trick: we shall work w.r.t. a coordinate system attached to the camera. Since this is going to be a source of confusion, let us write down the three coordinate systems we are using:

Pixel coordinate system: This is a 2D system to identify points on the image. Origin is at the centre of the image (assumed to be the point straight in front of the camera, i.e., the image has not been cropped). $y$-axis is vertically upwards, $x$-axis is towards the right.
CG coordinate system: The coordinate system shown in red above. This is the 3D coordinate system used when building the CG object.
Camera coordinate system: This is the 3D coordinate system attached to the camera. See the diagram below.

The camera lens is at the origin of this coordinate.

The up direction is along the $y$-axis. The $x$-axis is towards the right. The $z$-axis is looking behind the camera. The screen is considered to the plane $z=-\sigma$ for some $\sigma>0.$ Any point $(x,y,z)$ is projected to $(p,q,-\sigma)$ on this plane. The camera reports this as $(p, q)$ in pixel coordinates. The pixel coordinates system has origin at the centre of the image (which is assumed not be cropped). Clearly $$ p = -\frac{\sigma x}{z}, q = -\frac{\sigma y}{z}. $$ Now take any 3D line not parallel to the screen. Its slope can be expressed as $(m,n,-1),$ since the $z$-component must be nonzero. Pick any point $(a,b,c)$ on the line. Then the parametric form of the line is $$ (a,b,c)+t(m,n,-1) = (a+tm,b+tn,c-t), $$ for $t\in{\mathbb R}.$ This projects to pixel $$ \sigma \left( -\frac{a+tm}{c-t}, -\frac{b+tn}{c-t}\right). $$ As $t\rightarrow\infty,$ this approaches $\sigma (m, n).$

Thus the VP depends only on the slope. This is as expected. Parallel lines having the same slope "meet" at the same VP.

Now specialise to our case. We are working with two VPs, along the $x$-axis and the $z$-axis. Let the slopes of these be $(m_x,n_x,-1)$ and $(m_z,n_z,-1).$ Since these are mutually perpendicular, their dot product vanishes: $$ m_x m_z + n_x n_z + 1 = 0. $$ We have found out $$ (x_1,x_2) = \sigma (m_x,n_x) \mbox{ and } (z_1,z_2) = \sigma (m_z,n_z). $$ Clearly, $$ (x_1,x_2)\bullet (z_1,z_2) = -\sigma^2. $$ Since $\sigma>0,$ we have found out one parameter of the camera.

Now that $\sigma $ is known, we also know $m_x,n_x, m_z$ and $n_z,$ i.e., the orientation of the axes. Indeed, we can write down the rotation matrix $R = [\vec u_1 ~~\vec u_2 ~~ \vec u_3],$ where $\vec u_1$ and $\vec u_2$ are unit vectors along the $x$- and $z$-axes, respectively, of the CG coordinates system, while $\vec u_2 = \vec u_3\times \vec u_1.$ This requires a bit of careful handling, as there are two unit vectors along any straight line. Basically, one has to understand from the photo if the axes are coming towards the camera or away from it, and decide upon the sign accordingly.

So we have obtained the complete orientation of the CG world w.r.t. the camera coordinates system.

All that remains to be done is to locate the origin of the CG world w.r.t. the camera coordinates system.

Clearly, it must be of the form $\left(\frac{o_1}{\sigma}, \frac{o_2}{\sigma},-1\right)t$ for some $t\in{\mathbb R}.$

We shall find that $t.$

The point at the tip of the $x$-axis of the CG world is $\left(\frac{o_1}{\sigma}, \frac{o_2}{\sigma},-1\right)t +\vec u_1$.

This point projects to image pixel $(p_1,p_2).$

So $$ \frac{o'_1t+u_{11}}{t-u_{13}} = p'_1, $$ where $o'_1 = \frac{o_1}{\sigma}$ and $p'_1 = \frac{p_1}{\sigma}.$

Solving $$ t = \frac{p'_1 u_{13} +u_{11}}{p'_1 - o'_1} = \frac{p_1 u_{13}+u_{11} \sigma }{p_1 - o_1}. $$ So we get a value of $t.$ Notice that there are two equations here (both leading to the same value of $t$). Similarly, we can also work out the (same) value by considering $(q_1,q_2)$ or $(r_1,r_2).$

Let us summarise our final products so far:

location of the CG origin $\vec o$ w.r.t. the camera coordinates.
The CG axes unit vectors $\vec u_1, \vec u_2,\vec u_3$ w.r.t. the camera coordinates.

Back to CG coordinates

So far we have expressed the CG coordinates in terms of the camera coordinates. Our original aim was just the opposite, viz, expressing the camera position and orientation in terms of the CG coordinates system.

The location of the camera is just $-R'\vec o$.

The orientation matrix is $R ^{-1} = R'$ .

Understanding rotation in AoI?

There are two ways to specify orientation of an object in AoI: either via Object Layout panel or via Object Transform dialogue. These behave differently. Use the Object Layout panel to set initial values (typically 0,180,0 for the camera). All subtle changes must be done via the Object Transform dialogue, as the transform propagates to the children. Here first the $z$-rotation is done (positive is counterclockwise). Then $y$-rotation around the global $y$-axis, and then around the global $x$-axis.

This may be demonstrated using the following R script:

R1 = function(theta1) {
    theta1 = theta1*pi/180
    c1 = cos(theta1)
    s1 = sin(theta1)
    rbind( c(1,0,0), c(0,c1,-s1), c(0,s1,c1))
}

R2 = function(theta2) {
    theta2 = theta2*pi/180
    c2 = cos(theta2)
    s2 = sin(theta2)
    rbind( c(c2,0,s2), c(0,1,0), c(-s2,0,c2))
}

R3 = function(theta3) {
    theta3 = theta3*pi/180
    c3 = cos(theta3)
    s3 = sin(theta3)
    rbind( c(c3,-s3,0), c(s3,c3,0) , c(0,0,1))
}

R = R1(30) %*% R2(40) %*% R3(50)
> R                      
       [,1]   [,2]   [,3]
[1,] 0.4924 -0.587  0.643
[2,] 0.8700  0.310 -0.383
[3,] 0.0252  0.748  0.663

In AoI we create a line (a thin cylinder, actually) along $y$-axis from $-1$ to 1. Also create a dot (a small sphere) at $(-0.587,0.310,0.748)$, which is the second column of $R.$ Apply the rotation

and you'll see a tip of the line just touch the dot:

Next, take an axis-aligned unit cube with centre at the origin. The camera looks at it along negative $z$-direction from $(0,0,10).$

Rotate the cube by

This produces

Render to get

Now reset the cube to its original orientation. Set the camera position to $R'(0,0,10)' = (0.252,7.478,6.634)'$. Also transform its orientation by

Render to get

What to do in AoI?

Create the CG object using the unit square as the reference. It is a good idea to locate the model on or very close to the unit square. Then place the camera at the location $-R'\vec o$ using the position panel. Also set the orientation to 0,180,0. This corresponds to looking in the negative $z$-direction. Next, use the transform panel to set the rotate angles to the Euler angles for $R'.$