So there’s compositing technique of motion tracking where you have a program – for example, After Effects – follow a specific point in the footage to use as a motion path for later. A simple example would be to follow a person walking down the street, then put, say, an Arby’s logo above their head.
Well, another technique is to take footage that’s been shot, and use a program to extrapolate what the camera was doing in 3D space. It’s a simple idea that’s very complicated to do. Essentially, the program runs through the footage and selects as many points as it can to follow. Then, by figuring out how they twist and move relative to themselves, it can make an accurate judgement of where the camera was when it shot the footage.
To illustrate the point, I have two pieces of footage below. The first is the footage that was shot. The second is what happens when you extrapolate where the camera was (using a program called SynthEyes), then export that camera to a 3D program (in my case, Cinema 4D), build a very simple animation with that 3D program, then export various layers to a compositing program (AfterEffects), then render out the final take. All in all, it takes about an hour.
Source Video:
Motion tracked video with simple 3D elements
So as you can see, when you watch something like Transformers, you now know how they did it.