The main idea is that the CAD model used to augment the video also provides important information to the tracking algorithm. The ESM tracking algorithm is able to determine the pose of a calibrated camera (6 DOF) for a video sequence given one or more faces and their initial orientations. Assuming the tracking is correct, we can use the camera pose to project a model of the tracked object and augment the images. Examples can be seen below.
The projected model not only augments the images but it also states where the faces of the model are located in the current frame. Faces that are trackable, i.e. not occluded and large enough, can be incorperated in the tracking. To avoid drifting a database of already tracked faces and a template update scheme is used. Details are described here and in the paper.