Dec. 12, 1998

The following shall become, in time, an introduction to the main relevant aspects of perspective for 3D reconstruction as it pertains to human perception and its use of perspective cues. This "page" also serves as a note-book to keep track of ideas and concepts that may prove useful to include in the SkiP system.

One definition ought to be:

- A simple model of image formation on a retina, i.e. through a central projection of light rays, where all rays are assumed to pass through a nodal or focal point (e.g. the iris of the human eye or of a camera).

The traditional way of representing the effect of perspective in a picture is by specifying the type of vanishing points/lines that may occur in 3D space. The generic case is when all 3 directions of space define a VP. Pair-wise these VPs define 3 vanishing lines which can be taken as equivalent horizons.

It may happen that we view an object in a more special way, e.g. by standing right in front of it and looking at it parallel-wise. Then the picture plane (our retina - flatten-out) may either be at an angle with respect to the main orientation of the object (2PtP) or be parallel to it (1PtP).

Perspective in 3D space (from cs123 notes).

These 3 cases are usually illustrated as above by considering the case of a box, which has the advantage of incorporating the 3 direction of space implicitly (a box is a frame is a box!).

Thus, if it happens that we have box-like objects in a scene, their perspective cues will be very helpful to retrieve the parameters of perspective for the scene at once. This is called "inverse perspective" in mathematics, and "camera calibration" in photogrammetry/vision: the process of retrieving or fine-tuning camera position, orientation, focal, etc.

Boxes and objects delimited by boxes are in fact quite commonly encountered. You may think of objects like cylinders/poles, pavements/bricks, corridors (box inside-out), buildings/walls/windows, etc.

We are really thinking of a *generalized box* here; i.e, 3 sets of
parallel lines mutually orthogonal. Certain objects can define 1, 2 or
the 3 sets of such lines. The "lines" do not necessarily have
to physically exist: they can be defined by the alinement of objects
which in turn define parallels of the scene.

As long as objects in a scene provide us such 3 sets of line, we are happy with using perspective cues. And thus far (i.e., for versions 1 and 2), SkiP has relied solely on such cues to resolve interactively camera calibration from a single view.

Are the 3 types of perspective disjoint? not really.

1PtP and 2PtP really are two facets of the same reality: the eye/camera
is pointing perpendicularly at objects lying on the ** same ground
plane or floor**. Note that we change the emphasis from the
objects to the nature of the scene projection, i.e., how the camera/eye
is oriented with respect to an important scene structure: the Horizon
Line. Indeed, in 1 & 2 PtP, there is only one single Horizon Line
and the camera/eye necessarily is looking at it: i.e., the Principal
Point must intercept this Horizon.

The difference between 1 and 2 PtP then merely helps us to distinguish where on this line and how we shall localize the PP.

The similitudes of 1PtP and 2PtP (from [Ching90]
).

N.B.: For an angle of 45°, 2PtP is really a 1PtP for the
quadrilateral seen as a diamond.

It will be convenient then, once such an Horizon Line (HL) is identified, to make sure the ground plane or floor, upon which objects are built, vanishes in this HL. This is what was initially provided in SkiP (v1).

Then, 3PtP can then be defined as those cases where the camera/eye has its PP not intercepting the HL, but instead piercing the ground plane (either in front of the viewer - bird's eye view - or behind - worm's eye view).

In summary, there are really 2 distinct type of perspective in 3D space, depending if the camera/eye looks at the "horizon" or not.

Note also that, in general (in practice) most objects will be attached or oriented with respect to the ground (due to the constraint/influence of gravity). Hence the perspective of the camera viz the scene will be shared by these objects.

This may not be the case of those objects "flying" or "in the air", i.e. not attached directly to a ground floor and oriented in 3D space irrespective of this ground.

We may argue then that:

- There are 2 types of scene perspective, function of the camera/eye intercepting the Horizon or not.
- There are 2 types of object perspective, function of an object attachment, or lack thereof, to a ground/floor plane.

The above conjecture has a direct impact on how one reconstruct objects in a scene. For scene perspective of type 1 (PP intercepts HL), the ground plane is parallel to the camera/eye horizon plane. Thus, one can proceed directly to reconstruction, once camera calibration is solved (the ground plane is simply a copy of the camera horizon plane, rotated about the horizon, if needed).

For type 2, the two planes are independent. Only the ground plane is relevant for reconstruction, and thus one extra step is required, after camera calibration, to specify this very ground plane.

In both 1PtP and 2PtP, the 2 VPs for parallel directions in the ground plane at right angle define the "arc capable". The segment of the Horizon Line joining both VPs defines the diameter of a "visual sphere" on which the camera/eye must lie. Furthermore, since the PP must also lie on this diameter, the visual plane, perpendicular to the picture plane, defines a great arc of circle directly above the HL, along which the CS must lie. In 1PtP, the CS is exactly at mid-distance along this arc from either VPs.

* Need a good picture here ...

In 3PtP, the situation is both more complex and simpler ... that is more constrained. Three mutually perpendicular direction are represented by 3 VPs. These, taken by pair, define 3 diameters, and hence 3 spheres, and the camera/eye must lie at their intersection: a uniquely defined point in space (in oriented projective geometry - that is front of the picture plane). This CS is also the apex of the visual tetrahedron with base defined by the 3 diameters.

* Need another good picture here ...

Page created & maintained by Frederic Leymarie, 1998.

Comments, suggestions, etc., mail to: leymarie@lems.brown.edu