Computing & the Arts

M.C. Escher (1898 - 1972), Bond Of Union, 1956.

1. Computational theories of Visual Perception

David Marr's model

(circa 1980)

"Vision can be understood as an information processing task which converts a numerical image representation into a symbolic shape-oriented representation."

Marr [Marr:Vision:1982] proposed three different levels for the understanding of information processing systems (having vision systems as the target example):

1. computational theory;

2. representation and algorithm; and

3. hardware implementation.

One of the Marr's most important contribution was made in the level of representation and algorithm when he proposed a representational framework for vision. He concentrated on the vision task of deriving shape information from images.

After D. Marr, in "Vision," 1982.

From S. Lehar.

The intensities perceived by any visual system are a function of four main factors:

1. the geometry (meaning shape and relative placement);
2. the reflectance and absorption properties of the visible surfaces (physical properties);
3. the illumination (light sources); and
4. the camera (viewpoint, optics).

The Primal Sketch

The detection of intensity changes, the representation and analysis of local geometric structures and the detection of illumination effects take place in the process of generation of the primal sketch, where independent spatial organizations of the viewed intensities in a scene reflects the structure of the visible surfaces. Marr proposed to capture these organizations by using a set of "place tokens," or low level features, which correspond to:

 1. oriented edges,

 2. bars,

 3. ends and

 4. blobs

Each of these were represented by a 5-tuple: type, position, orientation, scale, contrast.

A set of examples :

The 2.5 sketch

The 2.5-D sketch is intended to represent the orientation and depth of the visible surfaces as well as discontinuities. It is composed of some local surface orientation primitives, distance from the viewer and discontinuities in depth and surface orientation and, as in the previous representation, it is specified in a viewer-centered coordinate system.

It also takes into account visual information of motion, shading, shape and texture.

What is in the ".5" ? this is not about a fractal dimension, but rather a metaphor for the claim/concept that, in reality, we do not see all of our surroundings. For example, consider someone with her back turned to you. You can only see half of her body, although you assume their is some front part to her body (with a face, etc.). 

The point Marr is making here is that we are not actually aware of all our surroundings and so construct details to fill in the gaps.

From Benjamin Kimia, Brown University.

The 3D Model


The recognition process uses a catalogue of 3-D models which is a collection of stored 3-D model descriptions and various indices into the collection that allow the association of a new description with the appropriate one in the collection.

All 3-D model descriptions can be organized in a hierarchy according to the specificity of information they carry. The top level of such a hierarchy is a model which does not have a component decomposition and describes the model's principal axis. At the next level in the hierarchy more details are added to the model, like the number and distribution of subcomponent axes along the principal axis. At the lower levels each individual object's model receives more precise descriptions, and they can now be distinguished by the angles and length of their components.

From David Marr's book: Vision, 1982.

Another famous example of related ideas human cognition (primitve-based representation of objects) are the geons of Biederman et al. :


Object-centered representation:

Observer-based (or view-based) representation:


Edelman, S. & Vaina, L., "David Marr: a short biography," in the International Encyclopaedia of Social and Behavioral Sciences, 2001
Local copy

Guo, C., Zhu, S.-C, & Wu, Y., "A Mathematical Theory of Primal Sketch and Sketchability," Proc. of the Int. Conf. on Computer Vision (ICCV), pp. 1228-1235, Nice, France, 2003.

Marr, D., Vision: A computational investigation into the human representation and processing of visual information, W.H. Freeman publ., 1982.

Watt, R.J., Visual Processing: Computational, Psychophical and Cognitive Research, Lawrence Erlbaum publ., 1988.


Last update: Oct. 10, 2006.