In the first post, you were introduced to point clouds. This second blog post explains how 3D scanners work. The two most important methods to determine distance during scanning: Stereo Imaging (how different does an object look from two different views) and Time-of-Flight (how long does it take to reflect and return to be measured).

Stereo imaging

Stereo measures depth from how different an object looks when seen from two different views. This type of techniques can be divided into two:

  1. Direct or passive stereo
  2. Active stereo (or structured light) imaging.

Direct stereo

3D scanner
Figure 1: My index finger from both my left eye (left image) and right eye (middle image). See how my finger shifts against the background. The right image illustrates that the displacement angle θ is larger when an object is closer to the viewpoints.

Direct stereo is how our own eyes see depth. When looking at an object against a background first with the left and then the right eye, the object appears to have shifted. The larger the shift, the closer the object is (see Figure 1). Stereo is best known with two views but can be generalized using more views.

You can also find the following related terms: MultiView Stereo (MVS) and Structure from Motion (SfM).

MVS is the computation of the 3D points in the scene from 2D images using known camera positions. PMVS is an academic software package that implements this principle.

SfM estimates the camera positions first (when not known) which can then be used as input for MVS. VSFM is an example of academic software in this category.

It is essential to find corresponding points between images from different views. The easiest points to match are well-defined points in images, like corners with high contrast. Figure 1 shows an example of such a point: the tip of my index finger.

In practice, this is not always possible to find such an easy point, especially when scanned objects consist of large smooth surfaces without texture e.g. the floor and walls in Figure 1.

Some well-known techniques to find such matching points in a series of images are SIFT, SURF, etc. Using the same set of corresponding points in different images allows us to compute 3D points in a scene.

Active stereo

Active stereo (or using structured light) is a technique that makes it is easier to find matching points between different views. It projects a well-known pattern in visible light or in infrared on the scene, and with a camera. It then analyses the displacement between the outgoing pattern and the pattern observed by the camera (Figure 2).

Figure 2: (a) The IR pattern that in this case a Kinect sensor projects on the scene; (b) An example of how the Kinect sensor observes the IR pattern. (Image with courtesy of BBZippo)

 

The Structure Sensor, the first generation of the Kinect and the DotProduct sensors to use this active stereo principle. An advantage is that this technique can record both depth and visible color simultaneously.

Laser scanners (Lidar)

Lidar (or 3D laser scanning) uses different principles to measure distances. The word Lidar is a contraction of Light-Radar. In Lidar, there are again two different principles: Pulse-based Time-of-Flight (TOF) and phase-shift-based.

3D scanner
Figure 3: Principle of Time-Of-Flight laser scanning. The red beam is the outgoing beam, the green one the reflected beam. In (a), the scanner measures the distance d simply by taking half the time t between emitting a laser pulse and receiving back its reflection times the speed of light.

Time-of-Flight

This type is the easiest to understand. It is a bit like using echo to measure a distance: the time between you shouting and you hearing the echo tells you how far away your voice reflected. Now replace “voice” with “laser pulse” and the rest is the same: the time between the pulse leaving the device and the reflection coming back gives you the distance to this reflection spot. Given that light travels at 299 792 458 m/s, these calculations are fast!

This same principle is the basis for radar (using radio waves), Lidar (using light waves) and sonar (using sound waves).

Compared to phase-shift, Time-of-Flight typically has a larger range, up to 200-300 meter (656-984 ft), but a lower accuracy. It can also measure fewer points per second (~50,000 points per second) because it can only send one pulse of light at a time.

Phase-shift

In this type of scanner, a so-called “modulator” changes the intensity of the laser beam in relation to the time emitted. The scanner uses the time taken for the beam to reflect to calculate the distance from where it reflected. This is shown in Figure 4.

3D scanner
Figure 4: Principle of phase-shift laser scanning. A laser pulse is emitted from the source with an intensity that is a function of time (I0 was emitted a time t0, I1 at time t1 and so on). The colored bars indicate the intensity of the pulse at a given time. So when the sensor (or camera) sees a sequence of intensities I0 I1 I2 it can determine at what time the pulse was sent, thus measure how long it took to reflect and from this time again compute the distance.

By measuring the intensity of the beam the scanner can know which pulse it is measuring at any given time. If the beams return in a different order to that which they were sent (because the reflection point was further away) the scanner can still know which order the beams were sent. This means that it can send out more pulses per second as it does not have to wait until the light returns to send the next pulse.

As a result, a phase-based scanner can measure around a million points per second. A disadvantage of this method is that the range is limited to about 60 to 200 meters ( 197 – 656ft).

Combinations

Combinations of the two principles also exist, like Leica’s WaveForm Digitizer technology. This technology makes a tradeoff between measuring speed, distance and accuracy.

Static or mobile?

Mobile scanners are also becoming more popular. Mobile scanners can never be as accurate as static scanners. However, they can cover more area in less time, while still keeping the accuracy acceptable, < 2 cm (0.8″).

To scan a large project using static scanners, the operators need many different scan positions to cover the whole project (see also Figure 5). Only when they calibrate the scanners at all these positions carefully (which is very time consuming), can they merge all their data without introducing errors.

what is a point cloud what are point clouds? 3D scanner
Figure 5: From scanner 1, the left side of object A is visible, but everything in the light blue area is behind object A (including object B) is invisible to scanner 1. From scanner 2, the bottom of A and B are visible, but everything in the light red areas is invisible to scanner 2. The purple cross sections are invisible for either scanner.

 

Mobile scanners are faster, thus cheaper, but a bit less accurate. However, they are typically accurate enough for most applications in architecture. These solutions use clever SLAM (Simultaneous Localisation and Mapping) algorithms to keep possible alignment errors within an acceptable range.

An example of such a mobile scanner is the Geoslam Zeb-REVO. To use this an operator walks the scanner around the location(s) needed and measures up to 50,000 points per second. SLAM methods can then bring all measured points together in a single point cloud, with an accuracy of ~1.5 cm (0.6″).

Another mobile scanner is the NavVis M6. This is a kind of trolley that an operator can move around. NavVis claims that this system brings a good trade-off between scanning speed and accuracy: the area that an operator can scan 10 times as much in a day than he could with a static scanner ~2,000m2 (21,527ft2) using the NavVis M6 system. Thanks to using SLAM algorithms, accuracy is better than 1 cm (0.4″).

Next Week

Come back next week to learn how to plan a cloud point scanning project.