What is Plenoptic camera? How it works? What is the application?

 

Plenoptic cameras, known as light field cameras, best capture light fields. How do robotic systems use light field data? What new features does this type of camera create? How do you calibrate them properly? In this article, we take a closer look at the concept.

Light field

A light field is a 4D dataset that offers high potential to improve the perception of future robots. Its superior precision and detail can provide key advantages to automated object recognition. The 4D data is created in the following way: A 2D pixel matrix of the sensor exists, which can be displaced in 3D space and record the light field. However, this would be 5D. Since it makes no difference in the calculation if the camera is moved in the direction of the light beam, one dimension can be deleted from this equation – a 4D data set is created.

Compared to conventional 2D images, a 4D light field thus has two additional dimensions resulting from spatial displacement and reflecting the depth of the scene encountered. Through them, it is possible to obtain additional information and infer different data products. Most often, the industrial application of the light field exploits 2D images focused on a certain distance or 3D depth images (point clouds).

Features of the light field using the plenoptic function

Conventional cameras recreate the vision of the human eye. The camera views a scene from a fixed position and focuses on a specific object. Hence, they focus a selection of light rays from a scene to form a single image.

In contrast, a light field contains not only the light intensity at the position where a light ray hits the camera sensor, but also the direction from which the ray arrives. The intensity, i.e. the amount of light measured by the sensor, at position (u, v) and the directional angles (Θ, ζ) are information that a 4D light field creates using rendering.

Thanks to the additional two dimensions, other data products are possible. 2D images occur most frequently. For example, these are focused on a certain distance or have an extended depth of field. An additional possibility is to create a 3D depth image from the 4D light field data. With a single image capture, it is now feasible to create multiple directly related image products.

In order to put the light field into application and use its information productively, it requires associated software or an algorithm that evaluates this data. Due to their four dimensions, light field images are very large and therefore require a lot of computing power. For this reason, the industrial use of plenoptic cameras is still in its infancy – only recently have powerful computers made it possible to process this data in sufficiently short periods of time. From the image to the 3D point cloud, this is currently possible in about 50 ms.

But how exactly must a camera be designed that can capture the light field in such a way?

Plenoptic cameras

Plenoptic cameras are created by modifying conventional cameras. Due to the theoretical infinite depth of field and the possibility of refocusing, it is possible to subsequently shift the plane of focus in the object space. Due to this additional depth information, a plenoptic camera can also be used as a 3D camera.

To image the plenoptic function, there are basically two physical possibilities:

Light field acquisition with micro lenses

In a micro lens array (MLA), a plate with micro lenses sits in front of the camera sensor. A micro lens is a miniaturized control and focusing lens. It bundles the light optimally to prevent light rays from hitting the edge of the image sensor. In this way, it prevents distortions or differences in brightness from occurring. Depending on the type of light, the micro lens must be made of a different material:

  1. Wavelength between 150 nm and 4 μm: silicon dioxide
  2. Wavelength between 1.2 μm and 15 μm (infrared light): silicon

The MLA is a matrix of individual micro lenses, allowing it to capture the scene from different angles. Each of these lenses has a diameter of between a few micrometers and a few millimetres. The field of view covers several hundred pixels. The incident light rays refract in the micro lenses, in accordance with the laws of physics. Thus, depending on the direction of incidence, the ray falls on a certain sensor pixel below the micro lens. Thus, the position (u, v) corresponds to a micro lens and the direction angles (Θ, ζ) correspond to a sensor pixel below this lens.

Due to their small size in the sub-millimeter range, light-field microlens arrays are used in particular in the life sciences – e.g. installed in microscopes.

Light field acquisition with camera arrays

A camera array is basically a macroscopically extended approach of micro lenses. The individual cameras are controlled via an Ethernet switch, for example. The data is retrieved via an Ethernet interface. The retrieval of the resulting data works via an Ethernet interface to achieve high processing speeds.

To enable the image data from the various cameras to be converted into a data product, the cameras are arranged in a known regular pattern. Only in this way can the algorithm behind it calculate the shifts correctly.

Through the multiple cameras, a scene is observed from different positions. Each camera has different parameters:

With the help of a camera array, the complete inspection of a scene (e.g. 3D survey), the determination of spectral properties (e.g. color and chromatic aberration) or the acquisition of dielectric properties (e.g. polarization) is possible.

Camera arrays are not suitable for the microscopic range due to their size. Instead, they show their advantages in larger applications: in industrial production, special camera arrays are used for detection in the range of a few centimeters (e.g. screws) to meters (e.g. for pallets).

Distributed camera arrays

Distributed camera arrays consist of several cameras, which are still modeled as single cameras. This means that the entire camera array cannot be described with common extrinsic parameters (= position of the camera in the world coordinate system). In addition, the spatial coverage areas often do not overlap. Application areas of these camera arrays are surveillance systems for different premises or industrial inspection, where only different object areas have to be covered.

Such systems can contain homogeneous (e.g. surveillance systems) as well as heterogeneous (e.g. inspection system) sensors. Here, the recorded data complement each other. To avoid overlapping, the number of cameras should always be chosen minimally with respect to the task.

Compact camera arrays

The cameras of a compact camera array are modeled together and therefore have additional extrinsic parameters by which it is possible to describe the position of the entire camera array with respect to the scene. In this case, the spatial coverage areas usually overlap considerably.

Such a system usually contains homogeneous sensors. The acquired information can be complementary as well as distributed (only a joint evaluation of the images provides the desired information). Compact camera arrays are also capable of capturing uni- (variation of a single acquisition parameter) and multivariate (variation of multiple acquisition parameters) image series.

Compact camera arrays are becoming increasingly important for many applications because they offer comprehensive capabilities to fully capture the visually detectable information of a scene.

Calibration of Plenoptic Cameras

In practical applications, it is often not only optical detection that is important, but also precise measurement of the detected workpieces. Correctly calibrated, plenoptic cameras can also be used as measuring systems: the metric information required for this purpose comes directly from the light field data of the calibration and the general properties of the plenoptic function.

However, commercially available plenoptic cameras usually provide distance values in non-metric units. This presents a hurdle for robotics applications, where metric distance values must be available. By separating the configuration of traditional plenoptic cameras from the new properties of plenoptic cameras, it is now possible to use traditional camera calibration methods to simplify the plenoptic camera alignment process and increase accuracy. Currently, accuracies in the sub-millimeter range can be achieved. For this purpose, the pinhole camera model is used as for a conventional camera.

The system uses two different input data types to perform these two steps of alignment. These data types are 2D images with an extended depth of field and 3D depth images.

Thus, the noise of the depth estimation no longer affects the estimation of traditional parameters, such as the focal length or the radial lens distortion. This results in further advantages:

  1. Application of different optimizationsto the above input data. This makes it possible to reduce outliers for the particular data type in a more targeted way.
  2. Bisection is easier, making novel and faster initialization models for all parameters realistic.
  3. Mixing of models for lens and depth distortion as well as those for internal quantities (focal length f, distance b between MLA and sensor, distance h between MLA and objective lens) is avoided.

Plenoptic cameras are therefore able to shift the depth of field even in retrospect or to generate different data products from their images. In order to evaluate the data correctly, the calibration must be divided into two steps. This makes it possible to reduce the noise component, which in turn makes depth estimation possible. Plenoptic imaging techniques are therefore disruptive technologies that allow new application areas to be opened up and traditional imaging techniques to be developed further.

Unlike conventional cameras, a Standard Plenoptic Camera comes with an array of micro lenses statically placed at the distance of the effective micro lens focal length fs in front of the image sensor.

Each micro lens projects a micro image on the sensor plane. For simplicity, the animated figure above only depicts 6 micro lenses with rays piercing through their optical centre, a property which defines them to be so-called chief rays. If the main lens is modelled as a thin lens, a special type of chief rays is seen to cross the optical centre of the main lens just as that of a micro lens. Closer inspection reveals that each of these plenoptic chief rays (yellow) impinges on a centre of a respective micro image. Taking this centre as a reference, adjacent micro image positions can be found which are separated by a constant width, commonly known as pixel pitch. Given micro image

positions one pixel below the centre, the animation indicates how chief rays (blue) at the respective positions are traced back to object space by the aid of geometrical optics. Similarly, this is applied to chief rays impinging on positions above micro image centres (green).

Thereby, a basic principle of the Standard Plenoptic Ray Tracing Model relies on the property of collimated (i.e. parallel) light rays passing through a convex shaped lens. It is a well known fact in geometrical optics that parallel light rays travelling through a convex lens converge at the focal point on the image side of that lens. This also may be applied the other way around: Light rays that emitted from points along the focal plane diverge before entering the lens material and propagate in a parallel manner after passing through the lens material .

Taking advantage of the latter statement, it is possible to trace not only chief rays but all kinds of light rays starting from any point at the image sensor. Rays that impinge on an arbitrary position u0 are considered to be parallel in the range between micro lens s and main lens U. Such collimated light beams are refracted when travelling through the main lens and due to the parallel alignment, a light beam focuses at the main lens’ focal plane FU. The height of the intersection point with the focal plane FU depends on the light beams’ angle and therefore on u. Similar to the micro image position u0, adjacent positions (u1u2) are traced through the lens system. For the sake of clearness, only chief rays of light beams are depicted in the model.

Refocusing

In paraxial optics, a well focused conventional camera is seen to project a far distant object point onto the image plane in a way that its image point is sharply focused and infinitesimally small. Given the same focus setting, another projected point from a much closer object would focus behind the sensor meaning that its directional rays are distributed over a larger portion of the image plane causing a blurred spot to occur on the output image. With traditional imaging, it is difficult to resolve blurred image areas. The plenoptic camera overcomes this limitation by the aid of a micro lens array and an image processing technique which is indicated below.

As highlighted in the animation, the light intensity of an image point E’0(s0) emitted from M0 is distributed over image sensor locations Efs(u0s0), Efs(u1s0) and Efs(u2s0). Hence, by summing up these Efs values, the intensity at point E’0(s0) is retrieved. Upon closer inspection, it may be apparent that each image point at plane E’0 is obtained by calculating the sum of all pixel values within the respective micro image s.

As suggested by the term refocusing, another object plane, e.g. M1, may be computationally brought into focus from the same raw image capture. For instance, light rays emanating from a plane M1 can be thought of as projecting corresponding image points at E’1 behind the image sensor. The intensity of E’1(s0) is then recovered by integrating Efs intensities at positions (u2s0), (u1s1) and (u0s2). From this example, it is seen that pixels selected to form image points behind the sensor are distributed over several micro images. In general, it can be stated that the closer the refocusable object plane Ma to the camera, the larger the gap between micro lenses from which pixels have to be merged. Recalling the second figure shown in the Sub-aperture section, an analogue representation is depicted below showing an approach to accomplish refocusing from previously extracted sub-aperture images.

While investigating the image refocusing process, the question came up what the distance to a refocused object plane Ma and its depth of field might be. A generic solution to this is discussed in the following.

Refocusing Distance and Depth of Field

According to the section Refocusing, it is possible to trace chief rays back to object space planes where they have been emitted from. Given all lens parameters of the Standard Plenoptic Camera, the slope of each chief ray within the camera as well as in object space may be retrieved and described as a linear function in a certain interval (e.g. from sensor image plane to micro lens). To approximate the metric distance of a refocusing plane Ma, the system of two arbitrary chief ray functions intersecting at plane Ma is solved.

Based on the propositions made in section model, a point u at the image sensor plane is seen to be of an infinitesimal size. However, lenses are known to diffract light and project a shape of an Airy pattern onto the image plane. Therefore, the width of u, which is due to lens diffraction, needs to be taken into consideration. Besides, sensor picture cells also have a finite size which is assumed to be greater than, or at least equal to, the lens diffraction pattern.

Tracing rays from sensor pixel boundaries in the same manner as described in section model yields intersections in object space, denoted as da- and da+, in front of and behind the refocusing slice Ma, respectively. These planes indicate the depth of field of a single refocusing slice Ma. Object surfaces located within that depth range are ‘in focus’ meaning that objects located in that distance interval exhibit least blur.

Plenoptisign – Refocusing Distance Estimation

This section features a program to compute aforementioned light field parameters posed by any Standard Plenoptic Camera. It is its purpose to evaluate the impact of optical parameters on the depth resolution capabilities of your plenoptic camera. This may be useful (but not limited to) an early design phase, prototyping stage or calibration-free conceptualisation of a plenoptic camera that requires metric precision outputs.

Sub-aperture

Similar to an array of cameras, the Standard Plenoptic Camera allows for a multi-view image acquisition. However, in contrast to a camera array, a plenoptic camera requires an additional image processing procedure to extract so-called sub-aperture images, which correspond to multi-view images in case of a camera array. As seen in the animation below, a sub-aperture image is composed of pixels sharing the same relative micro image position u (highlighted by colour). Further details on the triangulation, baseline and tilt angle can be found hereafter.

The sub-aperture extraction procedure collects pixels with the same distinct position u under each micro lens and places them in a new image array while each selected pixel is rearranged according to its dedicated micro lens position s. For instance, the central position u1, highlighted in yellow colour, corresponds to the central view whereas surrounding micro image positions u, e.g. blue or green, represent adjacent views from different perspective. The extraction process implies that the number of sub-aperture views amounts to the number of pixels in micro image. Consequently, the effective resolution of a sub-aperture image equals the number of micro lenses in the plenoptic camera it has been captured with. Below you can find an alternative scheme illustrating the rearrangement of pixels in the sub-aperture extraction process.

Triangulation

In stereoscopy and multi-view camera systems, it is well studied in which way the cameras’ positions affect a depth map’s quality. The key parameter in stereoscopic vision is the ‘baseline’, a technical term used to describe the distance separating optical centres of two objective lenses from each other. Although it is feasible to identify real camera positions in traditional stereoscopy, it is not obvious how plenoptic baselines and tilt angles are determined. Baseline and tilt angle parameters are needed when screening stereo or light field content on autostereoscopic displays. Besides, it would be interesting to see the impact of a plenoptic lens design on the acquired depth information. Based on the provided Standard Plenoptic Camera Model, explanations below answer these questions.

Baseline

Examination of the proposed model suggests that tracing the paths from light field rays in object space yields intersections along the entrance pupil A”. Since all chief rays, that form a sub-aperture light beam, travel through the same point A”i, this point can be regarded as the optical centre of a virtual camera lens. Accordingly, a virtual optical centre A”i is given by calculating slopes of respective object rays and finding their intersection along the entrance pupil. Similar to stereoscopic imaging, baselines BG in the Standard Plenoptic Camera can be obtained by the distance of two virtual optical centres such that BG = A”i + A”i+G .

Tilt angle

If each sub-aperture light beam belongs to a virtual camera lens A”i, its chief ray z’i can be thought to be its optical axis. An optical axes z’i may be used to indicate the tilt angle of each virtual camera lens. As depicted in the figure below, by shifting the main lens along zU, we observe that optical axes z’i change their slope with respect to zU except for the central optical axis z’0. Therefore, this behaviour can be seen as tilting the virtual camera lenses.

Plenoptisign – Triangulation Estimation

This section features a program to compute aforementioned light field parameters posed by any Standard Plenoptic Camera. It is its purpose to evaluate the impact of optical parameters on the depth resolution capabilities of your plenoptic camera. This may be useful (but not limited to) an early design phase, prototyping stage or calibration-free conceptualisation of a plenoptic camera that requires metric precision outputs.

Please feel free to contact us to ask detail.