How can I convert 3D world coordinates to 2D image coordinates and viceversa?

The ZED SDK provides registered depth and RGB information. The pixel of coordinate (u,v) of the depth map contains the depth value of the pixel of coordinate (u,v) of the RGB image.

The ZED SDK provides also the point cloud, providing 3D and RGB information for each pixel (u,v).

What are the formulas to get the 3D coordinates of an image pixel?

Given the 2D coordinates (u,v) of a pixel we can calculate the 3D coordinates (X,Y,Z) in the image frame (Z forward, X right, Y down) by using the following formulas:

Z = depth value of the pixel (u,v) from the depth map
X = (( u - c_x) * Z) / (f_x)
Y = (( v - c_y) * Z) / (f_y)

What are the formulas to get the pixel coordinates of a 3D coordinate?

Given the 3D coordinates (X,Y,Z) in the image frame (Z forward, X right, Y down) it is possible to calculate the 2D coordinates (u,v) of the corresponding pixel by using the following formulas:

u = ( X / Z ) * f_x + c_x 
v = ( Y / Z ) * f_y + c_y

What are f_x, f_y, c_x, and c_y?

They are the intrinsic camera parameters obtained with the camera calibration procedure, using the "pin hole" camera model:

  • (f_x, f_y): focal lengths in pixel units
  • (c_x, c_y): coordinates of the principal point

They are available runtime in the CameraParameters (C++, Python) data structure of the ZED SDK.