Visual Servoing Platform  version 3.3.0
Tutorial: AprilTag marker detection

Introduction

This tutorial shows how to detect one or more AprilTag marker with ViSP. To this end, we provide vpDetectorAprilTag class that is a wrapper over Apriltag 3rd party library. Notice that there is no need to install this 3rd party, since AprilTag source code is embedded in ViSP.

The vpDetectorAprilTag class inherits from vpDetectorBase class, a generic class dedicated to detection. For each detected tag, it allows retrieving some characteristics such as the tag id, and in the image, the polygon that contains the tag and corresponds to its 4 corner coordinates, the bounding box and the center of gravity of the tag.

Moreover, vpDetectorAprilTag class allows estimating the 3D pose of the tag. To this end, the camera parameters as well as the size of the tag are required.

In the next sections you will find examples that show how to detect tags in a single image or in images acquired from a camera connected to your computer.

Note that all the material (source code and image) described in this tutorial is part of ViSP source code and could be downloaded using the following command:

$ svn export https://github.com/lagadic/visp.git/trunk/tutorial/detection/tag

Print an AprilTag marker

We provide a ready to print 36h11 tag that is 9.5 by 9.5 cm square [download].

If you prefer, you can also directly download on the Apriltag website some pre-generated tag families:

In each archive you will find a PNG image of each tag, a mosaic in PNG containing every tag and a ready-to-print postscript file with one tag per page. If you want to print an individual tag, you can manually scale the corresponding PNG image using two methods:

  • on Unix with ImageMagick, e.g.:
    $ convert tag36_11_00000.png -scale 5000% tag36_11_00000_big.png
  • or open the image with Gimp:
    • then from the pulldown menu, select Image > Scale Image
    • set the unit and the size
    • set the Interpolation mode to None
    • click on the Scale button
    • From the pulldown menu, select File > Export As
    • Save the image as a new PNG image, e.g., /tmp/tag36_11_00000-rescaled.png
    • Send the PNG file to your printer

AprilTag detection and pose estimation (single image)

The following example also available in tutorial-apriltag-detector.cpp detects a tag on a single image.

#include <visp3/detection/vpDetectorAprilTag.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/io/vpImageIo.h>
#ifdef VISP_HAVE_PUGIXML
#include <visp3/core/vpXmlParserCamera.h>
#endif
int main(int argc, const char **argv)
{
#if defined(VISP_HAVE_APRILTAG) && (defined(VISP_HAVE_X11) || defined(VISP_HAVE_GDI) || defined(VISP_HAVE_OPENCV))
std::string input_filename = "AprilTag.pgm";
double tagSize = 0.053;
float quad_decimate = 1.0;
int nThreads = 1;
std::string intrinsic_file = "";
std::string camera_name = "";
bool display_tag = false;
int color_id = -1;
unsigned int thickness = 2;
bool z_aligned = false;
for (int i = 1; i < argc; i++) {
if (std::string(argv[i]) == "--pose_method" && i + 1 < argc) {
poseEstimationMethod = (vpDetectorAprilTag::vpPoseEstimationMethod)atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--tag_size" && i + 1 < argc) {
tagSize = atof(argv[i + 1]);
} else if (std::string(argv[i]) == "--input" && i + 1 < argc) {
input_filename = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--quad_decimate" && i + 1 < argc) {
quad_decimate = (float)atof(argv[i + 1]);
} else if (std::string(argv[i]) == "--nthreads" && i + 1 < argc) {
nThreads = atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--intrinsic" && i + 1 < argc) {
intrinsic_file = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--camera_name" && i + 1 < argc) {
camera_name = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--display_tag") {
display_tag = true;
} else if (std::string(argv[i]) == "--color" && i + 1 < argc) {
color_id = atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--thickness" && i + 1 < argc) {
thickness = (unsigned int)atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--tag_family" && i + 1 < argc) {
tagFamily = (vpDetectorAprilTag::vpAprilTagFamily)atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--z_aligned") {
z_aligned = true;
} else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "Usage: " << argv[0]
<< " [--input <input file>] [--tag_size <tag_size in m>]"
" [--quad_decimate <quad_decimate>] [--nthreads <nb>]"
" [--intrinsic <intrinsic file>] [--camera_name <camera name>]"
" [--pose_method <method> (0: HOMOGRAPHY, 1: HOMOGRAPHY_VIRTUAL_VS, "
" 2: DEMENTHON_VIRTUAL_VS, 3: LAGRANGE_VIRTUAL_VS, "
" 4: BEST_RESIDUAL_VIRTUAL_VS, 5: HOMOGRAPHY_ORTHOGONAL_ITERATION) (default: 0)]"
" [--tag_family <family> (0: TAG_36h11, 1: TAG_36h10 (DEPRECATED), 2: TAG_36ARTOOLKIT (DEPRECATED),"
" 3: TAG_25h9, 4: TAG_25h7 (DEPRECATED), 5: TAG_16h5, 6: TAG_CIRCLE21h7, 7: TAG_CIRCLE49h12,"
" 8: TAG_CUSTOM48h12, 9: TAG_STANDARD41h12, 10: TAG_STANDARD52h13) (default: 0)]"
" [--display_tag] [--color <color_id (0, 1, ...)>]"
" [--thickness <thickness>] [--z_aligned]"
" [--help]"
<< std::endl;
return EXIT_SUCCESS;
}
}
cam.initPersProjWithoutDistortion(615.1674805, 615.1675415, 312.1889954, 243.4373779);
#ifdef VISP_HAVE_PUGIXML
if (!intrinsic_file.empty() && !camera_name.empty())
parser.parse(cam, intrinsic_file, camera_name, vpCameraParameters::perspectiveProjWithoutDistortion);
#endif
std::cout << cam << std::endl;
std::cout << "poseEstimationMethod: " << poseEstimationMethod << std::endl;
std::cout << "tagFamily: " << tagFamily << std::endl;
std::cout << "nThreads : " << nThreads << std::endl;
std::cout << "Z aligned: " << z_aligned << std::endl;
try {
vpImageIo::read(I, input_filename);
#ifdef VISP_HAVE_X11
vpDisplayX d(I);
#elif defined(VISP_HAVE_GDI)
#elif defined(VISP_HAVE_OPENCV)
#endif
vpDetectorAprilTag detector(tagFamily);
detector.setAprilTagQuadDecimate(quad_decimate);
detector.setAprilTagPoseEstimationMethod(poseEstimationMethod);
detector.setAprilTagNbThreads(nThreads);
detector.setDisplayTag(display_tag, color_id < 0 ? vpColor::none : vpColor::getColor(color_id), thickness);
detector.setZAlignedWithCameraAxis(z_aligned);
double t = vpTime::measureTimeMs();
std::vector<vpHomogeneousMatrix> cMo_vec;
detector.detect(I, tagSize, cam, cMo_vec);
std::stringstream ss;
ss << "Detection time: " << t << " ms for " << detector.getNbObjects() << " tags";
vpDisplay::displayText(I, 40, 20, ss.str(), vpColor::red);
for (size_t i = 0; i < detector.getNbObjects(); i++) {
std::vector<vpImagePoint> p = detector.getPolygon(i);
vpRect bbox = detector.getBBox(i);
std::string message = detector.getMessage(i);
std::size_t tag_id_pos = message.find("id: ");
if (tag_id_pos != std::string::npos) {
int tag_id = atoi(message.substr(tag_id_pos + 4).c_str());
ss.str("");
ss << "Tag id: " << tag_id;
vpDisplay::displayText(I, (int)(bbox.getTop() - 10), (int)bbox.getLeft(), ss.str(), vpColor::red);
}
for (size_t j = 0; j < p.size(); j++) {
std::ostringstream number;
number << j;
vpDisplay::displayText(I, p[j] + vpImagePoint(15, 5), number.str(), vpColor::blue);
}
}
vpDisplay::displayText(I, 20, 20, "Click to display tag poses", vpColor::red);
for (size_t i = 0; i < cMo_vec.size(); i++) {
vpDisplay::displayFrame(I, cMo_vec[i], cam, tagSize / 2, vpColor::none, 3);
}
vpDisplay::displayText(I, 20, 20, "Click to quit.", vpColor::red);
} catch (const vpException &e) {
std::cerr << "Catch an exception: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
#else
(void)argc;
(void)argv;
return 0;
#endif
}

The default behavior is to detect 36h11 marker in AprilTag.pgm image, but –tag_family <family> option allows considering other tags. To see which are the options, just run:

$ ./tutorial-apriltag-detector --help

To detect multiple 36h11 tags in the AprilTag.pgm image that is provided just run:

$ ./tutorial-apriltag-detector

You will get the following result:

After a user click in the image, you will get the following image where the frames correspond to the 3D pose of each tag.

Now we explain the main lines of the source.

First we have to include the header corresponding to vpDetectorAprilTag class that allows detecting one or multiple tags.

#include <visp3/detection/vpDetectorAprilTag.h>

Then in the main() function before going further we need to check if ViSP was built with AprilTag 3rd party. We also check if ViSP is able to display images using either X11, or the Graphical Device Interface (GDI) under Windows, or OpenCV.

#if defined(VISP_HAVE_APRILTAG) && (defined(VISP_HAVE_X11) || defined(VISP_HAVE_GDI) || defined(VISP_HAVE_OPENCV))

After reading the input image AprilTag.pgm and the creation of a display device in order to visualize the image, a vpDetectorAprilTag detector is constructed with the requested family tag.

vpDetectorAprilTag detector(tagFamily);

Then we are applying some settings. There is especially vpDetectorAprilTag::setAprilTagQuadDecimate() function that could be used to decimate the input image in order to speed-up the detection.

detector.setAprilTagQuadDecimate(quad_decimate);
detector.setAprilTagPoseEstimationMethod(poseEstimationMethod);
detector.setAprilTagNbThreads(nThreads);
detector.setDisplayTag(display_tag, color_id < 0 ? vpColor::none : vpColor::getColor(color_id), thickness);
detector.setZAlignedWithCameraAxis(z_aligned);

We are now ready to detect any 36h11 tags in the image. There is the vpDetectorAprilTag::detect(const vpImage<unsigned char> &) function that detects any tags in the image, but since here we want also to estimate the 3D pose of the tags, we call rather vpDetectorAprilTag::detect(const vpImage<unsigned char> &, const double, const vpCameraParameters &, std::vector<vpHomogeneousMatrix> &) that returns the pose of each tag as a vector of vpHomogeneousMatrix in cMo_vec variable.

std::vector<vpHomogeneousMatrix> cMo_vec;
detector.detect(I, tagSize, cam, cMo_vec);

If one or more tags are detected, we can retrieve the number of detected tags in order to create a for loop over the tags.

for (size_t i = 0; i < detector.getNbObjects(); i++) {

For each tag, we can then get the location of the 4 points that define the polygon that contains the tag and the corresponding bounding box.

std::vector<vpImagePoint> p = detector.getPolygon(i);
vpRect bbox = detector.getBBox(i);

And finally, we are also able to get the tag id by calling vpDetectorAprilTag::getMessage() and parsing the returned message.

std::string message = detector.getMessage(i);
std::size_t tag_id_pos = message.find("id: ");
if (tag_id_pos != std::string::npos) {
int tag_id = atoi(message.substr(tag_id_pos + 4).c_str());
ss.str("");
ss << "Tag id: " << tag_id;
vpDisplay::displayText(I, (int)(bbox.getTop() - 10), (int)bbox.getLeft(), ss.str(), vpColor::red);
}

Next in the code we display the 3D pose of each tag as a RGB frame.

for (size_t i = 0; i < cMo_vec.size(); i++) {
vpDisplay::displayFrame(I, cMo_vec[i], cam, tagSize / 2, vpColor::none, 3);
}
Note
  • To get absolute pose (not relative to a scale factor), you have to provide the real size of the marker (length of a marker side).
  • To calibrate your camera, you can follow this tutorial: Tutorial: Camera intrinsic calibration

AprilTag detection and pose estimation (live camera)

This other example also available in tutorial-apriltag-detector-live.cpp shows how to couple the AprilTag detector to an image grabber in order to detect tags on each new image acquired by a camera connected to your computer.

#include <visp3/core/vpConfig.h>
#ifdef VISP_HAVE_MODULE_SENSOR
#include <visp3/sensor/vpV4l2Grabber.h>
#include <visp3/sensor/vp1394CMUGrabber.h>
#include <visp3/sensor/vp1394TwoGrabber.h>
#include <visp3/sensor/vpFlyCaptureGrabber.h>
#include <visp3/sensor/vpRealSense2.h>
#endif
#include <visp3/detection/vpDetectorAprilTag.h>
#include <visp3/gui/vpDisplayGDI.h>
#include <visp3/gui/vpDisplayOpenCV.h>
#include <visp3/gui/vpDisplayX.h>
#include <visp3/core/vpXmlParserCamera.h>
//#undef VISP_HAVE_V4L2
//#undef VISP_HAVE_DC1394
//#undef VISP_HAVE_CMU1394
//#undef VISP_HAVE_FLYCAPTURE
//#undef VISP_HAVE_REALSENSE2
//#undef VISP_HAVE_OPENCV
int main(int argc, const char **argv)
{
#if defined(VISP_HAVE_APRILTAG) && \
(defined(VISP_HAVE_V4L2) || defined(VISP_HAVE_DC1394) || defined(VISP_HAVE_CMU1394) || (VISP_HAVE_OPENCV_VERSION >= 0x020100) || \
defined(VISP_HAVE_FLYCAPTURE) || defined(VISP_HAVE_REALSENSE2) )
int opt_device = 0; // For OpenCV and V4l2 grabber to set the camera device
double tagSize = 0.053;
float quad_decimate = 1.0;
int nThreads = 1;
std::string intrinsic_file = "";
std::string camera_name = "";
bool display_tag = false;
int color_id = -1;
unsigned int thickness = 2;
bool align_frame = false;
#if !(defined(VISP_HAVE_X11) || defined(VISP_HAVE_GDI) || defined(VISP_HAVE_OPENCV))
bool display_off = true;
std::cout << "Warning: There is no 3rd party (X11, GDI or openCV) to dislay images..." << std::endl;
#else
bool display_off = false;
#endif
for (int i = 1; i < argc; i++) {
if (std::string(argv[i]) == "--pose_method" && i + 1 < argc) {
poseEstimationMethod = (vpDetectorAprilTag::vpPoseEstimationMethod)atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--tag_size" && i + 1 < argc) {
tagSize = atof(argv[i + 1]);
} else if (std::string(argv[i]) == "--camera_device" && i + 1 < argc) {
opt_device = atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--quad_decimate" && i + 1 < argc) {
quad_decimate = (float)atof(argv[i + 1]);
} else if (std::string(argv[i]) == "--nthreads" && i + 1 < argc) {
nThreads = atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--intrinsic" && i + 1 < argc) {
intrinsic_file = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--camera_name" && i + 1 < argc) {
camera_name = std::string(argv[i + 1]);
} else if (std::string(argv[i]) == "--display_tag") {
display_tag = true;
} else if (std::string(argv[i]) == "--display_off") {
display_off = true;
} else if (std::string(argv[i]) == "--color" && i + 1 < argc) {
color_id = atoi(argv[i+1]);
} else if (std::string(argv[i]) == "--thickness" && i + 1 < argc) {
thickness = (unsigned int) atoi(argv[i+1]);
} else if (std::string(argv[i]) == "--tag_family" && i + 1 < argc) {
tagFamily = (vpDetectorAprilTag::vpAprilTagFamily)atoi(argv[i + 1]);
} else if (std::string(argv[i]) == "--z_aligned") {
align_frame = true;
}
else if (std::string(argv[i]) == "--help" || std::string(argv[i]) == "-h") {
std::cout << "Usage: " << argv[0]
<< " [--camera_device <camera device> (default: 0)]"
<< " [--tag_size <tag_size in m> (default: 0.053)]"
" [--quad_decimate <quad_decimate> (default: 1)]"
" [--nthreads <nb> (default: 1)]"
" [--intrinsic <intrinsic file> (default: empty)]"
" [--camera_name <camera name> (default: empty)]"
" [--pose_method <method> (0: HOMOGRAPHY, 1: HOMOGRAPHY_VIRTUAL_VS, "
" 2: DEMENTHON_VIRTUAL_VS, 3: LAGRANGE_VIRTUAL_VS, "
" 4: BEST_RESIDUAL_VIRTUAL_VS, 5: HOMOGRAPHY_ORTHOGONAL_ITERATION) (default: 0)]"
" [--tag_family <family> (0: TAG_36h11, 1: TAG_36h10 (DEPRECATED), 2: TAG_36ARTOOLKIT (DEPRECATED),"
" 3: TAG_25h9, 4: TAG_25h7 (DEPRECATED), 5: TAG_16h5, 6: TAG_CIRCLE21h7, 7: TAG_CIRCLE49h12,"
" 8: TAG_CUSTOM48h12, 9: TAG_STANDARD41h12, 10: TAG_STANDARD52h13) (default: 0)]"
" [--display_tag] [--z_aligned]";
#if (defined(VISP_HAVE_X11) || defined(VISP_HAVE_GDI) || defined(VISP_HAVE_OPENCV))
std::cout << " [--display_off] [--color <color id>] [--thickness <line thickness>]";
#endif
std::cout << " [--help]" << std::endl;
return EXIT_SUCCESS;
}
}
try {
cam.initPersProjWithoutDistortion(615.1674805, 615.1675415, 312.1889954, 243.4373779);
#ifdef VISP_HAVE_PUGIXML
if (!intrinsic_file.empty() && !camera_name.empty())
parser.parse(cam, intrinsic_file, camera_name, vpCameraParameters::perspectiveProjWithoutDistortion);
#endif
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
std::cout << "Use Video 4 Linux grabber on device " << device.str() << std::endl;
g.setDevice(device.str());
g.setScale(1);
g.open(I);
#elif defined(VISP_HAVE_DC1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use DC1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_CMU1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use CMU1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_FLYCAPTURE)
(void)opt_device; // To avoid non used warning
std::cout << "Use FlyCapture grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_REALSENSE2)
(void)opt_device; // To avoid non used warning
std::cout << "Use Realsense 2 grabber" << std::endl;
rs2::config config;
config.disable_stream(RS2_STREAM_DEPTH);
config.disable_stream(RS2_STREAM_INFRARED);
config.enable_stream(RS2_STREAM_COLOR, 640, 480, RS2_FORMAT_RGBA8, 30);
g.open(config);
g.acquire(I);
std::cout << "Read camera parameters from Realsense device" << std::endl;
#elif defined(VISP_HAVE_OPENCV)
std::cout << "Use OpenCV grabber on device " << opt_device << std::endl;
cv::VideoCapture g(opt_device); // Open the default camera
if (!g.isOpened()) { // Check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return -1;
}
cv::Mat frame;
g >> frame; // get a new frame from camera
#endif
std::cout << cam << std::endl;
std::cout << "poseEstimationMethod: " << poseEstimationMethod << std::endl;
std::cout << "tagFamily: " << tagFamily << std::endl;
std::cout << "nThreads : " << nThreads << std::endl;
std::cout << "Z aligned: " << align_frame << std::endl;
vpDisplay *d = NULL;
if (! display_off) {
#ifdef VISP_HAVE_X11
d = new vpDisplayX(I);
#elif defined(VISP_HAVE_GDI)
d = new vpDisplayGDI(I);
#elif defined(VISP_HAVE_OPENCV)
d = new vpDisplayOpenCV(I);
#endif
}
vpDetectorAprilTag detector(tagFamily);
detector.setAprilTagQuadDecimate(quad_decimate);
detector.setAprilTagPoseEstimationMethod(poseEstimationMethod);
detector.setAprilTagNbThreads(nThreads);
detector.setDisplayTag(display_tag, color_id < 0 ? vpColor::none : vpColor::getColor(color_id), thickness);
detector.setZAlignedWithCameraAxis(align_frame);
std::vector<double> time_vec;
for (;;) {
#if defined(VISP_HAVE_V4L2) || defined(VISP_HAVE_DC1394) || defined(VISP_HAVE_CMU1394) || defined(VISP_HAVE_FLYCAPTURE) || defined(VISP_HAVE_REALSENSE2)
g.acquire(I);
#elif defined(VISP_HAVE_OPENCV)
g >> frame;
#endif
double t = vpTime::measureTimeMs();
std::vector<vpHomogeneousMatrix> cMo_vec;
detector.detect(I, tagSize, cam, cMo_vec);
time_vec.push_back(t);
std::stringstream ss;
ss << "Detection time: " << t << " ms for " << detector.getNbObjects() << " tags";
vpDisplay::displayText(I, 40, 20, ss.str(), vpColor::red);
for (size_t i = 0; i < cMo_vec.size(); i++) {
vpDisplay::displayFrame(I, cMo_vec[i], cam, tagSize / 2, vpColor::none, 3);
}
vpDisplay::displayText(I, 20, 20, "Click to quit.", vpColor::red);
if (vpDisplay::getClick(I, false))
break;
}
std::cout << "Benchmark computation time" << std::endl;
std::cout << "Mean / Median / Std: " << vpMath::getMean(time_vec) << " ms"
<< " ; " << vpMath::getMedian(time_vec) << " ms"
<< " ; " << vpMath::getStdev(time_vec) << " ms" << std::endl;
if (! display_off)
delete d;
} catch (const vpException &e) {
std::cerr << "Catch an exception: " << e.getMessage() << std::endl;
}
return EXIT_SUCCESS;
#else
(void)argc;
(void)argv;
#ifndef VISP_HAVE_APRILTAG
std::cout << "Enable Apriltag support, configure and build ViSP to run this tutorial" << std::endl;
#else
std::cout << "Install a 3rd party dedicated to frame grabbing (dc1394, cmu1394, v4l2, OpenCV, FlyCapture, Realsense2), configure and build ViSP again to use this example" << std::endl;
#endif
#endif
return EXIT_SUCCESS;
}

The usage of this example is similar to the previous one:

  • with option –tag_family you select the kind of tag that you want to detect.
  • if more than one camera is connected to you computer, with option –input you can select which camera to use. The first camera that is found has number 0.

To detect 36h11 tags on images acquired by a second camera connected to your computer use:

$ ./tutorial-apriltag-detector-live --tag_family 0 --input 1

The source code of this example is very similar to the previous one except that here we use camera framegrabber devices (see Tutorial: Image frame grabbing). Two different grabber may be used:

  • If ViSP was built with Video For Linux (V4L2) support available for example on Fedora or Ubuntu distribution, VISP_HAVE_V4L2 macro is defined. In that case, images coming from an USB camera are acquired using vpV4l2Grabber class.
  • If ViSP wasn't built with V4L2 support but with OpenCV, we use cv::VideoCapture class to grab the images. Notice that when images are acquired with OpenCV there is an additional conversion from cv::Mat to vpImage.
#if defined(VISP_HAVE_V4L2)
std::ostringstream device;
device << "/dev/video" << opt_device;
std::cout << "Use Video 4 Linux grabber on device " << device.str() << std::endl;
g.setDevice(device.str());
g.setScale(1);
g.open(I);
#elif defined(VISP_HAVE_DC1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use DC1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_CMU1394)
(void)opt_device; // To avoid non used warning
std::cout << "Use CMU1394 grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_FLYCAPTURE)
(void)opt_device; // To avoid non used warning
std::cout << "Use FlyCapture grabber" << std::endl;
g.open(I);
#elif defined(VISP_HAVE_REALSENSE2)
(void)opt_device; // To avoid non used warning
std::cout << "Use Realsense 2 grabber" << std::endl;
rs2::config config;
config.disable_stream(RS2_STREAM_DEPTH);
config.disable_stream(RS2_STREAM_INFRARED);
config.enable_stream(RS2_STREAM_COLOR, 640, 480, RS2_FORMAT_RGBA8, 30);
g.open(config);
g.acquire(I);
std::cout << "Read camera parameters from Realsense device" << std::endl;
#elif defined(VISP_HAVE_OPENCV)
std::cout << "Use OpenCV grabber on device " << opt_device << std::endl;
cv::VideoCapture g(opt_device); // Open the default camera
if (!g.isOpened()) { // Check if we succeeded
std::cout << "Failed to open the camera" << std::endl;
return -1;
}
cv::Mat frame;
g >> frame; // get a new frame from camera
#endif

Then in the while loop, at each iteration we acquire a new image

#if defined(VISP_HAVE_V4L2) || defined(VISP_HAVE_DC1394) || defined(VISP_HAVE_CMU1394) || defined(VISP_HAVE_FLYCAPTURE) || defined(VISP_HAVE_REALSENSE2)
g.acquire(I);
#elif defined(VISP_HAVE_OPENCV)
g >> frame;
#endif

This new image is then given as input to the AprilTag detector.

Improve pose estimation accuracy using a RGB-D camera

When the tag is small in the image or when the tag corners location are extracted poorly, you may experience z-axis flipping if you analyse carefully the pose of the tag. The following images illustrate this behavior.

These are 3 successive images acquired by the Realsense D435 color camera with tag pose in overlay. In the image in the middle, you can see the z-axis flipping phenomena where z-axis in blue is not oriented as expected.

The pose is computed from the 4 tag corners location, assuming a planar object, and this behavior is inherent to the planar pose estimation ambiguity, see Dementhon.

To lift this ambiguity, we propose to use the depth map of a RGB-D sensor in order to bring additional 3D information. The pose estimation can then be seen as a 3D-3D optimization process implemented in vpPose::computePlanarObjectPoseFromRGBD(). As shown in the following images, using this method with a Realsense D435 sensor allows to overcome this ambiguity.

These are the same 3 successive images acquired by the Realsense color camera with resulting pose in overlay, but here we used depth map aligned with color image given as input to vpPose::computePlanarObjectPoseFromRGBD(). As you can see in the 3 images z-axis flipping phenomena doesn't occur any more.

An example that shows how to use this function is given in tutorial-apriltag-detector-live-rgbd-realsense.cpp. In this example we are using a Realsense D435 or equivalent RGB-D sensor, but it could be adapted to any other RGB-D sensor as long as you can align depth map and color image.

std::vector<std::vector<vpImagePoint> > tags_corners = detector.getPolygon();
std::vector<int> tags_id = detector.getTagsId();
std::map<int, double> tags_size;
tags_size[-1] = tagSize; // Default tag size
std::vector<std::vector<vpPoint> > tags_points3d = detector.getTagsPoints3D(tags_id, tags_size);
for (size_t i = 0; i < tags_corners.size(); i++) {
double confidence_index;
if (vpPose::computePlanarObjectPoseFromRGBD(depthMap, tags_corners[i], cam, tags_points3d[i], cMo, &confidence_index)) {
if (confidence_index > 0.5) {
vpDisplay::displayFrame(I_color2, cMo, cam, tagSize/2, vpColor::none, 3);
}
else if (confidence_index > 0.25) {
vpDisplay::displayFrame(I_color2, cMo, cam, tagSize/2, vpColor::orange, 3);
}
else {
vpDisplay::displayFrame(I_color2, cMo, cam, tagSize/2, vpColor::red, 3);
}
std::stringstream ss;
ss << "Tag id " << tags_id[i] << " confidence: " << confidence_index;
vpDisplay::displayText(I_color2, 35 + i*15, 20, ss.str(), vpColor::red);
}
}

Next tutorial

You are now ready to see the Tutorial: Bar code detection, that illustrates how to detect QR codes in an image.

vpDisplayX
Use the X11 console to display images on unix-like OS. Thus to enable this class X11 should be instal...
Definition: vpDisplayX.h:149
vpV4l2Grabber::open
void open(vpImage< unsigned char > &I)
Definition: vpV4l2Grabber.cpp:409
vpDetectorAprilTag::TAG_36h11
AprilTag 36h11 pattern (recommended)
Definition: vpDetectorAprilTag.h:219
vpDetectorAprilTag::HOMOGRAPHY_VIRTUAL_VS
Definition: vpDetectorAprilTag.h:234
vpMath::getMedian
static double getMedian(const std::vector< double > &v)
Definition: vpMath.cpp:221
vpDetectorBase::getBBox
vpRect getBBox(size_t i) const
Definition: vpDetectorBase.cpp:91
vpColor::orange
static const vpColor orange
Definition: vpColor.h:188
vpV4l2Grabber::setDevice
void setDevice(const std::string &devname)
Definition: vpV4l2Grabber.h:284
vpDetectorAprilTag::setAprilTagNbThreads
void setAprilTagNbThreads(int nThreads)
Definition: vpDetectorAprilTag.cpp:960
vpMath::getStdev
static double getStdev(const std::vector< double > &v, bool useBesselCorrection=false)
Definition: vpMath.cpp:251
vpImageConvert::convert
static void convert(const vpImage< unsigned char > &src, vpImage< vpRGBa > &dest)
Definition: vpImageConvert.cpp:78
vpCameraParameters
Generic class defining intrinsic camera parameters.
Definition: vpCameraParameters.h:232
vpCameraParameters::perspectiveProjWithoutDistortion
Definition: vpCameraParameters.h:239
vp1394CMUGrabber::open
void open(vpImage< unsigned char > &I)
Definition: vp1394CMUGrabber.cpp:186
vpColor::getColor
static vpColor getColor(const unsigned int &i)
Definition: vpColor.h:247
vpRect::getTop
double getTop() const
Definition: vpRect.h:191
vpDetectorBase::getPolygon
std::vector< std::vector< vpImagePoint > > & getPolygon()
Definition: vpDetectorBase.h:120
vpImageIo::read
static void read(vpImage< unsigned char > &I, const std::string &filename)
Definition: vpImageIo.cpp:242
vpXmlParserCamera
XML parser to load and save intrinsic camera parameters.
Definition: vpXmlParserCamera.h:170
vp1394CMUGrabber
Firewire cameras video capture based on CMU 1394 Digital Camera SDK.
Definition: vp1394CMUGrabber.h:150
vpDetectorAprilTag::vpPoseEstimationMethod
vpPoseEstimationMethod
Definition: vpDetectorAprilTag.h:232
vpDetectorAprilTag::setAprilTagPoseEstimationMethod
void setAprilTagPoseEstimationMethod(const vpPoseEstimationMethod &poseEstimationMethod)
Definition: vpDetectorAprilTag.cpp:972
vpRealSense2::getCameraParameters
vpCameraParameters getCameraParameters(const rs2_stream &stream, vpCameraParameters::vpCameraParametersProjType type=vpCameraParameters::perspectiveProjWithDistortion) const
Definition: vpRealSense2.cpp:400
vpDisplayGDI
Display for windows using GDI (available on any windows 32 platform).
Definition: vpDisplayGDI.h:127
vpDetectorBase::getNbObjects
size_t getNbObjects() const
Definition: vpDetectorBase.h:115
vpDisplay::displayRectangle
static void displayRectangle(const vpImage< unsigned char > &I, const vpImagePoint &topLeft, unsigned int width, unsigned int height, const vpColor &color, bool fill=false, unsigned int thickness=1)
Definition: vpDisplay_uchar.cpp:547
vpMath::getMean
static double getMean(const std::vector< double > &v)
Definition: vpMath.cpp:201
vpDetectorAprilTag
Definition: vpDetectorAprilTag.h:215
vpRect::getLeft
double getLeft() const
Definition: vpRect.h:172
vpDisplay::displayFrame
static void displayFrame(const vpImage< unsigned char > &I, const vpHomogeneousMatrix &cMo, const vpCameraParameters &cam, double size, const vpColor &color=vpColor::none, unsigned int thickness=1, const vpImagePoint &offset=vpImagePoint(0, 0))
Definition: vpDisplay_uchar.cpp:383
vpRealSense2::acquire
void acquire(vpImage< unsigned char > &grey)
Definition: vpRealSense2.cpp:87
vpRealSense2
Definition: vpRealSense2.h:281
vpDetectorAprilTag::detect
bool detect(const vpImage< unsigned char > &I)
Definition: vpDetectorAprilTag.cpp:761
vpDisplayOpenCV
The vpDisplayOpenCV allows to display image using the OpenCV library. Thus to enable this class OpenC...
Definition: vpDisplayOpenCV.h:140
vpTime::measureTimeMs
VISP_EXPORT double measureTimeMs()
Definition: vpTime.cpp:125
vpColor::green
static const vpColor green
Definition: vpColor.h:181
vpDisplay::display
static void display(const vpImage< unsigned char > &I)
Definition: vpDisplay_uchar.cpp:739
vpDetectorAprilTag::setAprilTagQuadDecimate
void setAprilTagQuadDecimate(float quadDecimate)
Definition: vpDetectorAprilTag.cpp:990
vpDisplay::displayText
static void displayText(const vpImage< unsigned char > &I, const vpImagePoint &ip, const std::string &s, const vpColor &color)
Definition: vpDisplay_uchar.cpp:663
vpDetectorAprilTag::setDisplayTag
void setDisplayTag(bool display, const vpColor &color=vpColor::none, unsigned int thickness=2)
Definition: vpDetectorAprilTag.h:282
vpXmlParserCamera::parse
int parse(vpCameraParameters &cam, const std::string &filename, const std::string &camera_name, const vpCameraParameters::vpCameraParametersProjType &projModel, unsigned int image_width=0, unsigned int image_height=0)
Definition: vpXmlParserCamera.cpp:966
vpFlyCaptureGrabber
Definition: vpFlyCaptureGrabber.h:154
vpDetectorAprilTag::vpAprilTagFamily
vpAprilTagFamily
Definition: vpDetectorAprilTag.h:218
vp1394TwoGrabber
Class for firewire ieee1394 video devices using libdc1394-2.x api.
Definition: vp1394TwoGrabber.h:183
vpImagePoint
Class that defines a 2D point in an image. This class is useful for image processing and stores only ...
Definition: vpImagePoint.h:87
vpDetectorAprilTag::setZAlignedWithCameraAxis
void setZAlignedWithCameraAxis(bool zAlignedWithCameraFrame)
Definition: vpDetectorAprilTag.cpp:1062
vpColor::none
static const vpColor none
Definition: vpColor.h:190
vpException::getMessage
const char * getMessage(void) const
Definition: vpException.cpp:89
vpCameraParameters::initPersProjWithoutDistortion
void initPersProjWithoutDistortion(double px, double py, double u0, double v0)
Definition: vpCameraParameters.cpp:181
vpPose::computePlanarObjectPoseFromRGBD
static bool computePlanarObjectPoseFromRGBD(const vpImage< float > &depthMap, const std::vector< vpImagePoint > &corners, const vpCameraParameters &colorIntrinsics, const std::vector< vpPoint > &point3d, vpHomogeneousMatrix &cMo, double *confidence_index=NULL)
Definition: vpPoseRGBD.cpp:250
vpV4l2Grabber
Class that is a wrapper over the Video4Linux2 (V4L2) driver.
Definition: vpV4l2Grabber.h:133
vpDetectorAprilTag::getTagsPoints3D
std::vector< std::vector< vpPoint > > getTagsPoints3D(const std::vector< int > &tagsId, const std::map< int, double > &tagsSize) const
Definition: vpDetectorAprilTag.cpp:865
vpColor::blue
static const vpColor blue
Definition: vpColor.h:184
vpDetectorAprilTag::getTagsId
std::vector< int > getTagsId() const
Definition: vpDetectorAprilTag.cpp:920
vpDisplay::flush
static void flush(const vpImage< unsigned char > &I)
Definition: vpDisplay_uchar.cpp:715
vpDisplay::displayCross
static void displayCross(const vpImage< unsigned char > &I, const vpImagePoint &ip, unsigned int size, const vpColor &color, unsigned int thickness=1)
Definition: vpDisplay_uchar.cpp:179
vpImage< unsigned char >
vp1394TwoGrabber::open
void open(vpImage< unsigned char > &I)
Definition: vp1394TwoGrabber.cpp:2065
vpRealSense2::open
void open(const rs2::config &cfg=rs2::config())
Definition: vpRealSense2.cpp:837
vpDisplay::getClick
static bool getClick(const vpImage< unsigned char > &I, bool blocking=true)
Definition: vpDisplay_uchar.cpp:764
vpV4l2Grabber::setScale
void setScale(unsigned scale=vpV4l2Grabber::DEFAULT_SCALE)
Definition: vpV4l2Grabber.cpp:386
vpHomogeneousMatrix
Implementation of an homogeneous matrix and operations on such kind of matrices.
Definition: vpHomogeneousMatrix.h:148
vpFlyCaptureGrabber::open
void open(vpImage< unsigned char > &I)
Definition: vpFlyCaptureGrabber.cpp:1211
vpDisplay
Class that defines generic functionnalities for display.
Definition: vpDisplay.h:170
vpException
error that can be emited by ViSP classes.
Definition: vpException.h:70
vpDetectorBase::getMessage
std::vector< std::string > & getMessage()
Definition: vpDetectorBase.h:105
vpColor::red
static const vpColor red
Definition: vpColor.h:178
vpRect
Defines a rectangle in the plane.
Definition: vpRect.h:77