Introduction
Many solutions for robotic systems include depth sensors or lasers to avoid collisions during movement. Drones receive info about depth, and stereo image or use lasers for improved coordination.”. And the main purpose is to receive a 3D map of rooms and fast orientation: with any changes update the map. But how simple could that system be? Thus, for the experiment we chose a Monocular video without any additional information – a GoPro video of a walk through a hardware store.
What systems exist?
Several SLAM and SVO algorithms require additional packages like ROS and specialized sensors. We wanted a universal open-source system that was easy to use and flexible.
SLAM and SVO algorithms analyze their surroundings, extract key features, and build a map of their environment. They can then locate themselves and update the map in real time. SLAM operates on keyframes for high accuracy, while SVO focuses on direct image alignment for speed.
ORB-SLAM2 strikes a balance, utilizing ORB features for efficient and accurate localization and mapping. It’s widely used in real-time applications such as autonomous robots and drones . It can work with or without ROS and includes calculations for monocular, stereo, and RGB-D cameras.
Data Preparation
ORB-SLAM is a fully ROS system, ORB_SLAM3 works with intime, like online translation, but especially ORB-SLAM2’s minimum requirements are a folder with a PNG file of each frame, and a file with camera calibration and distortion parameters. Converting video to set of images could be done simply with python code:
And more interesting with camera calibration and distortion parameters, as they could give us incorrect angles of rotation for our trajectory. OpenCV library has the functionality to determine all of those parameters, using the additional special video, but our goal is to use only our videos from the store, so approximate parameters were found on the internet.
Prototype Assembly
This straightforward solution is portable across various environments, from Raspberry Pi to PC . ORB-SLAM2 is a C++ algorithm utilizing specific libraries, which may pose dependency issues and best works on Ubuntu 18.04. But it doesn’t mean that we should prepare a separate computer to use it. Docker-container with that OS allows us to have such a system in any environment. As ORB-SLAM2 in basic setup provided as a desktop application, we need to work on Xserver settings to see it in any display out of the container. For installing ORB-SLAM2 to your environment you just need GPU, camera, Linux, compatible hardware, and sufficient processing power. For the Docker option, requirements include GPU support , Docker installed, and adequate system resources.
Performance
In monocular mode, processing a 35-second video with 50 frames per second. (1752 frames in total) on a MacBook with an i5 processor and 2 dedicated cores for Docker takes approximately 7 minutes and 44 seconds (464 seconds). This translates to processing at around 3.5 frames per second, with 10,000 features extracted in each frame. The processing time scales proportionally to the number of features per frame, and that parameter could be changed in settings. To achieve “real-time” visualization after that, a 20x acceleration is applied to video examples. With fewer than 4,000 features, the system fails to construct a reliable map due to the inability to match features across frames. The average CPU load is 200%, reaching a peak of 250 %. Notably, the memory usage for processing 10,000 features per frame for 1752 frames is 850 MiB.
The system builds a point cloud with 27,000 points on the video, providing sufficient detail to discern specific features on shelves rather than just outlines.
Future of Indoor Navigation
ORB-SLAM2 has the potential to revolutionize the way big companies manage their stock spaces. By enabling drones to navigate autonomously and efficiently, ORB-SLAM2 can streamline processes, improve inventory management, and enhance worker safety. For example, drones equipped with ORB-SLAM2 can quickly and accurately scan large warehouses, identifying and tracking inventory items. This information can then be used to optimize storage layout, improve picking and packing efficiency, and reduce errors
Not limited to business applications – to check what your cat destroyed while you are not at home or to find something that you forgot to put back, to provide a virtual tour of your office or event space, to check if this shelf is right-sized for your room.