Abstract:
The sense of space with eyes gives humans the basic ability to interpret the world. However, it is a challenging task for machines to model space in computer vision. Unlike scene reconstruction attempts to reconstruct every point of an image to correct 3D position, the sense of space aims to specify the original empty area of the scene. With this idea we can also distinguish the object within or outside the space.
This thesis provides an intuitive way to inference the space of a scene using stereo cameras. We first segmented the ground out of the image by adaptively learning the ground model in the image. We then used the convex hull to approximate the scene space. Objects within the scene can also be detected with the stereo cameras. Finally, we organized the scene space and the objects within the scene into a graphical model, and then used particle filters to approximate the solution.
Experiments were conducted to test the accuracy of the ground segmentation and the precision and recall of object detection within the scene. The results showed promising ground segmentation accuracy in an indoor environment, and gave a visualization segmentation result in an occupancy grid map. The precision and recall of object detection was about 50percent in our system. With additional tracking of the object, the recall could improve approximately 5 percent. Last, we also showed the possibility to improve the human detection result; many wrong detections can be filtered by our system.
We show a novel way to interpret the space of scene using simple convex contours, and the possibility to detect the object within the scene without a classifier. The result can be considered as prior knowledge for further image tasks, e.g. obstacle avoidance or object recognition.