Researchers
at Carnegie Mellon
University have developed a way for computer vision systems
to decipher
outdoor images more clearly by providing these systems a
better understanding of the physical constraints of a particular
scene.
Computer
vision systems analyze outdoor scenes through the use of virtual
building blocks in order to create three-dimensional images
based on mass and volume. The problem is that these systems can
struggle to analyze single images, and can't always understand
certain objects or space in outdoor images especially. Computer
vision systems were simply programmed to identify objects like
buildings and cars, but this does not help the computer understand
the geometry of the image, such as where walkable surfaces are
located. Other approaches to helping computers better decipher
outdoor images include mapping planar surfaces of an image to create
a pop-up book-type image, but the results turned out to be physically
impossible and off-scale.
Abhinav
Gupta, a post-doctoral fellow in CMU's Robotics Institute; Alexei A.
Efros, associate professor of robotics and computer science at CMU;
and Martial Hebert, robotics professor at CMU, have developed a new
system to help these computers gain a better understanding of outdoor
images.
To
do this, researchers developed a method that allows the systems to
break the image down into various parts which correspond to objects
within the image. Then, the sky and ground are identified while other
parts of the image are assigned geometric shapes and are classified
as light or heavy in weight. From there, the
computer reconstructs the image with virtual blocks using
correct geometrical shapes and its knowledge of the weight of each
object.
According
to Gupta, this qualitative volumetric system is "better than 70
percent accurate," but it is so new that no evaluation
methodologies or datasets exist for it quite yet.
"When
people look at a photo, they understand that the scene is
geometrically constrained," said Gupta. "We know that the
buildings aren't infinitely thin, that most towers do not lean, and
that heavy objects require support. It might not be possible to know
the three-dimensional size and shape of all the objects in the photo,
but we can narrow the possibilities. In the same way, if a computer
can replicate an image, block by block, it can better understand the
scene."
Gupta
and his fellow researchers hope that helping computers understand
this level of detail will serve many purposes, such as helping
robots read their surroundings and know where to walk.
This
research is being presented by Gupta at the European
Conference on Computer Vision from September 5-11 in Crete,
Greece.