The Vision Thing, continued

What Robots Can Do

    Much of Bajcsy’s own research at GRASP has focused on connecting machine perception with action. To understand what that means for robots, consider the problem of computer vision: How do robots look at things? Sixteen years ago, Bajcsy had the idea that a robot’s visual perception, its knowledge of the environment, would be improved if it could actively adapt to the scene around it. Bajcsy offers this analogy with human vision: "We do not just see; we look. Our pupils adjust to the level of illumination; our eyes bring the world into sharp focus; we move our heads or change our position to get a better view of something; and sometimes we even put on spectacles." Percepts–those impressions of objects that we collect with our senses–don’t just fall onto our eyes like rain falls onto the ground, Bajcsy points out. The same should hold true for a sensor on a robot.

Photo by Candace diCarlo

    How would that work in practice? Think of a camera mounted on a robot. The robot moves its head or its body–there’s the action–to help the camera collect information–that’s the perception. In turn the camera’s visual information is fed to a computer that controls the robot’s motion: a tidy feedback loop. The overriding idea is that actively changing a sensor’s parameters (like the camera’s position or focus) helps the robot adapt to an uncertain environment.
That link between perception and action was a new way of looking at computer vision in the early 1980s. In 1983 Bajcsy worked her theory into a new research paradigm that she called "active perception." The concept has become Bajcsy’s signature in the robotics community and just two years ago was a factor in her election to the National Academy of Engineering.
In recent years, Bajcsy and her co-workers in the GRASP lab have taken the concept of computer vision a step further with another question: How can you use computer vision to gather information about the objects in a three-dimensional world? The answer to this question is making reality out of "tele-immersion," a new technology that seems the fabric of fantasy. Using tele-immersion, you could have a meeting with people scattered across the country or across the globe, and each of you would feel as if you were in the same physical space. You’d be in a shared, simulated environment, but the person you "see" across the room would look real, not like an animated image. Real enough to make out the textures and vagaries of hair and skin and clothes. Real enough to touch.
The principle that will make this technology work is called "stereo-reconstruction." In each of the remote locations, a set of video cameras–at least two–takes pictures of the people and the objects in the room, capturing movements and measuring distances from the camera. The next step is putting together a three-dimensional reconstruction of the scene, incorporating changes as quickly as they happen. For that the GRASP tele-immersion group is refining a special algorithm that recovers all of the image information from the cameras. Then all the reconstructions can be projected into a virtual world. Or the 3-D reconstructions can be integrated into a real environment for a mind-bending taste of mixed reality. (Visit the lab’s Web site at www.cis.upenn.edu/ ~grasp/research.htm for a sample.)
Dr. Kostas Daniilidis, Bajcsy’s colleague in this project, says engineers could use tele-immersion in the collaborative design of mechanical parts. He sees other applications as well–for example, in the entertainment industry: A dancer in Chicago and a dancer in New York could train together in a virtual ballroom. But those are in the future. For now, tele-immersion is seen as a test. The GRASP lab is part of the National Tele-Immersion Initiative, an academic consortium with plans to use tele-immersion as the toughest available technical challenge for Internet2, the university-led research and education network meant to develop pre-commercial technologies and enhance the federal government’s Next Generation Internet. One reason tele-immersion will be such a good test of Internet2 is that it draws on a very wide bandwidth (data-transfer rate) that current Internet technology can’t support.
The reconstruction issue comes up again in another project of Bajcsy’s, this one on cooperative systems, which looks at how robots interact with each other and with people. Consider, say, a team of robots marching through an unknown environment. "The question is how much autonomy you want to give a system when the different members are supposed to cooperate," Bajcsy says. "It’s like in the Boy Scouts: Meet me at place X. How much do you need to communicate to get there? And what if one of your members gets stuck? What do you do then?"
Bajcsy has developed some mathematical models of this cooperation and autonomy and implemented them in the robots. But smoothing the robots’ paths to point X calls for a very accurate reconstruction of the environment by their sensors. And the familiar duet of action and perception comes into play here too. It turns out that different "perception strategies" will apply in different dynamic situations; in other words, the robot has to figure out the best way to avoid running into a wall.
Using motion parallax–the apparent change in an object’s position when you view it from two different spots–works pretty well in open spaces. For a robot making a turn, stereo–the use of two cameras that work like a pair of eyes–is a better way to keep from tripping over obstacles. Our own biological systems work in much the same way. The challenge with a robot, says Bajcsy, is designing an automatic switch that, depending on the changing environment, will select the best perception strategy to help the robot reach its goal. "Once we have good perception," she adds, "the control of the robots is easy."
Here’s one example of what a cooperative system has actually done in the GRASP lab: two mobile robots work together to carry a large object–a box or a pipe, say–and move across a room. As they move, first one robot leads the way; then they change their configuration and move side by side; then one robot takes the lead again. And all the while they hold the box together and traverse obstacles in their path.
Understanding active cooperation like this is important, says Bajcsy, because it will help us understand what makes intelligent behavior. When computer scientists talk about making intelligent machines, they use the term "artificial intelligence." Bajcsy views artificial intelligence as a discipline that tries to understand what intelligence really is and includes "a whole gamut of activities," like perception, action, reasoning and learning. AI does not mean copying humans, she believes: "Machines can do some activities better than humans. They can multiply faster, do mathematical operations, remember more. But humans are more flexible, and that’s the puzzle. How do we understand this flexibility?
Bajcsy’s mentor at Stanford, John McCarthy, agrees that you can’t point to one thing that will answer the question of whether or not a machine is intelligent. Though we define intelligence by relating it to human capabilities, "You have to ask which aspects of intelligence have been understood well enough to put into computer programs," he says. "AI researchers are free to use methods that are not observed in people or that involve much more computing than people can do."
Most of us would probably agree that a machine can be called intelligent at some level. But there’s another element of human intelligence called consciousness. Can machines have that? That’s a murky area, says Bajcsy, and one that the philosopher Daniel Dennett, author of Consciousness Explained, among other books, would call a frontier of science.
AI was built on the premise that we can model and implement whatever is rational behavior, she says. Now researchers are making progress toward trying to understand the emotional aspect. And that’s where consciousness comes in: "We all know that your emotional state can influence your rational behavior. But how do you put it into mathematical or formal terms; that’s really the question. When you cry or smile, we can measure your heartbeat, your perspiration, your temperature. All the lie detectors, for example, are based on this. So with emotions we have all these indirect observables. But are they the right observables? That is really the issue–that we cannot crawl inside you and measure what is going on."
McCarthy views consciousness from a different angle. Conscious machines, he says, will need the power of self-observation: "Computer programs will have to be able to look at themselves and evaluate their own intentions. They’ll have to decide whether their line of thinking is successful or unsuccessful, the way people do."


previous page | next page  

July/August Contents | Gazette Home

Copyright 1999 The Pennsylvania Gazette Last modified 7/7/99