How can code-driven robots better interact with humans? Recently, Brown University's Human-Robot Robotics Lab tested a new AI-enabled system that aims to make robots understand human commands in everyday language and perform tasks accurately.
The key point of this research is that they have developed a new system that enables robots to perform complex tasks without the need for thousands of hours of data training. While traditional machine training requires a large number of examples to show the robot how to understand and execute instructions in different places, this new system allows the robot to operate in different environments by providing a detailed map of the area.
The researchers describe the role of the large language model embedded in their system to enable robots to understand and perform tasks by breaking down instructions without large amounts of training data. The system is not only able to accept natural language instructions, it is also able to calculate the logical jumps the robot might need based on the context of the environment, which makes the instructions much simpler and clearer, including what the robot can do, what it cannot do, and in what order.
Stefanie Tellex, one of the principal researchers on the project and a professor of computer science at Brown University, said: "In selecting our subjects, we specifically considered a mobile robot moving around the environment, and we wanted to have a way in which the robot could understand the complex and verbal instructions that the human was giving him, like walking down Thayer Street in Providence to meet me at the coffee shop, but avoiding CVS and stopping at the bank first, and follow the instructions exactly."
If the research achieves results, it will be applied to many mobile robots in the city in the future, including drones, self-driving cars, unmanned transport vehicles, etc., you only need to use the usual way of communicating with people to interact with the robot, he can accurately understand your instructions, making the application of mobile robots in complex environments possible.
To test the system, the researchers ran simulations using OpenStreetMap in 21 cities and showed that the system performed the task accurately 80 percent of the time, a much higher accuracy rate than other similar systems, which typically only achieve about 20 percent accuracy and cannot handle complex instructions and tasks.
At the same time, the team also conducted indoor testing on the Brown University campus with Boston Dynamics' spot robot, which is considered one of the world's leading general-purpose quadruped robots, and the success of the verification on spot will facilitate the applicability of the system to robots from other manufacturers.
Jason Xinyu, a PhD student in computer science and a lead member of the research team, explains how the system works with an example.
Suppose the user tells the drone to go to the "store" on "Main Street" but to go to the "bank" first. After the instruction is entered, the software first identifies the two locations, and then the language model begins to match these abstract locations with the concrete location of the robot. At the same time, it also analyzes location metadata, such as the address or type of location, to help the system make decisions, in this case, there are several stores nearby, but only one is on Main Street, so the system knows where to go; The language model then translates the commands into linear temporal logic, which is a mathematical code and notation to express the commands; Finally, the system plugs the current mapped location into this formula, telling the robot to go to point A, but after point B.
A simulation based on OpenStreetMaps will be posted online in November, allowing users to test the system for themselves. Users can enter natural language commands on a web page to guide a simulated drone on a navigation task to help researchers fine-tune the software.
This means that an "AI+ robot" project jointly trained by the public is coming to us.