|Schedule||Speakers||Future Star Talks||Description|
|08:40-08:45||Workshop Introduction and Welcome|
|12:15-12:30||Closing remark and Panel|
Autonomous robotics recently gains attention in many areas, from the success of autonomous driving vehicles (Google Car, Tesla, for example), to flying droids, to industrial assistant robots such as Baxter robots. The ultimate goal is that the robots can learn through the interactions with the environment and the human subjects automatically, and reduce the stress of human subjects in various industrial and home style tasks.
This makes the learning a critical component of these autonomous systems. Among many learning methods, deep learning aims at learning hierarchical feature representations. With those labeled data, it is possible to train large and deep networks on big datasets. In closely related areas such as computer vision, such deep networks steadily outperformed classic approaches in recent studies (e.g., recognition in ImageNet). The continuously increasing interest in the intersection between autonomous robotics and deep learning motivates us to organize this workshop to study the learning of feature representations for robotics problems. This deep learning for autonomous robotics is a broad area, ranging from learning methods (Recurrent Neural Nets, Convolutional Neural Nets), to software and tools (notably Caffe, Theano, Torch and LightNet ), to hardware implementations with sensors (Autonomous driving technology).
We encourage researchers to formulate innovative learning theories, feature representations, and end-to-end robotics systems based on deep learning, as well as preliminary answers to open questions such as tailoring deep learning methods for various robotics tasks and the limitations of deep learning in robotics. We also encourage new theories for dealing with large scale robotic learning datasets through leveraging information based on deep learning architectures.
We expect the workshop to be of interest of researchers and research scientists from both academy and industry. Deep learning has nowadays take over many algorithms and main companies dealing with large scale learning systems are interested in a way or another. Our main goal is to encourage discussions and collaborations between both research and industry applications.
Abstract: Current methods for processing visual data tend to break down when faced with occlusions, viewpoint changes, poor lighting, and other challenging but common situations. I will show that we can train robots to handle these variations by modeling the causes behind visual appearance changes. If we model how the world changes over time, we can be robust to the types of changes that objects often undergo. I show how we can use this idea to improve performance on two different tasks: tracking and object recognition with neural networks. By modeling the causes of appearance changes over time, we can make our methods more robust to a variety of challenging situations that commonly occur in the real-world.
Abstract: There are two major paradigms for vision-based autonomous driving systems: mediated perception approaches which parse an entire scene to make a driving decision and behavior reflex approaches which directly map an input image to a driving action by a regressor. In this talk, I will discuss our recently proposed third paradigm: a direct perception approach to estimate the affordance for driving. An input image is mapped to a small number of key indicators that directly relate to the affordance of a road/traffic state for driving. This representation provides a compact yet complete description of the scene to enable a simple controller to drive autonomously. Falling in between the two extremes of mediated perception and behavior reflex, our direct perception representation may provide the right level of abstraction. To test this, we trained a deep convolutional neural network to estimate the affordance indicators using 12 hours of human driving in a video game and found that our model can learn to correctly drive a car in a diverse set of virtual environments. Additional experiments indicate that our direct perception approach generalizes well to real driving videos. (Joint work with Chenyi Chen, Alain Kornhauser, and Jianxiong Xiao)
Abstract: Action segmentation is important for many applications in human-robot interaction, video monitoring, and human skill analysis. The goal is to temporally segment a sequence of actions occurring in a scene, such as cutting, peeling, and mixing vegetables in a cooking task. Despite substantial prior work, action segmentation is still ill-suited for many real-world applications due to accurate but impractical solutions from the robotics community and inaccurate but general solutions from the computer vision community. In this talk, I will introduce a video-based solution, using a segmental spatiotemporal CNN, that leverages domain knowledge, like a fixed-view camera, to achieve substantially improved performance compared to other recent computer vision models. The spatial component of our network captures the relationships between latent objects in the scene, the temporal component captures how relationships change over time, and the segmental component ensures temporal consistency. In addition, we improve upon this model by leveraging sensor data, like robot kinematics, during the training phase to learn a better video-based model that also infers sensor positions from images with high accuracy. We evaluate on the 50 Salads and JIGSAWS datasets. This is joint work with Christian Rupprecht, Austin Reiter, Rene Vidal, and Greg Hager.