Research themes and questions

Industrial robots have been mostly limited to carefully controlled environments and driven by meticulously hand-coded scripts 1. While this approach works for certain high-volume products like automobiles, most tasks are prohibitively expensive to automate in this way. By comparison, humans intelligently adapt, quickly master new skills, and easily change environments. At Vicarious, we are building systems to bring human-like intelligence to the world of robots.

To reach this challenging goal, research at Vicarious is organized differently from mainstream deep learning and computer vision research. This sort of work has been traditionally directed around a set of standard benchmarks, which allow for researchers to steadily improve their techniques and compare results. However, using standard datasets as the primary benchmark of progress is a controversial issue 2, and many researchers have noted the downsides of it, the main objection being that to beat a benchmark the best strategy is to take the winning model from one year and to make incremental modifications on top of that. This creates a significant barrier to entry for novel methods 2, and makes it difficult to understand why a particular model does well. In addition, many different modes of errors are mixed up in these large datasets, which means that fundamental improvements in some aspects of the model bring only marginal improvements in overall performance of the system 3. Finally, the benchmark-beating techniques tend to over-fit to the idiosyncrasies of the data 4, 5.

In contrast, our research is organized around a set of questions and themes, and we use datasets that are appropriately designed to probe those questions. When we test on standard benchmarks, we try to carefully identify and characterize the sources of errors 3 instead of trying to beat the current state of the art algorithm. While this approach has the downside that it might not beat many standard benchmarks in the short term, we believe proper execution of our approach will lead to significantly better understanding and significantly better performing systems in the long term.

The following themes and constraints are emphasized in the work we do:


Data efficiency

Generalizing from a limited number of training examples: Deep neural networks and other machine learning algorithms are known to require an enormous amount of training, whereas humans are able to generalize from as little as one example 6. We believe that generalizing from a few examples lies at the core of intelligence. So far our models have required only a handful of examples to train.


Unsupervised learning

Although Deep Learning research has made advances in unsupervised learning techniques, recent successes are attributed to supervised learning with large amounts of data. We believe that unsupervised learning will be important for a large class of problems, and most of our efforts are focused on unsupervised learning techniques.


Neuro & cognitive sciences

Using the neocortex as a source of inductive biases and constraints; it is a widely held view that the learning efficiency and generalization of the brain comes from its inductive biases 7. The organization of circuits in the neocortex provides rich clues regarding the inductive biases and inference algorithms, and the investigation of these clues in the context of the deficiencies of existing models would enable the discovery of new network architectures, learning algorithms, and inference mechanisms.


Network structure

Like many other researchers, we believe that network architecture plays a significant role in generalization 8. Many of our experiments are designed towards the question of uncovering insights related to network micro-architecture. We emphasize parts-based representations and compositionality, similar to several other researchers building grammar-based models. In addition, the organization of cortical micro-circuitry provides rich clues about the nature of potential modifications. Our efforts in this direction have already been very fruitful in discovering a new network architecture that provides tight control for invariance-selectivity tradeoffs.



Siciliano, Bruno, and Oussama Khatib, eds. Springer Handbook of Robotics. Springer, 2016.


A. Yuille, "Computer vision needs a core and foundations" Image and Vision Computing, 2012.


D. Hoiem, Y. Chodpathumwan, and Q. Dai, "Diagnosing Error in Object Detectors" ECCV, 2012.


A. Torralba and A. A. Efros, "Unbiased Look at Dataset Bias" CVPR, 2011.


J. Ponce, T. L. Berg, M. Everingham, D. A. Forsyth, M. Hebert, S. Lazebnik, M. Marszalek, C. Schmid, B. C. Russell, A. Torralba, C. K. I. Williams, J. Zhang, A. Zisserman, "Dataset Issues in Object Recognition" Toward Category-Level Object Recognition, 2006.


J. B. Tenenbaum, "How to Grow a Mind: Statistics, Structure, and Abstraction" Knowl. Creat. Diffus. Util., vol. 1279, 2011.


J. Pearl, "Causality," New York Cambridge, 2000.


T. Poggio, J. Mutch, J. Leibo, and A. Tacchetti, "The computational magic of the ventral stream : sketch of a theory (and why some deep architectures work)" Computer Science and Artificial Intelligence Laboratory Technical Report, 2012.