Robotics, Machine Learning & Computer Vision - Guest Post

By Stephanie Casola • Aug 13, 2020

We recently connected with an awesome blogger in the computer vision space. We wanted to highlight his blog to our community as he publishes several relevant and interesting articles about Computer Vision and Robotics. It is called the Serious Computer Vision Blog, and is definitely worthwhile checking out.  

Some background information about the blogger, Li Yang Ku:

Li Yang Ku is currently a researcher at Vicarious (a robotics company with some pretty famous investors including Jeff Bezos, Elon Musk and Mark Zuckerberg). He received his PhD in Computer Science at the University of Massachusetts Amherst and was a member of both the Laboratory of Perceptual Robotics and the Computer Vision Lab. He was also a NASA Space Technology Research Fellow and had collaborated with the Robonaut team at Johnson Space Center from 2014 to 2018. Before that, he was a researcher at HRL Laboratories and was involved in several robotics and computer vision projects. Li’s research mostly focuses on integrating robotic action, perception, and memory.

Li also published one of our articles on his blog! He wanted to highlight to his audience a tool that would make it easy for them to build computer vision, and used our Posture Corrector App tutorial as an example. You can check out this article here.

We would also like to highlight one of his articles below. It is evident by his background that he is highly knowledgeable when it comes to Computer Vision and Robotics. We would like to share this particular article about robotics, computer vision and machine learning and how these three fields all relate to each other:

Machine Learning, Computer Vision & Robotics

By Li Yang Ku

Having TA’d for Machine Learning this semester and worked in the field of Computer Vision and Robotics for the past few years, I always have this feeling that the more I learn the less I know. Therefore, it's sometimes good to just sit back and look at the big picture. This post will talk about how I see the relations between these three fields at a high level.

First of all, Machine Learning is more a brand than a name. Just like Deep Learning and AI, this name is used for getting funding when the previous name used is out of hype. In this case, the name popularized after AI projects failed in the 70s. Therefore, Machine learning covers a wide range of problems and approaches that may look quite different at first glance. Adaboost and support vector machine was the hot topic in Machine Learning when I was doing my master’s degree, but now it is deep neural networks that get all the attention.

Despite the wide variety of research in Machine Learning, they usually have this common assumption on the existence of a set of data. The goal is then to learn a model based on this set of data. There are a wide range of variations here, the data could be labeled or not labeled resulting in supervised or unsupervised approaches; the data could be labeled with a category or a real number, resulting in classification or regression problems; the model can be limited to a certain form such as a class of probability models, or can have less constraints in the case of deep neural network. Once the model is learned, there are also a wide range of possible usage.

It can be used for predicting outputs given new inputs, filling missing data, generating new samples, or providing insights on hidden relationships between data entries. Data is so fundamental in Machine Learning, people in the field don’t really ask the question of why learning from data. Many datasets from different fields are collected or labeled and the learned models are compared based on accuracy, computation speed, generalizability, etc. Therefore Machine Learning people often consider Computer Vision and Robotics as areas for applying Machine Learning techniques.

Robotics on the other hand comes from a very different background. There are usually no data to start with in robotics. If you cannot control your robot or if your robot crashes itself at first move, how are you going to collect any data? Therefore, classical robotics is about designing models based on physics and geometries. You build models that model how the input and current observation of the robot changes the robot state. Based on this model you can infer the input that will safely control the robot to reach a certain state.

Once you can command your robot to reach a certain state, a wide variety of problems emerge. The robot will then have to do obstacle avoidance and path planning to reach a certain goal. You may need to to find a goal state that satisfies a set of restrictions while optimizing a set of properties.

Simultaneous localization and mapping (SLAM) may be needed if no maps are given. In addition, sensor fusion is required when multiple sensors with different properties are used. There may also be uncertainties in robot states where belief space planning may be helpful. For robots with a gripper, you may also need to be able to identify stable grasps and recognize the type and pose of an object for manipulation. And of course, there is a whole different set of problems on designing the mechanics and hardware of the robot.  Unlike Machine Learning, a lot of approaches to these problems are solved without a set of data. However, most of these robotics problems (excluding mechanical and hardware problems) share a common goal of determining the robot input based on feedback. (Some) Roboticists view robotics as the field that has the ultimate goal of creating machines that act like humans, and Machine Learning and Computer Vision are fields that can provide methods to help accomplish such a goal.

The field of Computer Vision started under AI in the 60s under the goal of helping robots to achieve intelligent behaviors, but left this goal behind after the internet era when tons of images on the internet were waiting to be classified. In this age, computer vision applications are no longer restricted to physical robots.

In the past decade, the field of Computer Vision is driven by datasets. The implicit agreement on evaluation based on standardized datasets helped the field to advance at a reasonably fast pace (under the cost of millions of grad student hours on tweaking models to get a 1% improvement.) Given these datasets, the field of Computer Vision inevitably left the Robotics community and embraced the data-driven Machine Learning approaches.

Most Computer Vision problems have a common goal of learning models for visual data. The model is then used to do classification, clustering, sample generation, etc. on images or videos. The big picture of Computer Vision can be seen in my previous post.

Some Computer Vision scientists consider vision different from other senses and believe that the development of vision is fundamental to the evolution of intelligence (which could be true… experiments do show 50% of our brain neurons are vision related.) Nowadays, Computer Vision and Machine Learning are deeply tangled; Machine Learning techniques help foster Computer Vision solutions, while successful models in Computer Vision contribute back to the field of Machine Learning.

For example, the successful story of Deep Learning started from Machine Learning models being applied to the ImageNet challenge, and ended up with a wide range of architectures that can be applied to other problems in Machine Learning. On the other hand, Robotics is a field where Computer Vision folks are gradually moving back to. Several well known Computer Vision scientists, such as Jitendra Malik, started to consider how Computer Vision can help the field of Robotics based on the recent success on data-driven approaches in Computer Vision.

If you feel inspired by this article to start your own robotics project, you can use the alwaysAI platform to give your robot sight, just sign up here. 

If you are interested in learning more about the topic of robotics, we have several articles on our blog, including:

Don’t forget to check out the Serious Computer Vision Blog for more articles on robotics as well!

By Stephanie Casola • Aug 13, 2020

Developer stories to your inbox.

Subscribe to the Developer Digest, a monthly dose of all things code.

You may unsubscribe at any time using the unsubscribe link in the digest email. See our privacy policy for more information.

alwaysAI Ad
stylized image of a computer chip

Sign up today and start your project

We can't wait to see what you'll build!