Core Computer Vision Models Power Advanced Applications

Computer vision applications rely on complex algorithms to allow cameras to see and interpret the physical world. But nearly all computer vision solutions are the result of four types of models.

Classification

Classification is fundamental to computer vision and requires classifying each object in an image with an identifying label or tag (i.e. cat, dog, bicycle). Classification does not localize objects in an image but rather simply identifies objects in the image.

It’s like holding up a flash card and asking the model if there is a dog in the image. Of the four primary computer vision models, classification requires the least amount of processing and is ideal for use cases like room occupancy where you simply need to know if a person is in the frame.

Object Detection

Object detection adds another layer of sophistication to classification models. Like classification, object detection also requires each object in an image to be labeled; however, object detection locates the object(s) within the image and provides detailed spatial information – typically through the use of an annotated bounding box.

For example, object detection could identify a hard hat on a construction worker’s head. Object detection results in more specific analysis, and has wide ranging use cases.

Semantic Segmentation

Semantic segmentation is even more specific than object detection in that it assigns every pixel to a class (a defined object) in an image. Semantic segmentation is a more detailed way to identify objects in an image. Whereas a bounding box might include details in the background of the image (not just the annotated object), semantic segmentation more closely masks the object in a net.

Analyzing details at the pixel level allows for extremely precise object identification and the reason why semantic segmentation is useful in detecting defects and anomalies. When pixels are off within the same object (like bottle cap), it indicates a defect.  

Instance Segmentation

Instance segmentation provides the highest level of detail in identifying objects in an image.  Like semantic segmentation, instance segmentation assigns every pixel to a class (object mask) except background pixels in an image.

However, instance segmentation goes a step further to distinguish separate instances of one class in one image (i.e. 4 different dogs in the same image). To do this, it uses both object detection and semantic segmentation techniques. Because of this sophisticated level of detail, instance segmentation is often used in the medical industry.

While these models increase in level of specificity and processing times, it does not mean that one model is always better than another.

Choosing models depends on a number of factors including the level of accuracy needed, the goal of use case, and processing speeds.

alwaysAI uses a multi-model approach to create the most efficient and effective applications.