Glossary of Terms


(Adjective). The process of labeling data by defining which areas of an image contain the relevant object(s).

(Noun). The actual files that contain the information regarding the areas of interest for a particular image. Annotations are sometimes referred to as the ground truth and they are used in supervised learning; the model repeatedly compares predictions against annotations in order to improve.


The process of altering images thereby creating new images that are sufficiently different from the originals. Augmentation can include blurring, cropping, brightening, darkening, rotating, and more. Augmentation is used to increase the size of a dataset.


The number of images trained on in a step.


In annotation, ‘difficult’ is set to 1 when the object is not easily recognized, otherwise it is set to 0.


Training on each image one time.


A quantification of how different the model’s prediction is from the ground truth.

Learning Rate

How often the weights in the model are updated.


When the model performs well on the training dataset, but poorly on new test data.


Training ‘batch’ number of images. For instance, if batch size is 16, then in one step, the model would train on 16 images.

Train, Validation, Test Split

There are three components to training a neural network. The actual training, the tuning of the hyperparameters, such as learning rate, and testing the model. To accomplish this, the original dataset is typically split into a training and testing dataset, usually with an 80/20 split, respectively. The training dataset is then split into a training and validation dataset. As the model trains, it compares its prediction to the annotation on all of the training data and adjusts the weights and other hyperparameters accordingly. When the model is done training, the model is tested against the validation data to see how well it performs.


When annotating, this describes whether the object being annotated is completely visible. If the object is visible (i.e. not truncated), this value is set to 0, otherwise it is set to 1. Typically if 20% or more of the object is obscured, it should be marked as truncated.


The model has poor performance on validation and training data as well as test data.


Weights are a way of quantifying how important a given input is for a neural network and how much it contributes to the output.