4.5 Data Augmentation

We're aiming to create a knowledge hub for 3D printing of the future.

4.5 Data Augmentation

Neural network models generalize more when they are trained on more data. The more data the network sees during training, the better it learn to model the true distribution of the data. However, collecting and labeling more data are typically difficult and costly in many application domains. One way to increase the size of training data is to generate artificial data. This is especially easy in some applications like object recognition.

For a sample image of a butterfly, we can obtain more versions of the butterfly by transforming the image in ways that conserve the semantics of the butterfly. Such transformations can be flipping the image vertically and horizontally, rotating it by some degrees, cropping part of the image, removing or changing the background, adding noise, de-coloring or enhancing textures, etc. The augmented images still represent the same class (butterfly), however, we have an order of magnitude more training data than previously to feed the neural network. By performing data augmentation, we are significantly teaching the neural network to be invariant to different versions of the same object, hence increasing the network’s generalizability.

It is worth paying attention that we should not accidentally change the class or semantics of the sample when augmenting the sample. For example flipping letter ‘b’ could make it become letter ‘d’ and rotating number ‘6’ could turn it to number ‘9’, essentially changing their labels. Furthermore, one should take the contribution of data augmentation into account when comparing two different deep learning methods in which one requires augmentation while the other does not, since the performance improvement might be coming from the augmentation rather than the effective design of the algorithm itself.

Figure 1: An example of different data augmentation techniques applied on a butterfly image, which still preserve the semantics of the original image.