The proposed method is evaluated using two main scenarios in creating a training set: 1) apply without augmentation, 2) using various augmentation settings. In our experiments, we use Columbia Dogs Dataset, the images are pre-processed and cropped faces. Therefore, 8111 images are selected and split into training and testing sets. The training set contains 6781 images, and the testing set consists of 10 images per breed at a total of 1330 images. Each setting is evaluated using three pre-trained models from the ImageNet dataset, including MobileNetV2, InceptionV3 and NASNet. We retrain the networks using the Tensorflow library.
Since our training set is small based on the number of classes that we have, we augment the training set to increase the number of images and to improve the performance. We apply data wrapping augmentation to the training set and compare performance between several settings, including rotation, translation, and adding noise. However, the number of degrees for transformation is decided based on the possible transformations that would occur in the real images. For example, the degree of rotation would not exceed 45 degrees, and the translation would not need to exceed half of the image, as shown in Fig. 5. Then we randomly select 200 images per breed as our training set.
As shown in Table 1, training sets containing rotation and translation achieve higher performance for dog breed classification than the baseline without augmentation. The reported results show that NASNet model achieves overall highest performances regardless of the training set used. We achieve an accuracy of 89.92 % using the training set containing rotation images. Fig. 7 shows the confusion matrix of the model.
Augmentation techniques Accuracy(%) MobileNetV2 InceptionV3 NASNet Without 80.82 87.50 89.10 Rotation 81.65 88.42 89.92 Translate 81.65 89.02 88.87 Noise 80.30 85.94 88.80
Table 1. Accuracy of dog breed classification from different CNN models
Figure 7. Confusion matrix from the highest accuracy (89.92%) using NASNet models and training set containing some rotation images. The breed names listed from bottom to top in the y-axis and from left to right in the x-axis are ordered in the alphabet-ordering of breeds′ names in the Columbia Dogs Dataset.
In addition, we evaluate our best setting using the 10-fold cross-validation, as reported in Table 2. We achieve an average classification rate of 89.74% with 1.07 standard deviations. Fig. 8 and 9 show the average classification accuracy and standard deviation for each breed. Our results show that the models can recognize most breeds with overall accuracy above 80%.
K fold Accuracy (%) SD 1 90.52 18.68 2 88.66 21.85 3 91.25 17.93 4 87.67 20.57 5 90.01 14.46 6 88.90 18.35 7 90.14 18.66 8 89.52 17.94 9 90.63 19.51 10 90.14 17.05 Avg±SD 89.74±1.07
Table 2. Accuracy of 10-fold cross-validation using the NASNet model on the training set with rotation images
Figure 8. Average accuracy of 10-fold cross-validation. The breed names listed from bottom to top in the y-axis and from left to right in the x-axis are ordered in the alphabet-ordering of breeds′ names in the Columbia Dogs Dataset.
In the previous study on image classification, NASNet achieved the highest accuracy on image classification using the ImageNet dataset. Following our results in Table 1, we found that the results are in a similar orders, NASNet, InceptionV3, and MobileNetV2 regardless of training data. It confirms that the architecture of NASNet is current fit for our task. Although there are some improvements using augmentation techniques, we observe that adding noise reduces the performance. Noise can decrease the quality of images and could lead to model confusion. On the other hand, we found the rotating images can improve classification performance because testing images could be from various angles.
As shown in Table 3, we compare our proposed method and previous studies using the same dataset. Our result achieves the highest accuracy using NASNet with the augmentation training set, while previous methods used traditional feature selection techniques that would find it difficult to discriminate between a large number of breeds.
Model Accuracy Liu et al. 67.00 BreedNet 86.63 NASNet with augmented data 89.92
Table 3. Accuracy of dog breed classification from different CNN models
While training, CNN layers in the network are updated by backpropagation from the optimizer and its loss function. Several features are generated from these iterations. In order to understand how the model distinguishes between dog breeds, we visualize a heatmap from the last feature extraction layer showing which parts of an image are used for classification based on higher weights, as shown in Fig. 10. The heatmap illustrates that the discriminative areas adopted in the classification are located in the center of an image, which contains the alignment between eyes and nose, their patterns, and textures, the rest are disregarded. Therefore, there are some confusions between breeds that have similar appearances and alignments of such faces' components, e.g, Lowchen/Havanese and Cardigan Welsh Corgi/Pembroke Welsh Corgi (Fig. 11). Since these breeds are similar or come from the same origin, using only their faces might not be able to distinguish them.
Figure 10. Original image (left) and its heatmap (right) generated from the final feature layer of the model. The yellow spots mean the higher weights
This study has demonstrated that using deep learning can identify dog breeds from a dog face image. These results represent an initial step toward animal identification. However, it remains a challenge for us to improve the classification. To further our research, we will focus on cross-breed dogs by combining other parts, such as the body's shape, color, and texture, for training the breed classification model.
Knowing Your Dog Breed: Identifying a Dog Breed with Deep Learning
- Received: 2020-06-04
- Accepted: 2020-09-30
- Published Online: 2020-11-13
- Computer vision /
- deep learning /
- dog breed classification /
- transfer learning /
- image augmentation
Abstract: Dog breed identification is essential for many reasons, particularly for understanding individual breeds′ conditions, health concerns, interaction behavior, and natural instinct. This paper presents a solution for identifying dog breeds using their images of their faces. The proposed method applies a deep learning based approach in order to recognize their breeds. The method begins with a transfer learning by retraining existing pre-trained convolutional neural networks (CNNs) on the public dog breed dataset. Then, the image augmentation with various settings is also applied on the training dataset, in order to improve the classification performance. The proposed method is evaluated using three different CNNs with various augmentation settings and comprehensive experimental comparisons. The proposed model achieves a promising accuracy of 89.92% on the published dataset with 133 dog breeds.
|Citation:||Punyanuch Borwarnginn, Worapan Kusakunniran, Sarattha Karnjanapreechakorn, Kittikhun Thongkanchorn. Knowing Your Dog Breed: Identifying a Dog Breed with Deep Learning. International Journal of Automation and Computing. doi: 10.1007/s11633-020-1261-0|