CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise

Kuang-Huei Lee1, Xiaodong He2*, Lei Zhang1, Linjun Yang3*
1 Microsoft AI and Research, 2 JD AI Research, 3 Facebook
(* Work performed while working at Microsoft)

[ Paper ] [ BibTex ] [ Dataset ] [ Code ] [ MSR Blog ]

This is the CleanNet project page. CleanNet is a neural achitecture for learning image classification in presence of label noise and label noise detection using minimum human supervision from Microsoft AI & Research.

The CleanNet paper will appear in CVPR 2018.

We also wrote an article on Microsoft Research Blog to introduce this work.

Abstract

In this paper, we study the problem of learning image classification models with label noise. Existing approaches depending on human supervision are generally not scalable as manually identifying correct or incorrect labels is time-consuming, whereas approaches not relying on human supervision are scalable but less effective. To reduce the amount of human supervision for label noise cleaning, we introduce CleanNet, a joint neural embedding network, which only requires a fraction of the classes being manually verified to provide the knowledge of label noise that can be transferred to other classes. We further integrate CleanNet and conventional convolutional neural network classifier into one framework for image classification learning. We demonstrate the effectiveness of the proposed algorithm on both of the label noise detection task and the image classification on noisy data task on several large-scale datasets. Experimental results show that CleanNet can reduce label noise detection error rate on held-out classes where no human supervision available by 41.5% compared to current weakly supervised methods. It also achieves 47% of the performance gain of verifying all images with only 3.2% images verified on an image classification task.

Food-101N Dataset


The Food-101N dataset is introduced in this paper, designed for learning to address label noise with minimum human supervision.

[ Link to the dataset website ]

Code

[ Tensorflow code is available here ]

Citation

@inproceedings{lee2017cleannet,
  title={CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise},
  author={Lee, Kuang-Huei and He, Xiaodong and Zhang, Lei and Yang, Linjun},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}

Acknowledgments

We would like to thank Xi Chen, Bosco Chiu, Yandong Guo and Po-Sen Huang for their thoughtful feedback and discussions. Thanks also to Kelly Huang and Arun Sacheti for helping development of the Food-101N dataset.