Food-101N: A Dataset for Learning to Address Label

Images of "waffles" crawled from the web. Such crawling usually results in label noise. Images incorrectly labeled (label noise) are outlined in red.

A Dataset for Learning to Address Label Noise

The Food-101N dataset is introduced in a CVPR 2018 paper CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise from Microsoft. The dataset is designed for learning to address label noise with minimum human supervision.

Food-101N is an image dataset containing about 310,009 images of food recipes classified in 101 classes (categories). Food-101N and the Food-101 dataset share the same 101 classes, whereas Food-101N has much more images and is more noisy.

In this dataset, we define two types of labels for images:

Class labels: a class label describes the class of an image. Class labels are noisy, which means they could be incorrect. Each image in this dataset has a class label. The estimated noise rate is ~20%.

Verification labels: a verification label marks whether the class label is correct for an image. We manually assign verification labels to 52,868 images (~523 images per class) for training and 4,741 images (~47 images per class) for validation.

In our paper CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise, we explored using transfer learning to address label noise. Specifically for some experiments, we only keep verification labels for part of the classes so that we only learn from human supervision for some of the classes and transfer the knowledge to address label noise in other classes. For future research to follow the experiments in the paper, we also include the held-out class lists in this dataset.

Tasks and Evaluation

Food-101N is designed for the following two tasks:

Learning image classification with label noise
Food-101N directly adopts the testing set of Food-101 to evaluate image classification.

Here we provide an image classification baseline using ResNet-50 without any human verification:

Dataset	Top-1 Accuracy
Food-101N	81.44%
Food-101	81.67%

The above results also show that Food-101N perform comparably to Food-101. Please refer to the paper for more results.

Label noise detection
Food-101N provides 4,741 images with verification labels for validation.

Citations

If you use the dataset in your paper, then please cite our paper "CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise".

@inproceedings{lee2017cleannet,
  title={CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise},
  author={Lee, Kuang-Huei and He, Xiaodong and Zhang, Lei and Yang, Linjun},
  booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition ({CVPR})},
  year={2018}
}

Download the Dataset

Please review the following terms before downloading

The Food-101N dataset is intended for non-commercial research purposes only to promote advancement in the field of artificial intelligence and related areas, and is made available free of charge without extending any license or other intellectual property rights. The dataset is provided “as is” without warranty, and usage of the data has risks since we may not own the underlying rights in the images. We will not be liable for any damages related to use of the dataset. Feedback is voluntarily given and can be used as we see fit. Upon violation of any of these terms, your rights to use the dataset will end automatically.

Please contact Kuang-Huei Lee at kuanghul@alumni.cmu.edu if you own any of the documents made available but do not want them in this dataset. We will remove the data accordingly. If you have questions about use of the dataset or any research outputs in your product or services, we encourage you to undertake your own independent legal review. For other questions, please feel free to contact us.

I agree to these terms. Proceed to download.

Food-101N Dataset

Kuang-Huei Lee Xiaodong He Lei Zhang Linjun Yang

A Dataset for Learning to Address Label Noise

Tasks and Evaluation

Citations

Download the Dataset