Indian food is one of the popular cuisines in the world and comprises uniquely flavoured preparations using traditional recipes and spices. Across its 29 states, billion-plus population, and thousands of years of the culinary heritage of India has resulted in a large number of sub-cuisines and a vast number of unique dishes (Kate Springer, 2020).
To aid Indian food enthusiasts, such as myself, in their quest for wellness and fitness, I have conceptualised CurryAI — a computer vision aided Indian food nutrition calculator. This calculator would be able to estimate the nutritional content of an Indian dish by means of analysing an image of the dish.
The artificial intelligence within the calculator logic could be embedded as part of a mobile app. This would easily enable users to get a dish’s nutritional content after snapping a picture from their phone.
Figure 1: Deciphering nutritional content from mobile picture
In researching literature for a similar capability, I found prior research performed at Google titled “Im2Calories” (Google Research, 2015). This paper approaches the problem of detecting calories from images and not complete nutritional information. It also uses information from menus of popular US restaurants, thus reducing the problem to the classification of a dish within the context of a known restaurant’s menu. The scope of the problem is constrained to this smaller set of known dishes.
My research focuses on the problem of generic meal detection as that is more widely applicable.
I could not find any dataset, paper or commercially available apps that focused on identifying Indian food. I have attempted to address this gap by means of my research.
To facilitate this, I proposed to provide a nutritional breakdown of a dish by using information from the FoodData Central US government website (USDA FoodData Central).
The FoodData Central Database contains five data types, including USDA. USDA has been the authoritative source of food composition data for more than a century and the primary source of publicly available data on the nutrients and other components found in foods.
Figure 2: USDA Nutritional Database
(ILSI Newsletter June 2019, 2019)
Each Indian food dish has a varying quantity of ingredients. Breaking down the dish into ingredients, I created a mapping of Indian food dishes to their nutritional content with respect to ingredients, calories, macronutrients (protein, carbohydrate, fats, fibre etc.), using the FoodData Central database.
CurryAI focuses on the important steps in providing complete nutritional information for Indian dishes. Specifically, this paper focuses on the first step, i.e. recognising an Indian food dish from its image.
There are many other pieces to this problem, such as plate segmentation and portion size detection, which can be addressed through future work. This work can also be extended beyond Indian food to global food recognition.
Figure 3: CurryAI – Steps within scope
Indian Food Dataset collection
Computer vision has become quite advanced with the arrival of artificial neural networks, and in particular, deep neural networks, which have several layers of computation. These are therefore capable of learning complex patterns such as image identification.
Convolutional Neural Networks (CNNs) (Cornelisse, 2018) function similar to the human brain and are suited for tasks such as image recognition. They identify shapes and features and learn to match specific image categories or types.
Similar to most neural networks, CNNs also require a large amount of data to be trained well for a particular task. A rule of thumb is to have 1000 images per class (Mitsa, 2019). In this case, the algorithm needs to be able to recognise a wide variety of foods. Hence it needs to understand a large number of categories or classes, which is called a large-scale multi-class classification problem.
For detecting approximately 100 classes, our CNN would need to be trained using 100,000 images and would take many days to train. Fortunately, there are ways to reduce this learning time. One such way is called transfer learning by using a pre-trained network. The pre-trained network can identify generic features of images, and the network can be further trained to identify specific features. This is described in detail in the Transfer Learning Section.
Figure SEQ Figure \* ARABIC 4: Convolutional Neural Network Design
(Convolutional Neural Network: A Step By Step Guide, 2019)
Thus, my approach is to start with a multi-class classifier using a pre-trained deep learning CNN network for computer vision that can already detect a wide variety of images. I will then train this further with a large variety of Indian dish images to become good at recognising Indian meals.
As I could not find any dataset for Indian foods, my first task was to create a representative dataset containing most of the common Indian food images and their names.
There are many Indian meal images publicly available and searchable via Google. I decided to use that to curate my Indian Food dataset.
To get this started, I needed a reliable and comprehensive list of Indian food names. I was able to find a comprehensive list of 301 Indian Foods on Wikipedia, referred to as IndianFoodList301 (Wikipedia).
I found that the Wikipedia list had several overlapping names of foods. E.g.’ Dal’ and ‘Dal fry with tadka’. This would confuse my algorithm as almost the same images would be present under two different classes. It would also make the problem exponentially complex.
To address this issue, I manually created superset classes such as ‘Dal’, combining similar classes and ensuring that all the classes were relatively distinct. I call this list of superset classes — IndianFoodList85.
Using this list, I wrote a web crawler to crawl 100 images of each dish on the internet to create an Indian Food dataset. As the image recognition algorithm would be able to work better if the images were of similar size and shape, I customised the crawler to download medium-sized images of square aspect ratio. I manually cleaned the dataset to remove any blank images as well as any irrelevant images that got downloaded.
I call this final curated dataset — IndianFood85.
While it is possible to arbitrarily increase the number of dishes and the number of images per dish, IndianFood85 is a good starting point to cover a sufficiently large number of Indian cuisine dishes across states to test the applicability of the algorithm in a real-life setting.
As image classification into large numbers of classes is complicated and time-consuming, I also created a smaller subset of this database, called IndianFood31, to use as the first training set for my multi-class classification. This is because every experiment runs to fine-tune the network takes up a large amount of computing power and time. So it is more efficient to develop the algorithm on a smaller number of classes in the initial phase.
IndianFood31 contains a subset of dishes mostly from the northern states. As the dataset is increased in the number of dishes and the number of images per dish, the performance of CurryAI will vastly improve. I plan to continue to enrich the dataset over time.
Meal Detection & Classification
Meal detection and classification is the initial step of the CurryAI algorithm. To solve the problem of meal detection, I needed to train a deep learning algorithm to understand an image containing a single dish and classify it into one of the IndianFood31 classes.
a) Transfer Learning
The initial step in the understanding of an image is the extraction of main features such as edges. This understanding is common across all images, and hence it is useful to start with a general pre-trained network.
A pre-trained network has been previously trained on a large dataset. You can then customise it for a given task. In this way, you can benefit from the knowledge already built into the network by others without starting from the beginning.
The intuition behind transfer learning for image classification is that if a model is trained on a large and general enough dataset, this model will effectively serve as a generic model of the visual world. You can then take advantage of these learned feature maps without starting from scratch by training a large model on a large dataset. (Tensorflow, 2021)
In this case, I started with the MobileNetV2 pre-trained deep learning-based neural network, which has been trained on image classification with hundreds of thousands of images of 1000+ categories. (Sandler, Howard, Zhu, Zhmoginov, & Chen, 2018)
I then proceeded to further train it for IndianFood31 detection.
b) Binary classification
I first attempted to do a simple binary classification, training for a single class – idli alongside the same number of non-idli images. To enable the transfer learning, I set all the layers of the MobileNetV2 except the last one to trainable = false. This means that the model training will not disturb the weights learned for the previous layers, thus retaining its prior learned information of generic image detection. It will only add additional learned layers at the end using the custom dataset.
Using an architecture of 2 dense layers of 64 neurons (ref: glossary), each followed by a final softmax (ref: glossary) activated layer of 2 neurons, I trained a binary classifier that almost perfectly segregated the images into the idli and non-idli classes.
c) Multi-class classification on IndianFood31:
I then expanded my model to differentiate between multiple classes. To do this, I had to put the images into separate sub-directories, where the name of the sub-directory was the name of the dish. Then I had to change my algorithm to automatically understand the class name from the sub-directory name.
To do this, I used ImageDataGenerator from the TensorFlow Keras library. ImageDataGenerator allows you to specify a directory name and then infers the various classes and associated image subsets from the sub-directories under it. You can also configure it to augment your data by flipping, rotating etc., as the more data you have for training, the better the model will be. In this case, I decided not to augment data initially as the flip/ rotation was distorting and reducing the quality of the original images.
I separated out 25% of the images for a test set and kept it separate from the training set, so I could verify the performance of my algorithm on this unseen data.
In the remaining 75% of the data, I again kept aside 25% as validation data, so I could use it as a guide to fine-tune my model training. As you start to run each iteration (epoch) and the network starts to learn, you need to compute the accuracy of the training. This tells how accurately the algorithm can identify each class of the training data.
Often, when a neural network is learning, it also sometimes learns the dataset’s ‘noise’ or irrelevant features. In such a case, it shows high accuracy on the training data, but in fact, it has also learned a lot of noise from the training dataset. Therefore, when the algorithm is run on different data, it performs poorly. This situation is called ‘overfitting.’ For example, if the training images contain a cake with sprinkles, it will assume that all cakes need to have sprinkles. If it shows a different type of cake, it will be unable to identify it correctly.
To avoid this problem, we keep some data aside as validation data during the training phase. We constantly check using the validation data if the model is overfitting and try to prevent this.
Overall this results in our trained model becoming more robust.
So far, I have been able to get a training accuracy of 0.88 and a validation accuracy of 0.59 for the IndianFood31 dataset. I was also able to get a Mean Average Precision of 0.56 and a Mean Average Recall of 0.48 on the test set.
There is scope to further improve my network performance by running more experiments by adjusting the different layers and different parameters, and I shall continue to work on it.
Nutrition Database Mapping
Once an Indian food dish has been recognised through its image provided, the dish needs to be broken down into its ingredients, and the nutritional components of the dish calculated. This can be done by using food mapping via the FoodData Central database. This scope will be addressed in a future paper.
Mobile application development
A mobile application will be created to embed the algorithm and will allow the image taken with the phone camera to be used as input to the algorithm. This will also be developed and described in future work.
Know more about the project here.
|Sr. No.||Indian Dish Name|
|5||Bhindi Sabzi Dry|
|23||Gobhi Sabzi Dry|
|48||Other non-veg curry|
|50||Other atta roti|
Google Research. (2015). Im2Calories: towards an automated mobile vision food diary . Retrieved from Google Research Publications: https://research.google/pubs/pub44321/
Kate Springer, C. (2020, 9). Indian food: The best dishes in each region. Retrieved from CNN Travel: https://edition.cnn.com/travel/article/indian-food-dishes/index.html
USDA FoodData Central. (n.d.). FoodData Central. Retrieved from USDA – US Department of Agriculture: https://fdc.nal.usda.gov/
Wikipedia. (n.d.). List of Indian dishes. Retrieved from Wikipedia: https://en.wikipedia.org/wiki/List_of_Indian_dishes
Cornelisse, D. (2018, April 24). An intuitive guide to Convolutional Neural Networks. Retrieved from freeCodeCamp: https://www.freecodecamp.org/news/an-intuitive-guide-to-convolutional-neural-networks-260c2de0a050/
Mitsa, T. (2019, April 23). How Do You Know You Have Enough Training Data? Retrieved from towards Data Science: https://towardsdatascience.com/how-do-you-know-you-have-enough-training-data-ad9b1fd679ee
Tensorflow. (2021, April). Transfer learning and fine-tuning. Retrieved from Tensorflow: https://www.tensorflow.org/tutorials/images/transfer_learning
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., & Chen, L.-C. (2018, Jan 13). MobileNetV2: Inverted Residuals and Linear Bottlenecks. Retrieved from arXiv.org: https://arxiv.org/abs/1801.04381v4
ILSI Newsletter June 2019. (2019, June). Retrieved from ILSI: https://ilsi.org/ilsi-newsletter-june-2019
Convolutional Neural Network: A Step By Step Guide. (2019, March 17). Retrieved from towards data science: https://towardsdatascience.com/convolutional-neural-network-a-step-by-step-guide-a8b4c88d6943
Gudikandula, P. (2019, March 22). Deep view on transfer learning with iamge classification pytorch. Retrieved from Purnasai Gudikandula Github: https://purnasai.github.io/Deep-view-on-Transfer-learning-with-Iamge-classification-Pytorch/
AsianDelight. (2021). AsianDelight. Retrieved from Shutterstock: https://www.shutterstock.com/image-photo/calories-counting-food-control-concept-woman-1488795380
Join Our Telegram Group. Be part of an engaging online community. Join Here.
Subscribe to our NewsletterGet the latest updates and relevant offers by sharing your email.
Shubh Samtani is a 10th-grade student at The International School Bangalore (TISB). He has won the MARRS International Spelling Bee Contest, which had participation from over 250,000 children. Shubh is among the top programmers in his age group on HackerRank, which has over 11 million active programmers. He is one of the youngest globally on the platform to achieve a ranking of <10,000 at the age of 14 years. Shubh volunteers as a teacher at 0Gravity, a global movement to create awareness, structured training programs and communities for computer education for kids 10-15 years old. He recently got accepted to the prestigious LaunchX summer program on entrepreneurship for Summer 2021. He has conducted groundbreaking research on the impact of the Right To Education for 400 million children in India during the COVID-19 crisis and outlined recommendations for changes to education techniques, policies and infrastructure. He has also played for the school soccer team for many years.