MITB Banner
Picture of Ankit Das

Ankit Das

A data analyst with expertise in statistical analysis, data visualization ready to serve the industry using various analytical platforms. I look forward to having in-depth knowledge of machine learning and data science. Outside work, you can find me as a fun-loving person with hobbies such as sports and music.
scrapy

Hands-On Guide To Web Scraping Using Python and Scrapy

Web Scraping is a procedure to extract information from sites. This can be done with the assistance of web scraping programming known as web scrapers. They consequently load and concentrate information from the sites dependent on client prerequisites.Scrapy is an open-source web crawling system, written in Python. Initially intended for web scratching, it can likewise be utilised to separate information utilising APIs or as a universally useful web crawler.

question_answering

Most Popular Datasets for Question Classification

Questions Classification assumes a significant part in question answering systems, with one of the most important steps in the enhancement of the classification process being the identification of question types. The main aim of question classification is to anticipate the substance kind of the appropriate response of a natural language processing. Question order is regularly done using machine learning procedures.

machine_translation

Deep Dive in Datasets for Machine translation in NLP Using TensorFlow and PyTorch

With the advancement of machine translation, there is a recent movement towards large-scale empirical techniques that have prompted exceptionally massive enhancements in translation quality. Machine Translation is the technique of consequently changing over one characteristic language into another, saving the importance of the info text.

language

Datasets for Language Modelling in NLP using TensorFlow and PyTorch

In recent times, Language Modelling has gained momentum in the field of Natural Language Processing. So, it is essential for us to think of new models and strategies for quicker and better preparation of language models. Nonetheless, because of the complexity of language, we have to deal with some of the problems in the dataset. With an increase in the size of the dataset, there is an increase in the normal number of times a word shows up in that dataset.

Guide to IMDb Movie Dataset With Python Implementation

Internet Movie Database (IMDb) is an online information base committed to a wide range of data about a wide scope of film substance, for example, movies, TV and web-based streaming shows, etc. The IMDb dataset contains 50,000 surveys, permitting close to 30 audits for each film.

vlog

Moment in Time: The Biggest Short Video Dataset For Data Scientists

Moment in Time is one of the biggest human-commented video datasets catching visual and discernible short occasions created by people, creatures, articles and nature. It was developed in 2018 by the researchers: Mathew Monfort, Alex Andonian, Bolei Zhou and Kandan Ramakrishnan. The dataset comprises more than 1,000,000 3-second recordings relating to 339 unique action words

human_activities

Have you Heard About the Video Dataset of Day to day Human Activities

ActivityNet is an enormous dataset that covers exercises that are generally pertinent to how people invest their energy in their everyday living. It was developed in 2015 by the researchers: Fabian Caba Heilbron, Victor Escorcia, Bernard Ghanemand Juan Carlos Niebles1. ActivityNet gives tests from 203 movement classes with a normal of 137 untrimmed recordings per class and 1.41 movement occurrences per video, for an aggregate of 849 video hours.

video_dataset

How To Use UCF101, The Largest Dataset Of Human Actions

UCF-101 dataset has 101 actions and 13320 clips of human actions, collected from youtube were first introduced in 2012 by researchers: Khurram Soomro, Amir Roshan Zamir and Mubarak Shah of Center for Research in Computer Vision, Orlando, FL 32816, USA. The clips in the action class are divided into 25 groups. Each group contains 4-7 clips. Clips in each group share some common features like background or actor.

loss_function

Loss Functions in Deep Learning: An Overview

Neural Network uses optimising strategies like stochastic gradient descent to minimize the error in the algorithm. The way we actually compute this error is by using a Loss Function. It is used to quantify how good or bad the model is performing. These are divided into two categories i.e.Regression loss and Classification Loss.

Gaussian mixture

Gaussian Mixture Model Clustering Vs K-Means: Which One To Choose

In recent times, there has been a lot of emphasis on Unsupervised learning. Studies like customer segmentation, pattern recognition has been a widespread example of this which in simple terms we can refer to as Clustering. We used to solve our problem using a basic algorithm like K-means or Hierarchical Clustering. With the introduction of Gaussian mixture modelling clustering data points have become simpler as they can handle even oblong clusters. It works in the same principle as K-means but has some of the advantages over it.

Language_model

Complete Guide on Language Modelling: Unigram Using Python

Language modelling is the speciality of deciding the likelihood of a succession of words. These are useful in many different Natural Language Processing applications like Machine translator, Speech recognition, Optical character recognition and many more.In recent times language models depend on neural networks, they anticipate precisely a word in a sentence dependent on encompassing words. However, in this project, we will discuss the most classic of language models: the n-gram models.

Knowledge Graph

Complete Guide to Implement Knowledge Graph Using Python

Information Extraction is a process of extracting information in a more structured way i.e., the information which is machine-understandable. It consists of subfields which cannot be easily solved. Therefore, an approach to store data in a structured manner is Knowledge Graph which is a set of three-item sets called Triple where the set combines a subject, a predicate and an object.

Principal Component

Principal Component Analysis On Matrix Using Python

Machine learning algorithms may take a lot of time working with large datasets. To overcome this a new dimensional reduction technique was introduced. If the input dimension is high Principal Component Algorithm can be used to speed up our machines.

Vocabulary Builder

How To Create A Vocabulary Builder For NLP Tasks?

The vocabulary helps in pre-processing of corpus text which acts as a classification and also a storage location for the processed corpus text. Once a text has been processed, any relevant metadata can be collected and stored.In this article, we will discuss the implementation of vocabulary builder in python for storing processed text data that can be used in future for NLP tasks.

Optimization

Optimization In Data Science Using Multiprocessing and Multithreading

In the real world, the size of datasets is huge which comes as a challenge for every data science programmer. Working on it takes a lot of time, so there is a need for a technique that can increase the algorithm’s speed. Most of us are familiar with the term parallelization that allows for the distribution of work across all available CPU cores. Python offers two built-in libraries for this process, multiprocessing and multithreading.

Deploy model

Complete Guide To Model Deployment Using Flask in Google Cloud Platform

In real-world, training and model prediction is one phase of the machine learning life-cycle. But it won’t be helpful to anyone other than the developer as no one will understand it. So, we need to create a frontend graphical tool that users can see on their machine. The easiest way of doing it is by deploying the model using Flask.

In this article, we will discuss how to use flask for the development of our web applications. Further, we will deploy the model on google platform environment.

spam-classification-image

Hands-On Guide To Detecting SMS Spam Using Natural Language Processing

In this era, Short message service or SMS is considered one of the most powerful means of communication. As the dependence on mobile devices has drastically increased over the period of time it has led to an increased number of attacks in the form of SMS Spam.The main aim of this article is to understand how to build an SMS spam detection model. We will build a binary classification model to detect whether a text message is spam or not.

Subscribe to our Newsletter