Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Design by Hackathon alert! MachineHack launches ‘Data Engineering Championship’ as part of DES2022 Summit

Data Engineering Summit 2022, presented by Google Cloud and organised by Analytics India Magazine, is India’s first conference dedicated to the high-demand and impactful field of data engineering. This virtual conference, to be held on April 30, 2022, will focus on data engineering innovation and give attendees direct access to top engineers and innovators working in leading tech companies. 

This will be a golden opportunity for attendees to learn about the software deployment architecture of machine learning systems, how to produce the latest data frameworks and solutions for business use cases from the very best in the field.

Data Engineering Championship by MachineHack

MachineHack is organising a data engineering hackathon for data scientists & data engineers to participate and win a chance to present at DES 2022. 

Subscribe to our Newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Data engineering consists of collecting, provisioning and maintaining excellent quality data to get insights. In order to do that, a data engineer needs to design and develop a scalable data architecture, set up processes that pool data from multiple sources, check the data quality, and eliminate corrupt data. In addition, exploratory data analysis (EDA) and extract, transform, and load (ETL) techniques are required to access and use downstream to solve business problems.

START DATE: 13th April 2022, 6:00 PM

END DATE: 30th May 2022, 6:00 PM


All you need to know about the ‘Data Engineering Championship’

With the dataset provided, the participants need to analyse and create features of the following description.

  • ‘DATE’: create the date from year, month and day of the week 
  • ‘LOW’: Lower value of DEP_TIME_BLK
  • ‘HIGH’: Higher value of DEP_TIME_BLK
  • ‘TIMESTAMP’: create a timestamp with date and lower value of DEP_TIME_BLK
  • ‘WIND_CHILL’: the perceived temperature due to cooling effect of wind blowing
  • ‘PRCP_SNOW_RATIO’: ratio of precipitation and snow
  • ‘PLANE_AGE_AIRLINE_AIRPORT_FLIGHTS_MONTH_RATIO’: ratio of plane age and airline and airport flights months.
  • ‘SEAT_DISTRIBUTION’: Ratio of seats and in  concurrent flight CONCURRENT_FLIGHTS
  • ‘SEAT_DISTRIBUTION_NORMALISED’: normalized values of ratio of seats and in  concurrent flight 


In order to calculate the winners of the hackathon, the submissions will be evaluated using the mean absolute error. One can use sklearn.metrics.mean absolute error to calculate the same mean_squared_error(y_true, y_pred, squared=False).

This hackathon will support private and public leaderboards.

  • The public leaderboard is evaluated on 30% of the dataset
  • The private leaderboard will be made available at the end of the hackathon, which will be evaluated on 100% of the dataset
  • The final score represents the score achieved based on the Best Score on the public leaderboard

How to generate a valid submission file?

In order to submit your file, the following steps have to be kept in mind.

  • Sklearn models should support the predict() method to generate the predicted values.
  • The participant should submit a .csv file with exactly 2,00,00 rows with 9 columns. The submission will return an Invalid Score if you have extra rows or columns.
  • The file should have exactly 9 columns.

Points to note:

  • One should not shuffle the sequence of the test series
  • If you are using pandas, use the following submission code:

submission_df.to_csv(‘my_submission_file.csv’, index=False

Dataset: 200000 rows x 26 columns

  • MONTH: Month
  • DAY_OF_WEEK: Day of Week
  • DEP_DEL15: TARGET Binary of a departure delay over 15 minutes (1 is yes)
  • DISTANCE_GROUP: Distance group to be flown by departing aircraft
  • DEP_BLOCK: Departure block
  • SEGMENT_NUMBER: The segment that this tail number is on for the day
  • CONCURRENT_FLIGHTS: Concurrent flights leaving from the airport in the same departure block
  • NUMBER_OF_SEATS: Number of seats on the aircraft
  • CARRIER_NAME: Carrier
  • AIRPORT_FLIGHTS_MONTH: Avg Airport Flights per Month
  • AIRLINE_FLIGHTS_MONTH: Avg Airline Flights per Month
  • AIRLINE_AIRPORT_FLIGHTS_MONTH: Avg Flights per month for Airline AND Airport
  • AVG_MONTHLY_PASS_AIRPORT: Avg Passengers for the departing airport for the month
  • AVG_MONTHLY_PASS_AIRLINE: Avg Passengers for the airline for the month
  • FLT_ATTENDANTS_PER_PASS: Flight attendants per passenger for airline
  • GROUND_SERV_PER_PASS: Ground service employees (service desk) per passenger for airline
  • PLANE_AGE: Age of departing aircraft
  • DEPARTING_AIRPORT: Departing Airport
  • LATITUDE: Latitude of departing airport
  • LONGITUDE: Longitude of departing airport
  • PREVIOUS_AIRPORT: Previous airport that aircraft departed from
  • PRCP: Inches of precipitation for the day
  •  SNOW: Inches of snowfall for the day
  • SNWD: Inches of snow on the ground for the day
  • TMAX: Max temperature for the day
  • AWND: Max wind speed for the day

START DATE: 13th April 2022, 6:00 PM

END DATE: 30th May 2022, 6:00 PM



The three winners will be getting a chance to present their solution approaches at the Data Engineering Summit (DES 2022).

Submission deadline

If you want to be a part of this exciting hackathon, make sure to submit your entries by May 30, 2022, at 06:00 PM IST, as the private leaderboard will be frozen at that time.


  • If any of the details entered are found incorrect, Analytics India Magazine reserves the right to disqualify any participant.
  • Any external dataset usage is strictly prohibited. The participants will be disqualified if found using any external dataset.

So what are you waiting for? Register now to participate in this hackathon.

Sreejani Bhattacharyya
I am a technology journalist at AIM. What gets me excited is deep-diving into new-age technologies and analysing how they impact us for the greater good. Reach me at sreejani.bhattacharyya@analyticsindiamag.com

Download our Mobile App

MachineHack | AI Hackathons, Coding & Learning

Host Hackathons & Recruit Great Data Talent!

AIMResearch Pioneering advanced AI market research

With a decade of experience under our belt, we are transforming how businesses use AI & data-driven insights to succeed.

The Gold Standard for Recognizing Excellence in Data Science and Tech Workplaces

With Best Firm Certification, you can effortlessly delve into the minds of your employees, unveil invaluable perspectives, and gain distinguished acclaim for fostering an exceptional company culture.

AIM Leaders Council

World’s Biggest Community Exclusively For Senior Executives In Data Science And Analytics.

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox