Swiss Re, the world’s leading reinsurance organisation, in collaboration with MachineHack, is set to launch a Machine Learning Hackathon from March 11th to 28th to predict accident risk scores for unique postcodes. The top three winners stand a chance to win prizes worth INR 1.5 lakh.
With a presence across 25 countries, Swiss Re’s tech strategy harnesses data and technology developing smarter and innovative solutions for clients’ value chains.
Swiss Re applies fresh perspectives, knowledge and capital to anticipate and manage risk to create smarter solutions. Swiss Re’s Global Business Solutions Center (BSC) in Bangalore has more than 1,300 professionals leveraging experience, expertise and out-of-the-box thinking to create new business opportunities.
Subscribe to our Newsletter
Join our editors every weekday evening as they steer you through the most significant news of the day, introduce you to fresh perspectives, and provide unexpected moments of joy
The hackathon starts on March 11, 2022, at 6:00 PM
Problem statement & description
Swiss Re is inviting data scientists, machine learning practitioners and analytics professionals to build a machine learning model to improve auto insurance pricing.
According to IBEF, “Domestic automobiles production increased at 2.36% CAGR between FY16-20 with 26.36 million vehicles being manufactured in the country in FY20. Overall, domestic automobiles sales increased at 1.29% CAGR between FY16-FY20 with 21.55 million vehicles being sold in FY20”.The rise in vehicles on the road will also lead to multiple challenges, and the road will be more vulnerable to accidents. Increased accident rates also lead to more insurance claims and payouts rise for insurance companies.
In order to pre-emptively plan for the losses, the insurance firms leverage accident data to understand the risk across the geographical units, e.g. Postal code/district etc.
In this challenge, we are providing you with the dataset to predict the “Accident_risk_index” against the postcodes.Accident_risk_index (mean casualties at a postcode) = sum(Number_of_casualities)/count(Accident_ID).
|Train Data (given)|
|Modelling Train Data (Rolled up at Postcode level)|
- The participants are required to predict the ‘Accident_risk_index’ in the test.csv and against the postcode on the test data
- Then submit your ‘my_submission_file.csv’ on the submission tab of the hackathon page.
Pro-tip: The participants are required to perform feature engineering to the first roll up the train data at postcode level and create a column as “Accident_risk_index” and optimize the model against postcode level.
Few Hypothesis to help you think: “More accidents happen in the latter part of the day as those are office hours causing congestion”
“Postal codes with more single carriage roads have more accidents”
(***In the above hypothesis features such as office_hours_flag and #single _carriage roads can be formed)
Additionally, we are providing you with road network data (contains info on the nearest road to a postcode and its characteristics) and population data (contains info about the population at area level). This info is for augmentation of features, but is not mandatory to use.
- The submission will be evaluated using the Root Mean Square Error. One can use sklearn.metrics.mean_squared_error to calculate the same.
mean_squared_error(y_true, y_pred, squared=False)
- This hackathon supports private and public leaderboards.
- The public leaderboard is evaluated on 30% of Test data.
- The private leaderboard will be made available at the end of the hackathon, which will be evaluated on 100% of Test data.
- The Final Score represents the score achieved based on the Best Score on the public leaderboard.
- First Prize: INR 75,000
- Second Price: INR 50,000
- Third Prize: INR 25,000
The hackathon will end on March 28, 2022, at 6:00 PM.
- Train.csv – 4,78,741 rows x 27 columns
- Test.csv – 1,21,259 rows x 27 columns
- Sample Submission.csv — Please check the ‘Evaluation’ section on MachineHack Page for more details on generating a valid submission.
train.csv & test.csv:
- ‘Day_of_Week’, ‘Time’,’
- ‘Local_Authority_(District)’, ‘Local_Authority_(Highway)’,
# Population: 8,035 rows x 10 columns
- ‘Rural Urban’,
- ‘Variable: All usual residents; measures: Value’,
- ‘Variable: Males; measures: Value’,
- ‘Variable: Females; measures: Value’,
- ‘Variable: Lives in a household; measures: Value’,
- ‘Variable: Lives in a communal establishment; measures: Value’,
- ‘Variable: Schoolchild or full-time student aged 4 and over at their non term-time address; measures: Value’,
- ‘Variable: Area (Hectares); measures: Value’,
- ‘Variable: Density (number of persons per hectare); measures: Value’
# Road Network: 91,566 rows x 8 columns
- ‘distance to the nearest point on rd’,
Evaluation criteria: Root Mean Square Error
Note: The target variables are all encoded in the training dataset for convenience. Please submit the test results in a similar encoded fashion for us to evaluate your results.
- If any of the details entered are found incorrect, Analytics India Magazine and Swiss Re reserve the right to disqualify any participant.
- Any external dataset usage is strictly prohibited. The participants will be disqualified if found using any external dataset.
- Optimising root mean square error
- Risk prediction
- Feature engineering
The hackathon starts on March 11, 2022 at 6:00 PM
The hackathon will end on March 28, 2022, at 6:00 PM.