MachineHack, in association with Embold, has recently launched a brand new hackathon — GitHub Bugs Prediction Challenge — where participants need to predict bugs on the GitHub titles and text body. The registration is now open and the hackathon closes on 18th of October 2020.
Embold.io is a software quality platform that enables leveraging quality code within a short duration. It combines machine learning, rigorous statistical algorithms, and powerful programming techniques to develop cutting edge products for the industry.
In this hackathon, data scientists need to come up with an algorithm that can predict the bugs, features, and questions based on GitHub text data. With this hackathon, participants will undergo an interesting learning curve where they will be able to write some quality code to win the prizes, as the evaluation involves getting a code quality score using the Embold Code Analysis platform. Further, Embold is also providing a quick tour of how to use its code analysis platform for free.
Read more about the hackathon here.
The hackathon comes with a two-stage evaluation. In the first stage of model evaluation participants will be evaluated based on their standing on the private leaderboard, which uses the 30% of provided test.json dataset. The final standing will reflect on 18th October, after 7:00 AM IST. In the second stage, the MachineHack team will select the top-20 participants from the private leaderboard, who will be notified to share their Embold Scorecard. The final winners will then be selected based on the aggregate score of their private leaderboard rankings and their Embold score.
With this hackathon, data scientists will have the opportunity to get their hands-on deploying the state-of-the-art language models as well as have the exposure of solving use cases at the organisational level. They will also have a chance to win bounties worth ₹25,000 by competing against top MachineHackers.
For developing an algorithm that can predict the bugs, features, and questions based on GitHub titles and the text body, the participants will be provided with a training set of 150000 rows x 3 columns (Includes label Column as Target variable); and a test set of 30000 rows x 2 columns. The attribute description includes the title of the GitHub bug, feature, question; the body of the GitHub bug, feature, question; and representations of various classes of labels.