Yandex Upgrades Machine Learning Library CatBoost

The new version goes far beyond just an upgrade and is the culmination of four years of work by the Yandex Team.  

Yandex, one of the leading technology companies that build intelligent products and services powered by machine learning, recently announced CatBoost 1.0.0, a new major version of their open-source machine learning library. The new version goes far beyond just an upgrade and is the culmination of four years of work by the Yandex Team.  

CatBoost has been used by tech innovators for everything since its initial launch in 2017, from streaming service user recommendations to particle classification to destination prediction for Careem, a ride-hailing service. 

The library is based on gradient boosting, a form of machine learning that analyses a wide range of data inputs by progressively training more complex models to maximize the accuracy of predictions

“Far beyond a run-of-the-mill system update, CatBoost 1.0.0 is the transformation from an open-source machine learning library to a stand-alone, ready-to-use product. We are proud of the incredible, diligent work our team has put into reaching this milestone and are committed to continuing to improve and innovate CatBoost,” said Stanislav Kirillov, head of Yandex ML Systems Group.

This major version of CatBoost fixes all bugs and includes a number of improvements, including:  

  • Spark support for distributed learning.
  • New, improved and convenient documentation with open code.
  • Speed boost for CPU and GPU learning and overall faster model training (binary classification on CPU is 15-35% faster).
  • Predictive, multiple label classification. 

The major change with this release is that the Apache Spark package is now truly distributed. In its previous version, CatBoost stored test datasets in controller process memory. Now the test datasets are splitted evenly by workers. To learn more about the new version, click on the link here

Download our Mobile App

Victor Dey
Victor is an aspiring Data Scientist & is a Master of Science in Data Science & Big Data Analytics. He is a Researcher, a Data Science Influencer and also an Ex-University Football Player. A keen learner of new developments in Data Science and Artificial Intelligence, he is committed to growing the Data Science community.

Subscribe to our newsletter

Join our editors every weekday evening as they steer you through the most significant news of the day.
Your newsletter subscriptions are subject to AIM Privacy Policy and Terms and Conditions.

Our Recent Stories

Our Upcoming Events

3 Ways to Join our Community

Telegram group

Discover special offers, top stories, upcoming events, and more.

Discord Server

Stay Connected with a larger ecosystem of data science and ML Professionals

Subscribe to our Daily newsletter

Get our daily awesome stories & videos in your inbox

Can OpenAI Save SoftBank? 

After a tumultuous investment spree with significant losses, will SoftBank’s plans to invest in OpenAI and other AI companies provide the boost it needs?

Oracle’s Grand Multicloud Gamble

“Cloud Should be Open,” says Larry at Oracle CloudWorld 2023, Las Vegas, recollecting his discussions with Microsoft chief Satya Nadella last week. 

How Generative AI is Revolutionising Data Science Tools

How Generative AI is Revolutionising Data Science Tools

Einblick Prompt enables users to create complete data workflows using natural language, accelerating various stages of data science and analytics. Einblick has effectively combined the capabilities of a Jupyter notebook with the user-friendliness of ChatGPT.