MITB Banner

XGBoost 2.0 is Here

XGBoost 2.0 introduces a novel feature under development, focusing on vector-leaf tree models for multi-target regression, multi-label classification, and multi-class classification.

Share

XGBoost 2.0 is Here
Listen to this story

The best solution for making sense of tabular data, XGBoost, has just been upgraded. XGBoost 2.0 brings forth a plethora of new features and enhancements aimed at revolutionising the machine learning landscape. 

Click here to check out the full release.

XGBoost 2.0 introduces a novel feature under development, focusing on vector-leaf tree models for multi-target regression, multi-label classification, and multi-class classification. Unlike the previous approach of building separate models for each target, this feature allows XGBoost to construct a single tree for all targets, offering several advantages, including prevention of overfitting, smaller model sizes, and the ability to consider correlations between targets. 

Users can combine vector leaf and scalar leaf trees during training through a callback. It’s important to note that this feature is a work in progress, and some aspects are still under development.

https://twitter.com/Yampeleg/status/1709329028375592978

Read: XGBoost is All You Need 

New Device Parameter 

A significant change is the introduction of a new ‘device’ parameter, replacing existing parameters like ‘gpu_id,’ ‘gpu_hist,’ ‘gpu_predictor,’ ‘cpu_predictor,’ ‘gpu_coord_descent,’ and the PySpark-specific ‘use_gpu.’ Users can now use the ‘device’ parameter to select their preferred device for computation, simplifying the configuration process.

Default Tree Method

Starting from XGBoost 2.0, the ‘hist’ tree method is set as the default. In previous versions, XGBoost would automatically choose between ‘approx’ and ‘exact’ based on input data and the training environment. The new default method aims to improve model training efficiency and consistency.

GPU-Based Approximate Tree Method 

XGBoost 2.0 offers initial support for the ‘approx’ tree method on GPU. While performance optimisation is ongoing, the feature is considered feature-complete, except for the JVM packages. 

Users can access this capability by specifying ‘device=”cuda”‘ and ‘tree_method=”approx”.’ It’s important to note that the Scala-based Spark interface is not yet supported.

Memory Footprint Optimization 

This release also introduces a new parameter, ‘max_cached_hist_node,’ allowing users to limit CPU cache size for histograms. This helps prevent aggressive caching of histograms, especially in deep trees. Additionally, memory usage for ‘hist’ and ‘approx’ tree methods on distributed systems is reduced by half.

Improved External Memory Support 

External memory support receives a significant boost in XGBoost 2.0. The default ‘hist’ tree method now utilises memory mapping, enhancing performance and reducing CPU memory usage. Users are encouraged to try this feature, particularly when memory savings are required.

Learning-to-Rank Enhancements XGBoost 2.0 introduces a new implementation for learning-to-rank tasks, offering a range of new features and parameters to improve ranking performance. 

Notable additions include parameters for pair construction strategy, control over the number of samples per group, experimental unbiased learning-to-rank support, and custom gain functions with NDCG.

Column-Based Split and Federated Learning Significant progress has been made in column-based split for federated learning, with support for various tree methods and vertical federated learning. GPU support for this feature is still in development.

PySpark Enhancements 

The PySpark interface in XGBoost 2.0 has received numerous new features and optimisations, including GPU-based prediction, data initialisation improvements, support for predicting feature contributions, Python typing support, and improved logs for training.

Share
Picture of Mohit Pandey

Mohit Pandey

Mohit dives deep into the AI world to bring out information in simple, explainable, and sometimes funny words. He also holds a keen interest in photography, filmmaking, and the gaming industry.
Related Posts

CORPORATE TRAINING PROGRAMS ON GENERATIVE AI

Generative AI Skilling for Enterprises

Our customized corporate training program on Generative AI provides a unique opportunity to empower, retain, and advance your talent.

Upcoming Large format Conference

May 30 and 31, 2024 | 📍 Bangalore, India

Download the easiest way to
stay informed

Subscribe to The Belamy: Our Weekly Newsletter

Biggest AI stories, delivered to your inbox every week.

AI Courses & Careers

Become a Certified Generative AI Engineer

AI Forum for India

Our Discord Community for AI Ecosystem, In collaboration with NVIDIA. 

Flagship Events

Rising 2024 | DE&I in Tech Summit

April 4 and 5, 2024 | 📍 Hilton Convention Center, Manyata Tech Park, Bangalore

MachineCon GCC Summit 2024

June 28 2024 | 📍Bangalore, India

MachineCon USA 2024

26 July 2024 | 583 Park Avenue, New York

Cypher India 2024

September 25-27, 2024 | 📍Bangalore, India

Cypher USA 2024

Nov 21-22 2024 | 📍Santa Clara Convention Center, California, USA

Data Engineering Summit 2024

May 30 and 31, 2024 | 📍 Bangalore, India

Subscribe to Our Newsletter

The Belamy, our weekly Newsletter is a rage. Just enter your email below.