H2O’s new major release 3.36 (Zorn) is packed with new features and fixes. Notable features include Distributed Uplift Random Forest and Infogram. The release also improves on H2O’s existing algorithms apart from improvements in MOJO import and Java version support.
Distributed Uplift Random Forest
Distributed Uplift Random Forest (Uplift DRF) is a classification tool for modeling uplift suitable for marketing and medicines.
Uplift DRF is a tree-based algorithm, which means in every tree, it takes information about treatment/control group assignment and information about response directly into a decision about splitting a node. The uplift score is the criterion to make said decision (similar to the Gini coefficient in the standard decision tree).
Infogram & Admissible Machine Learning
H20 has built new tools to help in the design of admissible learning algorithms which are efficient, fair and interpretable. The infogram is a graphical information-theoretic interpretability tool which allows the user to quickly spot the core, decision-making variables that uniquely and safely drive the response in supervised classification problems. The infogram can significantly cut down the number of predictors needed to build a model by identifying only the most valuable, admissible features.
RuleFit improvements
RuleFit has improved since the last major release through tuning both the multinomial classification algorithm and the model output. H20 HAS NOW extended the output with rule support which represents the proportion of training observations to which the rule applies. This represents another factor of rule importance apart from the LASSO coefficient.
Model selection
The ModelSelection toolbox aims to help users select the best predictor subsets to use when building a GLM regression model.
Support of Java 17
This release of H2O-3 adds official support for the latest version of Java 17 LTS. Production-ready binaries of Java 17 have been out since September 2021 and will be actively supported until 2026 or later.
MOJO Import
Importing (older) MOJO models into newer versions of H2O lets you compare the performance of your older models to the new ones you trained on new data.
CDP
The company has implemented support for the S3A delegation token refresh to access S3 buckets in deployments on CDP (Cloudera Data Platform) with IDbroker security.
AutoML
AutoML has a major change in terms of the validation and stacking strategy under resource-constrained environments, where the default 5-fold cross-validation strategy might be too slow or memory/compute intensive.
Also read – H2O.ai Raises $100M Series E Funding, Plans To Expand Its AI Development Platform