Last updated December 17, 2020
In AI Mysteries

Guide To Qlib: Microsoft’s AI Investment Platform

Microsoft Qlib is an AI-oriented quantitative investment platform containing the full ML pipeline of data processing, model training, back-testing; and covers the entire auto workflow of quant investment.

Share

Published on December 17, 2020

by Jayita Bhattacharyya

Microsoft Research has introduced a library based on AI quantitative investment named ‘qlib’ and represented by Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, and Tie-Yan Liu in their paper Qlib: An AI-oriented Quantitative Investment Platform. Qlib will allow users to easily try their ideas to create better Quant(Quantitative trading analysts) investment strategies.

AI in Finance has taken revolutionary turns with predicting stock price forecasting, analysis on trends, seasonality, irregularities. Machine learning and deep learning state of art models can analyse and make predictions of the futures trading market based on present scenario and historical data.

Quantitative investing is done by algorithmic trading and statistical modelling following research to analyse behaviours. Quantitative investment strategies are making use of complex tools with the advent of modern AI. Quant models are usually backtested, but even doing that, their actual applications and success rate are at constant market risks.

Qlib contains the full ML pipeline of data processing, model training, back-testing; and covers the entire auto workflow of quant investment. Other features include risk modelling, portfolio optimization, alpha seeking, and order execution. It is the first open-source platform that covers the workflow of a modern quantitative researcher in the age of AI. It aims to empower quantitative researchers with the true potential of machine learning in quantitative investment.

Modules & Workflow Framework:

Starting with the Data Server module which provides a data engine to query and process raw data. With retrieved data, build your dataset in the Data Enhancement module. The Model Creator module will learn models based on datasets. The Model Manager module can handle problems for modern quantitative researchers along with the Model Ensemble module. Portfolio Generator module is designed to generate a portfolio from trading signals output. The orders Executor module will then examine the performance of a strategy and Analyser modules will automatically analyze the results of trading signals, portfolio, and execution. Quantitative investment data is time-series data and updated timely. Thus Dynamic Modeling provides interfaces to handle such solutions.

Demo: https://terminalizer.com/view/3f24561a4470

Installation:

pip install pyqlib

 git clone https://github.com/microsoft/qlib.git && cd qlib
 python setup.py install

Load and Prepare Data: python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn

Workflow Code from Qlib GitHub repo: Qlib makes use of its tool ‘qrun’ to automate the workflow( data loading, training, backtesting and drawing inferences using graphs).

# importing libraries

 from qlib.contrib.model.gbdt import LGBModel
 from qlib.contrib.data.handler import Alpha158
 from qlib.contrib.strategy.strategy import TopkDropoutStrategy
 from qlib.contrib.evaluate import ( backtest as normal_backtest,   risk_analysis,)
 from qlib.utils import exists_qlib_data, init_instance_by_config
 from qlib.workflow import R
 from qlib.workflow.record_temp import SignalRecord, PortAnaRecord
 from qlib.utils import flatten_dict

# train model – Model used is LightGBM by fine-tuning with Qlib’s Hyperparameter Test Engine(HTE). Dataset used is Alpha158 (Qlib has another dataset Alpha 360)

 data_handler_config = {
     "start_time": "2008-01-01","end_time": "2020-08-01",
     "fit_start_time": "2008-01-01","fit_end_time": "2014-12-31",
     "instruments": market,}
 task = {
     "model": {
         "class": "LGBModel","module_path": "qlib.contrib.model.gbdt",
         "kwargs": {
             "loss": "mse","colsample_bytree": 0.8879,
             "learning_rate": 0.0421,"subsample": 0.8789,
             "lambda_l1": 205.6999,"lambda_l2": 580.9768,"max_depth": 8,
             "num_leaves": 210,"num_threads": 20,},},
     "dataset": {
         "class": "DatasetH",
         "module_path": "qlib.data.dataset",
         "kwargs": {
             "handler": {
                 "class": "Alpha158",
                 "module_path": "qlib.contrib.data.handler",
                 "kwargs": data_handler_config,
             },
             "segments": {
                 "train": ("2008-01-01", "2014-12-31"),
                 "valid": ("2015-01-01", "2016-12-31"),
                 "test": ("2017-01-01", "2020-08-01"),
             }, }, }, }

 Training until validation scores aren’t improving.
 [20] train's l2: 0.990559 valid's l2: 0.994332
 [40] train's l2: 0.98687 valid's l2: 0.993702
 [60] train's l2: 0.984317 valid's l2: 0.993545
 [80] train's l2: 0.982236 valid's l2: 0.99341
 [100] train's l2: 0.980412 valid's l2: 0.993336
 [120] train's l2: 0.978542 valid's l2: 0.993242
 [140] train's l2: 0.9768 valid's l2: 0.993249
 [160] train's l2: 0.975069 valid's l2: 0.993324
 Early stopping, best iteration is:
 [119] train's l2: 0.978632 valid's l2: 0.993239

# prediction, backtest & analysis

 port_analysis_config = {
     "strategy": {
         "class": "TopkDropoutStrategy",
         "module_path": "qlib.contrib.strategy.strategy",
         "kwargs": { "topk": 50, "n_drop": 5, }, },
     "backtest": {
         "verbose": False, "limit_threshold": 0.095,"account": 100000000,
         "benchmark": benchmark,"deal_price": "close","open_cost": 0.0005,
         "close_cost": 0.0015,"min_cost": 5, },}

# backtest and analysis

 with R.start(experiment_name="backtest_analysis"):
     recorder = R.get_recorder(rid, experiment_name="train_model")
     model = recorder.load_object("trained_model")

# prediction

     recorder = R.get_recorder()
     ba_rid = recorder.id
     sr = SignalRecord(model, dataset, recorder)
     sr.generate()

# backtest & analysis

     par = PortAnaRecord(recorder, port_analysis_config)
     par.generate()

 'Prediction results of the LGBModel model.'
                          score
datetime   instrument          
2017-01-03 SH600000   -0.045209
           SH600008    0.005298
           SH600009    0.025725
           SH600010   -0.004527
           SH600015   -0.127682
 'Analysis results of the excess return without cost.'
                        risk
 mean               0.000708
 std                0.005626
 annualized_return  0.178316
 information_ratio  1.996555
 max_drawdown      -0.081806
 'Analysis results of the excess return with the cost.'
                        risk
 mean               0.000512
 std                0.005626
 annualized_return  0.128982
 information_ratio  1.444287
 max_drawdown      -0.091078

# Report – Portfolio Analysis – Backtest Return

analysis_position.report_graph(report_normal_df)

# risk analysis

analysis_position.risk_analysis_graph(analysis_df, report_normal_df)

# score IC

 pred_label = pd.concat([label_df, pred_df], axis=1, sort=True).reindex(label_df.index)
 analysis_position.score_ic_graph(pred_label)

# model performance – forecasting signal analysis

Cumulative Return of groups

Return Distribution:

To view the complete source code and graphs visit this Colab Notebook.

End Notes

Qlib has excellent documentation. Apart from auto quant workflow, Qlib also has provision for custom workflows which quant researchers can use and make their own.

Qlib provides a time series flat-file database. These databases are dedicated to scientific computing on finance data and performing better than typical data processing databases and time-series databases.

Access all our open Survey & Awards Nomination forms in one place