Microsoft Research has introduced a library based on AI quantitative investment named ‘qlib’ and represented by Xiao Yang, Weiqing Liu, Dong Zhou, Jiang Bian, and Tie-Yan Liu in their paper Qlib: An AI-oriented Quantitative Investment Platform. Qlib will allow users to easily try their ideas to create better Quant(Quantitative trading analysts) investment strategies.
AI in Finance has taken revolutionary turns with predicting stock price forecasting, analysis on trends, seasonality, irregularities. Machine learning and deep learning state of art models can analyse and make predictions of the futures trading market based on present scenario and historical data.
Quantitative investing is done by algorithmic trading and statistical modelling following research to analyse behaviours. Quantitative investment strategies are making use of complex tools with the advent of modern AI. Quant models are usually backtested, but even doing that, their actual applications and success rate are at constant market risks.
Qlib contains the full ML pipeline of data processing, model training, back-testing; and covers the entire auto workflow of quant investment. Other features include risk modelling, portfolio optimization, alpha seeking, and order execution. It is the first open-source platform that covers the workflow of a modern quantitative researcher in the age of AI. It aims to empower quantitative researchers with the true potential of machine learning in quantitative investment.
Modules & Workflow Framework:
Starting with the Data Server module which provides a data engine to query and process raw data. With retrieved data, build your dataset in the Data Enhancement module. The Model Creator module will learn models based on datasets. The Model Manager module can handle problems for modern quantitative researchers along with the Model Ensemble module. Portfolio Generator module is designed to generate a portfolio from trading signals output. The orders Executor module will then examine the performance of a strategy and Analyser modules will automatically analyze the results of trading signals, portfolio, and execution. Quantitative investment data is time-series data and updated timely. Thus Dynamic Modeling provides interfaces to handle such solutions.
Demo: https://terminalizer.com/view/3f24561a4470
Installation:
pip install pyqlib
Or
git clone https://github.com/microsoft/qlib.git && cd qlib python setup.py install
Load and Prepare Data: python scripts/get_data.py qlib_data --target_dir ~/.qlib/qlib_data/cn_data --region cn
Workflow Code from Qlib GitHub repo: Qlib makes use of its tool ‘qrun’ to automate the workflow( data loading, training, backtesting and drawing inferences using graphs).
# importing libraries
from qlib.contrib.model.gbdt import LGBModel from qlib.contrib.data.handler import Alpha158 from qlib.contrib.strategy.strategy import TopkDropoutStrategy from qlib.contrib.evaluate import ( backtest as normal_backtest, risk_analysis,) from qlib.utils import exists_qlib_data, init_instance_by_config from qlib.workflow import R from qlib.workflow.record_temp import SignalRecord, PortAnaRecord from qlib.utils import flatten_dict
# train model – Model used is LightGBM by fine-tuning with Qlib’s Hyperparameter Test Engine(HTE). Dataset used is Alpha158 (Qlib has another dataset Alpha 360)
data_handler_config = { "start_time": "2008-01-01","end_time": "2020-08-01", "fit_start_time": "2008-01-01","fit_end_time": "2014-12-31", "instruments": market,} task = { "model": { "class": "LGBModel","module_path": "qlib.contrib.model.gbdt", "kwargs": { "loss": "mse","colsample_bytree": 0.8879, "learning_rate": 0.0421,"subsample": 0.8789, "lambda_l1": 205.6999,"lambda_l2": 580.9768,"max_depth": 8, "num_leaves": 210,"num_threads": 20,},}, "dataset": { "class": "DatasetH", "module_path": "qlib.data.dataset", "kwargs": { "handler": { "class": "Alpha158", "module_path": "qlib.contrib.data.handler", "kwargs": data_handler_config, }, "segments": { "train": ("2008-01-01", "2014-12-31"), "valid": ("2015-01-01", "2016-12-31"), "test": ("2017-01-01", "2020-08-01"), }, }, }, }
Training until validation scores aren’t improving. [20] train's l2: 0.990559 valid's l2: 0.994332 [40] train's l2: 0.98687 valid's l2: 0.993702 [60] train's l2: 0.984317 valid's l2: 0.993545 [80] train's l2: 0.982236 valid's l2: 0.99341 [100] train's l2: 0.980412 valid's l2: 0.993336 [120] train's l2: 0.978542 valid's l2: 0.993242 [140] train's l2: 0.9768 valid's l2: 0.993249 [160] train's l2: 0.975069 valid's l2: 0.993324 Early stopping, best iteration is: [119] train's l2: 0.978632 valid's l2: 0.993239
# prediction, backtest & analysis
port_analysis_config = { "strategy": { "class": "TopkDropoutStrategy", "module_path": "qlib.contrib.strategy.strategy", "kwargs": { "topk": 50, "n_drop": 5, }, }, "backtest": { "verbose": False, "limit_threshold": 0.095,"account": 100000000, "benchmark": benchmark,"deal_price": "close","open_cost": 0.0005, "close_cost": 0.0015,"min_cost": 5, },}
# backtest and analysis
with R.start(experiment_name="backtest_analysis"): recorder = R.get_recorder(rid, experiment_name="train_model") model = recorder.load_object("trained_model")
# prediction
recorder = R.get_recorder() ba_rid = recorder.id sr = SignalRecord(model, dataset, recorder) sr.generate()
# backtest & analysis
par = PortAnaRecord(recorder, port_analysis_config) par.generate()
'Prediction results of the LGBModel model.' score datetime instrument 2017-01-03 SH600000 -0.045209 SH600008 0.005298 SH600009 0.025725 SH600010 -0.004527 SH600015 -0.127682 'Analysis results of the excess return without cost.' risk mean 0.000708 std 0.005626 annualized_return 0.178316 information_ratio 1.996555 max_drawdown -0.081806 'Analysis results of the excess return with the cost.' risk mean 0.000512 std 0.005626 annualized_return 0.128982 information_ratio 1.444287 max_drawdown -0.091078
# Report – Portfolio Analysis – Backtest Return
analysis_position.report_graph(report_normal_df)
# risk analysis
analysis_position.risk_analysis_graph(analysis_df, report_normal_df)
# score IC
pred_label = pd.concat([label_df, pred_df], axis=1, sort=True).reindex(label_df.index) analysis_position.score_ic_graph(pred_label)
# model performance – forecasting signal analysis
Cumulative Return of groups
Return Distribution:
To view the complete source code and graphs visit this Colab Notebook.
End Notes
Qlib has excellent documentation. Apart from auto quant workflow, Qlib also has provision for custom workflows which quant researchers can use and make their own.
Qlib provides a time series flat-file database. These databases are dedicated to scientific computing on finance data and performing better than typical data processing databases and time-series databases.