Stacking & Meta labeling

  • Credit card fraud detection
  • Quantitative fundamental trading

先从 Credit card fraud detection 引入 stacking 和 meta labeling 的概念,再嫁接到金融数据上


import numpy as np
import pandas as pd
'''Data Viz'''
import matplotlib.pyplot as plt
import seaborn as sns'seaborn')
plt.rcParams['figure.figsize'] = [16, 9]
plt.rcParams['figure.dpi'] = 300
plt.rcParams['font.size'] = 20
plt.rcParams['axes.labelsize'] = 16
plt.rcParams['axes.titlesize'] = 18
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams[''] = 'serif'
%matplotlib inline
'''Data Prep'''
from sklearn import preprocessing as pp
from scipy.stats import pearsonr
from sklearn.model_selection import train_test_split
from sklearn.model_selection import StratifiedKFold
from sklearn.metrics import log_loss, accuracy_score, f1_score
from sklearn.metrics import precision_recall_curve, average_precision_score
from sklearn.metrics import roc_curve, auc, roc_auc_score
from sklearn.metrics import confusion_matrix, classification_report
from sklearn.linear_model import LogisticRegression
import lightgbm as lgb
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense,Dropout
from tensorflow.keras.callbacks import ReduceLROnPlateau,EarlyStopping
from sklearn.model_selection import GridSearchCV

import warnings
def view(data, num = 5):
    print('The shape is', data.shape)
    return data.head(num).append(data.tail(num))
data_original = pd.read_csv('creditcard.csv')
The shape is (284807, 31)
Time V1 V2 V3 V4 V5 V6 V7 V8 V9 ... V21 V22 V23 V24 V25 V26 V27 V28 Amount Class
0 0.0 -1.359807 -0.072781 2.536347 1.378155 -0.338321 0.462388 0.239599 0.098698 0.363787 ... -0.018307 0.277838 -0.110474 0.066928 0.128539 -0.189115 0.133558 -0.021053 149.62 0
1 0.0 1.191857 0.266151 0.166480 0.448154 0.060018 -0.082361 -0.078803 0.085102 -0.255425 ... -0.225775 -0.638672 0.101288 -0.339846 0.167170 0.125895 -0.008983 0.014724 2.69 0
2 1.0 -1.358354 -1.340163 1.773209 0.379780 -0.503198 1.800499 0.791461 0.247676 -1.514654 ... 0.247998 0.771679 0.909412 -0.689281 -0.327642 -0.139097 -0.055353 -0.059752 378.66 0
3 1.0 -0.966272 -0.185226 1.792993 -0.863291 -0.010309 1.247203 0.237609 0.377436 -1.387024 ... -0.108300 0.005274 -0.190321 -1.175575 0.647376 -0.221929 0.062723 0.061458 123.50 0
4 2.0 -1.158233 0.877737 1.548718 0.403034 -0.407193 0.095921 0.592941 -0.270533 0.817739 ... -0.009431 0.798278 -0.137458 0.141267 -0.206010 0.502292 0.219422 0.215153 69.99 0
284802 172786.0 -11.881118 10.071785 -9.834783 -2.066656 -5.364473 -2.606837 -4.918215 7.305334 1.914428 ... 0.213454 0.111864 1.014480 -0.509348 1.436807 0.250034 0.943651 0.823731 0.77 0
284803 172787.0 -0.732789 -0.055080 2.035030 -0.738589 0.868229 1.058415 0.024330 0.294869 0.584800 ... 0.214205 0.924384 0.012463 -1.016226 -0.606624 -0.395255 0.068472 -0.053527 24.79 0
284804 172788.0 1.919565 -0.301254 -3.249640 -0.557828 2.630515 3.031260 -0.296827 0.708417 0.432454 ... 0.232045 0.578229 -0.037501 0.640134 0.265745 -0.087371 0.004455 -0.026561 67.88 0
284805 172788.0 -0.240440 0.530483 0.702510 0.689799 -0.377961 0.623708 -0.686180 0.679145 0.392087 ... 0.265245 0.800049 -0.163298 0.123205 -0.569159 0.546668 0.108821 0.104533 10.00 0
284806 172792.0 -0.533413 -0.189733 0.703337 -0.506271 -0.012546 -0.649617 1.577006 -0.414650 0.486180 ... 0.261057 0.643078 0.376777 0.008797 -0.473649 -0.818267 -0.002415 0.013649 217.00 0

10 rows × 31 columns

数据集含有 28 个匿名的特征,1个数量特征,1个时间特征,还有一个目标变量,也就是target。数据展示了两天内的交易结构,其中我们在284807条交易数据中有492个违约情况。特征匿名化是为了保护客户的隐私以及这些特征是来自于PCA降维后的结果。没有被PCA转换的是 amount 和 time。

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 284807 entries, 0 to 284806
Data columns (total 31 columns):
 #   Column  Non-Null Count   Dtype  
---  ------  --------------   -----  
 0   Time    284807 non-null  float64
 1   V1      284807 non-null  float64
 2   V2      284807 non-null  float64
 3   V3      284807 non-null  float64
 4   V4      284807 non-null  float64
 5   V5      284807 non-null  float64
 6   V6      284807 non-null  float64
 7   V7      284807 non-null  float64
 8   V8      284807 non-null  float64
 9   V9      284807 non-null  float64
 10  V10     284807 non-null  float64
 11  V11     284807 non-null  float64
 12  V12     284807 non-null  float64
 13  V13     284807 non-null  float64
 14  V14     284807 non-null  float64
 15  V15     284807 non-null  float64
 16  V16     284807 non-null  float64
 17  V17     284807 non-null  float64
 18  V18     284807 non-null  float64
 19  V19     284807 non-null  float64
 20  V20     284807 non-null  float64
 21  V21     284807 non-null  float64
 22  V22     284807 non-null  float64
 23  V23     284807 non-null  float64
 24  V24     284807 non-null  float64
 25  V25     284807 non-null  float64
 26  V26     284807 non-null  float64
 27  V27     284807 non-null  float64
 28  V28     284807 non-null  float64
 29  Amount  284807 non-null  float64
 30  Class   284807 non-null  int64  
dtypes: float64(30), int64(1)
memory usage: 67.4 MB
#see the cluster and corralation of features and classes
def plot_corr(data = data_original, target = data_original.Class):
    ax1 = data.corrwith(target) = (20, 10),
                                             title = "Correlation with class",
                                             fontsize = 18, color='r',
                                             rot = 45, 
                                             grid = True)
    cmap = sns.diverging_palette(220, 20, as_cmap=True)
    corr =data.corr()

对于meta labeling来说,我们需要一个类别不均衡的数据,所以precision和recall不会同时变得很高。

val_counts = data_original[['Class']].value_counts()
ax = sns.barplot(x=val_counts.index,
ax.set(title=f'Frequency Percentage by {val_counts}',
       ylabel='Frequency Percentage');


所以一般的算法会很容易欺骗我们,准确率往往会非常高,因为本来99.9%的数据就是 non-frauds,但是我们的目的是找到违约的情况,那么这个结果就是有欺骗性的,这就需要我们有更好的对于准确度的衡量标准!

  • accuracy
  • average_precision
  • areaUnderROC
  • precision
  • recall
  • f1-score
  • confusion_matrix
  • Precision-Recall Curve
  • Area Under the Curve (AUC)

那么先来简单介绍一下这些 metrics

1. Recall, Precision and AUC (Area Under The Curve) ROC


Recall & Precision


ROC 曲线的 x 轴是 FPR,y 轴是 TPR,表示 TPR 对于 FPR 的敏感程度。AUC 是 ROC 曲线下方的面积,表示 TPR 和 FPR 之间的权衡关系,越大也好。

不同于 precision-recall 曲线,ROC 曲线对于均衡数据更有效

Average precision

先来说一下 Precision-recall curve。在评估分类器的好坏时,我们一般分析的是在改变阈值时,precision和recall的变化情况。好的分类器的precision将会随着recall增加而增加。


那么,average precision 是一个具体的数,用来表示分类器的表现,也就是上图中曲线下面的面积,对于所有 recall 对应的 precision 进行积分。



$$\sum_{k=1}^{N}p(k)\Delta r(k)\\ ~~~~~ where~~ \Delta r(k) ~~is ~~the ~~change ~~in ~~recall$$

F1 score



其中,$\beta=1$ 则变成了 F1 score,F-score最理想的数值是趋近于1,做法是让precision和recall都有很高的值。若两者皆为1,使得 ${\displaystyle 2\cdot {\frac {1}{2}}=1}$,则F-score = 1 (100%),代表该算法有着最佳的精确度。

Confusion matrix


def metrics_summary(true_label, prediction_prob, Threshold=0.5):
    #basically, slearn provides all the functions for metrics.
    average_precision = average_precision_score(true_label
    fpr, tpr, thresholds = roc_curve(true_label, prediction_prob)
    areaUnderROC = auc(fpr, tpr)
    prediction_int = prediction_prob > Threshold
    accuracy = accuracy_score(true_label, prediction_int)
    print(f'accuracy: {accuracy}')
    print(f"average_precision: {average_precision}")
    print(f'areaUnderROC--AUC: {areaUnderROC } \n')
    print(' '*20, 'classification_report')
    print('*'*60, "\n")
    print(classification_report(true_label, prediction_int))
    print(' '*20, 'confusion_matrix \n')
    print('*'*60, "\n")
    display(confusion_matrix(true_label, prediction_int))
    # precision_recall_curve and areaUnderROC 
    precision, recall, thresholds = precision_recall_curve( \
                                true_label, prediction_int)
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(16,9))
    ax1.step(recall, precision, color='k', alpha=0.7, where='post')
    ax1.fill_between(recall, precision, step='post', 
    ax1.set_xlabel('Recall', fontname="Arial", fontsize=24)
    ax1.set_ylabel('Precision', fontname="Arial", fontsize=24) 
    ax1.set_title('Precision-Recall curve: Average Precision\
    ={0:0.2f}'.format(average_precision), fontsize=24,
    ax2.plot(fpr, tpr, color='r', lw=2, label='ROC curve')
    ax2.plot([0, 1], [0, 1], color='k', lw=2, linestyle='--')
    ax2.set_xlabel('False Positive Rate', fontname="Arial",
    ax2.set_ylabel('True Positive Rate', fontname="Arial",
    ax2.set_title('areaUnderROC = {0:0.2f}'\
            .format(areaUnderROC), fontsize=24, fontname="Arial",)    
    ax2.legend(loc="lower right", fontsize=24, fancybox=True) 
    # Adjust the subplot layout, because the logit one may take more space
    # than usual, due to y-tick labels like "1 - 10^{-3}"
    # plt.subplots_adjust(top=0.92, bottom=0.08, left=0.10,
    # right=0.95, hspace=0.25,wspace=0.35)



  1. 逻辑回归
  2. lightGBM
  3. DNN




#Normalize training and testing data
def scale_data(x_train, x_test=None):   
    features_to_scale = x_train.copy().columns
    scaler = pp.StandardScaler()
    x_train.loc[:, features_to_scale] = \
    #normalize test dataset with the mean and std of train data set 
    x_test.loc[:, features_to_scale] = \
    return x_train, x_test
#seperate input and labels    
def get_x_y(data=data_original):
    data_x = data.copy().drop(['Class', 'Time'], axis=1)
    data_y = data['Class'].copy()
    return data_x, data_y
#split the train and test data
def data_split(data_x, data_y):
    x_train, x_test, y_train, y_test = \
    # stratify 针对不均衡数据
    return  x_train, x_test, y_train, y_test
#put all together
def data_process(data=data_original):
    data_x, data_y = get_x_y(data)
    x_train, x_test, y_train, y_test \
    = data_split(data_x, data_y)
    #do not touch the test data by any means!!!!
    x_train, x_test = scale_data(x_train, x_test)
    return  x_train, x_test, y_train, y_test
x_train, x_test_original, y_train, y_test_original \
= data_process(data_original)
x_train.shape, x_test_original.shape, \
y_train.shape, y_test_original.shape
StandardScaler(copy=True, with_mean=True, with_std=True)
((213605, 29), (71202, 29), (213605,), (71202,))
print(f'No. of fraud in test dataset:\
No. of fraud in test dataset:      123


def build_model_1(x_train, y_train):
    # parameters to be tuned
    logitreg_parameters = {'C': np.power(10.0, np.arange(-9, 1)),
                           'solver' : ('lbfgs', 'liblinear') }
    model_1 = LogisticRegression(#solver='liblinear',
    #uses the values of y to automatically adjust weights
    #reuse the solution of the previous call to fit
    # as initialization
                                 max_iter = 300,
    #Maximum number of iterations taken for the solvers to converge.
    #so results can be reproduced
    logitreg_grid = GridSearchCV(model_1, param_grid = \
           logitreg_parameters,scoring = 'f1', n_jobs = 1, cv=5)          
                        , y_train)
    return logitreg_grid

model_1 = build_model_1(x_train, y_train)
LogisticRegression(C=1e-06, class_weight='balanced', dual=False,
                   fit_intercept=True, intercept_scaling=1, l1_ratio=None,
                   max_iter=300, multi_class='auto', n_jobs=None, penalty='l2',
                   random_state=2020, solver='lbfgs', tol=0.0001, verbose=0,
# 0 and 1 two clasese
y_pred_prob_test_1 = model_1.predict_proba(x_test_original)[:,1]
# number of fraud is 123 in test dataset
Threshold = 0.5
y_pred_int_test_1 = y_pred_prob_test_1 > Threshold
False    71075
True       127
dtype: int64
metrics_summary(y_test_original, y_pred_int_test_1)
accuracy: 0.9993820398303418
average_precision: 0.6794307533509736
areaUnderROC--AUC: 0.9185303607562729 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.81      0.84      0.82       123

    accuracy                           1.00     71202
   macro avg       0.91      0.92      0.91     71202
weighted avg       1.00      1.00      1.00     71202



array([[71055,    24],
       [   20,   103]], dtype=int64)


#prepare data 
x_train_, x_cv, y_train_, y_cv = \
train_test_split(x_train, y_train,

def build_model_2(x_train, y_train, x_cv, y_cv ):
    #most of the parsmeters are default
    params_lightGB = {
    'task': 'train',
    'boosting': 'gbdt',
    'objective': 'binary',
    'metric': 'binary_logloss',
    'num_leaves': 10,
    'learning_rate': 0.01,
    'feature_fraction': 1.0,
    'bagging_fraction': 1.0,
    'bagging_freq': 0,
    'bagging_seed': 2018,
    'verbose': -1,
    lgb_train = lgb.Dataset(x_train, y_train)
    lgb_eval = lgb.Dataset(x_cv, y_cv, reference=lgb_train)
    model_2 = lgb.train(params_lightGB, lgb_train,
    return model_2
x_train_.shape, y_train_.shape, x_cv.shape, y_cv.shape
((160203, 29), (160203,), (53402, 29), (53402,))
model_2 = build_model_2(x_train_, y_train_, x_cv, y_cv)
y_pred_prob_test_2 = model_2.predict(x_test_original)
y_pred_int_test_2 = y_pred_prob_test_2 > Threshold
[LightGBM] [Warning] objective is set=binary, application=binary will be ignored. Current value: objective=binary
False    71099
True       103
dtype: int64
In [18]:
metrics_summary(y_test_original, y_pred_int_test_2)
accuracy: 0.999522485323446
average_precision: 0.7278241471837373
areaUnderROC--AUC: 0.890194661453642 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.93      0.78      0.85       123

    accuracy                           1.00     71202
   macro avg       0.97      0.89      0.92     71202
weighted avg       1.00      1.00      1.00     71202



array([[71072,     7],
       [   27,    96]], dtype=int64)

可以看出,accuracy 非常高,这是我们所预料的,但是 recall = 77% 相对一般


callbacks = [EarlyStopping(monitor='loss', patience=3), \
                 ReduceLROnPlateau(monitor='val_loss', factor=0.2, \
                                   patience=3, min_lr=0.001)]
def build_model_3(x_train, y_train, x_cv, y_cv, input_dim=29): 
    model_3 = Sequential([
                Dense(input_dim = input_dim, units = 32, 
                      activation  = 'relu'),
                Dense(units = 16, activation =  'relu'),
                Dense(units = 8, activation =  'relu'),
                # Dense(units = 4, activation =  'relu'),
                Dense(units =1, activation = 'sigmoid'),])
    model_3.compile(optimizer = 'adam', 
                 loss = 'binary_crossentropy', 
                 metrics = ['accuracy']), y_train, 
              validation_data = (x_cv, y_cv),
              batch_size = 64, 
              epochs = 50,
    return model_3
model_3 = build_model_3(x_train_, y_train_, \
x_cv, y_cv, input_dim=29)
y_pred_prob_test_3 = model_3.predict(x_test_original)
y_pred_int_test_3 = y_pred_prob_test_3 > Threshold
Epoch 1/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.0589 - accuracy: 0.9918 - val_loss: 0.0042 - val_accuracy: 0.9993
Epoch 2/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0052 - accuracy: 0.9990 - val_loss: 0.0038 - val_accuracy: 0.9994
Epoch 3/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0046 - accuracy: 0.9989 - val_loss: 0.0033 - val_accuracy: 0.9993
Epoch 4/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0034 - accuracy: 0.9992 - val_loss: 0.0030 - val_accuracy: 0.9994
Epoch 5/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0034 - accuracy: 0.9992 - val_loss: 0.0029 - val_accuracy: 0.9994
Epoch 6/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0031 - accuracy: 0.9992 - val_loss: 0.0032 - val_accuracy: 0.9993
Epoch 7/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0030 - accuracy: 0.9992 - val_loss: 0.0030 - val_accuracy: 0.9994
Epoch 8/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0024 - accuracy: 0.9994 - val_loss: 0.0030 - val_accuracy: 0.9994
Epoch 9/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0026 - accuracy: 0.9993 - val_loss: 0.0032 - val_accuracy: 0.9994
Epoch 10/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0026 - accuracy: 0.9992 - val_loss: 0.0030 - val_accuracy: 0.9995
Epoch 11/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0022 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9993
Epoch 12/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0028 - accuracy: 0.9992 - val_loss: 0.0038 - val_accuracy: 0.9995
Epoch 13/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0021 - accuracy: 0.9993 - val_loss: 0.0035 - val_accuracy: 0.9994
Epoch 14/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0021 - accuracy: 0.9993 - val_loss: 0.0033 - val_accuracy: 0.9994
Epoch 15/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0022 - accuracy: 0.9993 - val_loss: 0.0033 - val_accuracy: 0.9993
Epoch 16/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0023 - accuracy: 0.9992 - val_loss: 0.0033 - val_accuracy: 0.9993
Epoch 17/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0022 - accuracy: 0.9993 - val_loss: 0.0036 - val_accuracy: 0.9993
Epoch 18/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0020 - accuracy: 0.9994 - val_loss: 0.0034 - val_accuracy: 0.9994
Epoch 19/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0019 - accuracy: 0.9993 - val_loss: 0.0034 - val_accuracy: 0.9994
Epoch 20/50
2504/2504 [==============================] - 4s 2ms/step - loss: 0.0020 - accuracy: 0.9993 - val_loss: 0.0036 - val_accuracy: 0.9993
Epoch 21/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.0020 - accuracy: 0.9993 - val_loss: 0.0039 - val_accuracy: 0.9994
Epoch 22/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.0017 - accuracy: 0.9995 - val_loss: 0.0040 - val_accuracy: 0.9994
Epoch 23/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0017 - accuracy: 0.9993 - val_loss: 0.0038 - val_accuracy: 0.9994
Epoch 24/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0018 - accuracy: 0.9995 - val_loss: 0.0042 - val_accuracy: 0.9994
Epoch 25/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0015 - accuracy: 0.9994 - val_loss: 0.0038 - val_accuracy: 0.9994
Epoch 26/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0018 - accuracy: 0.9994 - val_loss: 0.0042 - val_accuracy: 0.9993
Epoch 27/50
2504/2504 [==============================] - 4s 2ms/step - loss: 0.0015 - accuracy: 0.9994 - val_loss: 0.0047 - val_accuracy: 0.9994
False    71091
True       111
dtype: int64
metrics_summary(y_test_original, y_pred_int_test_3)
accuracy: 0.9994943962248252
average_precision: 0.7182012748525336
areaUnderROC--AUC: 0.9023546112724454 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.89      0.80      0.85       123

    accuracy                           1.00     71202
   macro avg       0.95      0.90      0.92     71202
weighted avg       1.00      1.00      1.00     71202



array([[71067,    12],
       [   24,    99]], dtype=int64)



其实 stacking 就像是向原始数据中加入新的特征,并且新的特征来自于一级模型的预测,那么先来做一下特征工程,把所有数据堆叠一下。

def data_stack( x, y, m_1=model_1, m_2=model_2, m_3=model_3):
    # All required parameters must be placed before any 
    # default arguments.
    x: features
    y: labels
    m_1, m_2, m_3: 3 models
    # build a container to hold all the prediction from 3 models
    pred_all = pd.DataFrame(data=[], index=y.index)
    pred_1 = m_1.predict_proba(x)[:,1]
    pred_1_df = pd.DataFrame(pred_1, index=y.index)
    pred_2 = m_2.predict(x,num_iteration=m_2.best_iteration)                
    pred_2_df = pd.DataFrame(pred_2, index=y.index)
    pred_3 = m_3.predict(x).reshape(x.shape[0]) #to 1D shape
    pred_3_df = pd.DataFrame(pred_3, index=y.index)
   # join all the predictions together
    pred_all = pred_all.join(pred_1_df.astype(float),
    pred_all.columns = ['pred_1', 'pred_2','pred_3']
    # final training data will be the merge of training data 
    # and all the predictions
    x_pred = x.merge(pred_all, \
                    left_index=True, right_index=True)
    return x_pred
x_train_stack = data_stack(x_train, y_train)
(213605, 32)
x_test_stack = data_stack(x_test_original, y_test_original)
(71202, 32)
In [25]:

我们发现一级模型得到的 prediction 是高度相关的,这并不意外。1st model 的信息泄露到 2nd model 因为他们共享了一样的训练数据。只要测试集是原封不动的,我们反而更愿意有更多的信息涌入 2nd model 从而有更好的预测结果。


#normalize training and testing data
x_train_stack, x_test_stack = scale_data(x_train_stack,  x_test_stack)
#split the traning data to train and validation
x_train_stack_, x_cv_stack, y_train_, y_cv_ = \
train_test_split(x_train_stack, y_train,
#stratify mean samplling with the ratio of each class percentage in #all data.
x_train_stack_.shape, x_cv_stack.shape, y_train_.shape,  y_cv.shape
StandardScaler(copy=True, with_mean=True, with_std=True)
((160203, 32), (53402, 32), (160203,), (53402,))



Model 2 (lightBGM) as the secondary model

model_2_stack = build_model_2(x_train_stack_, y_train_, x_cv_stack, y_cv_)
y_pred_prob_test_2_stack = model_2_stack.predict(x_test_stack)
y_pred_int_test_2_stack = y_pred_prob_test_2_stack > Threshold
[LightGBM] [Warning] objective is set=binary, application=binary will be ignored. Current value: objective=binary
False    71080
True       122
dtype: int64
In [28]:
metrics_summary(y_test_original, y_pred_int_test_2_stack)
accuracy: 0.9994522625768939
average_precision: 0.7072647641036279
areaUnderROC--AUC: 0.918565532888689 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.84      0.84      0.84       123

    accuracy                           1.00     71202
   macro avg       0.92      0.92      0.92     71202
weighted avg       1.00      1.00      1.00     71202



array([[71060,    19],
       [   20,   103]], dtype=int64)

发现 recall 从 0.77 提升到 0.85,在 F1 score 和 average precision 也有一定提高

Model 3 DNN as the secondary model

model_3_stack = build_model_3(x_train_stack_, y_train_, \
                        x_cv_stack, y_cv_, input_dim=32)
y_pred_prob_test_3_stack = model_3_stack.predict(x_test_stack)
y_pred_int_test_3_stack = y_pred_prob_test_3_stack > Threshold
Epoch 1/50
2504/2504 [==============================] - 4s 2ms/step - loss: 0.0854 - accuracy: 0.9905 - val_loss: 0.0062 - val_accuracy: 0.9994
Epoch 2/50
2504/2504 [==============================] - 4s 2ms/step - loss: 0.0013 - accuracy: 0.9996 - val_loss: 0.0075 - val_accuracy: 0.9994
Epoch 3/50
2504/2504 [==============================] - 4s 1ms/step - loss: 8.7023e-04 - accuracy: 0.9996 - val_loss: 0.0108 - val_accuracy: 0.9994
Epoch 4/50
2504/2504 [==============================] - 4s 1ms/step - loss: 7.4676e-04 - accuracy: 0.9996 - val_loss: 0.0134 - val_accuracy: 0.9994
Epoch 5/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.2790e-04 - accuracy: 0.9998 - val_loss: 0.0173 - val_accuracy: 0.9994
Epoch 6/50
2504/2504 [==============================] - 4s 1ms/step - loss: 7.0444e-04 - accuracy: 0.9997 - val_loss: 0.0176 - val_accuracy: 0.9995
Epoch 7/50
2504/2504 [==============================] - 4s 1ms/step - loss: 5.0248e-04 - accuracy: 0.9998 - val_loss: 0.0173 - val_accuracy: 0.9994
Epoch 8/50
2504/2504 [==============================] - 3s 1ms/step - loss: 5.9450e-04 - accuracy: 0.9997 - val_loss: 0.0203 - val_accuracy: 0.9994
Epoch 9/50
2504/2504 [==============================] - 4s 1ms/step - loss: 6.6983e-04 - accuracy: 0.9997 - val_loss: 0.0220 - val_accuracy: 0.9994
Epoch 10/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.0335e-04 - accuracy: 0.9996 - val_loss: 0.0248 - val_accuracy: 0.9994
Epoch 11/50
2504/2504 [==============================] - 3s 1ms/step - loss: 4.5892e-04 - accuracy: 0.9998 - val_loss: 0.0227 - val_accuracy: 0.9994
Epoch 12/50
2504/2504 [==============================] - 4s 2ms/step - loss: 5.6076e-04 - accuracy: 0.9996 - val_loss: 0.0222 - val_accuracy: 0.9994
Epoch 13/50
2504/2504 [==============================] - 4s 1ms/step - loss: 4.2247e-04 - accuracy: 0.9998 - val_loss: 0.0263 - val_accuracy: 0.9994
Epoch 14/50
2504/2504 [==============================] - 4s 1ms/step - loss: 4.1276e-04 - accuracy: 0.9998 - val_loss: 0.0238 - val_accuracy: 0.9994
False    71091
True       111
dtype: int64
metrics_summary(y_test_original, y_pred_int_test_3_stack)
accuracy: 0.999522485323446
average_precision: 0.7327627814641402
areaUnderROC--AUC: 0.9064266863493351 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.90      0.81      0.85       123

    accuracy                           1.00     71202
   macro avg       0.95      0.91      0.93     71202
weighted avg       1.00      1.00      1.00     71202



array([[71068,    11],
       [   23,   100]], dtype=int64)

precision 增加了,但是 recall 降低了

Model 1 logistic regression as the secondary model

model_1_stack = build_model_1(x_train_stack, y_train)


y_pred_prob_test_1_stack = model_1_stack.predict_proba(x_test_stack)[:,1]# 0 and 1 two clases
y_pred_int_test_1_stack = y_pred_prob_test_1_stack > Threshold
False    71087
True       115
dtype: int64
metrics_summary(y_test_original, y_pred_int_test_1_stack)
accuracy: 0.9994943962248252
average_precision: 0.7214825396465117
areaUnderROC--AUC: 0.9104706237202921 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.88      0.82      0.85       123

    accuracy                           1.00     71202
   macro avg       0.94      0.91      0.92     71202
weighted avg       1.00      1.00      1.00     71202



array([[71065,    14],
       [   22,   101]], dtype=int64)

多多少少在 stack 之后比之前有了一些提升,而且也不必 lightGBM 和 DNN 差


由于 meta labeling 需要对输入数据和label都要添加新的特征,需要对之前的代码做一些调整

def data_meta(id, x, y, model):
    #get prediction from model 1
    Threshold = 0.5
    pred_prob_meta = model.predict_proba(x)[:,1]
    pred_prob_meta = pd.Series(pred_prob_meta, \
    pred_int_meta = pred_prob_meta > Threshold
    y_meta = pd.Series(y & pred_int_meta, name=f'y_train_meta_{id}')
    x_meta = x.join(pred_int_meta)
    return x_meta, y_meta

1st model: logreg and 2nd model: lightBGM

In [35]:
x_train_meta_1, y_train_meta_1 = \
data_meta(1, x_train, y_train, model_1)
x_train_meta_1.shape, y_train_meta_1.shape
((213605, 30), (213605,))
plot_corr(x_train_meta_1, y_train_meta_1);

看得出,从一级模型而来的预测值和 label 有相当强的相关性

In [37]:
x_test_meta_1, y_test_meta_1 = \
data_meta(1, x_test_original, y_test_original, model_1)
x_test_meta_1.shape, y_test_meta_1.shape
((71202, 30), (71202,))
x_train_meta_1, x_test_meta_1 = scale_data( \
                                x_train_meta_1, x_test_meta_1)
StandardScaler(copy=True, with_mean=True, with_std=True)
x_train_meta_1_, x_cv_meta_1, y_train_meta_1_, y_cv_meta_1 = \
train_test_split(x_train_meta_1, y_train_meta_1,
#stratify mean samplling with the ratio of each class percentage in #all data.
x_train_meta_1_.shape, x_cv_meta_1.shape, y_train_meta_1_.shape,  y_cv_meta_1.shape
((160203, 30), (53402, 30), (160203,), (53402,))
model_2_meta_1 = build_model_2( \
    x_train_meta_1_, y_train_meta_1_, x_cv_meta_1, y_cv_meta_1)
y_pred_prob_test_2_meta_1 = model_2_meta_1.predict(x_test_meta_1)
y_pred_int_test_2_meta_1 = y_pred_prob_test_2_meta_1 > Threshold
[LightGBM] [Warning] objective is set=binary, application=binary will be ignored. Current value: objective=binary
False    71097
True       105
dtype: int64

在我们从 meta model 得到预测值之后,我们把这个结果和一级模型合并起来

In [41]:
final_pred_2_meta_1 = y_pred_int_test_2_meta_1 &  y_pred_int_test_1
False    71097
True       105
dtype: int64
In [42]:
metrics_summary(y_test_original, final_pred_2_meta_1)
accuracy: 0.9996348417179293
average_precision: 0.7901657357952433
areaUnderROC--AUC: 0.9105409679851241 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.96      0.82      0.89       123

    accuracy                           1.00     71202
   macro avg       0.98      0.91      0.94     71202
weighted avg       1.00      1.00      1.00     71202



array([[71075,     4],
       [   22,   101]], dtype=int64)

虽然总体看起来没什么不同,但是拆开成不同的指标来看,可以看出 precision 和 recall 更均衡了

1st model: logreg and 2nd model: DNN

#if you receive an error message, try to run the data process again.
model_3_meta_1 = build_model_3( \
    x_train_meta_1_, y_train_meta_1_, \
    x_cv_meta_1, y_cv_meta_1, input_dim=30)
y_pred_prob_test_3_meta_1 = model_3_meta_1.predict(x_test_meta_1)
y_pred_int_test_3_meta_1 = y_pred_prob_test_3_meta_1 > Threshold
Epoch 1/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.1526 - accuracy: 0.9269 - val_loss: 6.9066e-04 - val_accuracy: 0.9999
Epoch 2/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0016 - accuracy: 0.9996 - val_loss: 7.1099e-04 - val_accuracy: 0.9999
Epoch 3/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0015 - accuracy: 0.9995 - val_loss: 6.3856e-04 - val_accuracy: 0.9999
Epoch 4/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0011 - accuracy: 0.9996 - val_loss: 9.6105e-04 - val_accuracy: 0.9995
Epoch 5/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0011 - accuracy: 0.9996 - val_loss: 7.7047e-04 - val_accuracy: 0.9998
Epoch 6/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0014 - accuracy: 0.9995 - val_loss: 6.4772e-04 - val_accuracy: 0.9999
Epoch 7/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.0011 - accuracy: 0.9995 - val_loss: 9.2610e-04 - val_accuracy: 0.9995
Epoch 8/50
2504/2504 [==============================] - 4s 2ms/step - loss: 0.0013 - accuracy: 0.9996 - val_loss: 8.2152e-04 - val_accuracy: 0.9997
Epoch 9/50
2504/2504 [==============================] - 3s 1ms/step - loss: 8.9805e-04 - accuracy: 0.9997 - val_loss: 5.9031e-04 - val_accuracy: 0.9998
Epoch 10/50
2504/2504 [==============================] - 3s 1ms/step - loss: 7.6331e-04 - accuracy: 0.9997 - val_loss: 6.7479e-04 - val_accuracy: 0.9999
Epoch 11/50
2504/2504 [==============================] - 3s 1ms/step - loss: 7.1836e-04 - accuracy: 0.9997 - val_loss: 7.4169e-04 - val_accuracy: 0.9999
Epoch 12/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0011 - accuracy: 0.9996 - val_loss: 7.8806e-04 - val_accuracy: 0.9998
Epoch 13/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0011 - accuracy: 0.9997 - val_loss: 4.9516e-04 - val_accuracy: 0.9999
Epoch 14/50
2504/2504 [==============================] - 3s 1ms/step - loss: 9.1622e-04 - accuracy: 0.9997 - val_loss: 4.2813e-04 - val_accuracy: 0.9999
Epoch 15/50
2504/2504 [==============================] - 3s 1ms/step - loss: 8.0975e-04 - accuracy: 0.9997 - val_loss: 3.3976e-04 - val_accuracy: 0.9999
False    71092
True       110
dtype: int64
# combine the  meta prediction with primary prediction
final_pred_3_meta_1 = y_pred_int_test_3_meta_1.flatten() & y_pred_int_test_1
metrics_summary(y_test_original, final_pred_3_meta_1)
accuracy: 0.9995084407741356
average_precision: 0.724727313085889
areaUnderROC--AUC: 0.9023616456989286 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.90      0.80      0.85       123

    accuracy                           1.00     71202
   macro avg       0.95      0.90      0.92     71202
weighted avg       1.00      1.00      1.00     71202



array([[71068,    11],
       [   24,    99]], dtype=int64)

貌似 precision 更高了,但代价是 recall 降低了一些

3. 1st model: logreg + lightBGM and 2nd model: DNN

In [46]:
def data_meta_2(id, x, y, m_1, m_2):
    id: the id of new columns
    x: input features
    y: labels
    m_1: model 1, here logreg
    m_2: model 2
    pred_prob_meta_1 = m_1.predict_proba(x)[:,1]
    pred_prob_meta_1 = pd.Series(pred_prob_meta_1, \
    pred_int_meta_1 = pred_prob_meta_1 > Threshold
    pred_prob_meta_2 = m_2.predict(x)
    #as DNN give 2D prediction that needs to be flatten to 1D for
    pred_prob_meta_2 = pd.Series(pred_prob_meta_2.flatten(), \
    pred_int_meta_2 = pred_prob_meta_2 > Threshold
    y_meta = pd.Series(y & pred_int_meta_1 & pred_int_meta_2, \
    x_meta = x.join(pred_int_meta_1).join(pred_int_meta_2)
    return x_meta, y_meta

#meta_1_2: meta data from 1 model and 2 model
x_train_meta_1_2, y_train_meta_1_2 = \
data_meta_2(1, x_train, y_train, model_1, model_2)
x_test_meta_1_2, y_test_meta_1_2 = \
data_meta_2(1, x_test_original, y_test_original, model_1, model_2)

x_train_meta_1_2, x_test_meta_1_2 = \
scale_data(x_train_meta_1_2, x_test_meta_1_2)

x_train_meta_1_2_, x_cv_meta_1_2, y_train_meta_1_2_, y_cv_meta_1_2 = \
train_test_split(x_train_meta_1_2, y_train_meta_1_2,
#stratify mean samplling with the ratio of each class percentage in #all data.
x_train_meta_1_2_.shape, x_cv_meta_1_2.shape, \
y_train_meta_1_2_.shape,  y_cv_meta_1_2.shape
StandardScaler(copy=True, with_mean=True, with_std=True)
((160203, 31), (53402, 31), (160203,), (53402,))
model_3_meta_1_2 = build_model_3( \
    x_train_meta_1_2_, y_train_meta_1_2_, \
    x_cv_meta_1_2, y_cv_meta_1_2, input_dim=31)
y_pred_prob_test_3_meta_1_2 = model_3_meta_1_2.predict(x_test_meta_1_2)
y_pred_int_test_3_meta_1_2 = y_pred_prob_test_3_meta_1_2 > Threshold
Epoch 1/50
2504/2504 [==============================] - 4s 1ms/step - loss: 0.0786 - accuracy: 0.9808 - val_loss: 9.4593e-04 - val_accuracy: 0.9998
Epoch 2/50
2504/2504 [==============================] - 3s 1ms/step - loss: 0.0013 - accuracy: 0.9993 - val_loss: 3.9711e-04 - val_accuracy: 0.9999
Epoch 3/50
2504/2504 [==============================] - 3s 1ms/step - loss: 9.4632e-04 - accuracy: 0.9995 - val_loss: 3.2985e-04 - val_accuracy: 0.9999
Epoch 4/50
2504/2504 [==============================] - 3s 1ms/step - loss: 8.1926e-04 - accuracy: 0.9996 - val_loss: 3.5339e-04 - val_accuracy: 0.9999
Epoch 5/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.5769e-04 - accuracy: 0.9996 - val_loss: 3.6725e-04 - val_accuracy: 0.9999
Epoch 6/50
2504/2504 [==============================] - 3s 1ms/step - loss: 4.6654e-04 - accuracy: 0.9997 - val_loss: 2.6634e-04 - val_accuracy: 0.9999
Epoch 7/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.2621e-04 - accuracy: 0.9997 - val_loss: 3.1632e-04 - val_accuracy: 0.9999
Epoch 8/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.7498e-04 - accuracy: 0.9997 - val_loss: 2.9157e-04 - val_accuracy: 0.9999
Epoch 9/50
2504/2504 [==============================] - 3s 1ms/step - loss: 6.6406e-04 - accuracy: 0.9997 - val_loss: 2.7021e-04 - val_accuracy: 0.9999
False    71102
True       100
dtype: int64
# combine the  meta prediction with primary prediction
final_pred_3_meta_1_2 = \
y_pred_int_test_3_meta_1_2.flatten() & \
y_pred_int_test_1 & y_pred_int_test_2
False    71102
True       100
dtype: int64


metrics_summary(y_test_original, y_pred_int_test_3_meta_1_2)
accuracy: 0.9995365298727564
average_precision: 0.7341330847790655
areaUnderROC--AUC: 0.8861436896562018 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.95      0.77      0.85       123

    accuracy                           1.00     71202
   macro avg       0.97      0.89      0.93     71202
weighted avg       1.00      1.00      1.00     71202



array([[71074,     5],
       [   28,    95]], dtype=int64)

综合起来还是不错的,但是 recall 还是稍微低了一点

4. 1st model: logreg + DNN and 2nd model: lightBGM

由于看起来 lightGBM 作为 2nd model 会更好一些,那么再来试一试

#meta_1_3: meta data from 1 model and 2 model 
#process the train dataset
x_train_meta_1_3, y_train_meta_1_3 = \
data_meta_2(1, x_train, y_train, model_1, model_3)
#meta_1_3: meta data from 1st model and 3rd model 
#process the test dataset
x_test_meta_1_3, y_test_meta_1_3 = \
data_meta_2(1, x_test_original, y_test_original, model_1, model_3)
#normalize the dataset
x_train_meta_1_3, x_test_meta_1_3 = \
scale_data(x_train_meta_1_3, x_test_meta_1_3)
#do a train, validation split
x_train_meta_1_3_, x_cv_meta_1_3, y_train_meta_1_3_, y_cv_meta_1_3 = \
train_test_split(x_train_meta_1_3, y_train_meta_1_3,
StandardScaler(copy=True, with_mean=True, with_std=True)
model_2_meta_1_3 = build_model_2( \
    x_train_meta_1_3_, y_train_meta_1_3_, \
    x_cv_meta_1_3, y_cv_meta_1_3)
[LightGBM] [Warning] objective is set=binary, application=binary will be ignored. Current value: objective=binary
y_pred_prob_test_2_meta_1_3 = model_2_meta_1_3.predict(x_test_meta_1_3)
y_pred_int_test_2_meta_1_3 = y_pred_prob_test_2_meta_1_3 > Threshold
# combine the  meta prediction with primary prediction
final_pred_2_meta_1_3 = \
y_pred_int_test_2_meta_1_3 & \
y_pred_int_test_1 & y_pred_int_test_3.flatten()
False    71100
True       102
dtype: int64
metrics_summary(y_test_original, final_pred_2_meta_1_3)
accuracy: 0.9995646189713772
average_precision: 0.7503253049423608
areaUnderROC--AUC: 0.8942737709570148 


              precision    recall  f1-score   support

           0       1.00      1.00      1.00     71079
           1       0.95      0.79      0.86       123

    accuracy                           1.00     71202
   macro avg       0.98      0.89      0.93     71202
weighted avg       1.00      1.00      1.00     71202



array([[71074,     5],
       [   26,    97]], dtype=int64)

可以看到 precision 达到 93%,recall 达到 81%,比之前的都好!


我们知道,stacking 和 meta labeling 有点像特征工程一样对于之前的训练集加入新的特征。但是跟原来的特征相比,新增的特征真的会更重要么?这值得分析,scikit-learn 的 feature_importance 函数可以帮我们来了解那些特征的重要性。

def plot_feature_importance(model, X , importance_type = 'split'):
    # split就是特征在所有决策树中被用来分割的总次数。
    # gain就是特征在所有决策树种被用来分割后带来的增益(gain)总和
    feature_imp = pd.DataFrame({'Value':model.
    f, ax = plt.subplots(figsize=(40, 30))
    ax.set_title(f'LightGBM Features Importance by {importance_type}', fontsize=75, fontname="Arial")     
    ax.set_xlabel('Features', fontname="Arial", fontsize=70)
    ax.set_ylabel('Importance', fontname="Arial", fontsize=70)  
    sns.barplot(x="Value", y="Feature",
                ascending=False), ax=ax)
In [55]:
plot_feature_importance(model_2_meta_1_3, x_train_meta_1_3_)
In [56]:
plot_feature_importance(model_2_meta_1_3, x_train_meta_1_3_, 'gain')

可以看到,两种情况都显示 meta data 远比原来的特征更重要

plot_feature_importance(model_2_meta_1, x_train_meta_1_)
In [58]:
plot_feature_importance(model_2_meta_1, x_train_meta_1_, 'gain')



如果存在从测试集到训练集的信息泄露情况的话,那这种影响会被放大,DNN 善于利用这一弊病而得到高分。

Index(['Time', 'V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7', 'V8', 'V9', 'V10',
       'V11', 'V12', 'V13', 'V14', 'V15', 'V16', 'V17', 'V18', 'V19', 'V20',
       'V21', 'V22', 'V23', 'V24', 'V25', 'V26', 'V27', 'V28', 'Amount'],
#normalize all the data in one go.

features_to_scale = data_original.columns[1:-1]
scaler = pp.StandardScaler()
data_original.loc[:, features_to_scale] = scaler.fit_transform(data_original[features_to_scale])
#split training and testing dataset afterwards.
x_train_cv, x_test, y_train_cv, y_test \
= train_test_split(data_original.loc[:, features_to_scale], data_original.Class, test_size=0.25,\



确定基础标签 ybase:用〖三隔栏方法〗一贴介绍的方法

  1. 当 ybase = 1 时,止盈隔栏先被触及
  2. 当 ybase = -1 时,止损隔栏先被触及
  3. 当 ybase = 0 时,垂直隔栏先被触及

确定元标签 ymeta:即是否按着头寸方向交易

  1. 当 ybase = 1 并且 rtrue > c 而触发止盈时,设置ymeta = 1;当 ybase = -1 并且 rtrue < -c 而触发止损时,设置 ymeta = 1
  2. 其他情况统统设置 ymeta = 0。





  • 机器学习模型

  • 计量经济学公式

  • 基本面分析

  • 技术分析

  • 人主观看法


  1. 提升了模型的可解读性。先通过简单模型(如基本面或者人的看法)来确定头寸方向,随后再使用复杂模型(如机器学习模型)
  2. 限制了过拟合。在元标签之后,复杂模型将只决定头寸大小而非方向
  3. 头寸方向和头寸大小的分解允许我们先简后繁。例如我们可以使用复杂模型分别对多头和空头进行专门训练确定头寸大小


综上,最好的结果是用 lightGBM 的元标签法作为次级模型,逻辑回归和DNN作为一级模型。


而预测该类的概率是任何机器学习的分类模型的副产品,在 scikit-learn 中,用 predict_proba() 可以得到预测概率。

假设我们用随机森林预测出概率为 p,在实际交易中,一种决策可以是

  • p < 55%,不要做多
  • p ∈ [55%, 60%],用 50% 资金做多
  • p > 60%,用 100% 资金做多

当初级模型用主观看法,而次级模型用客观数据,这种投资方法称为量化基本面投资(Quantitative Fundamental, Quantamental)

Quantamental refers to an investment strategy that combines quantitative approaches using computers, mathematical models, and big data with fundamental methods that analyze individual company cash flows, growth, and risk to generate better risk-adjusted returns.

量化基本面投资其实是一种对基本面投资和量化投资的融合,是将计算机算法与人类的分析结合起来的一种 1+1>2 的新型投资方式。


  1. Introduction to “Advances in Financial Machine Learning” by Lopez de Prado

  2. Measuring Performance: AUC (AUROC)

  3. A Pirate’s Guide to Accuracy, Precision, Recall, and Other Scores

  4. Average precision

  5. How to Calculate Feature Importance With Python