VBA和SAS也使用Python训练的机器学习模型了

alitrack 2022-01-27

1602

前言

我在 Python 训练了个模型，怎么交给 Java 用呢？也介绍到了 m2cgen，今天碰巧看到了网上的一个把生成的 VBA 模型再转 SAS 代码的方案，现分享给大家。

m2cgen 是一个非常友好的包，可以将许多不同的训练模型转换为支持的语言^[1]，如 R 和 VBA。但是，m2cgen 尚不支持 SAS。本文适用于需要在 SAS 环境中部署训练好的模型的人。本文介绍的赛道是先将模型转为 VBA 代码，再将 VBA 代码改成 SAS 脚本。

示例

将 XGBoost 模型转换为 VBA，然后转换为 SAS 脚本（有缺少值）

数据

从 sklearn 加载的 Iris 数据集

建模并转为 VBA，

# import packages
import pandas as pd
import numpy as np
import os
import refrom sklearn import datasets
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_scoreimport m2cgen as m2c# import data
iris = datasets.load_iris()
X = iris.data
Y = iris.target

# split data into train and test sets
seed = 2020
test_size = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model on training data
model = XGBClassifier()
model.fit(X_train, y_train)

code = m2c.export_to_visual_basic(model, function_name = 'pred')

VBA 转 SAS

import re
# remove unnecessary things
code = re.sub('Dim var.* As Double', '', code)
code = re.sub('End If', '', code)
# change the script to sas scripts
# change the beginning
code = re.sub('Module Model\nFunction pred\(ByRef inputVector\(\) As Double\) As Double\(\)\n',
                'DATA pred_result;\nSET dataset_name;', code)
# change the ending
code = re.sub('End Function\nEnd Module\n', 'RUN;', code)
# insert ';'
all_match_list = re.findall('[0-9]+\n', code)
for idx in range(len(all_match_list)):
    original_str = all_match_list[idx]
    new_str = all_match_list[idx][:-1]+';\n'
    code = code.replace(original_str, new_str)
all_match_list = re.findall('\)\n', code)
for idx in range(len(all_match_list)):
    original_str = all_match_list[idx]
    new_str = all_match_list[idx][:-1]+';\n'
    code = code.replace(original_str, new_str)
# handle missing values
all_match_list = re.findall('If.*Then', code)
for idx in range(len(all_match_list)):
    original_str = all_match_list[idx]
    new_str = ' '.join(original_str.split()[:-1] + ['and not missing', original_str.split()[1], ' Then'])
    code = code.replace(original_str, new_str)
# replace the 'inputVector' with var name
dictionary = {'inputVector(0)':'sepal_length',
              'inputVector(1)':'sepal_width',
              'inputVector(2)':'petal_length',
              'inputVector(3)':'petal_width'}
for key in dictionary.keys():
    code = code.replace(key, dictionary[key])
# change the prediction labels
code = re.sub('Math.Exp', 'Exp', code)
code = re.sub('pred = .*\n', '', code)
temp_var_list = re.findall(r"var[0-9]+\(\d\)", code)
for var_idx in range(len(temp_var_list)):
    code = re.sub(re.sub('\\(', '\\(', re.sub('\\)', '\\)', temp_var_list[var_idx])), iris.target_names[var_idx]+'_prob', code)
# save output
with open('vb1.sas', 'w') as vb
    vb.write(code)

生成的结果如下

详细代码可以点击 阅读原文
。

参考资料

[1]

支持的语言: https://github.com/BayesWitnesses/m2cgen#supported-languages

机器学习 python

文章转载自alitrack，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

VBA和SAS也使用Python训练的机器学习模型了

前言

示例

数据

建模并转为 VBA，

VBA 转 SAS

参考资料

评论