前言
我在 Python 训练了个模型,怎么交给 Java 用呢?也介绍到了 m2cgen,今天碰巧看到了网上的一个把生成的 VBA 模型再转 SAS 代码的方案,现分享给大家。
m2cgen 是一个非常友好的包,可以将许多不同的训练模型转换为支持的语言[1],如 R 和 VBA。但是,m2cgen 尚不支持 SAS。本文适用于需要在 SAS 环境中部署训练好的模型的人。本文介绍的赛道是先将模型转为 VBA 代码,再将 VBA 代码改成 SAS 脚本。
示例
将 XGBoost 模型转换为 VBA,然后转换为 SAS 脚本(有缺少值)
数据
从 sklearn 加载的 Iris 数据集
建模并转为 VBA,
# import packages
import pandas as pd
import numpy as np
import os
import refrom sklearn import datasets
from xgboost import XGBClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_scoreimport m2cgen as m2c# import data
iris = datasets.load_iris()
X = iris.data
Y = iris.target
# split data into train and test sets
seed = 2020
test_size = 0.3
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=test_size, random_state=seed)
# fit model on training data
model = XGBClassifier()
model.fit(X_train, y_train)
code = m2c.export_to_visual_basic(model, function_name = 'pred')
VBA 转 SAS
import re
# remove unnecessary things
code = re.sub('Dim var.* As Double', '', code)
code = re.sub('End If', '', code)
# change the script to sas scripts
# change the beginning
code = re.sub('Module Model\nFunction pred\(ByRef inputVector\(\) As Double\) As Double\(\)\n',
'DATA pred_result;\nSET dataset_name;', code)
# change the ending
code = re.sub('End Function\nEnd Module\n', 'RUN;', code)
# insert ';'
all_match_list = re.findall('[0-9]+\n', code)
for idx in range(len(all_match_list)):
original_str = all_match_list[idx]
new_str = all_match_list[idx][:-1]+';\n'
code = code.replace(original_str, new_str)
all_match_list = re.findall('\)\n', code)
for idx in range(len(all_match_list)):
original_str = all_match_list[idx]
new_str = all_match_list[idx][:-1]+';\n'
code = code.replace(original_str, new_str)
# handle missing values
all_match_list = re.findall('If.*Then', code)
for idx in range(len(all_match_list)):
original_str = all_match_list[idx]
new_str = ' '.join(original_str.split()[:-1] + ['and not missing', original_str.split()[1], ' Then'])
code = code.replace(original_str, new_str)
# replace the 'inputVector' with var name
dictionary = {'inputVector(0)':'sepal_length',
'inputVector(1)':'sepal_width',
'inputVector(2)':'petal_length',
'inputVector(3)':'petal_width'}
for key in dictionary.keys():
code = code.replace(key, dictionary[key])
# change the prediction labels
code = re.sub('Math.Exp', 'Exp', code)
code = re.sub('pred = .*\n', '', code)
temp_var_list = re.findall(r"var[0-9]+\(\d\)", code)
for var_idx in range(len(temp_var_list)):
code = re.sub(re.sub('\\(', '\\(', re.sub('\\)', '\\)', temp_var_list[var_idx])), iris.target_names[var_idx]+'_prob', code)
# save output
with open('vb1.sas', 'w') as vb
vb.write(code)
生成的结果如下
详细代码可以点击 阅读原文
。
参考资料
支持的语言: https://github.com/BayesWitnesses/m2cgen#supported-languages
文章转载自alitrack,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




