出差回来啦!
接着上次利用scikit-learn预测“绝地求生”非法外挂,
本周我给大家介绍如何利用Oracle Machine Learning,利用SQL进行机器学习...

需要数据以及完整SQL代码的童鞋,可以联系我~
什么是Oracle Machine Learning?
Oracle的机器学习服务众多,今天说到的Oracle Machine Learning(OML)是指集成在Autonomous Data Warehouse(ADW),即自治数据仓库内的机器学习服务。由于整个模型训练过程是在数据仓库中进行,因此不需要移动数据,数据权限由数据仓库统一管控,并且模型训练过程支持SQL, R, Python等多种语言。

下面是Oracle Machine Leanring的几个主要特点。
不需要移动数据。整个训练过程都在数据仓库进行,保证数据时效性及数据安全
自动化数据处理功能。不需要额外进行对数据进行数组或矩阵转换,空值处理,特征转换等过程可自动化进行,简化数据前处理工作
支持多种开发语言。支持SQL, R, Python等多种语言,精通SQL的DBA或者分析师也可以轻松进行机器学习
强悍的性能。基于Exadata提供的云服务。
AutoML。特征自动选择、算法自动选择、参数自动调优等,一系列自动化功能,提高了模型开发及调优效率
开始建模
好啦,老铁!咱们正式开始!!!

1. 特征重要性分析
1.1 数据确认
%sqlSelect * from JETTLIU.PUBGDATA;

1.2 创建训练数据
%script/* Create PUGB_TRAIN Table on Jett user. drop exist table */BEGINEXECUTE IMMEDIATE 'DROP Table JETTLIU.PUBG_TRAIN';EXCEPTIONWHEN OTHERS THEN NULL;END;//* Create PUBG_TRAIN on Jett user. */CREATE Table jettliu.PUBG_TRAINAS SELECT ID, GROUPID, MATCHID, ASSISTS, BOOSTS, DAMAGEDEALT, DBNOS, HEADSHOTKILLS, HEALS, KILLPLACE, KILLPOINTS, KILLS, KILLSTREAKS, LONGESTKILL, MATCHDURATION, MATCHTYPE, MAXPLACE, NUMGROUPS, RANKPOINTS, REVIVES, RIDEDISTANCE, ROADKILLS, SWIMDISTANCE, TEAMKILLS, VEHICLEDESTROYS, WALKDISTANCE, WEAPONSACQUIRED, WINPOINTS, WINPLACEPERC, CHEATERFROM JETTLIU.PUBGDATA

1.3 特征重要性分析
从训练数据中,找出影响“外挂”结果的特征的重要性
%script/* Find the importance of attributes that independently impact the target attribute: Cheater */BEGINEXECUTE IMMEDIATE 'DROP TABLE ai_explain_output';EXCEPTIONWHEN OTHERS THEN NULL;END;/BEGINDBMS_PREDICTIVE_ANALYTICS.EXPLAIN(data_table_name => 'PUBG_TRAIN',explain_column_name => 'CHEATER',result_table_name => 'ai_explain_output');END;

1.4 特征重要性排名显示
%sqlSELECT attribute_name, explanatory_value, rankFROM ai_explain_outputORDER BY rank, attribute_name;

2. 模型训练
2.1 重新生成训练数据
基于1.4的结果,确认影响外挂的9大特征为:WALKDISTANCE”, “KILLPLACE”, “MATCHTYPE”, “NUMGROUPS”, “MATCHDURATION”, “MAXPLACE”, “RIDEDISTANCE”, “WEAPONSACQUIRED”, “HEALS”。
提取上述特征并重新生成训练数据
%scriptBEGINEXECUTE IMMEDIATE 'DROP Table OML_PUBG_TRAIN';EXCEPTIONWHEN OTHERS THEN NULL;END;//* Create OML_PUBG_TRAIN on Jett user. */CREATE Table OML_PUBG_TRAINAS (SELECT WALKDISTANCE, KILLPLACE, MATCHTYPE, NUMGROUPS, MATCHDURATION, MAXPLACE, RIDEDISTANCE, WEAPONSACQUIRED, HEALS, CHEATERFROM PUBG_TRAIN);
2.2 确认新生成的训练数据
%sqlSelect * from OML_PUBG_TRAIN;

2.3 训练分类模型
其中激活"PREP_AUTO"功能,对训练数据中的空值及Categorical Data做自动转换处理
%script/* Build a classification model and then generate a lift test result and an apply result. */DECLAREv_sql varchar2(100);Begin/* drop build settings */BEGINv_sql := 'DROP TABLE empattr_build_settings PURGE';EXECUTE IMMEDIATE v_sql;DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');EXCEPTIONWHEN OTHERS THENDBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');END;/* drop model */BEGINv_sql := 'CALL DBMS_DATA_MINING.DROP_MODEL(''EMPATTR_CLASS_MODEL'')';EXECUTE IMMEDIATE v_sql;DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');EXCEPTIONWHEN OTHERS THENDBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no model exists');END;/* drop apply result */BEGINv_sql := 'DROP TABLE EMPATTR_APPLY_RESULT PURGE';EXECUTE IMMEDIATE v_sql;DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');EXCEPTIONWHEN OTHERS THENDBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');END;/* drop lift result */BEGINv_sql := 'DROP TABLE EMPATTR_LIFT_TABLE PURGE';EXECUTE IMMEDIATE v_sql;DBMS_OUTPUT.PUT_LINE (v_sql ||': succeeded');EXCEPTIONWHEN OTHERS THENDBMS_OUTPUT.PUT_LINE (v_sql ||': drop unneccessary - no table exists');END;/* Split the Data into N1_TRAIN_DATA and N1_TEST_DATA */EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW EMPATTR_TRAIN_DATA AS SELECT * FROM PUBG_TRAIN SAMPLE (60) SEED (1)';DBMS_OUTPUT.PUT_LINE ('Created EMPATTR_TRAIN_DATA');EXECUTE IMMEDIATE 'CREATE OR REPLACE VIEW EMPATTR_TEST_DATA AS SELECT * FROM PUBG_TRAIN MINUS SELECT * FROM EMPATTR_TRAIN_DATA';DBMS_OUTPUT.PUT_LINE ('Created EMPATTR_TEST_DATA');/* Create a Build Setting (DT) for Model Build */EXECUTE IMMEDIATE 'CREATE TABLE EMPATTR_build_settings (setting_name VARCHAR2(30),setting_value VARCHAR2(4000))';EXECUTE IMMEDIATE 'INSERT INTO EMPATTR_build_settings (setting_name, setting_value) VALUES (''ALGO_NAME'', ''ALGO_DECISION_TREE'')';EXECUTE IMMEDIATE 'INSERT INTO EMPATTR_build_settings (setting_name, setting_value) VALUES (''PREP_AUTO'', ''ON'')';DBMS_OUTPUT.PUT_LINE ('Created model build settings table: EMPATTR_build_settings ');/* Build a Classification Model */EXECUTE IMMEDIATE 'CALL DBMS_DATA_MINING.CREATE_MODEL(''EMPATTR_CLASS_MODEL'', ''CLASSIFICATION'', ''EMPATTR_TRAIN_DATA'', ''ID'',''CHEATER'', ''EMPATTR_build_settings'')';DBMS_OUTPUT.PUT_LINE ('Created model: EMPATTR_CLASS_MODEL ');/* Test the Model by generating a apply result and then create a lift result */EXECUTE IMMEDIATE 'CALL DBMS_DATA_MINING.APPLY(''EMPATTR_CLASS_MODEL'',''EMPATTR_TEST_DATA'',''ID'',''EMPATTR_APPLY_RESULT'')';DBMS_OUTPUT.PUT_LINE ('Created apply result: EMPATTR_APPLY_RESULT ');EXECUTE IMMEDIATE 'CALL DBMS_DATA_MINING.COMPUTE_LIFT(''EMPATTR_APPLY_RESULT'',''EMPATTR_TEST_DATA'',''ID'',''CHEATER'',''EMPATTR_LIFT_TABLE'',''1'',''PREDICTION'',''PROBABILITY'',100)';DBMS_OUTPUT.PUT_LINE ('Created lift result: EMPATTR_LIFT_TABLE ');END;

3. 模型评估
3.1 确认所有数据的预测结果
%sqlSELECT * FROM EMPATTR_APPLY_RESULT;

3.2 确认模型准确度
%sqlSELECT ROUND(SUM(correct)/COUNT(*),4) * 100 AS accuracy FROM (SELECT DECODE(CHEATER, PREDICTION(EMPATTR_CLASS_MODEL USING *), 1, 0) AS correct FROM EMPATTR_TEST_DATA);

3.3 确认混淆矩阵

用最简单的方式随便做了个模型,结果还算不错!

然后就没然后了...
今天就到这里,如果对Oracle Machine Learning感兴趣的话,可以申请Oracle永久免费云账号,就可以永久免费试用Oracle Machine Learning!具体申请方法可参照我之前发的文章!




