Oracle 21C是融合数据库,可以支持任何数据、任何工作负载。不仅有区块链表、原生json数据类型、持久内存支持等技术创新,也有对传统老用户极度友好的自带机器学习算法:

不用额外加载、学习python,只用我们最熟悉的sql语句调用dbms包就能进行数据分析,这太爽了。我们这里以MSET-SPRT 算法为例,切身体会一下Oracle 21c自带机器学习算法的友好便捷。
MSET-SPRT(多元状态估计技术-序列概率比检验)算法是一种用于监视关键过程的非线性、非参数异常检测技术。MSET-SPRT算法可检测到细微的异常,同时产生最少的错误警报。算法根据监视信号正常操作序列的可用历史数据校准预期行为,将学习到的系统行为整合到持久的MSET-SPRT模型中。这样,我们就可以将模型应用于新记录以检测异常行为。
一、数据准备
我们用DBA们比较熟悉的SH用户下的sales表来做实验,我们选取1998年到1999年的数据生成mset_build_sh_data视图来训练数据。
SQL> CREATE OR replace VIEW mset_build_sh_data2 AS SELECT time_id, sum(quantity_sold) quantity,3 sum(amount_sold) amount from (SELECT * FROM sh.sales WHERE4 time_id <= '30-DEC-99') GROUP BY time_id ORDER BY time_id;View created.
我们选取sales表2000年以后的数据生成mset_test_sh_data视图来测试数据。
SQL> CREATE OR replace VIEW mset_test_sh_data2 AS SELECT time_id, sum(quantity_sold) quantity, sum(amount_sold)3 amount FROM (SELECT * FROM sh.sales WHERE time_id > '30-DEC-99')4 GROUP BY time_id ORDER BY time_id;View created.
我们创建mset_sh_settings表来存放算法执行需要的配置数据。
SQL> CREATE TABLE mset_sh_settings(setting_name VARCHAR2(30),2 setting_value VARCHAR2(128));Table created.
我们把MSET-SPRT设置为模型算法,并配置算法的几个详细参数。
SQL> BEGIN2 -- Select MSET-SPRT as the algorithm3 INSERT INTO mset_sh_settings4 VALUES(dbms_data_mining.algo_name,5 dbms_data_mining.algo_mset_sprt);6 -- Turn on automatic data preparation7 INSERT INTO mset_sh_settings8 VALUES(dbms_data_mining.prep_auto,9 dbms_data_mining.prep_auto_on);10 -- Set memory vector11 INSERT INTO mset_sh_settings12 VALUES(dbms_data_mining.mset_memory_vectors, 100);13 -- Set alpha14 INSERT INTO mset_sh_settings15 VALUES(dbms_data_mining.MSET_ALPHA_PROB, 0.1);16 -- Set alert count17 INSERT INTO mset_sh_settings18 VALUES(dbms_data_mining.MSET_ALERT_COUNT, 3);19 -- Set alert window20 INSERT INTO mset_sh_settings21 VALUES(dbms_data_mining.MSET_ALERT_WINDOW, 5);22 -- Examples of other possible settings are:23 -- (dbms_data_mining.mset_beta_prob, 0.1)24 -- (dbms_data_mining.mset_adb_height, 0.01)25 -- (dbms_data_mining.mset_std_tolerance, 3)26 -- (dbms_data_mining.mset_heldaside, 500)27 commit;28 END;29PL/SQL procedure successfully completed.
二、建立数据模型
我们通过Oracle 21c自带的dbms_data_mining包调用mset_test_sh_data视图来建立MSET模型
SQL> BEGIN2 dbms_data_mining.create_model(model_name => 'MSET_SH_MODEL',3 mining_function => 'CLASSIFICATION',4 data_table_name => 'mset_build_sh_data',5 case_id_column_name => 'time_id',6 target_column_name => '',7 settings_table_name => 'MSET_SH_SETTINGS');8 END;9PL/SQL procedure successfully completed.
我们查询模型设置,看看算法参数是否生效
SQL> column setting_name format a30SQL> column setting_value format a30SQL> SELECT setting_name, setting_value2 FROM user_mining_model_settings3 WHERE model_name = 'MSET_SH_MODEL'4 ORDER BY setting_name;SETTING_NAME SETTING_VALUE------------------------------ ------------------------------ALGO_NAME ALGO_MSET_SPRTMSET_ADB_HEIGHT .05MSET_ALERT_COUNT 3MSET_ALERT_WINDOW 5MSET_ALPHA_PROB .1MSET_BETA_PROB .1MSET_HELDASIDE 10000MSET_MEMORY_VECTORS 100MSET_STD_TOLERANCE 3ODMS_DETAILS ODMS_ENABLEODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTOODMS_RANDOM_SEED 0ODMS_SAMPLING ODMS_SAMPLING_DISABLEPREP_AUTO ON14 rows selected.
我们查询模型属性信息
SQL> column attribute_name format a40SQL> column attribute_type format a20SQL> SELECT attribute_name, attribute_type2 FROM user_mining_model_attributes3 WHERE model_name = 'MSET_SH_MODEL'4 ORDER BY attribute_name;ATTRIBUTE_NAME ATTRIBUTE_TYPE---------------------------------------- --------------------AMOUNT NUMERICALQUANTITY NUMERICAL2 rows selected.
MSET_SH_MODEL模型会有一些辅助视图来记录相关信息。
SQL> col view_name format a30SQL> col view_type format a50SQL> SELECT view_name, view_type FROM user_mining_model_views2 WHERE model_name='MSET_SH_MODEL'3 ORDER BY view_name;VIEW_NAME------------------------------VIEW_TYPE--------------------------------------------------DM$VCMSET_SH_MODELScoring Cost MatrixDM$VGMSET_SH_MODELGlobal Name-Value PairsDM$VNMSET_SH_MODELNormalization and Missing Value HandlingDM$VSMSET_SH_MODELComputed SettingsDM$VTMSET_SH_MODELClassification TargetsDM$VWMSET_SH_MODELModel Build Alerts6 rows selected.
以DM$VGMSET_SH_MODEL视图为例,我们可以通过DBA们熟悉的sql语句来查看NUM_ROWS信息
SQL> column name format a20SQL> column numeric_value format a20SQL> column string_value format a15SQL> SELECT name,2 to_char(numeric_value, '99999') numeric_value,3 string_value FROM DM$VGMSET_SH_MODEL4 ORDER BY name;NAME NUMERIC_VALUE STRING_VALUE-------------------- -------------------- ---------------NUM_ROWS 7291 row selected.
三、测试数据模型
我们用1998年到1999年的数据来建立模型,用2000年后的数据来测试模型。我们选择01年12月16日到01年12月25日的数据来显示数据样本的预测和概率
SQL> col prob format 0.999SQL> col pred format 9999SQL> col time_id format a20SQL> SELECT time_id, prediction(mset_sh_model using *) over2 (ORDER BY time_id) pred, prediction_probability (mset_sh_model using *)3 over (ORDER BY time_id) prob4 FROM (SELECT * FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND5 time_id <= '25-DEC-01' ) ORDER BY time_id;TIME_ID PRED PROB-------------------- ----- ------16-DEC-01 1 1.00017-DEC-01 1 0.83318-DEC-01 1 0.66719-DEC-01 0 0.50020-DEC-01 0 0.50021-DEC-01 0 0.50022-DEC-01 0 0.50023-DEC-01 1 0.66724-DEC-01 1 0.83325-DEC-01 1 0.83310 rows selected.
还可以更进一步,用sql查看 2000年至2001年出现异常的所有日期
SQL> SELECT time_id, pred FROM (SELECT time_id, prediction(mset_sh_model using *)2 over (ORDER BY time_id) pred FROM mset_test_sh_data) WHERE pred = 0;TIME_ID PRED-------------------- -----21-SEP-00 022-SEP-00 019-DEC-01 020-DEC-01 021-DEC-01 022-DEC-01 06 rows selected.
显示2000年至2001年的异常总数
SQL> col min(prob) format 0.999SQL> col max(prob) format 0.999SQL> SELECT pred, count(pred), min(prob), max(prob) FROM (2 SELECT prediction(mset_sh_model using *) over (ORDER BY time_id) pred,3 prediction_probability(mset_sh_model using *) over (ORDER BY time_id)4 prob FROM mset_test_sh_data ) GROUP BY pred ORDER BY pred;PRED COUNT(PRED) MIN(PROB) MAX(PROB)----- ----------- --------- ---------0 6 0.500 0.5001 725 0.667 1.0002 rows selected.
显示异常率:异常数/数据总数
SQL>SQL> col anomalyrate format 9.999SQL> SELECT 1-sum(correct)/count(*) AS anomalyrate2 FROM (SELECT decode(prediction(mset_sh_model using *) over3 (ORDER BY time_id), 1, 1) AS correct FROM mset_test_sh_data );ANOMALYRATE-----------.0081 row selected.
显示预测详细信息
SQL> SET long 1000;SQL> col anomalydetails format a80SQL> SELECT time_id, prediction(mset_sh_model using *) over (ORDER BY time_id)2 pred, prediction_details(mset_sh_model using *) over (ORDER BY time_id)3 anomalyDetails FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND4 time_id <= '25-DEC-01' ORDER BY time_id;TIME_ID PRED-------------------- -----ANOMALYDETAILS--------------------------------------------------------------------------------16-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="QUANTITY" actualValue="558" weight="1" rank="1"/><Attribute name="AMOUNT" actualValue="117095.21" weight="1" rank="2"/></Details>17-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="AMOUNT" actualValue="281264.3" weight="1" rank="1"/><Attribute name="QUANTITY" actualValue="1507" weight=".8" rank="2"/></Details>18-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="AMOUNT" actualValue="114550.12" weight="1" rank="1"/><Attribute name="QUANTITY" actualValue="2447" weight=".6" rank="2"/></Details>19-DEC-01 0<Details algorithm="MSET-SPRT" class="0"><Attribute name="QUANTITY" actualValue="367" weight=".6" rank="1"/></Details>20-DEC-01 0<Details algorithm="MSET-SPRT" class="0"><Attribute name="QUANTITY" actualValue="1060" weight=".6" rank="1"/></Details>21-DEC-01 0<Details algorithm="MSET-SPRT" class="0"><Attribute name="QUANTITY" actualValue="473" weight=".6" rank="1"/></Details>22-DEC-01 0<Details algorithm="MSET-SPRT" class="0"><Attribute name="QUANTITY" actualValue="374" weight=".6" rank="1"/></Details>23-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="AMOUNT" actualValue="101917.27" weight="1" rank="1"/><Attribute name="QUANTITY" actualValue="1034" weight=".6" rank="2"/></Details>24-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="AMOUNT" actualValue="201537.77" weight="1" rank="1"/><Attribute name="QUANTITY" actualValue="1662" weight=".8" rank="2"/></Details>25-DEC-01 1<Details algorithm="MSET-SPRT" class="1"><Attribute name="AMOUNT" actualValue="88312.19" weight="1" rank="1"/><Attribute name="QUANTITY" actualValue="470" weight=".8" rank="2"/></Details>10 rows selected.
这一系列的操作都可以通过sql语句调用我们无比熟悉的DBMS包、视图、表来直接实现,简单便捷,减少了额外的学习成本、数据复制成本,是融合数据库便捷高效的又一例证。
编辑:殷海英





