暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

内秀于心--Oracle 21c 自带机器学习算法MSET-SPRT

299

    Oracle 21C是融合数据库,可以支持任何数据、任何工作负载。不仅有区块链表、原生json数据类型、持久内存支持等技术创新,也有对传统老用户极度友好的自带机器学习算法:

    不用额外加载、学习python,只用我们最熟悉的sql语句调用dbms包就能进行数据分析,这太爽了。我们这里以MSET-SPRT 算法为例,切身体会一下Oracle 21c自带机器学习算法的友好便捷。

    MSET-SPRT(多元状态估计技术-序列概率比检验)算法是一种用于监视关键过程的非线性、非参数异常检测技术。MSET-SPRT算法可检测到细微的异常,同时产生最少的错误警报。算法根据监视信号正常操作序列的可用历史数据校准预期行为,将学习到的系统行为整合到持久的MSET-SPRT模型中。这样,我们就可以将模型应用于新记录以检测异常行为。

一、数据准备

我们用DBA们比较熟悉的SH用户下的sales表来做实验,我们选取1998年到1999年的数据生成mset_build_sh_data视图来训练数据。

    SQL> CREATE OR replace VIEW mset_build_sh_data
    2 AS SELECT time_id, sum(quantity_sold) quantity,
    3 sum(amount_sold) amount from (SELECT * FROM sh.sales WHERE
    4 time_id <= '30-DEC-99') GROUP BY time_id ORDER BY time_id;


    View created.

    我们选取sales表2000年以后的数据生成mset_test_sh_data视图来测试数据。

      SQL> CREATE OR replace VIEW mset_test_sh_data
      2 AS SELECT time_id, sum(quantity_sold) quantity, sum(amount_sold)
      3 amount FROM (SELECT * FROM sh.sales WHERE time_id > '30-DEC-99')
      4 GROUP BY time_id ORDER BY time_id;


      View created.

      我们创建mset_sh_settings表来存放算法执行需要的配置数据。

        SQL> CREATE TABLE mset_sh_settings(setting_name VARCHAR2(30),
        2 setting_value VARCHAR2(128));


        Table created.

        我们把MSET-SPRT设置为模型算法,并配置算法的几个详细参数。

          SQL> BEGIN
          2 -- Select MSET-SPRT as the algorithm
          3 INSERT INTO mset_sh_settings
          4 VALUES(dbms_data_mining.algo_name,
          5 dbms_data_mining.algo_mset_sprt);
          6 -- Turn on automatic data preparation
          7 INSERT INTO mset_sh_settings
          8 VALUES(dbms_data_mining.prep_auto,
          9 dbms_data_mining.prep_auto_on);
          10 -- Set memory vector
          11 INSERT INTO mset_sh_settings
          12 VALUES(dbms_data_mining.mset_memory_vectors, 100);
          13 -- Set alpha
          14 INSERT INTO mset_sh_settings
          15 VALUES(dbms_data_mining.MSET_ALPHA_PROB, 0.1);
          16 -- Set alert count
          17 INSERT INTO mset_sh_settings
          18 VALUES(dbms_data_mining.MSET_ALERT_COUNT, 3);
          19 -- Set alert window
          20 INSERT INTO mset_sh_settings
          21 VALUES(dbms_data_mining.MSET_ALERT_WINDOW, 5);
          22 -- Examples of other possible settings are:
          23 -- (dbms_data_mining.mset_beta_prob, 0.1)
          24 -- (dbms_data_mining.mset_adb_height, 0.01)
          25 -- (dbms_data_mining.mset_std_tolerance, 3)
          26 -- (dbms_data_mining.mset_heldaside, 500)
          27 commit;
          28 END;
          29


          PL/SQL procedure successfully completed.

          二、建立数据模型

          我们通过Oracle 21c自带的dbms_data_mining包调用mset_test_sh_data视图来建立MSET模型

            SQL> BEGIN
            2 dbms_data_mining.create_model(model_name => 'MSET_SH_MODEL',
            3 mining_function => 'CLASSIFICATION',
            4 data_table_name => 'mset_build_sh_data',
            5 case_id_column_name => 'time_id',
            6 target_column_name => '',
            7 settings_table_name => 'MSET_SH_SETTINGS');
            8 END;
            9


            PL/SQL procedure successfully completed.

            我们查询模型设置,看看算法参数是否生效

              SQL> column setting_name format a30
              SQL> column setting_value format a30
              SQL> SELECT setting_name, setting_value
              2 FROM user_mining_model_settings
              3 WHERE model_name = 'MSET_SH_MODEL'
              4 ORDER BY setting_name;


              SETTING_NAME SETTING_VALUE
              ------------------------------ ------------------------------
              ALGO_NAME ALGO_MSET_SPRT
              MSET_ADB_HEIGHT .05
              MSET_ALERT_COUNT 3
              MSET_ALERT_WINDOW 5
              MSET_ALPHA_PROB .1
              MSET_BETA_PROB .1
              MSET_HELDASIDE 10000
              MSET_MEMORY_VECTORS 100
              MSET_STD_TOLERANCE 3
              ODMS_DETAILS ODMS_ENABLE
              ODMS_MISSING_VALUE_TREATMENT ODMS_MISSING_VALUE_AUTO
              ODMS_RANDOM_SEED 0
              ODMS_SAMPLING ODMS_SAMPLING_DISABLE
              PREP_AUTO ON


              14 rows selected.

              我们查询模型属性信息

                SQL> column attribute_name format a40
                SQL> column attribute_type format a20
                SQL> SELECT attribute_name, attribute_type
                2 FROM user_mining_model_attributes
                3 WHERE model_name = 'MSET_SH_MODEL'
                4 ORDER BY attribute_name;


                ATTRIBUTE_NAME ATTRIBUTE_TYPE
                ---------------------------------------- --------------------
                AMOUNT NUMERICAL
                QUANTITY NUMERICAL


                2 rows selected.

                MSET_SH_MODEL模型会有一些辅助视图来记录相关信息。

                  SQL> col view_name format a30
                  SQL> col view_type format a50
                  SQL> SELECT view_name, view_type FROM user_mining_model_views
                  2 WHERE model_name='MSET_SH_MODEL'
                  3 ORDER BY view_name;


                  VIEW_NAME
                  ------------------------------
                  VIEW_TYPE
                  --------------------------------------------------
                  DM$VCMSET_SH_MODEL
                  Scoring Cost Matrix


                  DM$VGMSET_SH_MODEL
                  Global Name-Value Pairs


                  DM$VNMSET_SH_MODEL
                  Normalization and Missing Value Handling


                  DM$VSMSET_SH_MODEL
                  Computed Settings


                  DM$VTMSET_SH_MODEL
                  Classification Targets


                  DM$VWMSET_SH_MODEL
                  Model Build Alerts




                  6 rows selected.

                  以DM$VGMSET_SH_MODEL视图为例,我们可以通过DBA们熟悉的sql语句来查看NUM_ROWS信息

                    SQL> column name format a20
                    SQL> column numeric_value format a20
                    SQL> column string_value format a15
                    SQL> SELECT name,
                    2 to_char(numeric_value, '99999') numeric_value,
                    3 string_value FROM DM$VGMSET_SH_MODEL
                    4 ORDER BY name;


                    NAME NUMERIC_VALUE STRING_VALUE
                    -------------------- -------------------- ---------------
                    NUM_ROWS 729


                    1 row selected.


                    三、测试数据模型

                    我们用1998年到1999年的数据来建立模型,用2000年后的数据来测试模型。我们选择01年12月16日到01年12月25日的数据来显示数据样本的预测和概率

                      SQL> col prob format 0.999
                      SQL> col pred format 9999
                      SQL> col time_id format a20
                      SQL> SELECT time_id, prediction(mset_sh_model using *) over
                      2 (ORDER BY time_id) pred, prediction_probability (mset_sh_model using *)
                      3 over (ORDER BY time_id) prob
                      4 FROM (SELECT * FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND
                      5 time_id <= '25-DEC-01' ) ORDER BY time_id;


                      TIME_ID PRED PROB
                      -------------------- ----- ------
                      16-DEC-01 1 1.000
                      17-DEC-01 1 0.833
                      18-DEC-01 1 0.667
                      19-DEC-01 0 0.500
                      20-DEC-01 0 0.500
                      21-DEC-01 0 0.500
                      22-DEC-01 0 0.500
                      23-DEC-01 1 0.667
                      24-DEC-01 1 0.833
                      25-DEC-01 1 0.833


                      10 rows selected.

                      还可以更进一步,用sql查看 2000年至2001年出现异常的所有日期

                        SQL> SELECT time_id, pred FROM (SELECT time_id, prediction(mset_sh_model using *)
                        2 over (ORDER BY time_id) pred FROM mset_test_sh_data) WHERE pred = 0;


                        TIME_ID PRED
                        -------------------- -----
                        21-SEP-00 0
                        22-SEP-00 0
                        19-DEC-01 0
                        20-DEC-01 0
                        21-DEC-01 0
                        22-DEC-01 0


                        6 rows selected.

                        显示2000年至2001年的异常总数

                          SQL> col min(prob) format 0.999
                          SQL> col max(prob) format 0.999
                          SQL> SELECT pred, count(pred), min(prob), max(prob) FROM (
                          2 SELECT prediction(mset_sh_model using *) over (ORDER BY time_id) pred,
                          3 prediction_probability(mset_sh_model using *) over (ORDER BY time_id)
                          4 prob FROM mset_test_sh_data ) GROUP BY pred ORDER BY pred;


                          PRED COUNT(PRED) MIN(PROB) MAX(PROB)
                          ----- ----------- --------- ---------
                          0 6 0.500 0.500
                          1 725 0.667 1.000


                          2 rows selected.

                          显示异常率:异常数/数据总数

                            SQL> 
                            SQL> col anomalyrate format 9.999
                            SQL> SELECT 1-sum(correct)/count(*) AS anomalyrate
                            2 FROM (SELECT decode(prediction(mset_sh_model using *) over
                            3 (ORDER BY time_id), 1, 1) AS correct FROM mset_test_sh_data );


                            ANOMALYRATE
                            -----------
                            .008


                            1 row selected.

                            显示预测详细信息

                              SQL> SET long 1000;
                              SQL> col anomalydetails format a80
                              SQL> SELECT time_id, prediction(mset_sh_model using *) over (ORDER BY time_id)
                              2 pred, prediction_details(mset_sh_model using *) over (ORDER BY time_id)
                              3 anomalyDetails FROM mset_test_sh_data WHERE time_id > '15-DEC-01' AND
                              4 time_id <= '25-DEC-01' ORDER BY time_id;


                              TIME_ID PRED
                              -------------------- -----
                              ANOMALYDETAILS
                              --------------------------------------------------------------------------------
                              16-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="QUANTITY" actualValue="558" weight="1" rank="1"/>
                              <Attribute name="AMOUNT" actualValue="117095.21" weight="1" rank="2"/>
                              </Details>


                              17-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="AMOUNT" actualValue="281264.3" weight="1" rank="1"/>
                              <Attribute name="QUANTITY" actualValue="1507" weight=".8" rank="2"/>
                              </Details>


                              18-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="AMOUNT" actualValue="114550.12" weight="1" rank="1"/>
                              <Attribute name="QUANTITY" actualValue="2447" weight=".6" rank="2"/>
                              </Details>


                              19-DEC-01 0
                              <Details algorithm="MSET-SPRT" class="0">
                              <Attribute name="QUANTITY" actualValue="367" weight=".6" rank="1"/>
                              </Details>


                              20-DEC-01 0
                              <Details algorithm="MSET-SPRT" class="0">
                              <Attribute name="QUANTITY" actualValue="1060" weight=".6" rank="1"/>
                              </Details>


                              21-DEC-01 0
                              <Details algorithm="MSET-SPRT" class="0">
                              <Attribute name="QUANTITY" actualValue="473" weight=".6" rank="1"/>
                              </Details>


                              22-DEC-01 0
                              <Details algorithm="MSET-SPRT" class="0">
                              <Attribute name="QUANTITY" actualValue="374" weight=".6" rank="1"/>
                              </Details>


                              23-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="AMOUNT" actualValue="101917.27" weight="1" rank="1"/>
                              <Attribute name="QUANTITY" actualValue="1034" weight=".6" rank="2"/>
                              </Details>


                              24-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="AMOUNT" actualValue="201537.77" weight="1" rank="1"/>
                              <Attribute name="QUANTITY" actualValue="1662" weight=".8" rank="2"/>
                              </Details>


                              25-DEC-01 1
                              <Details algorithm="MSET-SPRT" class="1">
                              <Attribute name="AMOUNT" actualValue="88312.19" weight="1" rank="1"/>
                              <Attribute name="QUANTITY" actualValue="470" weight=".8" rank="2"/>
                              </Details>




                              10 rows selected.


                              这一系列的操作都可以通过sql语句调用我们无比熟悉的DBMS包、视图、表来直接实现,简单便捷,减少了额外的学习成本、数据复制成本,是融合数据库便捷高效的又一例证。

                              编辑:殷海英

                              文章转载自杨建荣的学习笔记,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                              评论