暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 数据库 23ai 中的 AI 向量搜索

136
Oracle AI Vector Search 是 Oracle 推出的一个全新功能,旨在帮助用户安全地结合文档、图像以及其他非结构化数据的搜索与企业中结构化数据的查询,无需将数据从数据库中移动或复制。这种方式减少了数据暴露和重复存储的风险,提升了效率和安全性。

核心功能和优势:

  1. 结合搜索:

    • 不同于传统数据库只能处理结构化数据,AI Vector Search 能够理解和处理非结构化数据(如图片、PDF、音频等)。
    • 利用 AI 矢量化技术,将文本、图像等数据转换为数学向量,通过计算相似度快速找到相关内容。
  2. 实时运行:

    • 依托 Oracle Database 23ai,AI 算法直接与数据库集成,无需将数据导出到外部 AI 平台。
    • 这意味着用户可以直接在数据库中实时运行 AI 模型,节省数据移动所需的时间和资源。
  3. 高效开发:

    • Oracle 提供了统一的开发工具和关键任务支持,开发者可以用一种更高效的方式构建智能应用程序。
    • 数据专业人员无需掌握复杂的 AI 技术,也能轻松利用 AI Vector Search 提高业务搜索效率。
  4. 安全可靠:

    • 由于所有操作都在 Oracle 数据库内部完成,数据始终受到高标准的安全保护。
    • 结合企业的关键任务功能,可确保搜索过程的稳定性和高性能。

适用场景:

  • 知识管理:
    在企业内部快速搜索相关文档、合同、图片等,提升员工的生产力。
  • 客户服务:
    通过矢量搜索匹配客户提问与企业知识库中的答案,构建智能客服系统。
  • 推荐系统:
    根据用户行为和兴趣向量推荐产品或服务。
  • 医疗健康:
    在医学影像、病例中快速找到相关数据,用于诊断或研究。

通过 Oracle AI Vector Search,开发人员和数据专业人员能够更轻松地将 AI 与数据库融合,开发智能应用程序,并处理关键任务工作负载,提升企业效率与竞争力。

使用示例:
1、选择模型

很多人跟我一样对自己制作模型一无所知,所以这里我可以使用 Oracle 为我们提供的一个名为 all-MiniLM-L12-v2 的模型。

2、下载并解压模型

我们创建一个目录来保存模型,并将模型解压缩到该目录中。

    mkdir -p u01/models
    cd u01/models
    wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip
    unzip -oq all_MiniLM_L12_v2_augmented.zip

    3、下载模型(第1步已含)

      wget https://adwc4pm.objectstorage.us-ashburn-1.oci.customer-oci.com/p/VBRD9P8ZFWkKvnfhrWxkpPe8K03-JIoM5h_8EJyJcpE80c108fuUjg7R5L5O7mMZ/n/adwc4pm/b/OML-Resources/o/all_MiniLM_L12_v2_augmented.zip

      4、解压缩模型包(第1步已含)

        unzip -oq all_MiniLM_L12_v2_augmented.zip

        5、连接数据库

        我们连接到数据库,创建测试用户,创建指向物理目录的目录对象,并为测试用户授予访问权限。

          conn sys/SysPassword1@//localhost:1521/freepdb1 as sysdba


          6、创建测试用户和模型目录并授权

            -- 创建用户并授权
            create user if not exists testuser1 identified by testuser1 quota unlimited on users;
            grant create session, db_developer_role, create mining model to testuser1;


            -- 创建目录并授权
            create or replace directory model_dir as '/u01/models';
            grant read, write on directory model_dir to testuser1;

            7、将模型加载到数据库

            我们现在可以使用DBMS_VECTOR包将模型加载到数据库中。

              begin
              dbms_vector.drop_onnx_model (
              model_name => 'ALL_MINILM_L12_V2',
              force => true);
              dbms_vector.load_onnx_model (
              directory => 'model_dir',
              file_name => 'all_MiniLM_L12_v2.onnx',
              model_name => 'ALL_MINILM_L12_V2');
              end;
              /

              8、查看模型信息

              我们在USER_MINING_MODELS视图中看到了模型信息。

                column model_name format a30
                column algorithm format a10
                column mining_function format a15
                select model_name, algorithm, mining_function
                from user_mining_models
                where model_name = 'ALL_MINILM_L12_V2';
                MODEL_NAME ALGORITHM MINING_FUNCTION
                ------------------------------ ---------- ---------------
                ALL_MINILM_L12_V2 ONNX EMBEDDING
                SQL>

                9、生成向量(VECTOR数据类型)

                  et long 1000000
                  select vector_embedding(all_minilm_l12_v2 using 'Quick test' as data) AS my_vector;
                  MY_VECTOR
                  --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  [-3.8644433E-002,7.27762729E-002,-6.99380785E-003,-7.29616638E-003,8.81515723E-003,-6.36086538E-002,4.39667143E-003,-4.20215279E-002,-1.32307231E-001,-5.83763141E-003,-1.32369539E-002,-1.62914675E-002
                  ,6.54895976E-003,-4.98352237E-002,-1.98450759E-002,-4.69920859E-002,1.03937693E-001,-8.96753371E-002,-2.77858647E-003,4.13947217E-002,-6.51627034E-002,-1.0990192E-001,-8.73052608E-003,2.533352E-002,-1
                  .42030744E-002,-2.42071413E-002,1.91592015E-002,4.93750861E-003,6.30867062E-003,-1.24127828E-001,-7.17296917E-003,3.73173878E-002,4.9763605E-002,4.52162512E-002,1.49683282E-002,-2.21796334E-002,-3.679
                  36343E-002,-6.20212057E-004,7.16803446E-002,5.33913262E-003,1.92087796E-002,-9.91346464E-002,3.90679725E-002,2.22725477E-002,5.04363291E-002,1.81943234E-002,5.34031764E-002,1.44161787E-002,-1.99908093
                  E-002,-1.20323906E-002,-2.63888612E-002,-4.14666645E-002,6.24738075E-002,-4.68838811E-002,1.16748791E-002,-2.43180897E-002,-3.11982501E-002,-7.57505326E-003,2.25466546E-002,-4.17359509E-002,1.23237111
                  E-002,4.31706719E-002,-7.83751085E-002,1.24918474E-002,5.42060696E-002,4.33742851E-002,2.52278009E-003,-1.15482165E-002,-9.98713658E-004,-2.12613232E-002,1.00960573E-002,3.17986757E-002,-1.13147125E-0
                  02,-1.26893111E-002,2.66182888E-002,-7.5068539E-003,-3.70341614E-002,1.9485198E-002,-2.921376E-002,-2.61210185E-002,2.86212768E-002,-9.15901735E-002,1.50552308E-002,-4.9816858E-002,2.29324102E-002,7.8
                  2513991E-003,4.22973074E-002,3.37974578E-002,-4.23457436E-002,-6.32970929E-002,3.84950414E-002,-1.93851739E-002,1.96237396E-003,-3.91559181E-004,7.80333811E-003,5.63595779E-002,4.45815139E-002,-4.9701
                  6683E-002,1.36383837E-002,2.76547611E-001,6.35802001E-002,-1.69337578E-002,-3.25948372E-002,2.74621435E-002,-1.84808951E-002,-3.58916447E-002,3.18281911E-003,-3.92074399E-002,-5.03929285E-003,-3.95198
                  613E-002,2.64223926E-002,5.44404797E-002,-2.97637819E-003,1.0656476E-002,4.55005579E-002,-9.6166715E-002,4.53018732E-002,3.02239601E-002,-1.11025617E-001,6.18583001E-002,8.55141282E-002,-1.51456818E-0
                  02,-5.6408301E-002,-5.93017961E-004,1.07500203E-001,-6.81523681E-002,1.85917076E-002,3.75313126E-002,-3.27163152E-002,-4.72421981E-002,5.59753105E-002,2.20437739E-002,2.74991728E-002,2.6306238E-002,-5
                  MY_VECTOR
                  --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  .36230095E-002,6.86871773E-003,5.06029651E-003,8.86866376E-002,3.97678688E-002,-1.4952058E-003,-1.11991599E-001,-1.48920929E-002,-1.42183332E-002,5.44683374E-002,-5.46902604E-002,-3.37714627E-002,-3.9
                  3099897E-002,-8.8875033E-003,-2.50034444E-002,-3.8671799E-002,7.15422854E-002,-1.72948167E-002,5.7217218E-002,1.23445876E-002,-6.25340268E-002,-1.97963398E-002,4.08164188E-002,9.2235636E-003,2.3645639
                  4E-002,-4.27598739E-003,-1.24366455E-001,8.28649029E-002,-5.27118742E-002,-1.11121098E-002,4.35752161E-002,1.35777248E-002,-2.25060564E-002,4.55260463E-002,3.89738753E-002,-8.9330703E-002,1.17293425E-
                  001,5.51190637E-002,-2.56631747E-002,-5.30632809E-002,-8.39535818E-002,4.83218301E-003,6.57674894E-002,8.87271166E-002,-1.52742919E-002,1.05254455E-002,-1.58144645E-002,-3.10783144E-002,-2.9069094E-00
                  2,7.04223011E-003,-3.09849512E-002,-4.46299138E-003,-7.20088407E-002,-7.05658719E-002,4.65546325E-002,1.10276632E-001,3.60871851E-002,1.86070222E-002,-6.10643029E-002,3.21829244E-002,-1.43657476E-002,
                  -6.75653145E-002,8.07486102E-002,1.68783069E-002,-1.0059043E-001,-7.55800903E-002,-1.69591829E-002,-4.45710421E-002,-8.60542338E-003,4.33787964E-002,4.29520719E-002,3.94066162E-002,8.67492054E-003,-8.
                  52123275E-002,1.20206453E-001,-1.14268251E-001,-1.70285124E-002,8.87670461E-003,-4.69080843E-002,-3.02138515E-002,4.61057499E-002,-4.92520146E-002,1.56181203E-002,-9.27053615E-002,-6.08294345E-002,1.4
                  6451779E-002,-1.8469112E-002,-1.40407547E-001,5.35490811E-002,5.85880674E-033,7.62652978E-002,-3.0770693E-002,-6.7476593E-003,1.03074148E-001,7.20860213E-002,-9.75818858E-002,1.51840553E-001,7.4332207
                  4E-002,-2.99238227E-002,9.39518213E-002,1.50299622E-002,4.35530245E-002,-7.58084841E-003,-7.49263093E-002,-5.07647246E-002,4.0109925E-002,-7.43360296E-002,4.62087467E-002,9.6142469E-003,3.1514265E-004
                  ,6.2026035E-002,1.57011151E-002,3.29307579E-002,5.6974791E-002,-7.89973959E-002,9.78369173E-003,1.16775157E-002,-3.65987495E-002,-5.30386977E-002,-1.22491308E-002,5.65312728E-002,3.41438502E-002,-4.26
                  849015E-002,9.84478667E-002,1.52464816E-003,-6.92429468E-002,9.64930728E-002,-1.85021404E-002,4.28027436E-002,-4.41830456E-002,-2.54553054E-002,5.20384647E-002,-1.3808256E-002,-1.59469489E-002,2.10003
                  MY_VECTOR
                  --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                  518E-002,-1.85695086E-002,2.29395907E-002,1.91418286E-002,4.09490913E-002,2.35137846E-002,-3.91655453E-002,3.57466601E-002,4.80409786E-002,-1.02699241E-002,1.45040052E-002,-4.27465513E-002,-5.0925903E
                  -002,-7.12790117E-002,-9.191221E-002,1.79740768E-002,-3.53490785E-002,-2.26370115E-002,1.64974518E-002,1.05952621E-001,-3.52565721E-002,-3.41151282E-002,-5.72828352E-002,-3.10265515E-002,6.9757998E-00
                  2,-2.50361152E-002,-3.91423702E-002,1.37532372E-002,6.76134787E-003,-5.15896529E-002,-3.5578981E-002,6.91288859E-002,-3.47342566E-002,-1.09835267E-002,-2.45214235E-002,-3.24611999E-002,1.08489944E-002
                  ,8.2179103E-003,-2.97738314E-002,4.81431969E-002,-5.78631386E-002,2.85624769E-002,4.0292047E-002,2.82907616E-002,-3.79493125E-002,8.59350059E-003,1.02058621E-002,2.48055886E-002,7.16195907E-003,-6.245
                  35345E-002,-3.2572642E-002,4.26035373E-033,-7.57267745E-003,-4.15649898E-002,-4.98135239E-002,1.02479551E-002,3.28872576E-002,1.50394207E-003,-6.39199391E-002,-7.5357914E-002,-2.46184096E-002,-3.06450
                  184E-002,4.16100062E-002,7.04020485E-002,-8.15085992E-002,2.55300701E-002,1.8981915E-002,4.26408909E-002,-2.19986513E-002,7.1362392E-003,-3.42554934E-002,3.70068592E-003,-3.15256999E-003,1.41581176E-0
                  02,5.00134267E-002,7.54985586E-002,6.42605498E-002,7.55612329E-002,1.52721843E-002,1.15661159E-001,-2.45987438E-002,1.08358078E-002,5.02406172E-002,6.28810897E-002,-5.52952439E-002,-5.51969372E-002,-4
                  .60022828E-003,-1.46539817E-002,6.40283972E-002,5.18338121E-002,2.51765884E-002,6.45218417E-002,-8.35603774E-002,3.57579961E-002,6.28177961E-003,3.15946154E-003,2.32084282E-002,4.76812199E-002,-5.4778
                  5498E-003,-1.06323607E-001,-1.45862792E-002,-5.92180602E-002,-1.59236379E-002,-1.90922264E-002,4.61262129E-002,2.41158772E-002,-7.9021994E-003,1.11448221E-001,1.11205485E-002,-2.0573834E-002,-4.086579
                  75E-002,5.54621816E-002,5.37177958E-002,4.76263165E-002,-3.29907499E-002,4.3731384E-002]
                  SQL>

                  10、下载测试用的CSV文本数据

                  我们可以在DML中使用VECTOR_EMBEDDING函数,这是我们将在下一个示例中执行的操作。

                  从《拥抱的脸》中摘录了一些电影台词。

                    cd u01/model
                    wget https://huggingface.co/datasets/ygorgeurts/movie-quotes/resolve/main/movie_quotes.csv?download=true -O movie_quotes.csv

                    11、根据csv内容创建一个新的测试表

                      conn testuser1/testuser1@//localhost:1521/freepdb1


                      drop table if exists movie_quotes purge;


                      create table movie_quotes as
                      select movie_quote, movie, movie_type, movie_year
                      from external (
                      (
                      movie_quote varchar2(400),
                      movie varchar2(200),
                      movie_type varchar2(50),
                      movie_year number(4)
                      )
                      type oracle_loader
                      default directory model_dir
                      access parameters (
                      records delimited by newline
                      skip 1
                      badfile model_dir
                      logfile model_dir:'moview_quotes_ext_tab_%a_%p.log'
                      discardfile model_dir
                      fields csv with embedded terminated by ',' optionally enclosed by '"'
                      missing field values are null
                      (
                      movie_quote char(400),
                      movie,
                      movie_type,
                      movie_year
                      )
                      )
                      location ('movie_quotes.csv')
                      reject limit unlimited
                      );

                      desc movie_quotes
                      Name Null? Type
                      ----------------------------------------- -------- ----------------------------
                      MOVIE_QUOTE VARCHAR2(400)
                      MOVIE VARCHAR2(200)
                      MOVIE_TYPE VARCHAR2(50)
                      MOVIE_YEAR NUMBER(4)
                      SQL>

                      12、添加向量数据列

                      添加字段 movie_quote_vector 数据类型为 vector,用来保存向量数据。

                        alter table movie_quotes add (
                        movie_quote_vector vector
                        );


                        desc movie_quotes
                        Name Null? Type
                        ----------------------------------------- -------- ----------------------------
                        MOVIE_QUOTE VARCHAR2(400)
                        MOVIE VARCHAR2(200)
                        MOVIE_TYPE VARCHAR2(50)
                        MOVIE_YEAR NUMBER(4)
                        MOVIE_QUOTE_VECTOR VECTOR(*, *)
                        SQL>

                        13、生成向量填充新列

                          update movie_quotes
                          set movie_quote_vector = vector_embedding(all_minilm_l12_v2 using movie_quote as data);
                          commit;

                          14、使用 Vector_DISTANCE 进行向量搜索

                          我们使用VECTOR_DISTANCE函数进行搜索。这接受两个向量,顾名思义,返回它们之间的距离。由于我们使用的模型为文本数据生成向量,因此我们预计两个相似的向量的向量距离会更小。在以下示例中,我们根据搜索文本创建一个向量,并根据搜索文本和引用文本之间的向量距离对查询的输出进行排序。

                          首先,我们要求“有励志演讲的电影”。

                            variable search_text varchar2(100);
                            exec :search_text := 'Films with motivational speaking in them';
                            set linesize 200
                            column movie format a50
                            column movie_quote format a100
                            SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
                            movie,
                            movie_quote
                            FROM movie_quotes
                            order by 1
                            fetch approximate first 5 rows only;


                            DISTANCE MOVIE MOVIE_QUOTE
                            ---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
                            6.786E-001 Once Upon a Time in Hollywood That was the best acting i've ever seen in my whole life.
                            6.979E-001 Dead Poets Society You must strive to find your own voice because the longer you wait to begin, the less likely you are
                            going to find it at all.
                            7.169E-001 The Pursuit of Happyness Walk that walk and go forward all the time. Don't just talk that talk, walk it and go forward. Also,
                            the walk didn't have to be long strides; baby steps counted too. Go forward.
                            7.186E-001 Joker My mother always tells me to smile and put on a happy face. She told me I had a purpose to bring lau
                            ghter and joy to the world.
                            7.234E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
                            n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
                            on for Best Supporting Actor.
                            SQL>

                            接下来,我们搜索“关于战争的电影”。

                              variable search_text varchar2(100);
                              exec :search_text := 'Films about war';
                              set linesize 200
                              column movie format a50
                              column movie_quote format a100
                              SELECT vector_distance(movie_quote_vector, (vector_embedding(all_minilm_l12_v2 using :search_text as data))) as distance,
                              movie,
                              movie_quote
                              FROM movie_quotes
                              order by 1
                              fetch approximate first 5 rows only;


                              DISTANCE MOVIE MOVIE_QUOTE
                              ---------- -------------------------------------------------- ----------------------------------------------------------------------------------------------------
                              5.682E-001 Dr. Strangelove Gentlemen, you can't fight in here! This is the War Room!
                              6.346E-001 Blazing Saddles Men, you are about to embark on a great crusade to stamp out runaway decency in the west. Now you me
                              n will only be risking your lives, whilst I will be risking an almost certain Academy Award nominati
                              on for Best Supporting Actor.
                              6.587E-001 Fury Ideals are peaceful; history is violent.
                              7.243E-001 The Kill Team You give me your loyalty, and I?ll guarantee that each and every one of you will have a chance to be
                              a warrior, to actually be a part of history.
                              7.253E-001 Dr. No Bond. James Bond
                              SQL>
                              近期文章:


                              文章转载自山东Oracle用户组,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                              评论