暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Oracle 域索引中点 ('.') 的问题

ASKTOM 2021-01-06
514

问题描述

亲爱的团队,

我希望你们都做得很好!

我想要求一些提示来解决以下问题:
我们有一个巨大的数据集 (>60 mio) 的项目。每个项目都有一个项目编号。项目编号具有数据类型varchar2(50)。
物料编号的格式因供应商而异 [请参阅下面示例中的第03-07行]。
对于正在搜索项目的用户来说,要记住正确的符号是相当困难的。
为了支持搜索项目编号,我们希望跳过所有这些字符 '。/,=?{} \()[]-;~ | $!>* % _ &' [请参考下面示例中的第16行]。

这是我们的first solution可悲的是not working不出所料:
--------------------------------------------------
/*line: 01*/ /***PREPARATION***/
/*line: 02*/ create table test_item_number (item_number varchar2(50));
/*line: 03*/ insert into test_item_number values ('1.234。5');
/*line: 04*/ insert into test_item_number values ('1-25466');
/*line: 05*/ insert into test_item_number values ('001/007');
/*line: 06*/ insert into test_item_number values ('AA.123');
/*line: 07*/ insert into test_item_number values ('ABC.PD');
/*line: 08*/ commit;
/*line: 09*/ 
/*line: 10*/ /***DOMAIN PREFERENCES***/
/*line: 12*/ --grant execute on ctx_ddl to hr (if necessary);
/*line: 13*/ BEGIN
/*line: 14*/     --LEXER
/*line: 15*/  ctx_ddl.create_preference('MyLexer', 'BASIC_LEXER');
/*line: 16*/  ctx_ddl.set_attribute('MyLexer' , 'SKIPJOINS','./,=?{}\()[]-;~|$!>*%_&');
/*line: 17*/  
/*line: 18*/  --WORDLIST
/*line: 19*/  ctx_ddl.create_preference('MyWordList', 'BASIC_WORDLIST');
/*line: 20*/  ctx_ddl.set_attribute('MyWordList','PREFIX_INDEX','TRUE');    
/*line: 21*/  ctx_ddl.set_attribute('MyWordList','PREFIX_MIN_LENGTH', '2');
/*line: 22*/  ctx_ddl.set_attribute('MyWordList','PREFIX_MAX_LENGTH','50');
/*line: 23*/  ctx_ddl.set_attribute('MyWordList','SUBSTRING_INDEX','YES');
/*line: 24*/    ctx_ddl.set_attribute('MyWordList','REVERSE_INDEX','TRUE'); 
/*line: 25*/ END;
/*line: 26*/ /***DOMAIN INDEX***/
/*line: 27*/ create index IDX_ITEM_NUMBER 
/*line: 28*/           on test_item_number(item_number)
/*line: 29*/        indextype is ctxsys.context
/*line: 30*/   parameters ('LEXER MyLexer WORDLIST  MyWordList SYNC (ON COMMIT)');

--------------------------------------------------
这将导致以下搜索问题:
从test_item_number中选择 * 其中包含 (item_number,'aa1% ') >0; -0行
从test_item_number中选择 * 其中包含 (item_number,'1234%') >0; -0行

创建域索引仍然包括点 ('.')。这里是创建的索引表 '$ i' (包括我的想法,这可能是什么原因):
-显示索引条目
从d r $ idx_item_number $ i中选择token_text;
结果:
00
001
0010
00100
001007
001007
1. -- 因为numjoin?
1.2
1.23
1.234
1.234。
1.234。5
1.234。5
12
123
123
125
1254
12546
125466
125466
因为标点符号?
AA
AB
ABC
ABCP
ABCPD
ABCPD

到aviod点 ('.') 在索引中,我正在尝试此解决方法:
--------------------------------------------------
/***WORKAROUND 1 (NOT WORKING)***/
create index IDX_ITEM_NUMBER_REPLACE 
          on test_item_number(replace(item_number,'.',''))
       indextype is ctxsys.context
  parameters ('LEXER     MyLexer 
      WORDLIST  MyWordList
      SYNC      (ON COMMIT)');
--error: DRG-11304: function-based indexes are not supported by this indextype.

/***WORKAROUND 2 (NOT WORKING)***/
BEGIN
  --LEXER
  ctx_ddl.set_attribute('MyLexer' , 'NUMJOIN',' ');
  ctx_ddl.set_attribute('MyLexer' , 'PUNCTUATIONS',' ');
END;
drop index IDX_ITEM_NUMBER;
create index IDX_ITEM_NUMBER 
          on test_item_number(item_number)
       indextype is ctxsys.context
  parameters ('LEXER MyLexer WORDLIST  MyWordList SYNC (ON COMMIT)');
--the index entries (dr$idx_item_number$i) are the same like above 

--------------------------------------------------

关于这个问题我有以下questions:
(1) 我的想法是否正确,为什么域索引包含点 ('.') -由于numjoin/标点符号?
(2) 在域索引首选项中,我还能做些什么来避免索引任何点 ('.')?
(3) 如果问题2没有解决方案,是否还有另一种简单的解决方法 (也许没有不可见或虚拟列)?
(4) 我也可以使用普通索引,但是左截断的通配符查询将非常慢-或者我也可以在普通索引中进行改进吗?

THANK you very much for you thoughts!

祝你有美好的一天,
沃尔特

专家解答

我联系了下午的罗杰·福特 (Roger Ford),以获取有关此的Oracle文本。他这样说:

Workaround 2 works - setting NUMJOIN to an empty string; shouldn't be necessary to change PUNCTUATIONS in this example.

It's best if you only look at TOKEN_TYPE=0 when listing the contents of $I - saves confusion with the prefix index stuff which is TOKEN_TYPE=6.

The first workaround won't work as-is because you can't create a domain index directly on a function. However, you can use a function as part (or all) of the COLUMNS list in a MULTI_COLUMN_DATASTORE.


并提供了这些测试用例:

案例1

create table test_item_number (item_number varchar2(50));
insert into test_item_number values ('1.234.5');
insert into test_item_number values ('1-25466');
insert into test_item_number values ('001/007');
insert into test_item_number values ('AA.123');
insert into test_item_number values ('ABC.PD');
commit;

/*** DOMAIN PREFERENCES ***/
--grant execute on ctx_ddl to hr (if necessary);
BEGIN
    --LEXER
 ctx_ddl.create_preference('MyLexer', 'BASIC_LEXER'); 
 ctx_ddl.set_attribute('MyLexer' , 'SKIPJOINS','./,=?{}\()[]-;~|$!>*%_&');
 ctx_ddl.set_attribute('MyLexer' , 'NUMJOIN',' ');

 --WORDLIST
 ctx_ddl.create_preference('MyWordList', 'BASIC_WORDLIST');
 ctx_ddl.set_attribute('MyWordList','PREFIX_INDEX','TRUE');    
 ctx_ddl.set_attribute('MyWordList','PREFIX_MIN_LENGTH', '2');
 ctx_ddl.set_attribute('MyWordList','PREFIX_MAX_LENGTH','50');
 ctx_ddl.set_attribute('MyWordList','SUBSTRING_INDEX','YES');
--   ctx_ddl.set_attribute('MyWordList','REVERSE_INDEX','TRUE'); 
END;
/

/*** DOMAIN INDEX ***/
create index IDX_ITEM_NUMBER 
          on test_item_number(item_number)
       indextype is ctxsys.context
  parameters ('LEXER MyLexer WORDLIST  MyWordList SYNC (ON COMMIT)');

--show index entries
select token_text from dr$idx_item_number$i where token_type = 0;

TOKEN_TEXT   
001007        
12345         
125466        
AA123         
ABCPD

select * from test_item_number
where  contains ( item_number, '1234%' ) > 0;

ITEM_NUMBER   
1.234.5   

select * 
from   test_item_number 
where  contains ( item_number, 'AA1%' ) > 0; 

ITEM_NUMBER   
AA.123     


案例2

/*** PREPARATION ***/
create table test_item_number (item_number varchar2(50));
insert into test_item_number values ('1.234.5');
insert into test_item_number values ('1-25466');
insert into test_item_number values ('001/007');
insert into test_item_number values ('AA.123');
insert into test_item_number values ('ABC.PD');
commit;

/*** DOMAIN PREFERENCES ***/
--grant execute on ctx_ddl to hr (if necessary);
BEGIN
    --LEXER
 ctx_ddl.create_preference('MyLexer', 'BASIC_LEXER'); 
 ctx_ddl.set_attribute('MyLexer' , 'SKIPJOINS','./,=?{}\()[]-;~|$!>*%_&');

 --WORDLIST
 ctx_ddl.create_preference('MyWordList', 'BASIC_WORDLIST');
 ctx_ddl.set_attribute('MyWordList','PREFIX_INDEX','TRUE');    
 ctx_ddl.set_attribute('MyWordList','PREFIX_MIN_LENGTH', '2');
 ctx_ddl.set_attribute('MyWordList','PREFIX_MAX_LENGTH','50');
 ctx_ddl.set_attribute('MyWordList','SUBSTRING_INDEX','YES');
-- only in recent versions:   ctx_ddl.set_attribute('MyWordList','REVERSE_INDEX','TRUE'); 
 ctx_ddl.create_preference('MyDatastore', 'MULTI_COLUMN_DATASTORE');
 ctx_ddl.set_attribute('MyDatastore', 'COLUMNS', 'replace(ITEM_NUMBER,''.'','''')');
 ctx_ddl.set_attribute('MyDatastore', 'DELIMITER', 'newline'); /* avoid adding a tag */
END;
/

/*** DOMAIN INDEX ***/
create index IDX_ITEM_NUMBER 
          on test_item_number(item_number)
       indextype is ctxsys.context
  parameters ('LEXER MyLexer WORDLIST  MyWordList DATASTORE MyDatastore SYNC (ON COMMIT)');

--show index entries
select token_text from dr$idx_item_number$i where token_type = 0;

TOKEN_TEXT   
001007        
12345         
125466        
AA123         
ABCPD 

select * from test_item_number
where  contains ( item_number, '1234%' ) > 0;

ITEM_NUMBER   
1.234.5 

select * 
from   test_item_number 
where  contains ( item_number, 'AA1%' ) > 0; 

ITEM_NUMBER   
AA.123    

文章转载自ASKTOM,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论