Oracle REGEXP_REPLACE匹配参数m和n

askTom 2017-04-25

509

问题描述

我一直在尝试编写一个查询，该查询将从另一个查询中删除注释。这是带有注释的小示例查询，我想从中删除注释:

WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
  FROM dual
 MODEL DIMENSION BY (1 as line#)
       MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
) select * from sample_SQL;

LINE# TXT
----- --------------------------------------------
    1 SELECT count(*) cnt -- comment type#1
    2  FROM dual /* comment type #2 */ table_alias
    3  WHERE 1 = 1 /* start multiline comment
    4  ... other comments here ...
    5  ... and more here ...
    6  ... end of comment */ AND 2 = 2
    7 ;

7 rows selected.

我想出了如何使用分析函数删除注释; 这是相当直接的。但是，如何使用正则表达式并不明显。我以为我可以使用匹配参数m和n; 文档指出:

m -- expression假定有多行，其中 ^ 是一行的开始，$ 是一行的结束，而不管这些字符在expression中的位置。默认情况下，表达式假定为单行。
n -- 允许句点字符 (.) 与换行符匹配。默认情况下，句点是通配符。

但是，如何获取匹配参数对我来说并不明显。我找不到使用参数的任何示例。

我可以炮制的最接近的正则表达式查询使用LISTAGG，因此它受到varchar2的字符大小限制。(它不需要匹配参数; 另请参阅liveSQL)。

WITH sample_SQL ( line#, txt ) AS (
SELECT line#, txt
  FROM dual
 MODEL DIMENSION BY (1 as line#)
       MEASURES (cast(null as varchar2(50)) as txt)
RULES
( txt[1] = 'SELECT count(*) cnt -- comment type#1'
, txt[2] = ' FROM dual /* comment type #2 */ table_alias'
, txt[3] = ' WHERE 1 = 1 /* start multiline comment'
, txt[4] = ' ... other comments here ...'
, txt[5] = ' ... and more here ...'
, txt[6] = ' ... end of comment */ AND 2 = 2'
, txt[7] = ';'
) ORDER BY line#
),
concat_SQL as (
SELECT listagg(txt,chr(10))within group(order by line#) txt
  FROM sample_SQL
),
parms AS (
SELECT r#, pt, rr, dx
  FROM dual
 MODEL DIMENSION BY (1 as r#)
       MEASURES(cast(null as varchar2(50)) as pt
               ,cast(null as varchar2(50)) as rr
               ,cast(null as varchar2(50)) as dx)
RULES
( pt[1] = '--.+'||chr(10)            , rr[1]=chr(10) , dx[1]='double-dash'
, pt[2] = '/\*(.|'||chr(10)||')*?\*/', rr[2]=''      , dx[2]='slash/asterisk'
)),
rCTE (r#, txt, dx ) AS (
SELECT 0, txt, 'original'
  FROM concat_SQL
UNION ALL
SELECT parms.r#
     , REGEXP_REPLACE(rCTE.txt,parms.pt,parms.rr)
     , parms.dx
  FROM rCTE
  JOIN parms
    ON rCTE.r# + 1 = parms.r#
)
SELECT *
  FROM rCTE
 ORDER BY r#;

 R# TXT                                                DX
--- -------------------------------------------------- -------------------
  0 SELECT count(*) cnt -- comment type#1              original
     FROM dual /* comment type #2 */ table_alias
     WHERE 1 = 1 /* start multiline comment
     ... other comments here ...
     ... and more here ...
     ... end of comment */ AND 2 = 2
    ;

  1 SELECT count(*) cnt                                double-dash
     FROM dual /* comment type #2 */ table_alias
     WHERE 1 = 1 /* start multiline comment
     ... other comments here ...
     ... and more here ...
     ... end of comment */ AND 2 = 2
    ;

  2 SELECT count(*) cnt                                slash/asterisk
     FROM dual  table_alias
     WHERE 1 = 1  AND 2 = 2
    ;

我可以在不连接所有行的情况下使用SQL正则表达式解决这个问题吗？m和n何时可用作匹配参数？谢谢!请随意添加您的评论... :)

专家解答

当然，这里有一个使用 'n' 的简单示例，所以周期也匹配换行符:

with rws as (
  select 'this string
has a newline' str from dual
)
select str two_lines, regexp_replace(str, '.has', ' doesn''t have', 1, 1, 'n') one_line
from   rws;

TWO_LINES                  ONE_LINE                            
this string
has a newline  this string doesn't have a newline

你的例子的问题是线条是分开的rows。传递给regexp_replace的参数并不重要。每一行都被单独考虑。你必须以某种方式将它们粘在一起，使一行这样regexp_replace可以在许多行上匹配一个字符串。

oracle asktom

「喜欢这篇文章，您的关注和赞赏是给作者最好的鼓励」

关注作者