暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL 14 preview - 一些SQL标准函数调用parser 增强

digoal 2020-11-07
256

作者

digoal

日期

2020-11-07

标签

PostgreSQL , SQL


背景

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=40c24bfef92530bd846e111c1742c2a54441c62c

```
Improve our ability to regurgitate SQL-syntax function calls.

The SQL spec calls out nonstandard syntax for certain function calls,
for example substring() with numeric position info is supposed to be
spelled "SUBSTRING(string FROM start FOR count)". We accept many
of these things, but up to now would not print them in the same format,
instead simplifying down to "substring"(string, start, count).
That's long annoyed me because it creates an interoperability
problem: we're gratuitously injecting Postgres-specific syntax into
what might otherwise be a perfectly spec-compliant view definition.
However, the real reason for addressing it right now is to support
a planned change in the semantics of EXTRACT() a/k/a date_part().
When we switch that to returning numeric, we'll have the parser
translate EXTRACT() to some new function name (might as well be
"extract" if you ask me) and then teach ruleutils.c to reverse-list
that per SQL spec. In this way existing calls to date_part() will
continue to have the old semantics.

To implement this, invent a new CoercionForm value COERCE_SQL_SYNTAX,
and make the parser insert that rather than COERCE_EXPLICIT_CALL when
the input has SQL-spec decoration. (But if the input has the form of
a plain function call, continue to mark it COERCE_EXPLICIT_CALL, even
if it's calling one of these functions.) Then ruleutils.c recognizes
COERCE_SQL_SYNTAX as a cue to emit SQL call syntax. It can know
which decoration to emit using hard-wired knowledge about the
functions that could be called this way. (While this solution isn't
extensible without manual additions, neither is the grammar, so this
doesn't seem unmaintainable.) Notice that this solution will
reverse-list a function call with SQL decoration only if it was
entered that way; so dump-and-reload will not by itself produce any
changes in the appearance of views.

This requires adding a CoercionForm field to struct FuncCall.
(I couldn't resist the temptation to rearrange that struct's
field order a tad while I was at it.) FuncCall doesn't appear
in stored rules, so that change isn't a reason for a catversion
bump, but I did one anyway because the new enum value for
CoercionForm fields could confuse old backend code.

Possible future work:

  • Perhaps CoercionForm should now be renamed to DisplayForm,
    or something like that, to reflect its more general meaning.
    This'd require touching a couple hundred places, so it's not
    clear it's worth the code churn.

  • The SQLValueFunction node type, which was invented partly for
    the same goal of improving SQL-compatibility of view output,
    could perhaps be replaced with regular function calls marked
    with COERCE_SQL_SYNTAX. It's unclear if this would be a net
    code savings, however.

Discussion: https://postgr.es/m/42b73d2d-da12-ba9f-570a-420e0cce19d9@phystech.edu
```

扩展的SQL语法如下

```
+-- reverse-listing of various special function syntaxes required by SQL
+create view tt201v as
+select
+ extract(day from now()) as extr,
+ (now(), '1 day'::interval) overlaps
+ (current_timestamp(2), '1 day'::interval) as o,
+ 'foo' is normalized isn,
+ 'foo' is nfkc normalized isnn,
+ normalize('foo') as n,
+ normalize('foo', nfkd) as nfkd,
+ overlay('foo' placing 'bar' from 2) as ovl,
+ overlay('foo' placing 'bar' from 2 for 3) as ovl2,
+ position('foo' in 'foobar') as p,
+ substring('foo' from 2 for 3) as s,
+ substring('foo' similar 'f' escape '#') as ss,
+ substring('foo' from 'oo') as ssf, -- historically-permitted abuse
+ trim(' ' from ' foo ') as bt,
+ trim(leading ' ' from ' foo ') as lt,
+ trim(trailing ' foo ') as rt;

+select pg_get_viewdef('tt201v', true);
+ pg_get_viewdef
+-----------------------------------------------------------------------------------------------
+ SELECT date_part('day'::text, now()) AS extr, +
+ ((now(), '@ 1 day'::interval) OVERLAPS (CURRENT_TIMESTAMP(2), '@ 1 day'::interval)) AS o,+
+ (('foo'::text) IS NORMALIZED) AS isn, +
+ (('foo'::text) IS NFKC NORMALIZED) AS isnn, +
+ NORMALIZE('foo'::text) AS n, +
+ NORMALIZE('foo'::text, NFKD) AS nfkd, +
+ OVERLAY('foo'::text PLACING 'bar'::text FROM 2) AS ovl, +
+ OVERLAY('foo'::text PLACING 'bar'::text FROM 2 FOR 3) AS ovl2, +
+ POSITION(('foo'::text) IN ('foobar'::text)) AS p, +
+ SUBSTRING('foo'::text FROM 2 FOR 3) AS s, +
+ SUBSTRING('foo'::text SIMILAR 'f'::text ESCAPE '#'::text) AS ss, +
+ "substring"('foo'::text, 'oo'::text) AS ssf, +
+ TRIM(BOTH ' '::text FROM ' foo '::text) AS bt, +
+ TRIM(LEADING ' '::text FROM ' foo '::text) AS lt, +
+ TRIM(TRAILING FROM ' foo '::text) AS rt;
+(1 row)
+
```

执行一下以上view, 把alias column name去掉, 就可以看到对应的function call.

``` postgres=> select TRIM(TRAILING FROM ' foo '::text); rtrim


foo (1 row)

postgres=> select POSITION(('foo'::text) IN ('foobar'::text)) ; position


    1

(1 row) ```

at time zone用法

```
explain (costs off)
select * from tmptz where f1 at time zone 'utc' = '2017-01-18 00:00';
- QUERY PLAN


  • QUERY PLAN
    +-----------------------------------------------------------------------------------------------------
    Seq Scan on tmptz
  • Filter: (timezone('utc'::text, f1) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)
  • Filter: ((f1 AT TIME ZONE 'utc'::text) = 'Wed Jan 18 00:00:00 2017'::timestamp without time zone)
    (2 rows)
    ```

https://www.postgresql.org/docs/devel/functions-datetime.html

In addition to these functions, the SQL OVERLAPS operator is supported:

(start1, end1) OVERLAPS (start2, end2) (start1, length1) OVERLAPS (start2, length2)

This expression yields true when two time periods (defined by their endpoints) overlap, false when they do not overlap. The endpoints can be specified as pairs of dates, times, or time stamps; or as a date, time, or time stamp followed by an interval. When a pair of values is provided, either the start or the end can be written first; OVERLAPS automatically takes the earlier value of the pair as the start. Each time period is considered to represent the half-open interval start <= time < end, unless start and end are equal in which case it represents that single time instant. This means for instance that two time periods with only an endpoint in common do not overlap.

SELECT (DATE '2001-02-16', DATE '2001-12-21') OVERLAPS (DATE '2001-10-30', DATE '2002-10-30'); Result: true SELECT (DATE '2001-02-16', INTERVAL '100 days') OVERLAPS (DATE '2001-10-30', DATE '2002-10-30'); Result: false SELECT (DATE '2001-10-29', DATE '2001-10-30') OVERLAPS (DATE '2001-10-30', DATE '2001-10-31'); Result: false SELECT (DATE '2001-10-30', DATE '2001-10-30') OVERLAPS (DATE '2001-10-30', DATE '2001-10-31'); Result: true

https://www.postgresql.org/docs/devel/functions-string.html

text IS [NOT] [form] NORMALIZED → boolean

Checks whether the string is in the specified Unicode normalization form. The optional form key word specifies the form: NFC (the default), NFD, NFKC, or NFKD. This expression can only be used when the server encoding is UTF8. Note that checking for normalization using this expression is often faster than normalizing possibly already normalized strings.

U&'\0061\0308bc' IS NFD NORMALIZED → t

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论