暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL Oracle/sql server/SYBASE 兼容性之 - nchar , nvarchar , ntext (unicode character)

digoal 2018-08-18
436

作者

digoal

日期

2018-08-18

标签

PostgreSQL , Oracle , nchar , nvarchar , ntext , ms sql , sybase , unicode character , utf8


背景

在Oracle, MS SQL, SYBASE数据库引擎中,有一些这样的字符串类型nchar , nvarchar , ntext。

这些类型是什么含义?在PostgreSQL中应该使用什么类型与之对应?

以MS SQL为例

https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2016

Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

nchar [ ( n ) ]

Fixed-length Unicode string data. n defines the string length and must be a value from 1 through 4,000.

实际上,就两层含义

1、存储的是unicode编码的字符串,使用UNICODE UCS-2字符集。

2、长度指的是字符个数,而非字节数。

nchar , nvarchar , ntext 在 PostgreSQL的对应关系

介于以上介绍的两点,只要满足以下条件,char,varchar,text即对应到了nchar , nvarchar , ntext类型。

1、在PostgreSQL中使用UTF8字符集时,实际上就是unicode(别名)。

https://www.postgresql.org/docs/devel/static/multibyte.html

Name |Description |Language |Server? |ICU? |Bytes/Char |Aliases
---|---|---|---|---|---|---
UTF8 |Unicode, 8-bit |all |Yes |Yes |1-4 |Unicode

满足以上条件即可用char,varchar,text直接替代nchar,nvarchar,ntext,因为在PostgreSQL中char(n),varchar(n),任何时候就是指的字符长度(而不是字节长度)。

如果PG 目标DB不是utf8字符集,又当如何

可以有两种方法:

1、实际上依旧可以使用char, varchar, text类型存储(长度限制与上游保持一致即可),只是建议业务方做一下字符集转换后存入PG。

源端存储的 unicode 字符串转换为 PG目标库的目标字符集字符串。

```
postgres=# \df convert*
List of functions
Schema | Name | Result data type | Argument data types | Type
------------+--------------+------------------+---------------------+------
pg_catalog | convert | bytea | bytea, name, name | func
pg_catalog | convert_from | text | bytea, name | func
pg_catalog | convert_to | bytea | text, name | func
(3 rows)

dbtest1=# select convert_to(N'你好中国','sqlascii');
convert_to


\xc4e3bac3d6d0b9fa
(1 row)

dbtest1=# select convert_to(N'你好中国','utf8');
convert_to


\xe4bda0e5a5bde4b8ade59bbd
(1 row)
```

2、或者你在PG中可以使用字节流存储 来自源库unicode 字符串的字节流,读取时再转换为当前字符集。

```
dbtest1=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access privileges
-----------+----------+----------+------------+------------+-----------------------
dbtest1 | postgres | EUC_CN | C | C |

create table test (id int, content bytea);

insert into test values (1, convert_to(N'你好中国','utf8'));

dbtest1=# select * from test;
id | content
----+----------------------------
1 | \xe4bda0e5a5bde4b8ade59bbd
(1 row)

dbtest1=# select convert_from(content,'utf8') from test;
convert_from


你好中国
(1 row)
```

PostgreSQL nchar , nvarchar , ntext 的输入语法

N quote

```
dbtest1=# select N'abc你好中国';
bpchar


abc你好中国
(1 row)
```

参考

1、src/backend/parser/scan.l

```
/
* To ensure that {quotecontinue} can be scanned without having to back up
* if the full pattern isn't matched, we include trailing whitespace in
* {quotestop}. This matches all cases where {quotecontinue} fails to match,
* except for {quote} followed by whitespace and just one "-" (not two,
* which would start a {comment}). To cover that we have {quotefail}.
* The actions for {quotestop} and {quotefail} must throw back characters
* beyond the quote proper.
/
quote '
quotestop {quote}{whitespace}
quotecontinue {quote}{whitespace_with_newline}{quote}
quotefail {quote}{whitespace}
"-"

/ Bit string
* It is tempting to scan the string for only those characters
* which are allowed. However, this leads to silently swallowed
* characters if illegal characters are included in the string.
* For example, if xbinside is [01] then B'ABCD' is interpreted
* as a zero-length string, and the ABCD' is lost!
* Better to pass the string forward and let the input routines
* validate the contents.
/
xbstart [bB]{quote}
xbinside [^']*

/ Hexadecimal number /
xhstart [xX]{quote}
xhinside [^']*
```

输入nchar,nvarchar方法:

```
/ National character /
xnstart [nN]{quote}

/ Quoted string that allows backslash escapes /
xestart [eE]{quote}
xeinside [^\']+
xeescape [\][^0-7]
xeoctesc [\][0-7]{1,3}
xehexesc [\]x[0-9A-Fa-f]{1,2}
xeunicode \
xeunicodefail \

/ Extended quote
* xqdouble implements embedded quote, ''''
/
xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+

/ $foo$ style quotes ("dollar quoting")
* The quoted string starts with $foo$ where "foo" is an optional string
* in the form of an identifier, except that it may not contain "$",
* and extends to the first occurrence of an identical string.
* There is
no processing of the quoted text.

* {dolqfailed} is an error rule to avoid scanner backup when {dolqdelim}
* fails to match its trailing "$".
/
dolq_start [A-Za-z\200-\377_]
dolq_cont [A-Za-z\200-\377_0-9]
dolqdelim \$({dolq_start}{dolq_cont}
)?\$
dolqfailed \${dolq_start}{dolq_cont}*
dolqinside [^$]+

/ Double quote
* Allows embedded spaces and other special characters into identifiers.
/
dquote \"
xdstart {dquote}
xdstop {dquote}
xddouble {dquote}{dquote}
xdinside [^"]+

/ Unicode escapes /
uescape [uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}[^']{quote}
/
error rule to avoid backup /
uescapefail [uU][eE][sS][cC][aA][pP][eE]{whitespace}
"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]

/ Quoted identifier with Unicode escapes /
xuistart [uU]&{dquote}

/ Quoted string with Unicode escapes /
xusstart [uU]&{quote}

/ Optional UESCAPE after a quoted string or identifier with Unicode escapes. /
xustop1 {uescapefail}?
xustop2 {uescape}

/ error rule to avoid backup /
xufailed [uU]&
```

2、https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2016

3、https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2016

4、https://www.postgresql.org/docs/devel/static/multibyte.html

5、https://stackoverflow.com/questions/1245217/what-is-the-postgresql-equivalent-to-sql-server-nvarchar

It's varchar and text, assuming your database is in UNICODE encoding.

If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.

6、各种数据库引擎(postgresql, mysql, oracle, ms sql, sql lite等)的WEB版SQL在线测试运行环境

http://www.sqlfiddle.com/

7、《PostgreSQL 转义、UNICODE、与SQL注入》

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论