PostgreSQL Oracle/sql server/SYBASE 兼容性之 - nchar , nvarchar , ntext (unicode character)

digoal 2018-08-18

436

作者

digoal

日期

2018-08-18

背景

在Oracle, MS SQL, SYBASE数据库引擎中，有一些这样的字符串类型nchar , nvarchar , ntext。

这些类型是什么含义？在PostgreSQL中应该使用什么类型与之对应？

以MS SQL为例

https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2016

Character data types that are either fixed-length, nchar, or variable-length, nvarchar, Unicode data and use the UNICODE UCS-2 character set.

nchar [ ( n ) ]

Fixed-length Unicode string data. n defines the string length and must be a value from 1 through 4,000.

实际上，就两层含义

1、存储的是unicode编码的字符串，使用UNICODE UCS-2字符集。

2、长度指的是字符个数，而非字节数。

nchar , nvarchar , ntext 在 PostgreSQL的对应关系

介于以上介绍的两点，只要满足以下条件，char,varchar,text即对应到了nchar , nvarchar , ntext类型。

1、在PostgreSQL中使用UTF8字符集时，实际上就是unicode（别名）。

https://www.postgresql.org/docs/devel/static/multibyte.html

Name |Description |Language |Server? |ICU? |Bytes/Char |Aliases
---|---|---|---|---|---|---
UTF8 |Unicode, 8-bit |all |Yes |Yes |1-4 |Unicode

满足以上条件即可用char,varchar,text直接替代nchar,nvarchar,ntext，因为在PostgreSQL中char(n),varchar(n)，任何时候就是指的字符长度（而不是字节长度）。

如果PG 目标DB不是utf8字符集，又当如何

可以有两种方法：

1、实际上依旧可以使用char, varchar, text类型存储(长度限制与上游保持一致即可)，只是建议业务方做一下字符集转换后存入PG。

源端存储的 unicode 字符串转换为 PG目标库的目标字符集字符串。

dbtest1=# select convert_to(N'你好中国','sqlascii');
convert_to

\xc4e3bac3d6d0b9fa
(1 row)

dbtest1=# select convert_to(N'你好中国','utf8');
convert_to

\xe4bda0e5a5bde4b8ade59bbd
(1 row)
```

2、或者你在PG中可以使用字节流存储来自源库unicode 字符串的字节流，读取时再转换为当前字符集。

create table test (id int, content bytea);

insert into test values (1, convert_to(N'你好中国','utf8'));

dbtest1=# select * from test;
id | content
----+----------------------------
1 | \xe4bda0e5a5bde4b8ade59bbd
(1 row)

dbtest1=# select convert_from(content,'utf8') from test;
convert_from

你好中国
(1 row)
```

PostgreSQL nchar , nvarchar , ntext 的输入语法

N quote

```
dbtest1=# select N'abc你好中国';
bpchar

abc你好中国
(1 row)
```

参考

1、src/backend/parser/scan.l

```
/
* To ensure that {quotecontinue} can be scanned without having to back up
* if the full pattern isn't matched, we include trailing whitespace in
* {quotestop}. This matches all cases where {quotecontinue} fails to match,
* except for {quote} followed by whitespace and just one "-" (not two,
* which would start a {comment}). To cover that we have {quotefail}.
* The actions for {quotestop} and {quotefail} must throw back characters
* beyond the quote proper.
/
quote '
quotestop {quote}{whitespace}
quotecontinue {quote}{whitespace_with_newline}{quote}
quotefail {quote}{whitespace}"-"

/ Bit string
* It is tempting to scan the string for only those characters
* which are allowed. However, this leads to silently swallowed
* characters if illegal characters are included in the string.
* For example, if xbinside is [01] then B'ABCD' is interpreted
* as a zero-length string, and the ABCD' is lost!
* Better to pass the string forward and let the input routines
* validate the contents.
/
xbstart [bB]{quote}
xbinside [^']*

/ Hexadecimal number /
xhstart [xX]{quote}
xhinside [^']*
```

输入nchar,nvarchar方法:

```
/ National character /
xnstart [nN]{quote}

/ Quoted string that allows backslash escapes /
xestart [eE]{quote}
xeinside [^\']+
xeescape [\][^0-7]
xeoctesc [\][0-7]{1,3}
xehexesc [\]x[0-9A-Fa-f]{1,2}
xeunicode \
xeunicodefail \

/ Extended quote
* xqdouble implements embedded quote, ''''
/
xqstart {quote}
xqdouble {quote}{quote}
xqinside [^']+

/ $foo$ style quotes ("dollar quoting")
* The quoted string starts with $foo$ where "foo" is an optional string
* in the form of an identifier, except that it may not contain "$",
* and extends to the first occurrence of an identical string.
* There is no processing of the quoted text.

* {dolqfailed} is an error rule to avoid scanner backup when {dolqdelim}
* fails to match its trailing "$".
/
dolq_start [A-Za-z\200-\377_]
dolq_cont [A-Za-z\200-\377_0-9]
dolqdelim \$({dolq_start}{dolq_cont})?\$
dolqfailed \${dolq_start}{dolq_cont}*
dolqinside [^$]+

/ Double quote
* Allows embedded spaces and other special characters into identifiers.
/
dquote \"
xdstart {dquote}
xdstop {dquote}
xddouble {dquote}{dquote}
xdinside [^"]+

/ Unicode escapes /
uescape [uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}[^']{quote}
/ error rule to avoid backup /
uescapefail [uU][eE][sS][cC][aA][pP][eE]{whitespace}"-"|[uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}[^']|[uU][eE][sS][cC][aA][pP][eE]{whitespace}{quote}|[uU][eE][sS][cC][aA][pP][eE]{whitespace}*|[uU][eE][sS][cC][aA][pP]|[uU][eE][sS][cC][aA]|[uU][eE][sS][cC]|[uU][eE][sS]|[uU][eE]|[uU]

/ Quoted identifier with Unicode escapes /
xuistart [uU]&{dquote}

/ Quoted string with Unicode escapes /
xusstart [uU]&{quote}

/ Optional UESCAPE after a quoted string or identifier with Unicode escapes. /
xustop1 {uescapefail}?
xustop2 {uescape}

/ error rule to avoid backup /
xufailed [uU]&
```

2、https://docs.microsoft.com/en-us/sql/t-sql/data-types/nchar-and-nvarchar-transact-sql?view=sql-server-2016

3、https://docs.microsoft.com/en-us/sql/t-sql/data-types/data-types-transact-sql?view=sql-server-2016

4、https://www.postgresql.org/docs/devel/static/multibyte.html

5、https://stackoverflow.com/questions/1245217/what-is-the-postgresql-equivalent-to-sql-server-nvarchar

It's varchar and text, assuming your database is in UNICODE encoding.

If your database is in a non-UNICODE encoding, there is no special datatype that will give you a unicode string - you can store it as a bytea stream, but that will not be a string.

6、各种数据库引擎（postgresql, mysql, oracle, ms sql, sql lite等）的WEB版SQL在线测试运行环境

http://www.sqlfiddle.com/

7、《PostgreSQL 转义、UNICODE、与SQL注入》

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL Oracle/sql server/SYBASE 兼容性之 - nchar , nvarchar , ntext (unicode character)

作者

日期

标签

背景

nchar , nvarchar , ntext 在 PostgreSQL的对应关系

如果PG 目标DB不是utf8字符集，又当如何

PostgreSQL nchar , nvarchar , ntext 的输入语法

参考

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论