Python 字符串前缀

生有可恋 2022-07-03

3796

常用 python 字符串前缀有 u 、r、b、f，我们以python3为例，不再记忆各种前缀在python2 和 python3 间的差异。

u'中文‘ unicode 字符串
r'C:\Python\Python310' raw字符串
b'\xe4\xb8\xad\xe6\x96\x87' 字节字符串
f"He said his name is {name!r}." 格式字符串

在 python3 中字符串前缀u 已经没有意义，加 u 和不加 u 都表示是 unicode 类型的 str。可以这么理解，python3 中的字符串是 unicode 编码的，可以用来表达各国语言。unicode 有多种编码可选，而 python3 的源码使用的是 UTF-8，所以在程序中默认使用的是UTF-8编码。

>>> a = u'中文'
>>> a
'中文'
>>> type(a)
<class 'str'>
>>> a = '中文'
>>> a
'中文'
>>> type(a)
<class 'str'>

字符串只存在于内存中，当字符串需要保存到文件中时，就需要对字符串进行编码。我们以上面的例子 u'中文’ 字符串保存到文件中时需要占用多个个字节？我们查看一下 bytes 类型的字符串即可知道答案：

>>> a = '中文'
>>> a.encode()
b'\xe4\xb8\xad\xe6\x96\x87'
>>> b = b'\xe4\xb8\xad\xe6\x96\x87'
>>> b.decode()
'中文'
>>> type(b)
<class 'bytes'>
>>> type(a)
<class 'str'>
>>> len(b)
6
>>> len(a)
2

以 b 作为前缀的字符串已经不是str类型，它实际上是 bytes 类型。刚才我们提到 python3 默认是以 UTF-8 进行编码的，当 encode() decode() 不加参数时，默认使用 UTF-8 进行编解码。同样打开文件时，如果不指明编码，也是以 UTF-8 格式进行打开或保存。

读文件的例子：

try:
    with open('/tmp/input.txt', 'r') as f:
        lines = f.readlines()
except OSError:
    print("File not found"

写文件的例子：

rec = "/tmp/records.log"
with open(rec, "w") as f:
    f.write("中文\n")

字符串 u'中文' 长度为2，但存储到文件中占6个字节，在这个例子中，使用UTF-8编码，以3个字节长来编码中文。UTF-8是变长编码，字母编码占的空间小，汉字编码占的空间大。

字节串字面值要加前缀 'b' 或 'B'；生成的是类型 bytes 的实例，不是类型 str 的实例；字节串只能包含 ASCII 字符；字节串数值大于等于 128 时，必须用转义表示。

str 类型的字符串支持三种表达方式：

Single quotes: 'allows embedded "double" quotes'
Double quotes: "allows embedded 'single' quotes"
Triple quoted: '''Three single quotes'''
, """Three double quotes""

单引号中允许插入双引号，双引号中允许插入单引号，三重引号字符串中允许插入换行。

>>> a = '''aaaaaaa
... aaaaaaaaaaaaaaaa
... aaaaaaaaaaaaaaaa
... '''
>>> a
'aaaaaaa\naaaaaaaaaaaaaaaa\naaaaaaaaaaaaaaaa\n'

当字符串中包含转义字符，比如回车 \n、反斜杠 \\、Tab制表符 \t，此时需要用 raw 字符串来减少输入。字符串和字节串都可以加前缀 'r' 或 'R'，称为原始字符串，原始字符串把反斜杠当作原义字符，不执行转义操作。

raw 字符串常用来表示Windows下的路径

>>> a = r'C:\Python\Python310'
>>> a
'C:\\Python\\Python310'

python 3.6 之后引入了以 f 作为前缀的格式字符串。格式字符串字面值或称 f-string 是标注了 'f' 或 'F' 前缀的字符串字面值。这种字符串可包含替换字段，即以 {} 标注的表达式。其他字符串字面值只是常量，格式字符串字面值则是可在运行时求值的表达式。

格式字符串示例：

>>> name = "Fred"
>>> f"He said his name is {name!r}."
"He said his name is 'Fred'."
>>> f"He said his name is {repr(name)}."  # repr() is equivalent to !r
"He said his name is 'Fred'."
>>> width = 10
>>> precision = 4
>>> value = decimal.Decimal("12.34567")
>>> f"result: {value:{width}.{precision}}"  # nested fields
'result:      12.35'
>>> today = datetime(year=2017, month=1, day=27)
>>> f"{today:%B %d, %Y}"  # using date format specifier
'January 27, 2017'
>>> f"{today=:%B %d, %Y}" # using date format specifier and debugging
'today=January 27, 2017'
>>> number = 1024
>>> f"{number:#0x}"  # using integer format specifier
'0x400'
>>> foo = "bar"
>>> f"{ foo = }" # preserves whitespace
" foo = 'bar'"
>>> line = "The mill's closed"
>>> f"{line = }"
'line = "The mill\'s closed"'
>>> f"{line = :20}"
"line = The mill's closed   "
>>> f"{line = !r:20}"
'line = "The mill\'s closed" '

格式字符串的使用方法与 str.
format
(*args, **kwargs) 一致。

>>> '{0}, {1}, {2}'.format('a', 'b', 'c')
'a, b, c'
>>> '{}, {}, {}'.format('a', 'b', 'c')  # 3.1+ only
'a, b, c'
>>> '{2}, {1}, {0}'.format('a', 'b', 'c')
'c, b, a'
>>> '{2}, {1}, {0}'.format(*'abc')      # unpacking argument sequence
'c, b, a'
>>> '{0}{1}{0}'.format('abra', 'cad')   # arguments' indices can be repeated
'abracadabra'

其中常用的转换旗标有三种: '!s' 会对值调用 str()，'!r' 调用 repr() 而 '!a' 则调用 ascii()。

"Harold's a clever {0!s}"        # Calls str() on the argument first
"Bring out the holy {name!r}"    # Calls repr() on the argument first
"More {!a}"                      # Calls ascii() on the argument first

参考：

https://docs.python.org/zh-cn/3/library/string.html#formatexamples
https://docs.python.org/zh-cn/3/library/string.html#formatstrings
https://docs.python.org/zh-cn/3/reference/lexical_analysis.html#strings
https://docs.python.org/3/library/stdtypes.html#text-sequence-type-str

全文完。

如果转发本文，文末务必注明：“转自微信公众号：生有可恋”。

python python字符串操作

文章转载自生有可恋，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

Python 字符串前缀

评论