暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

MySQL/PG 数据脱敏

00后DBA实录 2024-10-30
73

前言


今天聊聊数据脱敏吧。

单纯是想玩一下 percona 的 data_masking 插件和 PG 的postgresql_anonymizer 扩展,顺便记录一下哈哈。

个人公众号就是口气大,没办法个人的就是自由啦。


什么是数据脱敏


percona 的说法是:

数据脱敏通过阻止未经授权的用户访问真实数据来保护敏感信息。此过程会创建更改后的数据版本,用于特定用途,例如演示文稿、销售演示或软件测试。脱敏后的数据保留与原始数据相同的格式,但包含无法逆转以显示真实信息的更改值。通过使数据对外部人员毫无价值,脱敏可帮助组织降低数据泄露或滥用的风险。公司可以安全地在各种场景中使用脱敏后的数据,而不会向未经授权的各方泄露机密详细信息。


简单说就是,以各种方式保护原表数据不被泄露,即使泄露也是无关紧要的部分内容,使用比如:数据加密、加盐、替换等方法是遮蔽这些内部数据。


PS:(悄悄话)这篇不讲加密 AES_DECRYPT() 之类的加密算法,太麻烦了,又要证书又要密钥,我懒得搞哈哈。什么时候想写了再写!!一切均是兴趣使然,强迫不得也。


MySQL 数据脱敏


mysql 的脱敏插件是企业版才有,爱白嫖的我们当然就选择,大慈善家 percona 了。

percona-mysql 有一个 data_masking 插件,我们去顺过来装在社区版玩,因为是同源,所以基本上就是兼容的。


我的mysql是 8.0.32 glibc 2.17

     root@localhost [(none)] 22:56:35 > select version();
    +-----------+
    | version() |
    +-----------+
    | 8.0.32 |
    +-----------+
    1 row in set (0.02 sec)

    [root@localhost ~]# ldd --version
    ldd (GNU libc) 2.17
    Copyright (C) 2012 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions. There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    由 Roland McGrath 和 Ulrich Drepper 编写。

    所以去拿 percona 对应版本的:https://www.percona.com/downloads#percona-server-mysql

    我选的是这个:Percona-Server-8.0.32-24-Linux.x86_64.glibc2.17-minimal.tar.gz

    安装 data_masking 插件


    解压一下,在这个位置下:Percona-Server-8.0.32-24-Linux.x86_64.glibc2.17-minimal/lib/plugin

    可以找到 data_masking 插件

       [root@localhost plugin]# ll data*
      -rw-r--r--. 1 root root 1374 3月 10 2023 data_masking.ini
      -rwxr-xr-x. 1 root root 179880 3月 10 2023 data_masking.so

      有一个  .ini  一个 data_masking.so,data_masking.so 就是插件。

      我们要把它复制到我们的 plugin_dir 下

        root@localhost [(none)] 22:53:58 > show variables like 'plugin%';
        +---------------+---------------------------------------+
        | Variable_name | Value |
        +---------------+---------------------------------------+
        | plugin_dir | application/mysql_8_3306/lib/plugin/ |
        +---------------+---------------------------------------+
        1 row in set (0.00 sec)

        复制过去然后授权

          cp data_masking.so application/mysql_8_3306/lib/plugin/
          chown mysql.mysql application/mysql_8_3306/lib/plugin/data_masking.so

          检查一下

             [root@localhost plugin]# ll application/mysql_8_3306/lib/plugin/data_masking.so
            -rwxr-xr-x. 1 mysql mysql 179880 1029 22:54 application/mysql_8_3306/lib/plugin/data_masking.so

            好了,现在启动!!!


            安装 data_masking

               root@localhost [(none)] 22:56:32 > INSTALL PLUGIN data_masking SONAME 'data_masking.so';
              Query OK, 0 rows affected (0.24 sec)

              OK 顺手牵羊成功


              使用 data_masking 插件


              看看 percona 官方写了有哪些函数:https://docs.percona.com/percona-server/8.0/data-masking-plugin-functions.html#data-masking-plugin-functions

              函数名描述
              mask_inner(string, margin1, margin2 [, character])返回仅对字符串内部部分进行脱敏的结果。可以指定不同的掩码字符。
              mask_outer(string, margin1, margin2 [, character])掩码字符串的外部部分。内部部分未被掩码。可以指定不同的掩码字符。
              mask_pan(string)通过用“X”替换字符串(最后四个字符除外)来掩码主账户号码(PAN)。PAN 字符串必须是 15 或 16 个字符。
              mask_pan_relaxed(string)返回前六个数字和最后四个数字。其余部分用“X”替换。
              mask_ssn(string)返回一个字符串,仅显示最后四个数字。其余部分用“X”替换。
              gen_range(lower, upper)根据选定范围生成随机数,并支持负数。
              gen_rnd_email()生成一个随机电子邮件地址。域名为 example.com。
              gen_rnd_pan([size in integer])生成一个随机主账户号码。此函数仅应用于测试目的。

              具体参考:https://docs.percona.com/percona-server/8.0/data-masking-plugin-functions.html#data-masking-plugin-functions

              翻译了一些我觉得挺有意思的函数描述。

              让我们试一下这个。


              测试数据

                create table mask(id int primary key,name varchar(64),phone varchar(11),idcard varchar(64));
                insert into mask values(1,'红太狼','12345678911','1234567891012134');
                insert into mask values(2,'灰太狼','12345678911','1234567891012134');

                红太狼和灰太狼用一个手机号码和银行卡没问题吧!!没毛病吧哈哈

                看一下小夫妻的个人信息。

                   root@localhost [ddl] 23:16:57 > select * from mask;
                  +----+-----------+-------------+------------------+
                  | id | name | phone | idcard |
                  +----+-----------+-------------+------------------+
                  | 1 | 红太狼 | 12345678911 | 1234567891012134 |
                  | 2 | 灰太狼 | 12345678911 | 1234567891012134 |
                  +----+-----------+-------------+------------------+
                  2 rows in set (0.00 sec)

                  试一下 mask_inner(),string 是指定要脱敏的字符串,margin1/margin2 是左右分别不掩码的字符串个数,character 掩码的内容。

                     root@localhost [ddl] 23:22:37 > select mask_inner('abcdef',1,1,'*'),mask_inner('abcdef',2,1,'='),mask_inner('abcdef',1,2,'-');
                    +------------------------------+------------------------------+------------------------------+
                    | mask_inner('abcdef',1,1,'*') | mask_inner('abcdef',2,1,'=') | mask_inner('abcdef',1,2,'-') |
                    +------------------------------+------------------------------+------------------------------+
                    | a****f | ab===f | a---ef |
                    +------------------------------+------------------------------+------------------------------+
                    1 row in set (0.01 sec)

                    还挺牛逼的哈。

                    放到表里面看看

                       root@localhost [ddl] 23:24:52 > select id,mask_inner(name,1,0,'*'),mask_inner(phone,3,4,'#'),mask_inner(idcard,4,4,'#') from mask;
                      +----+--------------------------+---------------------------+----------------------------+
                      | id | mask_inner(name,1,0,'*') | mask_inner(phone,3,4,'#') | mask_inner(idcard,4,4,'#') |
                      +----+--------------------------+---------------------------+----------------------------+
                      | 1 | ?** | 123####8911 | 1234########2134 |
                      | 2 | ?** | 123####8911 | 1234########2134 |
                      +----+--------------------------+---------------------------+----------------------------+
                      2 rows in set (0.00 sec)

                      好像没法处理中文诶。

                      那就换一种叭,用 concat 来实现。好了完美。

                         root@localhost [ddl] 23:35:57 > select id,concat(left(name,1),repeat('*',char_length(name)-1)),mask_inner(phone,3,4,'#'),mask_inner(idcard,4,
                        4,'#') from mask;
                        +----+------------------------------------------------------+---------------------------+----------------------------+
                        | id | concat(left(name,1),repeat('*',char_length(name)-1)) | mask_inner(phone,3,4,'#') | mask_inner(idcard,4,4,'#') |
                        +----+------------------------------------------------------+---------------------------+----------------------------+
                        | 1 | 红** | 123####8911 | 1234########2134 |
                        | 2 | 灰** | 123####8911 | 1234########2134 |
                        +----+------------------------------------------------------+---------------------------+----------------------------+
                        2 rows in set (0.00 sec)

                        你猜这样会走索引吗?

                           create index mask_id_name_idcard on mask(id,name,idcard);

                          肯定会走啦。这里数据太少了,就懒得测啦。

                          其他的函数其实都差不多,我举例一下,就不放表里面了

                          mask_outer 是对两边掩码

                            root@localhost [ddl] 23:46:17 > select mask_outer('abcde',1,1,'*');
                            +-----------------------------+
                            | mask_outer('abcde',1,1,'*') |
                            +-----------------------------+
                            | *bcd* |
                            +-----------------------------+
                            1 row in set (0.00 sec)

                            mask_pan 是对 15~16 位的字符掩码,超过或者少了都不行

                               root@localhost [ddl] 23:46:45 > select mask_pan('12345678978997555'),mask_pan('123456789789975'),mask_pan('adwewsdawertyui');
                              +-------------------------------+-----------------------------+-----------------------------+
                              | mask_pan('12345678978997555') | mask_pan('123456789789975') | mask_pan('adwewsdawertyui') |
                              +-------------------------------+-----------------------------+-----------------------------+
                              | 12345678978997555 | XXXXXXXXXXX9975 | XXXXXXXXXXXtyui |
                              +-------------------------------+-----------------------------+-----------------------------+
                              1 row in set (0.00 sec)

                              mask_pan_relaxed 是对 15~16 位的字符掩码,超过或者少了都不行,然后留下前 6 个和后 4 个

                                 root@localhost [ddl] 23:50:57 > select mask_pan_relaxed('awdesdesxdersat'),mask_pan_relaxed('4664646478674669');
                                +-------------------------------------+--------------------------------------+
                                | mask_pan_relaxed('awdesdesxdersat') | mask_pan_relaxed('4664646478674669') |
                                +-------------------------------------+--------------------------------------+
                                | awdesdXXXXXrsat | 466464XXXXXX4669 |
                                +-------------------------------------+--------------------------------------+
                                1 row in set (0.00 sec)

                                mask_ssn 是 9 位这种格式:654-66-2262,然后留下后 4 位

                                   root@localhost [ddl] 23:53:50 > select mask_ssn('654-66-2262');
                                  +-------------------------+
                                  | mask_ssn('654-66-2262') |
                                  +-------------------------+
                                  | XXX-XX-2262 |
                                  +-------------------------+
                                  1 row in set (0.00 sec)

                                  还有 3 个生成的一起看看吧,

                                  gen_range 生成范围内的随机数

                                  gen_rnd_email 生成一个 @example.com 结尾的 email

                                  gen_rnd_pan 生成 15~16 位

                                    root@localhost [ddl] 23:57:47 > select gen_range(-10,-1),gen_rnd_email(),gen_rnd_pan();
                                    +-------------------+----------------------+-----------------+
                                    | gen_range(-10,-1) | gen_rnd_email() | gen_rnd_pan() |
                                    +-------------------+----------------------+-----------------+
                                    | -4 | Xr603HSB@example.com | 341391117060328 |
                                    +-------------------+----------------------+-----------------+
                                    1 row in set (0.00 sec)

                                    还有一个数据字典的,要用到 secure_file_priv 变量定义数据字典的物理文件。

                                    我这里没有定义,它是只读变量

                                       root@localhost [ddl] 00:01:42 > show variables like 'secure_file_priv';
                                      +------------------+-------+
                                      | Variable_name | Value |
                                      +------------------+-------+
                                      | secure_file_priv | NULL |
                                      +------------------+-------+
                                      1 row in set (0.00 sec)

                                      root@localhost [ddl] 00:01:48 > set global secure_file_priv='/log';
                                      ERROR 1238 (HY000): Variable 'secure_file_priv' is a read only variable

                                      写配置文件

                                         [mysqld]
                                        secure_file_priv='/log'

                                        然后重启

                                           root@localhost [(none)] 00:04:47 > show variables like 'secure_file_priv';
                                          +------------------+-------+
                                          | Variable_name | Value |
                                          +------------------+-------+
                                          | secure_file_priv | log/ |
                                          +------------------+-------+
                                          1 row in set (0.02 sec)

                                          定义好了,我们在这里面写给物理文件

                                            echo -e 'xiyy\nmeiyy\nfeiyy' > log/dictionary.conf
                                            chown mysql.mysql log/dictionary.conf

                                            echo -e 'lanyy\nliyy\nmanyy' > log/dictionary2.conf
                                            chown mysql.mysql log/dictionary2.conf

                                            然后加载一下

                                               root@localhost [(none)] 00:04:50 > select gen_dictionary_load('/log/dictionary.conf','d1');
                                              +--------------------------------------------------+
                                              | gen_dictionary_load('/log/dictionary.conf','d1') |
                                              +--------------------------------------------------+
                                              | Dictionary load success |
                                              +--------------------------------------------------+
                                              1 row in set (0.00 sec)

                                              root@localhost [(none)] 00:09:23 > select gen_dictionary_load('/log/dictionary2.conf','d2');
                                              +---------------------------------------------------+
                                              | gen_dictionary_load('/log/dictionary2.conf','d2') |
                                              +---------------------------------------------------+
                                              | Dictionary load success |
                                              +---------------------------------------------------+
                                              1 row in set (0.00 sec)

                                              OK,成功了

                                              然后就可以随机查数据字典了。

                                                 root@localhost [(none)] 00:11:09 > select gen_dictionary('d1'),gen_dictionary('d2');
                                                +----------------------+----------------------+
                                                | gen_dictionary('d1') | gen_dictionary('d2') |
                                                +----------------------+----------------------+
                                                | feiyy | liyy |
                                                +----------------------+----------------------+
                                                1 row in set (0.00 sec)

                                                root@localhost [(none)] 00:11:56 > select gen_dictionary('d1'),gen_dictionary('d2');
                                                +----------------------+----------------------+
                                                | gen_dictionary('d1') | gen_dictionary('d2') |
                                                +----------------------+----------------------+
                                                | feiyy | lanyy |
                                                +----------------------+----------------------+
                                                1 row in set (0.00 sec)

                                                gen_blacklist 这个是对等替换,如果 第一个参数是 d1 的就替换 d2 的同一个位置的值。

                                                  root@localhost [(none)] 00:14:28 > select gen_blacklist('xiyy','d1','d2');
                                                  +---------------------------------+
                                                  | gen_blacklist('xiyy','d1','d2') |
                                                  +---------------------------------+
                                                  | lanyy |
                                                  +---------------------------------+
                                                  1 row in set (0.00 sec)

                                                  root@localhost [(none)] 00:14:37 > select gen_blacklist('meiyy','d1','d2');
                                                  +----------------------------------+
                                                  | gen_blacklist('meiyy','d1','d2') |
                                                  +----------------------------------+
                                                  | liyy |
                                                  +----------------------------------+
                                                  1 row in set (0.00 sec)


                                                     [root@localhost ~]# cat log/dictionary.conf xiyy
                                                    meiyy
                                                    feiyy
                                                    [root@localhost ~]# cat log/dictionary2.conf
                                                    lanyy
                                                    liyy
                                                    manyy

                                                    如果是 xiyy 就替换了第一个位置的 lanyy。

                                                    如果是 meiyy 就替换了 第二个位置的 liyy。

                                                    好了,data_masking 插件讲完了。


                                                    实践脱敏


                                                    真正的脱敏应该是用视图做,然后取消别的用户对表的查看权限。

                                                      create view v_mack as select id,concat(left(name,1),repeat('*',char_length(name)-1)),mask_inner(phone,3,4,'#'),mask_inner(idcard,4,4,'#'from mask;

                                                      看一下这个视图:

                                                        root@localhost [ddl] 00:25:00 > select * from v_mack;
                                                        +----+------------------------------------------------------+---------------------------+----------------------------+
                                                        | id | concat(left(name,1),repeat('*',char_length(name)-1)) | mask_inner(phone,3,4,'#') | mask_inner(idcard,4,4,'#') |
                                                        +----+------------------------------------------------------+---------------------------+----------------------------+
                                                        | 1 | 红** | 123####8911 | 1234########2134 |
                                                        | 2 | 灰** | 123####8911 | 1234########2134 |
                                                        +----+------------------------------------------------------+---------------------------+----------------------------+
                                                        2 rows in set (0.01 sec)

                                                        因为用了函数,所以这个视图是没法更新的奥。

                                                           root@localhost [ddl] 00:25:08 > insert into v_mack values(3,'小灰灰','12345678911','1234567891012134');
                                                          ERROR 1348 (HY000): Column 'concat(left(name,1),repeat('*',char_length(name)-1))' is not updatable

                                                          这时候就只需要给视图的 select 权限就可以了,我们试一下

                                                             create user mask identified by '123456';
                                                            grant select on ddl.v_mack to mask;

                                                            查询一下就可以看到,已经脱敏了。

                                                               mask@192.168.58.60 [(none)] 00:28:25 > select * from ddl.mask;
                                                              ERROR 1142 (42000): SELECT command denied to user 'mask'@'192.168.58.135' for table 'mask'
                                                              mask@192.168.58.60 [(none)] 00:28:36 > select * from ddl.v_mack;
                                                              +----+------------------------------------------------------+---------------------------+----------------------------+
                                                              | id | concat(left(name,1),repeat('*',char_length(name)-1)) | mask_inner(phone,3,4,'#') | mask_inner(idcard,4,4,'#') |
                                                              +----+------------------------------------------------------+---------------------------+----------------------------+
                                                              | 1 | 红** | 123####8911 | 1234########2134 |
                                                              | 2 | 灰** | 123####8911 | 1234########2134 |
                                                              +----+------------------------------------------------------+---------------------------+----------------------------+
                                                              2 rows in set (0.00 sec)

                                                              mysql 是没有动态脱敏的,等会玩 PG 的动态脱敏。


                                                              PG 数据脱敏


                                                              PG 脱敏的话,这篇玩的是 postgresql_anonymizer:https://gitlab.com/dalibo/postgresql_anonymizer

                                                              postgresql_anonymizer 支持动态脱敏和静态脱敏,相当于 mysql 的脱敏方式更加灵活。

                                                              这个脱敏有 2 种:

                                                              1. 动态的用户级脱敏

                                                              2. 静态的直接修改值(这个不太建议)

                                                              我就演示 动态的用户级脱敏。


                                                              编译 postgresql_anonymizer


                                                              安装依赖

                                                                yum install zlib-devel readline-devel libxml2-devel libxslt-devel openssl-devel perl-devel perl-ExtUtils-Embed python-devel gcc-c++ libicu-devel python3 python3-devel bison flex

                                                                拉取包

                                                                   git clone https://gitlab.com/dalibo/postgresql_anonymizer.git
                                                                  or
                                                                  wget https://gitlab.com/dalibo/postgresql_anonymizer/-/archive/2.0.0-beta.3/postgresql_anonymizer-2.0.0-beta.3.tar.gz
                                                                  tar zxvf postgresql_anonymizer-2.0.0-beta.3.tar.gz

                                                                  postgresql_anonymizer 依赖于 pgrx ,参考 pgrx 项目:pgcentralfoundation/pgrx

                                                                  安装Rust

                                                                    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
                                                                    source "$HOME/.cargo/env"

                                                                    安装 pgrx

                                                                      cargo install cargo-pgrx --version 0.12.6 --locked
                                                                      cargo pgrx init

                                                                      注意版本号要与 Cargo.toml 中的版本对应。

                                                                        [package]
                                                                        name = "anon"
                                                                        version = "2.0.0-beta.3"
                                                                        [package]
                                                                        name = "anon"
                                                                        version = "2.0.0-beta.3"
                                                                        edition = "2021"

                                                                        [lib]
                                                                        crate-type = ["cdylib", "lib"]

                                                                        [features]
                                                                        default = ["pg13"]
                                                                        #pg11 = ["pgrx/pg11", "pgrx-tests/pg11" ]
                                                                        pg12 = ["pgrx/pg12", "pgrx-tests/pg12" ]
                                                                        pg13 = ["pgrx/pg13", "pgrx-tests/pg13" ]
                                                                        pg14 = ["pgrx/pg14", "pgrx-tests/pg14" ]
                                                                        pg15 = ["pgrx/pg15", "pgrx-tests/pg15" ]
                                                                        pg16 = ["pgrx/pg16", "pgrx-tests/pg16" ]
                                                                        pg17 = ["pgrx/pg17", "pgrx-tests/pg17" ]
                                                                        pg_test = []

                                                                        [dependencies]
                                                                        c_str_macro = "1.0.3"
                                                                        chrono = "0.4.37"
                                                                        fake = { version = "2.10.0", features = ["bigdecimal", "chrono", "http", "rust_decimal", "uuid", "time","random_color"] }
                                                                        md-5 = "0.10.6"
                                                                        paste = "1.0"
                                                                        pgrx = "0.12.6"
                                                                        rand = "0.8.5"
                                                                        regex = "1.10.2"

                                                                        [dev-dependencies]
                                                                        pgrx-tests = "0.12.6"

                                                                        [profile.dev]
                                                                        panic = "unwind"

                                                                        [profile.release]
                                                                        panic = "unwind"

                                                                        装依赖

                                                                           [postgres@postgresql1 postgresql_anonymizer]$ sudo yum install clang
                                                                          [postgres@postgresql1 postgresql_anonymizer]$ locate libclang.so
                                                                          usr/lib64/clang-private/libclang.so
                                                                          /usr/lib64/clang-private/libclang.so.6
                                                                          /usr/lib64/clang-private/libclang.so.6.0
                                                                          [postgres@postgresql1 postgresql_anonymizer]$ export LIBCLANG_PATH=/usr/lib64/clang-private/libclang.so.6.0
                                                                          [postgres@postgresql1 postgresql_anonymizer]$ export LD_LIBRARY_PATH=/usr/lib64/clang-private:$LD_LIBRARY_PATH

                                                                          编译

                                                                            make extension PG_CONFIG=/pgsql/bin/pg_config
                                                                            make install PG_CONFIG=/pgsql/bin/pg_config


                                                                            安装


                                                                            创建一个测试库 encryption

                                                                               testdb=# create database encryption;
                                                                              CREATE DATABASE

                                                                              然后把 anon 添加到库级的 session_preload_libraries

                                                                                testdb=# \c encryption -
                                                                                You are now connected to database "encryption" as user "postgres".
                                                                                encryption=# \dx
                                                                                List of installed extensions
                                                                                Name | Version | Schema | Description
                                                                                ---------+---------+------------+------------------------------
                                                                                plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
                                                                                (1 row)

                                                                                encryption=# show session_preload_libraries;
                                                                                session_preload_libraries
                                                                                ---------------------------

                                                                                (1 row)

                                                                                encryption=# alter database encryption set session_preload_libraries = 'anon';
                                                                                ALTER DATABASE
                                                                                encryption=# \q

                                                                                创建扩展 EXTENSION

                                                                                  [postgres@postgresql1 ~]$ psql
                                                                                  postgres_dba 6.0 installed. Use ":dba" to see menu
                                                                                  psql (16.3)
                                                                                  Type "help" for help.

                                                                                  testdb=# \c encryption -
                                                                                  You are now connected to database "encryption" as user "postgres".
                                                                                  encryption=# create extension pgcrypto;
                                                                                  CREATE EXTENSION
                                                                                  encryption=# CREATE EXTENSION anon CASCADE;
                                                                                  CREATE EXTENSION

                                                                                  anon 依赖于 pgcrypto,所以要先创建 pgcrypto

                                                                                     encryption=# \dx
                                                                                    List of installed extensions
                                                                                    Name | Version | Schema | Description
                                                                                    ----------+--------------+------------+---------------------------------------------
                                                                                    anon | 2.0.0-beta.4 | public | Anonymization & Data Masking for PostgreSQL
                                                                                    pgcrypto | 1.3 | public | cryptographic functions
                                                                                    plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
                                                                                    (3 rows)

                                                                                    可以看到装上了


                                                                                    简要说说 anon


                                                                                    它支持这八种脱敏策略,功能还是很丰富的,这里我就不详细说

                                                                                    • Destruction(重构)

                                                                                    • Adding Noise(噪音化)

                                                                                    • Randomization(随机化)

                                                                                    • Faking(伪造)

                                                                                    • Pseudonymization(笔名化)

                                                                                    • Generic Hashing(哈希)

                                                                                    • Partial scrambling(部分遗漏)

                                                                                    • Generalization((泛化))

                                                                                    大家可以看官方文档:https://postgresql-anonymizer.readthedocs.io/en/stable/masking_functions/

                                                                                    主要参数介绍一下:

                                                                                    • anon.algorithm:脱敏函数使用的 hash 算法,默认为 sha256,支持 md5、sha256、sha384 和sha512 等,需要超级用户权限才可设置。

                                                                                    • anon.salt:pseudonymizing(假名化) functions 使用的盐,一般建议每个数据库设置不一样的值,需要超级用户权限才可设置。

                                                                                    • anon.maskschema:用于存储动态脱敏视图的模式,默认名称为 mask。

                                                                                    • anon.sourceshema:动态脱敏引擎脱敏表所在的 schema(即 'namespace')。 默认是 public 模式。

                                                                                    • anon.restrict_to_trusted_schemas:通过启用此参数,必须使用位于有限命名空间列表中的函数来定义屏蔽规则。默认情况下,只有 anon
                                                                                      schema 是受信任的。默认值:off


                                                                                    动态脱敏


                                                                                    进行如下设置:

                                                                                       alter database encryption set anon.restrict_to_trusted_schemas = on;
                                                                                      alter database encryption set anon.salt = 'db_encryption_Secret_Salt';

                                                                                      重新连接数据库后执行如下语句,启动动态脱敏:

                                                                                        select anon.start_dynamic_masking();

                                                                                        要用到 Faking(伪造) 时需要初始化anon插件,我这里等会要用,就初始化一下。

                                                                                           encryption=# SELECT anon.init();
                                                                                          NOTICE: The anon extension is already initialized.
                                                                                          init
                                                                                          ------
                                                                                          t
                                                                                          (1 row)

                                                                                          开启动态脱敏会创建一个叫 mask 的 schema

                                                                                             encryption=# \dn
                                                                                            List of schemas
                                                                                            Name | Owner
                                                                                            --------+-------------------
                                                                                            anon | postgres
                                                                                            public | pg_database_owner
                                                                                            (2 rows)

                                                                                            encryption=# select anon.start_dynamic_masking();
                                                                                            start_dynamic_masking
                                                                                            -----------------------
                                                                                            t
                                                                                            (1 row)

                                                                                            encryption=# \dn
                                                                                            List of schemas
                                                                                            Name | Owner
                                                                                            --------+-------------------
                                                                                            anon | postgres
                                                                                            mask | postgres
                                                                                            public | pg_database_owner
                                                                                            (3 rows)


                                                                                            创建测试表

                                                                                               create table mask(id int primary key,name varchar(64),phone varchar(11),idcard varchar(64));
                                                                                              insert into mask values(1,'红太狼','12345678911','1234567891012134');
                                                                                              insert into mask values(2,'灰太狼','12345678911','1234567891012134');

                                                                                              声明脱敏规则

                                                                                                 encryption=# SECURITY LABEL FOR anon ON COLUMN mask.id IS 'MASKED WITH VALUE NULL';
                                                                                                SECURITY LABEL
                                                                                                encryption=# SECURITY LABEL FOR anon ON COLUMN mask.name IS 'MASKED WITH FUNCTION anon.fake_first_name()';
                                                                                                SECURITY LABEL

                                                                                                定义一个脱敏用户,然后给 mask 用户授权 select

                                                                                                   encryption=# CREATE user mask password 'postgres123';
                                                                                                  CREATE ROLE
                                                                                                  encryption=# SECURITY LABEL FOR anon ON ROLE mask IS 'MASKED';
                                                                                                  SECURITY LABEL
                                                                                                  encryption=# grant select on mask to mask;
                                                                                                  GRANT
                                                                                                  encryption=# SELECT anon.start_dynamic_masking();
                                                                                                  NOTICE: schema "mask" already exists, skipping
                                                                                                  start_dynamic_masking
                                                                                                  -----------------------
                                                                                                  t
                                                                                                  (1 row)

                                                                                                  我们切到 mask 用户去看下这个表

                                                                                                     encryption=> select * from mask;
                                                                                                    id | name | phone | idcard
                                                                                                    ----+-------+-------------+------------------
                                                                                                    | Jason | 12345678911 | 1234567891012134
                                                                                                    | Aimee | 12345678911 | 1234567891012134
                                                                                                    (2 rows)

                                                                                                    encryption=> select * from mask;
                                                                                                    id | name | phone | idcard
                                                                                                    ----+--------+-------------+------------------
                                                                                                    | Joanne | 12345678911 | 1234567891012134
                                                                                                    | Dawn | 12345678911 | 1234567891012134
                                                                                                    (2 rows)

                                                                                                    可以看到已经遮掩了,还挺牛逼。


                                                                                                    它还有几个视图,文档没有描述,我们大概能猜到是干嘛的

                                                                                                        view anon.pg_identifiers
                                                                                                      view anon.pg_masked_roles
                                                                                                      view anon.pg_masking_rules
                                                                                                      view anon.pg_masks
                                                                                                      view anon.pg_trusted_functions

                                                                                                      anon.pg_identifiers 应该是搜索标识符:https://postgresql-anonymizer.readthedocs.io/en/stable/detection/

                                                                                                         encryption=# select * from anon.pg_identifiers;
                                                                                                        attrelid | attnum | relname | attname | format_type | col_description | indirect_identifier | priority
                                                                                                        ----------+--------+---------+---------+-------------+-----------------+---------------------+----------
                                                                                                        (0 rows)

                                                                                                        anon.pg_masked_roles 长得和 pg_authid 差不多。关于用户定义的动态脱敏的,hasmask 可以看到是否开启动态脱敏。

                                                                                                           encryption=# select * from anon.pg_masked_roles where hasmask='t'\gx
                                                                                                          -[ RECORD 1 ]--+-----------------------------
                                                                                                          rolname | mask
                                                                                                          rolsuper | f
                                                                                                          rolinherit | t
                                                                                                          rolcreaterole | f
                                                                                                          rolcreatedb | f
                                                                                                          rolcanlogin | t
                                                                                                          rolreplication | f
                                                                                                          rolconnlimit | -1
                                                                                                          rolpassword | ********
                                                                                                          rolvaliduntil |
                                                                                                          rolbypassrls | f
                                                                                                          rolconfig | {"search_path=mask, public"}
                                                                                                          oid | 60570
                                                                                                          hasmask | t

                                                                                                          anon.pg_masking_rules 定义的是脱敏规则

                                                                                                             encryption=# select * from anon.pg_masking_rules\gx
                                                                                                            -[ RECORD 1 ]----+--------------------------------------------
                                                                                                            attrelid | 60561
                                                                                                            attnum | 1
                                                                                                            relnamespace | public
                                                                                                            relname | mask
                                                                                                            attname | id
                                                                                                            format_type | integer
                                                                                                            col_description | MASKED WITH VALUE NULL
                                                                                                            masking_function |
                                                                                                            masking_value | NULL
                                                                                                            priority | 100
                                                                                                            masking_filter | NULL
                                                                                                            trusted_schema | f
                                                                                                            -[ RECORD 2 ]----+--------------------------------------------
                                                                                                            attrelid | 60561
                                                                                                            attnum | 2
                                                                                                            relnamespace | public
                                                                                                            relname | mask
                                                                                                            attname | name
                                                                                                            format_type | character varying(64)
                                                                                                            col_description | MASKED WITH FUNCTION anon.fake_first_name()
                                                                                                            masking_function | anon.fake_first_name()
                                                                                                            masking_value |
                                                                                                            priority | 100
                                                                                                            masking_filter | anon.fake_first_name()
                                                                                                            trusted_schema | t

                                                                                                            anon.pg_masks 和 anon.pg_masking_rules 差不多,也是脱敏规则

                                                                                                               encryption=# select * from anon.pg_masks\gx
                                                                                                              -[ RECORD 1 ]----+--------------------------------------------
                                                                                                              attrelid | 60561
                                                                                                              attnum | 1
                                                                                                              relnamespace | public
                                                                                                              relname | mask
                                                                                                              attname | id
                                                                                                              format_type | integer
                                                                                                              col_description | MASKED WITH VALUE NULL
                                                                                                              masking_function |
                                                                                                              masking_value | NULL
                                                                                                              priority | 100
                                                                                                              masking_filter | NULL
                                                                                                              trusted_schema | f
                                                                                                              -[ RECORD 2 ]----+--------------------------------------------
                                                                                                              attrelid | 60561
                                                                                                              attnum | 2
                                                                                                              relnamespace | public
                                                                                                              relname | mask
                                                                                                              attname | name
                                                                                                              format_type | character varying(64)
                                                                                                              col_description | MASKED WITH FUNCTION anon.fake_first_name()
                                                                                                              masking_function | anon.fake_first_name()
                                                                                                              masking_value |
                                                                                                              priority | 100
                                                                                                              masking_filter | anon.fake_first_name()
                                                                                                              trusted_schema | t

                                                                                                              anon.pg_trusted_functions; 应该是允许原生PG用来脱敏的函数

                                                                                                               
                                                                                                                encryption=# select * from anon.pg_trusted_functions;
                                                                                                                schema | function
                                                                                                                ------------+---------------------------------------------------------------
                                                                                                                pg_catalog | "json_object"(text[])
                                                                                                                pg_catalog | "left"(text, integer)
                                                                                                                pg_catalog | "right"(text, integer)
                                                                                                                pg_catalog | age(timestamp without time zone, timestamp without time zone)
                                                                                                                pg_catalog | array_to_json(anyarray)
                                                                                                                pg_catalog | btrim(text)
                                                                                                                pg_catalog | concat(VARIADIC "any")
                                                                                                                pg_catalog | date_part(text, timestamp without time zone)
                                                                                                                pg_catalog | date_trunc(text, timestamp without time zone)
                                                                                                                pg_catalog | json_build_array()
                                                                                                                pg_catalog | json_build_object()
                                                                                                                pg_catalog | jsonb_build_array()
                                                                                                                pg_catalog | jsonb_build_object()
                                                                                                                pg_catalog | jsonb_object(text[])
                                                                                                                pg_catalog | length(text)
                                                                                                                pg_catalog | lower(text)
                                                                                                                pg_catalog | ltrim(text)
                                                                                                                pg_catalog | make_date(year integer, month integer, day integer)
                                                                                                                pg_catalog | make_time(hour integer, min integer, sec double precision)
                                                                                                                pg_catalog | md5(text)
                                                                                                                pg_catalog | now()
                                                                                                                pg_catalog | random()
                                                                                                                pg_catalog | regexp_replace(text, text, text)
                                                                                                                pg_catalog | regexp_replace(text, text, text, text)
                                                                                                                pg_catalog | repeat(text, integer)
                                                                                                                pg_catalog | replace(text, text, text)
                                                                                                                pg_catalog | row_to_json(record)
                                                                                                                pg_catalog | rtrim(text)
                                                                                                                pg_catalog | substr(text, integer)
                                                                                                                pg_catalog | to_char(timestamp without time zone, text)
                                                                                                                pg_catalog | to_date(text, text)
                                                                                                                pg_catalog | to_json(anyelement)
                                                                                                                pg_catalog | to_jsonb(anyelement)
                                                                                                                pg_catalog | to_number(text, text)
                                                                                                                pg_catalog | to_timestamp(text, text)
                                                                                                                pg_catalog | upper(text)
                                                                                                                (36 rows)


                                                                                                                结束语


                                                                                                                这篇文章光是安装 postgresql_anonymizer 就编译了很久。

                                                                                                                吐槽:喵的,pgrx 真有你的,cargo pgrx init 初始化要下载安装编译 PG 12~17,PG17 少了几个依赖包重新安装编译轮回好几把!!!!!!

                                                                                                                好了就写到这啦,下次想玩别的脱敏函数再写叭


                                                                                                                参考


                                                                                                                https://docs.percona.com/percona-server/8.0/data-masking-overview.html#data-masking-overview

                                                                                                                https://docs.percona.com/percona-server/8.0/data-masking-plugin-functions.html#data-masking-plugin-functions

                                                                                                                https://postgresql-anonymizer.readthedocs.io/en/stable/INSTALL/


                                                                                                                文章转载自00后DBA实录,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                                                                                                                评论