暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

PostgreSQL plpgsql_check - 函数、存储过程内容runtime前校验,性能profiler,性能问题检测,sql注入检测

digoal 2019-08-31
760

作者

digoal

日期

2019-08-31

标签

PostgreSQL , profiler , plpgsql_check , performance tips , sql injection detection


背景

原文

http://okbob.blogspot.com/2019/08/why-you-need-plpgsqlcheck-if-you-write.html

plpgsql是解释型存储过程语言,内部执行的sql只有在调用并运行时才会被parser,所以写好plpgsql存储过程后,如何快速分析plpgsql的问题?

plpgsql_check插件,用于帮助用户解决只有runtime时才能检测出plpgsql内部问题的问题,例如定义对象关联关系和发现、检测类型一致性,未使用的变量,多余的类型转换,sql注入风险,性能问题,性能profile等。

  • check fields of referenced database objects and types inside embedded SQL
  • using correct types of function parameters
  • unused variables and function argumens, unmodified OUT argumens
  • partially detection of dead code (due RETURN command)
  • detection of missing RETURN command in function
  • try to identify unwanted hidden casts, that can be performance issue like unused indexes
  • possibility to collect relations and functions used by function
  • possibility to check EXECUTE stmt agaist SQL injection vulnerability

PLpgSQL is simple (but relatively powerful) specialized language. It is specialized for usage inside PostgreSQL as glue of SQL statements. It is great language for this purpose (and can be bad if it is used differently - for high cycles numeric calculation for example).

Originally a implementation of PLpgSQL language was really simple - it was very simple language with possibility to execute embedded SQL. But there was not possibility to check syntax of this SQL in validation time.

Note: PLpgSQL is a interpret of AST (Abstract Syntax Tree) nodes. There is a validation stage (when code is parsed into AST), and evaluation stage (runtime), when AST (parsed again when code is first executed) is interpreted.

Old SPI (Stored Procedure Interface) had not any API for validation of SQL without execution. Almost all checks in this time was at runtime. It was not too practical - so some checks (check of SQL syntax) are at validation stage now. Still PLpgSQL validator doesn't check a validity of SQL identifiers (tables, columns, functions, ... names). Now, there are two reason why the validator doesn't do it: a) we have not any other tools how to solve cyclic dependencies, b) Postgres's local temp tables - PostgreSQL temporary tables are created at run-time and doesn't exists at validation time - so PLpgSQL validator should not to check validity of SQL identifiers - they should not to exists in this time.

SQL identifiers validity checks

价差sql对象的合法性,例如是否定义,定义结构是否正确等。

But often we should to check all identifiers inside code (SQL identifiers too). This is time for plpgsql_check:

```
CREATE TABLE bigtable(id int, v int);
INSERT INTO bigtable
SELECT random() * 1000000, random()*10000
FROM generate_series(1,1000000);
CREATE INDEX ON bigtable(id);
VACUUM ANALYZE bigtable;

CREATE OR REPLACE FUNCTION example01(_id numeric)
RETURNS numeric AS $$
DECLARE
r record;
s numeric DEFAULT 0;
BEGIN
FOR r IN SELECT * FROM bigtable WHERE id = _id
LOOP
s := s + r.k;
END LOOP;
END;
$$ LANGUAGE plpgsql;
```

This code has lot of issues, but no one is a problem for buildin plpgsql validator. So this function can be created and executed. Sure, there will be run-time error:

I executed this function 2x and I got two different errors:

调用时报错:

```
postgres=# select example01(1000);
ERROR: control reached end of function without RETURN
CONTEXT: PL/pgSQL function example01(numeric)

postgres=# select example01(1001);
ERROR: record "r" has no field "k"
CONTEXT: SQL statement "SELECT s + r.k"
PL/pgSQL function example01(numeric) line 8 at assignment
```

There are possible two branches of execution (with/without execution of body of cycle). There is wrong reference to SQL identifiers r.k, and a statement RETURN is missing.

It isn't too hard to see these issues in this simple code. But it is harder to find these bugs inside function with more hundreds of lines. plpgsql_check can helps:

postgres=# SELECT * FROM plpgsql_check_function('example01', fatal_errors=>false); +------------------------------------------------------------+ | plpgsql_check_function | +------------------------------------------------------------+ | error:42703:8:assignment:record "r" has no field "k" | | Context: SQL statement "SELECT s + r.k" | | error:2F005:control reached end of function without RETURN | | warning extra:00000:3:DECLARE:never read variable "r" | | warning extra:00000:4:DECLARE:never read variable "s" | +------------------------------------------------------------+ (5 rows)

After fix the plpgsql_check is a happy:

```
CREATE OR REPLACE FUNCTION public.example01(_id numeric)
RETURNS numeric
LANGUAGE plpgsql
AS $function$
DECLARE
r record;
s numeric DEFAULT 0;
BEGIN
FOR r IN SELECT * FROM bigtable WHERE id = _id
LOOP
s := s + r.v;
END LOOP;
RETURN s;
END;
$function$

postgres=# SELECT * FROM plpgsql_check_function('example01', fatal_errors=>false);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)

postgres=# select example01(1001);
+-----------+
| example01 |
+-----------+
| 3164 |
+-----------+
(1 row)
```

But performance is not too good - it should to read only few (or zero) rows from indexed columns, and execution time is 275 ms. This is time for integrated profiler:

plpgsql_check profiler

统计plpgsql代码中每条调用的平均响应时间,最大响应时间,处理的行数等。

plpgsql_check profiler can works in two modes. a) with persistent profiles in shared memory (requires loading at start of Postgres). b) with session only profiles. For my purpose session only profiles are ok, so I don't need special configuration. Just I need to start profiler:

```
-- ensure active profiler
LOAD 'plpgsql'; LOAD 'plpgsql_check'; SET plpgsql_check.profiler TO on;

-- run example01 more times
SELECT example01(1001);
SELECT example01(1001);
SELECT example01(1001);
```

Now I can see a profile of execute01 function

postgres=# SELECT * FROM plpgsql_profiler_function_tb('example01'); +--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+ | lineno | stmt_lineno | cmds_on_row | exec_stmts | total_time | avg_time | max_time | processed_rows | source | +--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+ | 1 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | | | 2 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | DECLARE | | 3 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | r record; | | 4 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | s numeric DEFAULT 0; | | 5 | 5 | 1 | 3 | 0.07 | 0.024 | {0.029} | {0} | BEGIN | | 6 | 6 | 1 | 3 | 851.895 | 283.965 | {299.195} | {0} | FOR r IN SELECT * FROM bigtable WHERE id = _id | | 7 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | LOOP | | 8 | 8 | 1 | 6 | 0.063 | 0.011 | {0.023} | {0} | s := s + r.v; | | 9 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | END LOOP; | | 10 | 10 | 1 | 3 | 0 | 0 | {0} | {0} | RETURN s; | | 11 | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | ∅ | END; | +--------+-------------+-------------+------------+------------+----------+-----------+----------------+--------------------------------------------------+ (11 rows)

You can see - almost all time is in query execution. Fast recheck of query doesn't show any special:

postgres=# EXPLAIN ANALYZE SELECT * FROM bigtable WHERE id = 1001; +---------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | +---------------------------------------------------------------------------------------------------------------------------+ | Index Scan using bigtable_id_idx on bigtable (cost=0.42..12.46 rows=2 width=8) (actual time=0.224..0.233 rows=2 loops=1) | | Index Cond: (id = 1001) | | Planning Time: 0.198 ms | | Execution Time: 0.289 ms | +---------------------------------------------------------------------------------------------------------------------------+ (4 rows)

Why the direct execution of query needs 0.3ms and execution inside function 283ms? plpgsql_check can helps again.

Performance tips

检测性能相关问题,例如类型使用不当。

We should to use a option performance_warnings

postgres=# SELECT * FROM plpgsql_check_function('example01', performance_warnings => true); +-------------------------------------------------------------------------------------------------------------------------------+ | plpgsql_check_function | +-------------------------------------------------------------------------------------------------------------------------------+ | performance:42804:5:statement block:target type is different type than source type | | Detail: cast "integer" value to "numeric" type | | Hint: Hidden casting can be a performance issue. | | Context: during statement block local variable "s" initialization on line 4 | | performance:42804:6:FOR over SELECT rows:implicit cast of attribute caused by different PLpgSQL variable type in WHERE clause | | Query: SELECT * FROM bigtable WHERE id = _id | | -- ^ | | Detail: An index of some attribute cannot be used, when variable, used in predicate, has not right type like a attribute | | Hint: Check a variable type - int versus numeric | | performance:00000:routine is marked as VOLATILE, should be STABLE | | Hint: When you fix this issue, please, recheck other functions that uses this function. | +-------------------------------------------------------------------------------------------------------------------------------+ (11 rows)

There is a bad type of variable - numeric. Postgres cannot to use index when column is a integer, and searched value is a numeric:

postgres=# EXPLAIN ANALYZE SELECT * FROM bigtable WHERE id = 1001::numeric; +---------------------------------------------------------------------------------------------------------------------------+ | QUERY PLAN | +---------------------------------------------------------------------------------------------------------------------------+ | Gather (cost=1000.00..12175.00 rows=5000 width=8) (actual time=21.000..115.142 rows=2 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Parallel Seq Scan on bigtable (cost=0.00..10675.00 rows=2083 width=8) (actual time=56.270..105.250 rows=1 loops=3) | | Filter: ((id)::numeric = '1001'::numeric) | | Rows Removed by Filter: 333333 | | Planning Time: 0.180 ms | | Execution Time: 115.190 ms | +---------------------------------------------------------------------------------------------------------------------------+ (8 rows)

From times you can see, so parallel execution is not available inside plpgsql too.

So after fix:

DROP FUNCTION example01 (numeric) ; CREATE OR REPLACE FUNCTION public.example01(_id int) RETURNS int LANGUAGE plpgsql AS $function$ DECLARE r record; s int DEFAULT 0; BEGIN FOR r IN SELECT * FROM bigtable WHERE id = _id LOOP s := s + r.v; END LOOP; RETURN s; END; $function$

the execution time is about 2ms. But plpgsql_check is not happy still:

函数内没有修改数据的操作,可以改成stable或根据情况改成immutable的函数。优化器针对标记为stable、immutable函数,可以进行索引过滤。而volatile函数不支持索引,因为它在不停的变化。。

postgres=# SELECT * FROM plpgsql_check_function('example01', performance_warnings => true); +-----------------------------------------------------------------------------------------+ | plpgsql_check_function | +-----------------------------------------------------------------------------------------+ | performance:00000:routine is marked as VOLATILE, should be STABLE | | Hint: When you fix this issue, please, recheck other functions that uses this function. | +-----------------------------------------------------------------------------------------+ (2 rows)

When the function does only read operations from database, then should be marked as STABLE. So final code should to look like:

CREATE OR REPLACE FUNCTION public.example01(_id int) RETURNS int LANGUAGE plpgsql AS $function$ DECLARE r record; s int DEFAULT 0; BEGIN FOR r IN SELECT * FROM bigtable WHERE id = _id LOOP s := s + r.v; END LOOP; RETURN s; END; $function$ STABLE

Now execution is little bit faster (about 0.5 ms)

Next example is very untypical case for PLpgSQL (this language is not designed for numeric calculations), but it is good example of some developer's mistakes.

CREATE OR REPLACE FUNCTION public.example02(integer) RETURNS bigint LANGUAGE plpgsql AS $function$ DECLARE result bigint DEFAULT 0; BEGIN FOR i IN 1..$1 LOOP result := result + 1.0; END LOOP; RETURN result; END; $function$

Again, there is more than one issue. But we can check a performance - 1M iterations needs about 922ms. It is relatively slow - but this language is designed for different purpose - it is glue of SQL. plpgsql_check shows some warnings:

```
postgres=# SELECT * FROM plpgsql_check_function('example02', performance_warnings => true);
+-----------------------------------------------------------------------------------------+
| plpgsql_check_function |
+-----------------------------------------------------------------------------------------+
| performance:42804:3:statement block:target type is different type than source type |
| Detail: cast "integer" value to "bigint" type |
| Hint: Hidden casting can be a performance issue. |
| Context: during statement block local variable "result" initialization on line 2 |
| performance:42804:6:assignment:target type is different type than source type |
| Detail: cast "numeric" value to "bigint" type |
| Hint: Hidden casting can be a performance issue. |
| Context: at assignment to variable "result" declared on line 2 |
| warning extra:00000:4:DECLARE:never read variable "i" |
| performance:00000:routine is marked as VOLATILE, should be IMMUTABLE |
| Hint: When you fix this issue, please, recheck other functions that uses this function. |
+-----------------------------------------------------------------------------------------+
(11 rows)

Time: 2,080 ms
```

The most important issues are: 1. lazy casting from numeric inside cycle's body, 2. the function is marked as VOLATILE by default, although it should be marked IMMUTABLE (because doesn't do any operation with tables). The overhead of lazy cast from numeric type is about 470ms. The overhead of default VOLATILE is 200ms. The function

CREATE OR REPLACE FUNCTION public.example02(integer) RETURNS bigint LANGUAGE plpgsql AS $function$ DECLARE result bigint DEFAULT 0; BEGIN FOR i IN 1..$1 LOOP result := result + 1; END LOOP; RETURN result; END; $function$ IMMUTABLE

needs about 250ms for 1M iterations (good reduction from original 922ms). Probably it should be faster - I have developer build of PostgreSQL with active assertions.

SQL injection detection

动态SQL没有使用identified或format封锁,所以可以被入侵,例如输入变量时采用or条件加注释接口。。

plpgsql_check can help with most dangerous security issue of SQL world - SQL injection. Here is again buggy code:

```
CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$
DECLARE result int;
BEGIN
EXECUTE 'SELECT count(*) FROM ' || tablename INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql STABLE;

CREATE FUNCTION

postgres=# SELECT count_rows('bigtable');
+------------+
| count_rows |
+------------+
| 1000000 |
+------------+
(1 row)
```

This code is terrible bad, but plpgsql_check can to show the problem:

postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true); +-----------------------------------------------------------------------------+ | plpgsql_check_function | +-----------------------------------------------------------------------------+ | security:00000:3:EXECUTE:text type variable is not sanitized | | Query: SELECT 'SELECT count(*) FROM ' || tablename | | -- ^ | | Detail: The EXECUTE expression is SQL injection vulnerable. | | Hint: Use quote_ident, quote_literal or format function to secure variable. | +-----------------------------------------------------------------------------+ (5 rows)

Fixing is not hard:

```
CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$DECLARE result int;
BEGIN
EXECUTE 'SELECT count(*) FROM ' || quote_ident(tablename) INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql STABLE;
CREATE FUNCTION
Time: 51,511 ms

postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)

CREATE OR REPLACE FUNCTION count_rows(tablename text)
RETURNS text AS $$DECLARE result int;
BEGIN
EXECUTE format('SELECT count(*) FROM %I', tablename) INTO result;
RETURN result;
END
$$ LANGUAGE plpgsql STABLE;
CREATE FUNCTION
Time: 55,931 ms

postgres=# SELECT * FROM plpgsql_check_function('count_rows', security_warnings => true);
+------------------------+
| plpgsql_check_function |
+------------------------+
+------------------------+
(0 rows)
```

The SQL injection detection is only basic, but almost all bugs of this type are very simple.

小结

使用plpgsql_check可以帮助用户发现存储过程的问题(不需要run,即可发现问题),例如

1、客户函数中引用了不存在的表或列。触发器函数引用了不存在的中间表、对象等。

```
postgres=# select * from plpgsql_check_function('foo_trg()', 'bar');
plpgsql_check_function


error:42703:3:assignment:record "new" has no field "c"
(1 row)
For triggers with transitive tables you can set a oldtable or newtable parameters:

create or replace function footab_trig_func()
returns trigger as $$
declare x int;
begin
if false then
-- should be ok;
select count(*) from newtab into x;

-- should fail;  
select count(*) from newtab where d = 10 into x;

end if;
return null;
end;
$$ language plpgsql;

select * from plpgsql_check_function('footab_trig_func','footab', newtable := 'newtab');
```

2、帮助用户发现SQL注入的风险。

3、performance tips:帮助用户发现性能问题,例如没有创建索引,导致性能变差。

4、plpgsql profiler:帮助用户分析plpgsql函数或存储过程内部执行SQL的平均响应速度,最大响应速度,从而了解plpgsql代码瓶颈。

如果需要大批量了解plpgsql存储过程是否存在问题,可以使用系统表发觉。

```
-- check all nontrigger plpgsql functions
SELECT p.oid, p.proname, plpgsql_check_function(p.oid)
FROM pg_catalog.pg_namespace n
JOIN pg_catalog.pg_proc p ON pronamespace = n.oid
JOIN pg_catalog.pg_language l ON p.prolang = l.oid
WHERE l.lanname = 'plpgsql' AND p.prorettype <> 2279;
or

SELECT p.proname, tgrelid::regclass, cf.*
FROM pg_proc p
JOIN pg_trigger t ON t.tgfoid = p.oid
JOIN pg_language l ON p.prolang = l.oid
JOIN pg_namespace n ON p.pronamespace = n.oid,
LATERAL plpgsql_check_function(p.oid, t.tgrelid) cf
WHERE n.nspname = 'public' and l.lanname = 'plpgsql'
or

-- check all plpgsql functions (functions or trigger functions with defined triggers)
SELECT
(pcf).functionid::regprocedure, (pcf).lineno, (pcf).statement,
(pcf).sqlstate, (pcf).message, (pcf).detail, (pcf).hint, (pcf).level,
(pcf)."position", (pcf).query, (pcf).context
FROM
(
SELECT
plpgsql_check_function_tb(pg_proc.oid, COALESCE(pg_trigger.tgrelid, 0)) AS pcf
FROM pg_proc
LEFT JOIN pg_trigger
ON (pg_trigger.tgfoid = pg_proc.oid)
WHERE
prolang = (SELECT lang.oid FROM pg_language lang WHERE lang.lanname = 'plpgsql') AND
pronamespace <> (SELECT nsp.oid FROM pg_namespace nsp WHERE nsp.nspname = 'pg_catalog') AND
-- ignore unused triggers
(pg_proc.prorettype <> (SELECT typ.oid FROM pg_type typ WHERE typ.typname = 'trigger') OR
pg_trigger.tgfoid IS NOT NULL)
OFFSET 0
) ss
ORDER BY (pcf).functionid::regprocedure::text, (pcf).lineno
```

5、限制和参数配置,被动、主动模式配置请参考:

https://github.com/okbob/plpgsql_check

参考

http://okbob.blogspot.com/2019/08/why-you-need-plpgsqlcheck-if-you-write.html

https://github.com/okbob/plpgsql_check

https://groups.google.com/forum/#!forum/postgresql-extensions-hacking

PostgreSQL 许愿链接

您的愿望将传达给PG kernel hacker、数据库厂商等, 帮助提高数据库产品质量和功能, 说不定下一个PG版本就有您提出的功能点. 针对非常好的提议,奖励限量版PG文化衫、纪念品、贴纸、PG热门书籍等,奖品丰富,快来许愿。开不开森.

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

digoal's wechat

文章转载自digoal,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论