PostgreSQL 14 preview - pgbench 压测工具新增随机函数 permute(i, size, [seed]) 返回 i 经过重新(随机)映射后在 [0,size) 范围内的一个值

digoal 2021-01-04

363

作者

digoal

日期

2021-04-07

背景

pgbench 压测工具新增随机函数 permute(i, size, [seed]) 返回 i 经过重新(随机)映射后在 [0,size) 范围内的一个值 .

类似的功能 abs(hash(i) % size) , 但是hash有冲突的问题, permute没有冲突问题.

主要解决的是随机问题, 避免某些场景中ctid(物理存储)与实际写入value之间存在的线性相关(例如自动产生的序列. 或者多个随机值对应多个列这些列之间可能存在着某些相关性), 与实际用户行为不符导致的测试数据偏差.

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6b258e3d688db14aadb58dde2a72939362310684

```
pgbench: Function to generate random permutations.
author Dean Rasheed dean.a.rasheed@gmail.com
Tue, 6 Apr 2021 10:50:42 +0000 (11:50 +0100)
committer Dean Rasheed dean.a.rasheed@gmail.com
Tue, 6 Apr 2021 10:50:42 +0000 (11:50 +0100)
commit 6b258e3d688db14aadb58dde2a72939362310684
tree b1f740242a8998a1992065f603953e848680d89b tree
parent a8af856d3257138590788e40eb84049def147acf commit | diff
pgbench: Function to generate random permutations.

This adds a new function, permute(), that generates pseudorandom
permutations of arbitrary sizes. This can be used to randomly shuffle
a set of values to remove unwanted correlations. For example,
permuting the output from a non-uniform random distribution so that
all the most common values aren't collocated, allowing more realistic
tests to be performed.

Formerly, hash() was recommended for this purpose, but that suffers
from collisions that might alter the distribution, so recommend
permute() for this purpose instead.

Fabien Coelho and Hironobu Suzuki, with additional hacking be me.
Reviewed by Thomas Munro, Alvaro Herrera and Muhammad Usama.

Discussion: https://postgr.es/m/alpine.DEB.2.21.1807280944370.5142@lancre
```

```
+/
+ * Pseudorandom permutation function
+
+ * For small sizes, this generates each of the (size!) possible permutations
+ * of integers in the range [0, size) with roughly equal probability. Once
+ * the size is larger than 20, the number of possible permutations exceeds the
+ * number of distinct states of the internal pseudorandom number generators,
+ * and so not all possible permutations can be generated, but the permutations
+ * chosen should continue to give the appearance of being random.
+
+ * THIS FUNCTION IS NOT CRYPTOGRAPHICALLY SECURE.
+ * DO NOT USE FOR SUCH PURPOSE.
+ /
+static int64
+permute(const int64 val, const int64 isize, const int64 seed)
+{
+ RandomState random_state1;
+ RandomState random_state2;
+ uint64 size;
+ uint64 v;
+ int masklen;
+ uint64 mask;
+ int i;
+
+ if (isize < 2)
+ return 0; / nothing to permute /
+
+ / Initialize a pair of random states using the seed /
+ random_state1.xseed[0] = seed & 0xFFFF;
+ random_state1.xseed[1] = (seed >> 16) & 0xFFFF;
+ random_state1.xseed[2] = (seed >> 32) & 0xFFFF;
+
+ random_state2.xseed[0] = (((uint64) seed) >> 48) & 0xFFFF;
+ random_state2.xseed[1] = seed & 0xFFFF;
+ random_state2.xseed[2] = (seed >> 16) & 0xFFFF;
+
+ / Computations are performed on unsigned values /
+ size = (uint64) isize;
+ v = (uint64) val % size;
+
+ / Mask to work modulo largest power of 2 less than or equal to size /
+ masklen = pg_leftmost_one_pos64(size);
+ mask = (((uint64) 1) << masklen) - 1;
+
+ /
+ * Permute the input value by applying several rounds of pseudorandom
+ * bijective transformations. The intention here is to distribute each
+ * input uniformly randomly across the range, and separate adjacent inputs
+ * approximately uniformly randomly from each other, leading to a fairly
+ * random overall choice of permutation.
+
+ * To separate adjacent inputs, we multiply by a random number modulo
+ * (mask + 1), which is a power of 2. For this to be a bijection, the
+ * multiplier must be odd. Since this is known to lead to less randomness
+ * in the lower bits, we also apply a rotation that shifts the topmost bit
+ * into the least significant bit. In the special cases where size <= 3,
+ * mask = 1 and each of these operations is actually a no-op, so we also
+ * XOR the value with a different random number to inject additional
+ * randomness. Since the size is generally not a power of 2, we apply
+ * this bijection on overlapping upper and lower halves of the input.
+
+ * To distribute the inputs uniformly across the range, we then also apply
+ * a random offset modulo the full range.
+
+ * Taken together, these operations resemble a modified linear
+ * congruential generator, as is commonly used in pseudorandom number
+ * generators. The number of rounds is fairly arbitrary, but six has been
+ * found empirically to give a fairly good tradeoff between performance
+ * and uniform randomness. For small sizes it selects each of the (size!)
+ * possible permutations with roughly equal probability. For larger
+ * sizes, not all permutations can be generated, but the intended random
+ * spread is still produced.
+ /
+ for (i = 0; i < 6; i++)
+ {
+ uint64 m,
+ r,
+ t;
+
+ / Random multiply (by an odd number), XOR and rotate of lower half /
+ m = (uint64) getrand(&random_state1, 0, mask) | 1;
+ r = (uint64) getrand(&random_state2, 0, mask);
+ if (v <= mask)
+ {
+ v = ((v * m) ^ r) & mask;
+ v = ((v << 1) & mask) | (v >> (masklen - 1));
+ }
+
+ / Random multiply (by an odd number), XOR and rotate of upper half /
+ m = (uint64) getrand(&random_state1, 0, mask) | 1;
+ r = (uint64) getrand(&random_state2, 0, mask);
+ t = size - 1 - v;
+ if (t <= mask)
+ {
+ t = ((t * m) ^ r) & mask;
+ t = ((t << 1) & mask) | (t >> (masklen - 1));
+ v = size - 1 - t;
+ }
+
+ / Random offset */
+ r = (uint64) getrand(&random_state2, 0, size - 1);
+ v = (v + r) % size;
+ }
+
+ return (int64) v;
+}
+

```

https://www.postgresql.org/docs/devel/pgbench.html

permute ( i, size [, seed ] ) → integer

Permuted value of i, in the range [0, size). This is the new position of i (modulo size) in a pseudorandom permutation of the integers 0...size-1, parameterized by seed, see below.

permute(0, 4) → an integer between 0 and 3

Note
When designing a benchmark which selects rows non-uniformly, be aware that the rows chosen may be correlated with other data such as IDs from a sequence or the physical row ordering, which may skew performance measurements.

To avoid this, you may wish to use the permute function, or some other additional step with similar effect, to shuffle the selected rows and remove such correlations.

permute accepts an input value, a size, and an optional seed parameter. It generates a pseudorandom permutation of integers in the range [0, size), and returns the index of the input value in the permuted values. The permutation chosen is parameterized by the seed, which defaults to :default_seed, if not specified. Unlike the hash functions, permute ensures that there are no collisions or holes in the output values. Input values outside the interval are interpreted modulo the size. The function raises an error if the size is not positive. permute can be used to scatter the distribution of non-uniform random functions such as random_zipfian or random_exponential so that values drawn more often are not trivially correlated. For instance, the following pgbench script simulates a possible real world workload typical for social media and blogging platforms where a few accounts generate excessive load:

\set size 1000000 \set r random_zipfian(1, :size, 1.07) \set k 1 + permute(:r, :size)

In some cases several distinct distributions are needed which don't correlate with each other and this is when the optional seed parameter comes in handy:

\set k1 1 + permute(:r, :size, :default_seed + 123) \set k2 1 + permute(:r, :size, :default_seed + 321)

A similar behavior can also be approximated with hash:

\set size 1000000 \set r random_zipfian(1, 100 * :size, 1.07) \set k 1 + abs(hash(:r)) % :size

文章转载自digoal，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

PostgreSQL 14 preview - pgbench 压测工具新增随机函数 permute(i, size, [seed]) 返回 i 经过重新(随机)映射后在 [0,size) 范围内的一个值

作者

日期

标签

背景

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

PostgreSQL 14 preview - pgbench 压测工具新增 随机函数 permute(i, size, [seed]) 返回 i 经过重新(随机)映射后 在 [0,size) 范围内的一个值

作者

日期

标签

背景

PostgreSQL 许愿链接

9.9元购买3个月阿里云RDS PostgreSQL实例

PostgreSQL 解决方案集合

德哥 / digoal's github - 公益是一辈子的事.

评论

PostgreSQL 14 preview - pgbench 压测工具新增随机函数 permute(i, size, [seed]) 返回 i 经过重新(随机)映射后在 [0,size) 范围内的一个值