Redis - 基础概念, 数据结构

logerJava 2021-08-17

290

前言

从今天开始我们来整理 Redis 相关的知识体系, 因为 MySQL 的调优涉及到很多地方, 所以 loger 正在构思怎么去写, 当然中间不能闲着, 所以就拿 Redis 来填补一下空缺 😀

为什么要使用 Redis ?

相信点开文章的大家一定对 Redis 已经有了一定程度的了解, 如果不了解也没关系, 可以看一下它的官网 :

https://redis.io/topics/introduction

第一句话就是 Redis 的简介 : Redis 是一种开源（BSD 许可）、内存中数据结构存储，用作数据库、缓存和消息代理

剩余部分就不进行赘述了, loger 这里直接提取出重点 :

Redis 基于内存, 常用作缓存技术
为 key - value 方式存储
AOF, RDB 磁盘持久化
Lua 脚本保证原子性

如果大家用过 GitHub 或者网上下载的 Guava 或 hutool 工具类库, 就会发现在类库中存在一个用 Map 做的缓存工具类, 那么我们为什么要用 Redis 而不用 Map 在程序中做缓存呢 ?

首先无论是自己用 Map 实现的, 还是类库中提供的使用的应该是本地缓存, 这种缓存是不具有分布式下一致性的, 并且此种缓存最大的优点在于轻量, 快速, 生命周期是跟随 JVM 的销毁而结束的
Redis 中的缓存为分布式缓存, 就算是多台机器, 应该是每个实例共享一份缓存, 具有一致性
Redis 和 Map 实现的缓存存储量不同, 使用 Map 时, JVM 如果内存占用过大容易挂掉, 而缓存数据也会随之消失; Redis 的缓存容量相较来说大了很多, 并且提供了持久化机制, 确保在宕机, 掉电等情况下数据的恢复问题
使用 Map 操作无法使用 Redis 中各种方便的机制, 如各种数据结构, 过期机制, 淘汰策略等

关于 Redis 的应用场景可以参考 :

http://www.scienjus.com/redis-use-case/

为什么在程序中要使用缓存 ?

在关系型数据库中, 读写操作一般都是要经过磁盘的, 而磁盘的读写速度相对于内存的读写速度来说很慢, 而遇到一些业务场景时, 关系型数据库已经不能满足业务场景了, 比如秒杀场景, 访问流量高峰等这种情况下如果直接去操作数据库, 很容易就会打穿数据库, 崩溃, 宕机也就接踵而至, 这些情况下就时缓存出场的时候了

让 CPU 告诉你硬盘和网络到底有多慢

https://cizixs.com/2017/01/03/how-slow-is-disk-and-network/

Redis 与 Memcached

其实缓存有多种实现方式, 而最常见用来与 Redis 进行比较的就是 Memcached, 这两者都是非关系型内存键值数据库, 我们整理一下不同之处 :

数据类型不同

Memcached 仅支持字符串类型
Redis 支持 5 种不同数据类型

持久化不同

Memcached 不支持持久化
Redis 存在 AOF 与 RDB

分布式不同

Memcached 不支持分布式, 可以在客户端通过一致性哈希的方式实现分布式
Redis Cluster 支持分布式

内存管理机制不同

Memcached 将内存分割成特定长度的块来存储数据, 以完全解决内存碎片的问题, 但这种方式会使内存的利用率不高, 例如块的大小为 128 bytes, 只存储 100 bytes 的数据, 那么剩下的 28 bytes 就浪费掉了
Redis 中, 并不是所有数据都一直存储在内存中, 可以将一些很久没用的 value 交换到磁盘, 而 Memcached 的数据则会一直在内存中

数据类型与常见操作

Redis 中存在 5 种数据类型 : STRING, LIST, HASH, SET, ZSET (sorted set), 这个在官网中有介绍

数据类型	可存储	操作
STRING	字符串, 整数, 浮点数	对整个字符串或者字符串的其中一部分执行操作, 对整数和浮点数执行自增或者自减操作
LIST	列表	从两端压入或者弹出元素, 对单个或者多个元素进行修剪, 只保留一个范围内的元素
SET	无序集合	添加、获取、移除单个元素, 检查一个元素是否存在于集合中, 计算交集、并集、差集
HASH	包含键值对的无序散列表	添加、获取、移除单个键值对, 获取所有键值对, 检查某个键是否存在
ZSET	有序集合	添加、获取、删除元素, 根据分值范围或者成员来获取元素, 计算一个键的排名

What Redis data structures look like

https://redis.com/ebook/part-1-getting-started/chapter-1-getting-to-know-redis/1-2-what-redis-data-structures-look-like/

关于常见操作, 十分简单这里就不水文章了, 如果手痒痒的话, 可以用下面两个网站对照着试一下, 不用安装 Redis 即可操作 :

Redis 的命令参考

http://doc.redisfans.com/

在线尝试 Redis 命令

https://try.redis.io/

数据结构

上面忘了讲, Redis 是用 C 语言进行编写的, 实不相瞒 loger 压根就不会 C 语言, 但是这并不耽误咱们去看它的实现逻辑对吧 😀

字典

dictht 是一个散列表结构，使用拉链法解决哈希冲突

/* This is our hash table structure. Every dictionary has two of this as we
* implement incremental rehashing, for the old to the new table. */
typedef struct dictht {
 
   dictEntry **table; // 哈希表数组
 
   unsigned long size; // 哈希表大小
 
   unsigned long sizemask; // 哈希表大小掩码，计算索引值
 
   unsigned long used; // 哈希表已有节点数量
 
} dictht;

typedef struct dictEntry {
 
   void *key; // 键
 
   union { // 值
 
       void *val;
 
       uint64_t u64;
 
       int64_t s64;
 
       double d;
 
  } v;
 
   struct dictEntry *next; // 指向下个哈希节点，组成链表
 
} dictEntry;

Redis 的字典 dict 中包含两个哈希表 dictht, 方便进行 rehash 操作, 在扩容时, 将其中一个 dictht 上的键值对 rehash 到另一个 dictht 上面, 完成之后释放空间并交换两个 dictht 的角色

typedef struct dict {
 
   dictType *type; // 类型特定函数
 
   void *privdata; // 私有数据
 
   dictht ht[2]; // 哈希表
 
   long rehashidx; // rehash索引, 当rehash不进行时, 值为-1
 
   unsigned long iterators; // 当前运行的迭代器数量
 
} dict;

typedef struct dictType{
   unsigned int (*hashFunction)(const void * key); // 计算哈希值的函数
   void *(*keyDup)(void *private, const void *key); // 复制键的函数
   void *(*valDup)(void *private, const void *obj);   // 复制值的函数
   int (*keyCompare)(void *privdata , const void *key1, const void *key2); // 对比键的函数
   void (*keyDestructor)(void *private, void *key); // 销毁键的函数
   void (*valDestructor)(void *private, void *obj); // 销毁值的函数
 
}dictType

从代码实现上我们可以发现，Redis 中有实际存在两个哈希表 :

ht[0] ：用于存放真实的 key - vlaue 数据
ht[1] ：用于扩容 (rehash)

Redis 中哈希算法和哈希冲突与 Java 实现的差不多, 他们的不同在于 :

Redis 出现哈希冲突会将新节点放在链表头部
Java 在 JDK 1.8 之后出现哈希冲突, 会放在链表尾部

下面我们来看一下 rehash, rehash 操作不是一次性完成, 而是采用渐进方式, 这是为了避免一次性执行过多的 rehash 操作给服务器带来过大的负担

渐进式 rehash 通过记录 dict 的 rehashidx 完成, 它从 0 开始, 然后每执行一次 rehash 都会递增, 例如在一次 rehash 中, 要把 dict[0] rehash 到 dict[1], 这一次会把 dict[0] 上 table[rehashidx] 的键值对 rehash 到 dict[1] 上, dict[0] 的 table[rehashidx] 指向 null, 并令 rehashidx++

在 rehash 期间, 每次对字典执行添加、删除、查找或者更新操作时, 都会执行一次渐进式 rehash

采用渐进式 rehash 会导致字典中的数据分散在两个 dictht 上, 因此对字典的查找操作也需要到对应的 dictht 去执行

/* Performs N steps of incremental rehashing. Returns 1 if there are still
* keys to move from the old to the new hash table, otherwise 0 is returned.
*
* Note that a rehashing step consists in moving a bucket (that may have more
* than one key as we use chaining) from the old to the new hash table, however
* since part of the hash table may be composed of empty spaces, it is not
* guaranteed that this function will rehash even a single bucket, since it
* will visit at max N*10 empty buckets in total, otherwise the amount of
* work it does would be unbound and the function may block for a long time. */
int dictRehash(dict *d, int n) {
   int empty_visits = n * 10; /* Max number of empty buckets to visit. */
   if (!dictIsRehashing(d)) return 0;
   while (n-- && d->ht[0].used != 0) {
       dictEntry *de, *nextde;
       /* Note that rehashidx can't overflow as we are sure there are more
        * elements because ht[0].used != 0 */
       assert(d->ht[0].size > (unsigned long) d->rehashidx);
       while (d->ht[0].table[d->rehashidx] == NULL) {
           d->rehashidx++;
           if (--empty_visits == 0) return 1;
      }
       de = d->ht[0].table[d->rehashidx];
       /* Move all the keys in this bucket from the old to the new hash HT */
       while (de) {
           uint64_t h;
           nextde = de->next;
           /* Get the index in the new hash table */
           h = dictHashKey(d, de->key) & d->ht[1].sizemask;
           de->next = d->ht[1].table[h];
           d->ht[1].table[h] = de;
           d->ht[0].used--;
           d->ht[1].used++;
           de = nextde;
      }
       d->ht[0].table[d->rehashidx] = NULL;
       d->rehashidx++;
  }
   /* Check if we already rehashed the whole table... */
   if (d->ht[0].used == 0) {
       zfree(d->ht[0].table);
       d->ht[0] = d->ht[1];
       _dictReset(&d->ht[1]);
       d->rehashidx = -1;
       return 0;
  }
   /* More to rehash... */
   return 1;
}