guava cache 源码分析(四)

dragon元 2019-05-09

284

好的，我又回来了，今天带来的是 guava Cache 源码分析第四篇，这篇是分析 guava Cache 的主动淘汰和扩容。

void runLockedCleanup(long now) {
    if (this.tryLock()) {
        try {
            this.drainReferenceQueues();
            this.expireEntries(now);
            this.readCount.set(0);
        } finally {
            this.unlock();
        }
    }


}

在之前章节，把 this.drainReferenceQueues(); 方法都分析了一遍，现在分析 this.expireEntries(now); 这个方法

@GuardedBy("this")
void expireEntries(long now) {
    this.drainRecencyQueue();


    ReferenceEntry e;
    while((e = (ReferenceEntry)this.writeQueue.peek()) != null && this.map.isExpired(e, now)) {
        if (!this.removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
            throw new AssertionError();
        }
    }


    while((e = (ReferenceEntry)this.accessQueue.peek()) != null && this.map.isExpired(e, now)) {
        if (!this.removeEntry(e, e.getHash(), RemovalCause.EXPIRED)) {
            throw new AssertionError();
        }
    }
}

该方法是 guava Cache 的主动清除缓存。

从方法内的第一段代码开始

@GuardedBy("this")
void drainRecencyQueue() {
    ReferenceEntry e;
    while((e = (ReferenceEntry)this.recencyQueue.poll()) != null) {
        if (this.accessQueue.contains(e)) {
            this.accessQueue.add(e);
        }
    }
}

查询缓存的请求成功时会加入 recencyQueue 队列，而此方法目的是将所有查询成功的缓存都转移到 accessQueue 队列中。

recencyQueue，accessQueue 这两个队列，在我们不指定过期时间或不指定最大重量时是给一个默认的队列，。writeQueue 在不指定写入过期时间时会给一个默认队列。

//三个队列的赋值
this.recencyQueue = (Queue)(map.usesAccessQueue() ? new ConcurrentLinkedQueue() : LocalCache.DISCARDING_QUEUE);
this.writeQueue = (Queue)(map.usesWriteQueue() ? new LocalCache.WriteQueue() : LocalCache.DISCARDING_QUEUE);
this.accessQueue = (Queue)(map.usesAccessQueue() ? new LocalCache.AccessQueue() : LocalCache.DISCARDING_QUEUE);


//默认队列
static final Queue<?> DISCARDING_QUEUE = new AbstractQueue<Object>() {
    public boolean offer(Object o) {
        return true;
    }


    public Object peek() {
        return null;
    }


    public Object poll() {
        return null;
    }


    public int size() {
        return 0;
    }


    public Iterator<Object> iterator() {
        return ImmutableSet.of().iterator();
    }
};

可以看出默认队列是不会做任何事情的。

在执行完 drainRecencyQueue 方法之后，会从 writeQueue 和 accessQueue 队列中获取头元素，首先说明一点 peek 方法为返回头元素，而 poll 方法为弹出头元素，peek 不对队列做操作，而 poll 会改变头元素。而当取出得元素不为 null 时，会执行 this.map.isExpired(e, now) 该方法。

boolean isExpired(ReferenceEntry<K, V> entry, long now) {
    Preconditions.checkNotNull(entry);
    if (this.expiresAfterAccess() && now - entry.getAccessTime() >= this.expireAfterAccessNanos) {
        return true;
    } else {
        return this.expiresAfterWrite() && now - entry.getWriteTime() >= this.expireAfterWriteNanos;
    }
}

expiresAfterAccess 方法为判断是否设置过期时间(在给定时间内写入或者获取)，entry.getAccessTime() 最后一次操作得时间，默认时间为 Long 得最大值 9223372036854775807L。now 值在之前也说明过，此值可以手动的设置，不设置则默认获取系统得当前纳秒时间，正常的效果就是当前时间 - 最后一次访问时间 >= 过期时间，那么会返回 true，else 代码块的逻辑也是如此

那么如果此缓存已到过期时间，removeEntry 方法又做了什么事情呢？

@VisibleForTesting
@GuardedBy("this")
boolean removeEntry(ReferenceEntry<K, V> entry, int hash, RemovalCause cause) {
    int newCount = this.count - 1;
    AtomicReferenceArray<ReferenceEntry<K, V>> table = this.table;
    int index = hash & table.length() - 1;
    ReferenceEntry<K, V> first = (ReferenceEntry)table.get(index);


    for(ReferenceEntry e = first; e != null; e = e.getNext()) {
        if (e == entry) {
            ++this.modCount;
            ReferenceEntry<K, V> newFirst = this.removeValueFromChain(first, e, e.getKey(), hash, e.getValueReference().get(), e.getValueReference(), cause);
            newCount = this.count - 1;
            table.set(index, newFirst);
            this.count = newCount;
            return true;
        }
    }


    return false;
}

代码和之前得 reclaimKey 差不多，就不在做说明了。唯一不同得可能是 RemovalCause 值了。reclaimKey 是 RemovalCause.COLLECTED ，removeEntry 是 EXPIRED。该值是缓存被回收得原因。

这一部分为主动回收机制，那么现在只剩下最后一部分了， guava Cache 的扩容。

int newCount = this.count + 1;
if (newCount > this.threshold) {
    this.expand();
    newCount = this.count + 1;
}

通过判断原缓存数 + 1 是否大于 threshold 值。大于则做扩容操作，首先看它是在何时被初始化得。

//segment 构造中的初始化代码
this.initTable(this.newEntryArray(initialCapacity));


//创建一个原子数组
AtomicReferenceArray<ReferenceEntry<K, V>> newEntryArray(int size) {
    return new AtomicReferenceArray(size);
}


//将创建的数组执行成员变量 table。并设置 threshold 的值
void initTable(AtomicReferenceArray<ReferenceEntry<K, V>> newTable) {
    this.threshold = newTable.length() * 3 / 4;
    if (!this.map.customWeigher() && (long)this.threshold == this.maxSegmentWeight) {
        ++this.threshold;
    }


    this.table = newTable;
}

threshold 的值为数组长度得 3/4。其实这个和 HashMap 的扩容机制差不多，只不过 Guava Cache 是在 put 数据前，而 HashMap 是在 put 数据后，那 Guava Cache 扩容方法都做什么操作呢？

@GuardedBy("this")
void expand() {
    //获取数组对象
    AtomicReferenceArray<ReferenceEntry<K, V>> oldTable = this.table;
    //获取数组长度
    int oldCapacity = oldTable.length();
    if (oldCapacity < 1073741824) {
        //获取缓存数量
        int newCount = this.count;
        //创建一个新数组，大小设置为之前的2倍
        AtomicReferenceArray<ReferenceEntry<K, V>> newTable = this.newEntryArray(oldCapacity << 1);
        //设置新的临界值
        this.threshold = newTable.length() * 3 / 4;
        int newMask = newTable.length() - 1;


        //循环老数组
        for(int oldIndex = 0; oldIndex < oldCapacity; ++oldIndex) {
            //获取数组指定下标的头节点
            ReferenceEntry<K, V> head = (ReferenceEntry)oldTable.get(oldIndex);
            if (head != null) {
                //获取头节点的 next 节点
                ReferenceEntry<K, V> next = head.getNext();
                //计算头节点的新下标
                int headIndex = head.getHash() & newMask;
                //头节点不存在 next 节点，则直接放置在指定下标上
                if (next == null) {
                    newTable.set(headIndex, head);
                } else {//存在 next 节点
                    ReferenceEntry<K, V> tail = head;
                    int tailIndex = headIndex;


                    ReferenceEntry e;
                    int newIndex;
                    for(e = next; e != null; e = e.getNext()) {
                        newIndex = e.getHash() & newMask;
                        if (newIndex != tailIndex) {
                            tailIndex = newIndex;
                            tail = e;
                        }
                    }


                    newTable.set(tailIndex, tail);


                    for(e = head; e != tail; e = e.getNext()) {
                        newIndex = e.getHash() & newMask;
                        ReferenceEntry<K, V> newNext = (ReferenceEntry)newTable.get(newIndex);
                        ReferenceEntry<K, V> newFirst = this.copyEntry(e, newNext);
                        if (newFirst != null) {
                            newTable.set(newIndex, newFirst);
                        } else {
                            this.removeCollectedEntry(e);
                            --newCount;
                        }
                    }
                }
            }
        }


        //设置新数组
        this.table = newTable;
        //设置缓存数量
        this.count = newCount;
    }
}

如果看过 HashMap 源码就会发现 guava Cache 扩容其实和 hashMap 得扩容差不多。guava Cache 扩容后得数量是原先得 2 倍。扩容其实挺简单得。没啥好多说得，不过有一段我认为还是有一点难以理解的，就是转移元素的那一段。

ReferenceEntry<K, V> tail = head;
int tailIndex = headIndex;


ReferenceEntry e;
int newIndex;
for(e = next; e != null; e = e.getNext()) {
    newIndex = e.getHash() & newMask;
    if (newIndex != tailIndex) {
        tailIndex = newIndex;
        tail = e;
    }
}


newTable.set(tailIndex, tail);

上面这一段代码我起初看得时候也并不理解，不理解为什么会有这样一段操作，因为这段操作在我看来是可以忽略的。明明一次循环就可以做完得操作，为什么要单独做这样得操作呢。

tail 记录节点，tailIndex 记录下标值，newIndex 记录新得下标。循环该链表的所有节点，依次计算每个节点的新下标值，由于数组每次都是以原先的2倍进行扩容，而在创建数组的时候初始大小经过限制只会是2的n次幂，所以，每个节点的要么在原来的下标上，要么在原来的下标的值上加上原数组长度。因为所有节点都按照这个规律的话，是不存在原先不同下标的节点分到同一个下标的。所以可以直接执行 newTable.set(tailIndex, tail); 这段代码，而不用判断该下标是否存在节点。

for 循环效果是为了找到从末尾节点开始连续的、下标值也相同的节点，并将这段节点存放在新数组中。所以如果运气不好，末尾节点与它的前置节点计算的新下标不在同一个位置，那么这一段代码就只对末尾节点做了操作。如果运气好，循环下来所有节点的下标都是相同的。那么直接将头节点放入新数组中就好了。这样就节省了很多时间，因为在下一个循环中，除了已经被存放在新数组中的节点，其他的节点都会重新生成，并且顺序也将倒置，可以看下下面这段代码的操作。

for(e = head; e != tail; e = e.getNext()) {
    newIndex = e.getHash() & newMask;
    ReferenceEntry<K, V> newNext = (ReferenceEntry)newTable.get(newIndex);
    ReferenceEntry<K, V> newFirst = this.copyEntry(e, newNext);
    if (newFirst != null) {
        newTable.set(newIndex, newFirst);
    } else {
        this.removeCollectedEntry(e);
        --newCount;
    }
}

在 for 循环中，判断节点不等于 tail 才进入循环体，tail 在之前也说了，tail 节点和 tail 之后的节点都会先加入到新数组中，所以如果 tail == head 的话，这一段是不会执行的。

循环体里的代码就是获取指定下标的头节点，并复制旧的缓存数据生成新的缓存，由于头节点会作为新生成的节点的尾节点，所以可以知道 head 节点到 tail 节点之间的节点它们的顺序会倒置。值得注意的是这一段代码也做了清除失效缓存的操作。

到这里 put 的第一部分已经结束，其中有些地方我也看了挺久了，也有一开始疑惑的地方，但是慢慢看下去也会有种豁然开朗的感觉，第一部分代码量相对多一些，到第二三部分代码就会少一些，也有一些复用的方法，所以不用重复分析。

其实第一部分总结下来就几句话

1. 判断 key 或 value 是否不为强引用

2. 如果有 key 或 value 被 GC 回收了，则清除对应得缓存，并清除失效缓存

3. 执行主动淘汰策略，判断缓存是否过期，过期则清除缓存，并清除失效缓存

4. 判断缓存数量是否达到临界值，达到则扩容，并清除失效缓存

数据库

文章转载自dragon元，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

guava cache 源码分析(四)

评论