本文由巨杉数据库北美实验室资深数据库架构师撰写,主要介绍巨杉数据库的并发malloc实现与架构设计。原文为英文撰写,我们提供了中文译本在英文之后。
SequoiaDB Concurrent malloc Implementation
Introduction
In a C/C++ application, the dynamic memory allocation function malloc(3) can have a significant impact on the application’s performance. For multi-threaded applications such as a database engine, a sub-optimal memory allocator can also limit the scalability of the application. In this paper, we will discuss several popular dynamic memory allocator, and how SequoiaDB addresses the dynamic memory allocation problem in its database engine.
dlmalloc/ptmalloc
The GNU C library (glibc) uses ptmalloc, which is an allocator forked from dlmalloc with thread-related improvement. Memories are allocated as chunks, which is 8-byte aligned data structure containing a header and usable memory. This means there is at least an 8 or 16 byte overhead for memory chunk management. Unallocated memory is grouped by similar sizes and maintained by a double-linked list of chunks.
jemalloc
Originally developed by Jason Evans in 2005, jemalloc has since been adopted by FreeBSD, Facebook, Mozilla Firefox, MariaDB, Android and etc. jemalloc is a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support. In order to avoid lock contention, jemalloc uses separate memory pool “arenas” for each CPU, and threads are assigned to an arena to handle malloc requests.
tcmalloc

SequoiaDB Implementation
In SequoiaDB 3.4, it implements its own proprietary memory allocator, which is highly efficient and tailored for the memory usage within the SequoiaDB database engine. While jemalloc and tcmalloc are both excellent general purpose memory allocator, they cannot address all the challenges that are encountered within SequoiaDB. For example, the ability to trace memory requests is an important requirement in SequoiaDB engine, and this feature is lacking in existing third-party memory allocators. Figure 2 shows the architecture of the SequoiaDB memory model. There are three layers - thread, pool and OSS (Operating System Services).

OSS Layer
Pool Layer
Thread Layer
介绍
dlmalloc/ptmalloc
GNU C 库 (glibc) 使用 ptmalloc,它是从 dlmalloc 派生的具有线程相关改进的分配器。内存被分配为块,这是 8byte 对齐的数据结构,其中包含标头和可用内存。这意味着内存块管理至少有 8 或 16byte 的开销。未分配的内存按相似的大小分组,并由块的双向链接列表维护。
jemalloc
jemalloc 最初由 Jason Evans 于2005年开发,此后已被 FreeBSD,Facebook,Mozilla Firefox,MariaDB,Android 等采用。jemalloc 是通用的 malloc(3) 实现,主要特点是避免碎片化和可扩展的并发支持。为了避免锁竞争,jemalloc 为每个 CPU 使用单独的内存池“区域”,并且将线程分配给区域以处理 malloc 请求。
tcmalloc

SequoiaDB的实现
在 SequoiaDB 中(以 SequoiaDB v3.4 作为例子),它实现了自己专有的内存分配器,该分配器高效且针对 SequoiaDB 数据库引擎中的内存使用量身定制。尽管 jemalloc 和 tcmalloc 都是出色的通用内存分配器,但它们无法解决 SequoiaDB 内部遇到的所有挑战。例如,跟踪内存请求的能力是 SequoiaDB 引擎的一项重要要求,而现有的第三方内存分配器缺少此功能。图2显示了 SequoiaDB 内存模型的体系结构。共有三层-线程,池和 OSS(操作系统服务)。

OSS Layer
OSS 层提供了 malloc API,该 API 向底层操作系统请求内存。这也是 PoolLayer 从中获取内存的位置。
Pool Layer
Pool Layer 是全局内存池,其中包含不同大小的段。段是从 OSS 层分配的连续内存块。每个段分为固定大小的块。默认情况下,有32字节,64、128…8092字节的块大小。超过8092字节最大块大小阈值的请求将由 OSS 层处理。
Thread Layer
线程层是线程本地缓存,每个线程都有其自己的专用缓存,因此可以无锁方式完成内存分配。内存块按其块大小分组在一起,使用链接列表实现。从 Pool Layer 请求内存块并将其缓存到配置的阈值。对于超过此阈值的内存,它们将释放回 Pool Layer 并可以由其他线程重用。
此设计有助于限制整体内存占用。此外,每个线程都有一个弹性大块,用于服务超过最大块大小阈值的请求。因此,在大多数情况下,可以在线程层中满足请求,这既高效又快速。
此外,SequoiaDB 内存模型还具有内置的内存调试功能,可以检测内存损坏。它还具有跟踪功能,可以跟踪从哪里请求内存。最重要的是,它是完全可配置的,并允许根据客户的工作量和环境自定义部署。
往期技术干货巨杉数据库 v5.0 Beta版 正式发布
巨杉数据库无人值守智能自动化测试实践
巨杉Tech | SequoiaDB 分布式事务实现原理简介







