暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Ceph性能瓶颈分析与优化(rbd篇)

Ceph开源社区 2020-11-11
1753

本文经作者授权,转自知乎,原文链接:https://zhuanlan.zhihu.com/p/277448500

背景

希望通过对ceph的性能瓶颈点进行分析,从而找到io路径主要性能瓶颈点,进而对ceph的性能做相应的优化。

ceph的性能瓶颈分析与优化一直是一个难点,因为优化性能这个事情本身是没有一成不变的可供参考的例子,比如一些朋友提供了ceph配置优化的建议,但是这种优化是基于特定硬件以及特定环境的,是不可复制的。我个人理解的是如果期望对性能瓶颈进行分析并进行优化,最好的办法便是通过ceph的io日志或者是各个阶段的延迟的监控指标,进而找到耗时比较长的点,然后针对此做出优化(当然,做这个事情的前提便是需要对ceph的io路径比较熟悉,同时由于ceph的日志或者相关指标项可能不是很全,必要时需要自己添加)。

io耗时初分析

这里是利用fio来测试,fio测试的话主要是有几个阶段的耗时:

  • fio工具发起请求到librbd接收到请求的耗时

  • librbd️接收到请求到向osd发送出请求的耗时

  • osd处理io的耗时

  • osd发送reply给client,client接收到的耗时

  • client给应用(fio)回复的耗时

所以这里我们期望通过ceph提供的延时阶段来大概看到哪个阶段是主要耗时点:

// fio命令-- fio整个耗时是15ms左右
username@hostname:/# fio -filename=/dev/nbd0 -direct=1 -iodepth=128 -rw=randwrite -ioengine=libaio -bs=4k -time_based -numjobs=1 -name="test" -runtime=3
test: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=libaio, iodepth=128
fio-2.16
Starting 1 process
Jobs: 1 (f=1): [w(1)] [-.-% done] [0KB/34116KB/0KB s] [0/8529/0 iops] [eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=1173687: Thu Sep 17 14:10:08 2020
write: io=102280KB, bw=33935KB/s, iops=8483, runt= 3014msec
slat (usec): min=1, max=2238, avg= 9.73, stdev=19.80
clat (msec): min=1, max=47, avg=15.07, stdev= 2.25
lat (msec): min=1, max=47, avg=15.08, stdev= 2.25


// wr_latency是librbd收到fio的请求到发送给osd,再到收到回复并准备返回给应用io完成的整个耗时
// 可以看到fio的延时与如下wr_latency延时基本相当,都是十几毫秒。这也就证明了fio应用与rbd交互本身耗时很短
username@hostname:/# sudo ceph daemon var/run/ceph/ceph-client.828073.asok perf dump
"librbd-16bdd26b8b4567-rbd-volume-nbd": {
"wr": 25570,
"wr_bytes": 104734720,
"wr_latency": {
"avgcount": 25570,
"sum": 379.966093228,
"avgtime": 0.014859839
},

// 所以一个想法便是既然写延时这么高,那么这个延时是发生在osd侧么?
// 所以测试前清空osd的perf,然后fio完查看

username@hostname# for i in {0,3,4};do sudo ceph daemon osd.${i} perf reset all ;done
root@pubt2-ceph0:/home/hzwuhongsong# for i in {0,3,4};do sudo ceph daemon osd.${i} perf dump ;done > io_perf_osd ; cat io_perf_osd | grep -A 3 op_latency ;

// op_latency即是osd收到io到整个io完成的整个时间段
"op_latency": {
"avgcount": 1834,
"sum": 2.175986444,
"avgtime": 0.001186470
--
// subop_latency是本osd处理副本op的耗时
"subop_latency": {
"avgcount": 3005,
"sum": 1.671269352,
"avgtime": 0.000556162
--
"op_latency": {
"avgcount": 1395,
"sum": 1.661397001,
"avgtime": 0.001190965
--
"subop_latency": {
"avgcount": 3323,
"sum": 1.771620247,
"avgtime": 0.000533138
--
"op_latency": {
"avgcount": 1485,
"sum": 1.795529030,
"avgtime": 0.001209110
--
"subop_latency": {
"avgcount": 3255,
"sum": 1.724762648,
"avgtime": 0.000529880


从上面数据可知,op_latency的时延在一两毫秒,所以这也就表明了延时不在osd侧

继续分析

由于耗时主要不是在osd端,这里首先想到的便是网络繁忙等导致的网络延时或者是客户端回调(finish线程是单线程的)会耗时比较高。所以想到的是获取到op开始准备好发送给osd(_session_op_assign函数)到收到返回并处理完成的时间(_session_op_remove),那么通过打印出这两个函数的日志,然后看他们的间隔就知道这个阶段的耗时了,如下:

// 如果要查看以下日志,需要把debug_objecter设置为15
username@hostname:/# cat ceph-client.828073.log | egrep "_session_op_assign|_session_op_remove" |sort -k 8

2020-09-17 10:38:50.549806 7f639d009040 15 client.1500594.objecter _session_op_assign 8 1(每个io有一个唯一的tid)
2020-09-17 10:38:50.550629 7f638de72700 15 client.1500594.objecter _session_op_remove 8 1
......
......
2020-09-17 10:39:27.235067 7f63777fe700 15 client.1500594.objecter _session_op_assign 7 9998(每个io有一个唯一的tid)
2020-09-17 10:39:27.237152 7f638de72700 15 client.1500594.objecter _session_op_remove 7 9998
2020-09-17 10:39:27.235110 7f63777fe700 15 client.1500594.objecter _session_op_assign 1 9999
2020-09-17 10:39:27.237156 7f638ce70700 15 client.1500594.objecter _session_op_remove 1 9999


从上面日志片段可以看出,这两个函数相隔的时间也都不算多,跟后端osd处理时间差不太多,所以肯定是rbd从收到op到发送给osd之前这个阶段耗时比较长。

其实如果单纯想看是不是finish回调比较慢,可以通过perf dump里面提供的finish的信息便可以知道,但是finishi队列排队个数以及平均完成时间均很小,所以显然finish回调耗时不长

    "finisher-radosclient": {
"queue_len": 2, // finishi队列排队个数
"complete_latency": {
"avgcount": 9942995,
"sum": 80.688346845,
"avgtime": 0.000008115 // 平均完成时间
}
},


进一步分析

如果期望做进一步分析,需要对io流程有一个初步的梳理,如下:

Image::write
->ImageRequestWQ<I>::write
-> ImageRequestWQ<I>::aio_write
-> ImageRequest<I>::aio_write
-> queue() // 把io丢到队列io_work_queue

// 工作线程从io_work_queue队列取出io后调用ImageRequest<I>::send()
工作线程(io_work_queue = new io::ImageRequestWQ<>):ImageRequestWQ<I>::process

ImageRequest<I>::send()
->AbstractImageWriteRequest<I>::send_request()
->AbstractImageWriteRequest<I>::send_object_requests
--> journal开启
--> journal未开启
AbstractObjectWriteRequest<I>::send()
--> AbstractObjectWriteRequest<I>::pre_write_object_map_update()
--> AbstractObjectWriteRequest<I>::write_object
--> image_ctx->data_ctx.aio_operate
--> librados::IoCtx::aio_operate
--> librados::IoCtxImpl::aio_operate
--> Objecter::op_submit // 向主osd发送io

// 收到主osd的io回复
Objecter::ms_dispatch(Message *m)
-> Objecter::handle_osd_op_reply
-> onfinish->complete(rc) // 丢finish队列,finish处理回调


从上面的io路径发现一个关键点,也就是io会先丢到队列io_work_queue的,然后工作线程再从队列里面取出io进行处理。这里因为经过队列以及线程的转换,往往可能是瓶颈所在,所以期望查看io丢队列前到工作线程拿到io准备处理这个阶段的耗时。正好他们分别对应着函数ImageRequestWQ::aio_write以及ImageRequest::send(),所以希望从日志里面找到这两个时间点的间隔

// 这个对应着ImageRequestWQ<I>::aio_write的调用次数
username@hostname:/# sudo cat var/log/ceph/ceph-client.828073.log |egrep 'aio_write' |grep -v send | wc -l
25644

// 这个对应着ImageRequest<I>::send()次数
username@hostname:/# sudo cat var/log/ceph/ceph-client.828073.log |egrep 'send: aio_write' | wc -l
25644

ImageRequestWQ::aio_write函数会调用ImageRequest::send()函数,所以他们的调用次数是一样的。这没问题,但是细细分析我们发现,日志里很多时候多个aio_write函数后才会有send()函数,本来一个aio_write函数后会跟着一个send, 这说明了io在aio_write到send之间发生了等待以及阻塞

void ImageRequestWQ<I>::aio_write(AioCompletion *c, uint64_t off, uint64_t len,bufferlist &&bl, int op_flags,bool native_async) 
{

c->init_time(util::get_image_ctx(&m_image_ctx), AIO_TYPE_WRITE);

// 关键日志的打印
ldout(cct, 20) << "ictx=" << &m_image_ctx << ", "
<< "completion=" << c << ", off=" << off << ", "
<< "len=" << len << ", flags=" << op_flags << dendl;

RWLock::RLocker owner_locker(m_image_ctx.owner_lock);

if (m_image_ctx.non_blocking_aio || writes_blocked()) {
queue(ImageRequest<I>::create_write_request(
m_image_ctx, c, {{off, len}}, std::move(bl), op_flags, trace));
} else
{
c->start_op();
ImageRequest<I>::aio_write(&m_image_ctx, c, {{off, len}},std::move(bl), op_flags, trace);
finish_in_flight_io();
}
}

// queue丢队列,然后在工作线程里面调用下面的send函数

void ImageRequest<I>::send()
{
I &image_ctx = this->m_image_ctx;

AioCompletion *aio_comp = this->m_aio_comp;

// 关键日志的打印
ldout(cct, 20) << get_request_type() << ": ictx=" << &image_ctx << ", "<< "completion=" << aio_comp << dendl;

if (m_bypass_image_cache || m_image_ctx.image_cache == nullptr)
{
send_request();
}
else
{
send_image_cache_request();
}
}

继续细看代码的日志输出,发现对于同一个请求,这两个函数打印出来的completion是一样的,于是便可以通过这两条日志查看这两个函数的运行时间,如下:

username@hostname:/#  sudo cat  var/log/ceph/ceph-client.828073.log  |egrep aio_write |grep send  |awk '{print $10}' |sort |uniq -c    |less

3 completion=0x7f63600008c0
3 completion=0x7f6360000b20
7 completion=0x7f6360000c60
3 completion=0x7f6360000d60
3 completion=0x7f6360000d80
3 completion=0x7f6360000da0
20 completion=0x7f6360000e30
2 completion=0x7f636011c4f0
...... 还有很多

username@hostname:/# sudo cat var/log/ceph/ceph-client.828073.log | grep "completion=0x7f63600008c0"

// 可以看到这两个函数对应的日志输出间隔时间有十几毫秒,大头基本就在这里了
2020-09-17 10:39:26.153953 7f63757fa700 20 librbd::io::ImageRequestWQ: 0x55983bb2d980 aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0, off=17863884800, len=4096, flags=0
2020-09-17 10:39:26.164749 7f63777fe700 20 librbd::io::ImageRequest: 0x7f6360035810 send: aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0

2020-09-17 10:39:26.167203 7f63757fa700 20 librbd::io::ImageRequestWQ: 0x55983bb2d980 aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0, off=81776902144, len=4096, flags=0
2020-09-17 10:39:26.180059 7f63777fe700 20 librbd::io::ImageRequest: 0x7f63600b8fd0 send: aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0

2020-09-17 10:39:26.183942 7f63757fa700 20 librbd::io::ImageRequestWQ: 0x55983bb2d980 aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0, off=18960003072, len=4096, flags=0
2020-09-17 10:39:26.194718 7f63777fe700 20 librbd::io::ImageRequest: 0x7f6360054710 send: aio_write: ictx=0x55983bb2e7a0, completion=0x7f63600008c0

// 选择另外一个查看也是这样
username@hostname:/# sudo cat var/log/ceph/ceph-client.828073.log | grep "completion=0x7f636011c4f0" 2020-09-17 10:39:26.112072 7f63757fa700 20 librbd::io::ImageRequestWQ: 0x55983bb2d980 aio_write: ictx=0x55983bb2e7a0, completion=0x7f636011c4f0, off=8828133376, len=4096, flags=0
2020-09-17 10:39:26.124578 7f63777fe700 20 librbd::io::ImageRequest: 0x7f63600215c0 send: aio_write: ictx=0x55983bb2e7a0, completion=0x7f636011c4f0
2020-09-17 10:39:26.127175 7f63757fa700 20 librbd::io::ImageRequestWQ: 0x55983bb2d980 aio_write: ictx=0x55983bb2e7a0, completion=0x7f636011c4f0, off=70778023936, len=4096, flags=0
2020-09-17 10:39:26.138301 7f63777fe700 20 librbd::io::ImageRequest: 0x7f636003bb10 send: aio_write: ictx=0x55983bb2e7a0, completion=0x7f636011c4f0

到这里基本就怀疑是丢队列后线程来不及处理引起的。因为往队列里丢和取了以后进行处理都是一个线程,但是取的线程因为还要调用send函数所以相对会慢很多,进而导致队列堆积

代码修改

如上,可以看到可能是工作线城处理不过来,通过查看发现工作线程是写死的单线程的,所以试图把工作线程变为可配置的。另外,为了实时查看到工作队列里面积压的请求个数,可以加个perf dump信息,具体代码修改如下:

diff --git a/src/librbd/ImageCtx.cc b/src/librbd/ImageCtx.cc
index 8f52c9609d..284993eb04 100644
--- a/src/librbd/ImageCtx.cc
+++ b/src/librbd/ImageCtx.cc
@@ -58,7 +58,7 @@ public:
ContextWQ *op_work_queue;

explicit ThreadPoolSingleton(CephContext *cct)
- : ThreadPool(cct, "librbd::thread_pool", "tp_librbd", 1,
+ : ThreadPool(cct, "librbd::thread_pool", "tp_librbd", cct->_conf->get_val<int64_t>("rbd_op_threads"),
"rbd_op_threads"),
op_work_queue(new ContextWQ("librbd::op_work_queue",
cct->_conf->get_val<int64_t>("rbd_op_thread_timeout"),
@@ -275,6 +275,7 @@ struct C_InvalidateCache : public Context {
trace_endpoint.copy_name(pname);
perf_start(pname);

if (cache) {
Mutex::Locker l(cache_lock);
ldout(cct, 20) << "enabling caching..." << dendl;
@@ -381,6 +382,14 @@ struct C_InvalidateCache : public Context {
"wb", perf_prio, unit_t(BYTES));
plb.add_time_avg(l_librbd_wr_latency, "wr_latency", "Write latency",
"wl", perf_prio);
+ plb.add_time_avg(l_librbd_wr_before_queue_op_lat, "wr_before_queue_latency", "before queue latency",
+ "wbql", perf_prio);
+ plb.add_time_avg(l_librbd_wr_after_dequeue_op_lat, "wr_after_dequeue_latency", "after dequeue latency",
+ "wadl", perf_prio);
+ plb.add_time_avg(l_librbd_wr_send_lat, "wr_send_latency", "send latency",
+ "send", perf_prio);
+ //plb.add_u64_counter(l_librbd_wr_queue, "queue", "queue");
+ plb.add_u64(l_librbd_wr_queue, "queue", "q", "q", PerfCountersBuilder::PRIO_USEFUL);
plb.add_u64_counter(l_librbd_discard, "discard", "Discards");

diff --git a/src/librbd/Types.h b/src/librbd/Types.h
index 007e0f3da0..027b26c433 100644
--- a/src/librbd/Types.h
+++ b/src/librbd/Types.h
@@ -20,6 +20,10 @@ enum {
l_librbd_wr,
l_librbd_wr_bytes,
l_librbd_wr_latency,
+ l_librbd_wr_before_queue_op_lat,
+ l_librbd_wr_after_dequeue_op_lat,
+ l_librbd_wr_send_lat,
+ l_librbd_wr_queue,

diff --git a/src/librbd/io/ImageRequest.h b/src/librbd/io/ImageRequest.h
index 24239d3e5b..fb5f18962e 100644
--- a/src/librbd/io/ImageRequest.h
+++ b/src/librbd/io/ImageRequest.h
@@ -95,6 +95,10 @@ public:
m_bypass_image_cache = true;
}

+ AioCompletion* get_completion() {
+ return m_aio_comp;
+ }
+

diff --git a/src/librbd/io/ImageRequestWQ.cc b/src/librbd/io/ImageRequestWQ.cc
index fe2a39bd7f..db144897e8 100644
--- a/src/librbd/io/ImageRequestWQ.cc
+++ b/src/librbd/io/ImageRequestWQ.cc
@@ -262,7 +262,10 @@ void ImageRequestWQ<I>::aio_write(AioCompletion *c, uint64_t off, uint64_t len,

RWLock::RLocker owner_locker(m_image_ctx.owner_lock);
if (m_image_ctx.non_blocking_aio || writes_blocked()) {
- queue(ImageRequest<I>::create_write_request(
+ utime_t elapsed;
+ elapsed = ceph_clock_now() - c->start_time;
+ m_image_ctx.perfcounter->tinc(l_librbd_wr_before_queue_op_lat, elapsed);
+ queue(ImageRequest<I>::create_write_request(
m_image_ctx, c, {{off, len}}, std::move(bl), op_flags, trace));
} else {
c->start_op();
@@ -609,8 +612,17 @@ void ImageRequestWQ<I>::process(ImageRequest<I> *req) {
ldout(cct, 20) << "ictx=" << &m_image_ctx << ", "
<< "req=" << req << dendl;

+
+ utime_t elapsed;
+ elapsed = ceph_clock_now() - req->get_completion()->start_time;
+ m_image_ctx.perfcounter->tinc(l_librbd_wr_after_dequeue_op_lat, elapsed);
+
+ utime_t before_send = ceph_clock_now();
req->send();

+ utime_t send_time;
+ send_time = ceph_clock_now() - before_send;
+ m_image_ctx.perfcounter->tinc(l_librbd_wr_send_lat, send_time);
finish_queued_io(req);
if (req->is_write_op()) {
finish_in_flight_write();

@@ -705,15 +719,19 @@ void ImageRequestWQ<I>::queue(ImageRequest<I> *req) {
assert(m_image_ctx.owner_lock.is_locked());

CephContext *cct = m_image_ctx.cct;

if (req->is_write_op()) {
m_queued_writes++;
} else {
m_queued_reads++;
}
+ m_image_ctx.perfcounter->set(l_librbd_wr_queue, m_queued_writes);

diff --git a/src/test/librbd/io/test_mock_ImageRequestWQ.cc b/src/test/librbd/io/test_mock_ImageRequestWQ.cc
index 7a731db23a..1119abb437 100644
--- a/src/test/librbd/io/test_mock_ImageRequestWQ.cc
+++ b/src/test/librbd/io/test_mock_ImageRequestWQ.cc
@@ -40,6 +40,10 @@ struct ImageRequest<librbd::MockTestImageCtx> {
}

+ AioCompletion* get_completion() {
+ return aio_comp;
+ }
+

代码修改后验证

  • rbd_op_thread =1

[global]
ioengine=rbd
pool=rbd
rbdname=volume-nbd02
invalidate=0
rw=randwrite
bs=4k
runtime=300

[rbd_iodepth32]
iodepth=1024

当rbd_op_thread为1时,此时fio只有不到两万八九,此时查看后端,queue中均有上千个排队。

username@hostname:/#  i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump  |grep queue   |grep -v queue_len |grep -v laten
"queue": 951,
username@hostname:/#
username@hostname:/#
username@hostname:/# i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump |grep queue |grep -v queue_len |grep -v laten
"queue": 941,
username@hostname:/# i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump |grep queue |grep -v queue_len |grep -v laten
"queue": 927,

查看平均延时,如下,可以看出wr_latency延时基本耗在wr_after_dequeue_latency,也即wr_before_queue_latency到wr_after_dequeue_latency之间,即也是在队列里面的时间

username@hostname:/#  i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump > txt ;cat txt | grep -A3  wr_lat ; cat txt | grep -A3 wr_before_queue_latency ; cat txt  |grep -A3 wr_after_dequeue_latency ;
"wr_latency": {
"avgcount": 61510,
"sum": 2191.520516783,
"avgtime": 0.035628686
"wr_before_queue_latency": {
"avgcount": 62534,
"sum": 0.040011280,
"avgtime": 0.000000639
"wr_after_dequeue_latency": {
"avgcount": 61606,
"sum": 2033.996583350,
"avgtime": 0.033016209
username@hostname:/# i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump > txt ;cat txt | grep -A3 wr_lat ; cat txt | grep -A3 wr_before_queue_latency ; cat txt |grep -A3 wr_after_dequeue_latency ;
"wr_latency": {
"avgcount": 116908,
"sum": 4189.363698823,
"avgtime": 0.035834705
"wr_before_queue_latency": {
"avgcount": 117932,
"sum": 0.076435187,
"avgtime": 0.000000648
"wr_after_dequeue_latency": {
"avgcount": 116989,
"sum": 3877.598762186,
"avgtime": 0.033144985

  • rbd_op_thread = 4

换成4后,iops可以到6万左右,这时候发现队列中排队的仅为个位数、

username@hostname:/#  i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump  |grep queue   |grep -v queue_len |grep -v laten
"queue": 3,
username@hostname:/# i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump |grep queue |grep -v queue_len |grep -v laten
"queue": 2,

username@hostname:/#  i=`sudo ls var/run/ceph | grep client` ; sudo ceph daemon var/run/ceph/${i} perf dump > txt ;cat txt | grep -A3  wr_lat ; cat txt | grep -A3 wr_before_queue_latency ; cat txt  |grep -A3 wr_after_dequeue_latency ;
"wr_latency": {
"avgcount": 2270672,
"sum": 37963.386444893,
"avgtime": 0.016719009 这个延时基本就是后端延时
"wr_before_queue_latency": {
"avgcount": 2271695,
"sum": 1.605575268,
"avgtime": 0.000000706
"wr_after_dequeue_latency": {
"avgcount": 2271687,
"sum": 60.918221533,
"avgtime": 0.000026816
username@hostname:/# i=`sudo ls /var/run/ceph | grep client` ; sudo ceph daemon /var/run/ceph/${i} perf dump > txt ;cat txt | grep -A3 wr_lat ; cat txt | grep -A3 wr_before_queue_latency ; cat txt |grep -A3 wr_after_dequeue_latency ;
"wr_latency": {
"avgcount": 2439428,
"sum": 40801.854420279,
"avgtime": 0.016725992
"wr_before_queue_latency": {
"avgcount": 2440451,
"sum": 1.724664580,
"avgtime": 0.000000706
"wr_after_dequeue_latency": {
"avgcount": 2440451,
"sum": 65.108590681,
"avgtime": 0.000026678


// 后端延时
username@hostname:/# for i in {0,3,4,5,6};do sudo ceph daemon osd.${i} perf dump ;done > io_perf_osd ; cat io_perf_osd | grep -A 3 op_w_process "op_w_process_latency": {
"avgcount": 25845,
"sum": 382.087422960,
"avgtime": 0.014783804
--
"op_w_process_latency": {
"avgcount": 46384,
"sum": 695.850786001,
"avgtime": 0.015001957

查看社区代码修改记录及讨论,可以看到在比较老的版本(比如h版本)时,rbd_op_thread是可以配置的,但是在我测试的L版本中,rbd_op_thread是写死为1的(高版本也依然是这样),配置写死的原因貌似是高版本引入了rbd cache以及mirror,如果多线程,在极端情况可能导致不一致,所以如果不使用这两项特性,应该是可以把rbd_op_thread的个数做到可配置的。

结论

做了如上代码修改后,并把rbd_op_thread配置项设为4,fio的随机4k iops可以从三万左右提升到6万以上(三节点,每节点19个sata ssd盘且未做其他任何优化)。

限于本人知识水平以及文笔有限,文中可能有疏漏甚至是错误的地方,希望大家批评指证,谢谢。



Ceph中国社区

是国内唯一官方正式授权的社区,

为广大Ceph爱好者提供交流平台!

↓↓↓

开源-创新-自强

官方网站:www.ceph.org.cn

合作邮箱:devin@ceph.org.cn

投稿地址:tougao@ceph.org.cn

长期招募热爱翻译人员,

参与社区翻译外文资料工作。

文章转载自Ceph开源社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论