写在前面
此前博客有针对osd日志解读的文章,本篇继续延续该系列,解读Ceph rgw的日志,选取上传对象的日志进行分析
从上传说起
对象存储网关rgw是上传对象到Ceph的入口,其功能、性能直接面向用户,对rgw深入理解,读懂日志,有助于我们排查、定位问题及调优,并在此基础上做可能的二次开发,本次对rgw日志的解读仍然采用20/20的日志,上传单个文件和分片文件进行流程解读,版本为mimic 13.2.5
配置情况
上传客户端使用s3cmd,其默认分片大小为15Mrgw的rgw max chunk size = 8388608
整个op处理流程大致可分为3个阶段,下面分阶段进行
首先,随便选择一个较小的文件,上传到集群的一个bucket中
[store@server227 build]$ s3cmd put radosgw.8000.log s3://cpp-bucket1/test.fileupload: 'radosgw.8000.log' -> 's3://cpp-bucket1/test.file' [1 of 1]82695 of 82695 100% in 0s 2.63 MB/s done
然后,将得到的日志进行解读
第一阶段
2019-05-31 16:36:27.857 7fdc91735700 20 CONTENT_LENGTH=826952019-05-31 16:36:27.857 7fdc91735700 20 CONTENT_TYPE=text/plain2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_ACCEPT_ENCODING=identity2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_AUTHORIZATION=AWS 0555b35654ad1656d804:9ISu+ty4SOQZ+9xcSkpSe5nRpdM=2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_HOST=127.0.0.1:80002019-05-31 16:36:27.857 7fdc91735700 20 HTTP_VERSION=1.12019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_DATE=Fri, 31 May 2019 08:36:27 +00002019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_META_S3CMD_ATTRS=atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD2019-05-31 16:36:27.857 7fdc91735700 20 REMOTE_ADDR=127.0.0.12019-05-31 16:36:27.857 7fdc91735700 20 REQUEST_METHOD=PUT2019-05-31 16:36:27.857 7fdc91735700 20 REQUEST_URI=/cpp-bucket1/test.file2019-05-31 16:36:27.857 7fdc91735700 20 SCRIPT_URI=/cpp-bucket1/test.file2019-05-31 16:36:27.857 7fdc91735700 20 SERVER_PORT=8000
这是一个http请求进来后的第一个处理流程,主要进行解析这个http头的信息,主要关注的地方有三个
第一个是CONTENT_LENGTH,可以看到这个值就是我们上传的文件的大小
第二个是REQUEST_METHOD,这个请求是PUT,表示上传数据
第三个是REQUEST_URI ,显示这个请求的bucket以及key,也就是请求的资源
2019-05-31 16:36:27.857 7fdc91735700 1 ====== starting new request req=0x7fdc9172ec80 =====2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s::PUT cpp-bucket1/test.file::initializing for trans_id = tx000000000000000000007-005cf0e78b-103b-default2019-05-31 16:36:27.858 7fdc91735700 10 rgw api priority: s3=5 s3website=42019-05-31 16:36:27.858 7fdc91735700 10 host=127.0.0.12019-05-31 16:36:27.858 7fdc91735700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=02019-05-31 16:36:27.858 7fdc91735700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/cpp-bucket1/test.file2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_DATE2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_META_S3CMD_ATTRS2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_STORAGE_CLASS2019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-date:Fri, 31 May 2019 08:36:27 +00002019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store2019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-storage-class:STANDARD2019-05-31 16:36:27.858 7fdc91735700 20 get_handler handler=22RGWHandler_REST_Obj_S32019-05-31 16:36:27.858 7fdc91735700 10 handler=22RGWHandler_REST_Obj_S32019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file::getting op 12019-05-31 16:36:27.858 7fdc91735700 10 op=21RGWPutObj_ObjStore_S32019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying requester2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::S3AnonymousEngine denied with reason=-12019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine2019-05-31 16:36:27.858 7fdc91735700 10 get_canon_resource(): dest=/cpp-bucket1/test.file2019-05-31 16:36:27.858 7fdc91735700 10 string_to_sign:PUTtext/plainx-amz-date:Fri, 31 May 2019 08:36:27 +0000x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:storex-amz-storage-class:STANDARD/cpp-bucket1/test.file2019-05-31 16:36:27.858 7fdc91735700 15 string_to_sign=PUTtext/plainx-amz-date:Fri, 31 May 2019 08:36:27 +0000x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:storex-amz-storage-class:STANDARD/cpp-bucket1/test.file
这一段显示这个http请求被正式处理,主要操作是进行http格式化,检查各个字段,进行基本的字段认证,这里主要关注的地方有一个
就是op=21RGWPutObj_ObjStore_S3,这个字段表明了这个请求的op类型,就是上传对象
到这里为止,可以认为第一阶段,s3请求的初步解析完成了,注意到,整个过程只用了1ms
第二阶段
继续往下
2019-05-31 16:36:27.858 7fdc91735700 15 server signature=9ISu+ty4SOQZ+9xcSkpSe5nRpdM=2019-05-31 16:36:27.858 7fdc91735700 15 client signature=9ISu+ty4SOQZ+9xcSkpSe5nRpdM=2019-05-31 16:36:27.858 7fdc91735700 15 compare=02019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::LocalEngine granted access2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy granted access2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:normalizing buckets and tenants2019-05-31 16:36:27.858 7fdc91735700 10 s->object=test.file s->bucket=cpp-bucket12019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:init permissions2019-05-31 16:36:27.858 7fdc91735700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>testid</ID><DisplayName>M. Tester</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>testid</ID><DisplayName>M. Tester</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:recalculating target2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:reading permissions2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:init op2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op mask2019-05-31 16:36:27.858 7fdc91735700 20 required_mask= 2 user.op_mask=72019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op permissions2019-05-31 16:36:27.858 7fdc91735700 20 -- Getting permissions begin with perm_mask=502019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0) mask=502019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for uid=testid2019-05-31 16:36:27.858 7fdc91735700 5 Found permission: 152019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for group=1 mask=502019-05-31 16:36:27.858 7fdc91735700 5 Permissions for group not found2019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for group=2 mask=502019-05-31 16:36:27.858 7fdc91735700 5 Permissions for group not found2019-05-31 16:36:27.858 7fdc91735700 5 -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0), owner=testid, perm=22019-05-31 16:36:27.858 7fdc91735700 10 identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=22019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op params2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:pre-executing
到这里,可以称为第二阶段,主要进行行为的鉴权,主要是
对比op与server的signature以验证op的合法性
判断用户是否有权限操作对应的资源
确认权限掩码及权限参数
当一个s3写入请求被返回403或者400时,多半是这个地方校验失败导致,具体要看日志的说法,这个阶段处理时间也是在1ms内
继续往下
2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT /cpp-bucket1/test.file:put_obj:executing2019-05-31 16:36:27.859 7fdc91735700 5 NOTICE: call to do_aws4_auth_completion2019-05-31 16:36:27.863 7fdc91735700 5 NOTICE: call to do_aws4_auth_completion2019-05-31 16:36:27.867 7fdc91735700 10 x>> x-amz-date:Fri, 31 May 2019 08:36:27 +00002019-05-31 16:36:27.868 7fdc91735700 10 x>> x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store2019-05-31 16:36:27.868 7fdc91735700 10 x>> x-amz-storage-class:STANDARD
上述过程主要执行的函数在src/rgw/rgw_op.cc的RGWPutObj::execute函数中,进行的动作在日志中没有太多的体现,看一下它的代码
在executing日志打印之后,handler进入RGWPutObj::execute,进行一些配置的设置和检查,例如md5计算配置、etage、chunk配置等,然后进入数据的读取
//开始循环从缓冲区读取数据,读到数据后会立即写入osdtracepoint(rgw_op, before_data_transfer, s->req_id.c_str());do {bufferlist data;if (fst > lst)break;if (copy_source.empty()) {//从缓冲区读取data数据len = get_data(data);} else {/** 数据读取的时候,要根据rgw_max_chunk_size与数据大小的实际操作来读取* rgw_max_chunk_size指的是一次性从缓冲区读出多少字节数据* 在每次读出数据后,每次写4M数据到osd中*/uint64_t cur_lst = min(fst + s->cct->_conf->rgw_max_chunk_size - 1, lst);op_ret = get_data(fst, cur_lst, data);if (op_ret < 0)goto done;len = data.length();s->content_length += len;fst += len;}if (len < 0) {op_ret = len;//有时rgw返回500错误,就是在这里,data读取失败导致ldout(s->cct, 20) << "get_data() returned ret=" << op_ret << dendl;goto done;}......ofs += len;//如果上一次有读出数据,则数据可能没有读完,于是继续读取} while (len > 0);......op_ret = do_aws4_auth_completion();
可以看到,当日志中打印NOTICE: call to do_aws4_auth_completion即表示数据读取完成
这里面用到参数rgw_max_chunk_size,它是从配置中读取的,默认是4M,它表示每次从缓冲区读取多少数据,开头我们将其设置为8M,但是注意,这个数据大小不是直接下发到osd的数据大小,实际上读出这块数据之后,如果大于4M,则会以4M为一块,下发到osd中,这里上传的数据不足4M,所以直接就读出来下发到osd了,另外,这些读出的数据,会进行md5的计算,确保完整性
至此,上传对象的数据部分就上传完成了,这个过程耗时10ms,从这里也可以看出,针对大文件,尤其是百M以上文件写入为主的场景,调大这个值有助于提高性能
2019-05-31 16:36:27.868 7fdc91735700 20 get_obj_state: rctx=0x7fdc9172e440 obj=cpp-bucket1:test.file state=0x5619dcf915a0 s->prefetch_data=02019-05-31 16:36:27.868 7fdc91735700 20 get_obj_state: rctx=0x7fdc9172e440 obj=cpp-bucket1:test.file state=0x5619dcf915a0 s->prefetch_data=02019-05-31 16:36:27.868 7fdc91735700 10 setting object write_tag=ba85d906-5451-49e4-b0e1-f90b7db13259.4155.72019-05-31 16:36:27.868 7fdc91735700 20 reading from default.rgw.meta:root:.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.12019-05-31 16:36:27.868 7fdc91735700 20 get_system_obj_state: rctx=0x7fdc9172d1a0 obj=default.rgw.meta:root:.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 state=0x5619ddc981e0 s->prefetch_data=02019-05-31 16:36:27.868 7fdc91735700 10 cache get: name=default.rgw.meta+root+.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 : hit (requested=0x16, cached=0x17)2019-05-31 16:36:27.868 7fdc91735700 20 get_system_obj_state: s->obj_tag was set empty2019-05-31 16:36:27.868 7fdc91735700 10 cache get: name=default.rgw.meta+root+.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 : hit (requested=0x11, cached=0x17)2019-05-31 16:36:27.868 7fdc91735700 20 bucket index object: .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.12019-05-31 16:36:27.868 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4137 5.1 5:85f057f7:::.dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0bb340 con 02019-05-31 16:36:27.873 7fdcaef70700 1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1775 ==== osd_op_reply(4137 .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 [call,call] v30'14 uv14 ondisk = 0) v8 ==== 234+0+0 (3061579556 0 0) 0x5619dd0bb340 con 0x5619dcf9be002019-05-31 16:36:27.873 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4138 7.7 7:fe73a341:::ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1_test.file:head [create,setxattr user.rgw.idtag (44),setxattr user.rgw.tail_tag (44),writefull 0~82695,setxattr user.rgw.manifest (313),setxattr user.rgw.acl (147),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (32),setxattr user.rgw.x-amz-date (32),setxattr user.rgw.x-amz-meta-s3cmd-attrs (143),setxattr user.rgw.x-amz-storage-class (9),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0bb8c0 con 02019-05-31 16:36:27.879 7fdcaef70700 1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1776 ==== osd_op_reply(4138 ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1_test.file [create,setxattr (44),setxattr (44),writefull 0~82695,setxattr (313),setxattr (147),setxattr (11),setxattr (32),setxattr (32),setxattr (143),setxattr (9),call,setxattr (4)] v30'2 uv2 ondisk = 0) v8 ==== 701+0+0 (4094818144 0 0) 0x5619dd0bb8c0 con 0x5619dcf9be002019-05-31 16:36:27.879 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4139 5.1 5:85f057f7:::.dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1:head [call rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0ab080 con 0
上一段看到,数据部分读取完成,于是接下来进行了元数据的更新,首先是确认对象的状态,get_obj_state ,然后操作default.rgw.meta中的信息,更新相关内容,最后我们看到会更新index,这个地方比较关键,如果是海量文件存储,更新index中的内容将非常耗时
上述属于第二阶段,完成数据的读取和元数据信息的更新,总共耗时21ms
第三阶段
2019-05-31 16:36:27.879 7fdc91735700 2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:completing2019-05-31 16:36:27.879 7fdc91735700 2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:op status=02019-05-31 16:36:27.879 7fdc91735700 2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:http status=2002019-05-31 16:36:27.879 7fdc91735700 1 ====== req done req=0x7fdc9172ec80 op status=0 http_status=200 ======2019-05-31 16:36:27.879 7fdc91735700 1 civetweb: 0x5619dd18c000: 127.0.0.1 - - [31/May/2019:16:36:27 +0800] "PUT /cpp-bucket1/test.file HTTP/1.1" 200 206 - -2019-05-31 16:36:27.883 7fdcaef70700 1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1777 ==== osd_op_reply(4139 .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 [call,call] v30'15 uv15 ondisk = 0) v8 ==== 234+0+0 (2261156683 0 0) 0x5619dd0ab080 con 0x5619dcf9be00
最后,handler将处理结果封装成http响应,返回给客户端,是每个op的最后一个阶段
这个op总耗时是26ms,其中第二阶段数据上传和元数据更新花费时间为21ms,占比80%
总结
上述可以看到,rgw处理普通的上传op过程中,会进行三个阶段的处理,首先是进行op的鉴权,然后是数据的接收和分发,更新元数据,最后是返回HTTP请求,从中我们也可以看到第二阶段的耗时最大,从中可以定位到性能优化点,例如减小网络延时、使用巨帧一次性传输更多的数据、加大一次从缓冲区读取的参数值,甚至是rados的块大小。在下篇,我们会针对分块上传进行分析,敬请期待
本文转自奋斗的松鼠公众号,感兴趣的可以在下方关注。
Ceph中国社区
是国内唯一官方正式授权的社区,
为广大Ceph爱好者提供交流平台!
开源-创新-自强
官方网站:www.ceph.org.cn
合作邮箱:devin@ceph.org.cn
投稿地址:tougao@ceph.org.cn
长期招募热爱翻译人员,
参与社区翻译外文资料工作。






