暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

看懂Ceph日志系列之RGW op处理流程-上

Ceph开源社区 2019-07-30
2179

写在前面

此前博客有针对osd日志解读的文章,本篇继续延续该系列,解读Ceph rgw的日志,选取上传对象的日志进行分析


从上传说起

对象存储网关rgw是上传对象到Ceph的入口,其功能、性能直接面向用户,对rgw深入理解,读懂日志,有助于我们排查、定位问题及调优,并在此基础上做可能的二次开发,本次对rgw日志的解读仍然采用20/20的日志,上传单个文件和分片文件进行流程解读,版本为mimic 13.2.5


配置情况

    上传客户端使用s3cmd,其默认分片大小为15M
    rgw的rgw max chunk size = 8388608

    整个op处理流程大致可分为3个阶段,下面分阶段进行


    首先,随便选择一个较小的文件,上传到集群的一个bucket中

      [store@server227 build]$ s3cmd put radosgw.8000.log s3://cpp-bucket1/test.file
      upload: 'radosgw.8000.log' -> 's3://cpp-bucket1/test.file' [1 of 1]
       82695 of 82695   100% in    0s     2.63 MB/s  done

      然后,将得到的日志进行解读


      第一阶段

        2019-05-31 16:36:27.857 7fdc91735700 20 CONTENT_LENGTH=82695
        2019-05-31 16:36:27.857 7fdc91735700 20 CONTENT_TYPE=text/plain
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_ACCEPT_ENCODING=identity
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_AUTHORIZATION=AWS 0555b35654ad1656d804:9ISu+ty4SOQZ+9xcSkpSe5nRpdM=
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_HOST=127.0.0.1:8000
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_VERSION=1.1
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_DATE=Fri, 31 May 2019 08:36:27 +0000
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_META_S3CMD_ATTRS=atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store
        2019-05-31 16:36:27.857 7fdc91735700 20 HTTP_X_AMZ_STORAGE_CLASS=STANDARD
        2019-05-31 16:36:27.857 7fdc91735700 20 REMOTE_ADDR=127.0.0.1
        2019-05-31 16:36:27.857 7fdc91735700 20 REQUEST_METHOD=PUT
        2019-05-31 16:36:27.857 7fdc91735700 20 REQUEST_URI=/cpp-bucket1/test.file
        2019-05-31 16:36:27.857 7fdc91735700 20 SCRIPT_URI=/cpp-bucket1/test.file
        2019-05-31 16:36:27.857 7fdc91735700 20 SERVER_PORT=8000


        这是一个http请求进来后的第一个处理流程,主要进行解析这个http头的信息,主要关注的地方有三个


        第一个是CONTENT_LENGTH,可以看到这个值就是我们上传的文件的大小


        第二个是REQUEST_METHOD,这个请求是PUT,表示上传数据


        第三个是REQUEST_URI ,显示这个请求的bucket以及key,也就是请求的资源

          2019-05-31 16:36:27.857 7fdc91735700  1 ====== starting new request req=0x7fdc9172ec80 =====
          2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s::PUT cpp-bucket1/test.file::initializing for trans_id = tx000000000000000000007-005cf0e78b-103b-default
          2019-05-31 16:36:27.858 7fdc91735700 10 rgw api priority: s3=5 s3website=4
          2019-05-31 16:36:27.858 7fdc91735700 10 host=127.0.0.1
          2019-05-31 16:36:27.858 7fdc91735700 20 subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0
          2019-05-31 16:36:27.858 7fdc91735700 20 final domain/bucket subdomain= domain= in_hosted_domain=0 in_hosted_domain_s3website=0 s->info.domain= s->info.request_uri=/cpp-bucket1/test.file
          2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_DATE
          2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_META_S3CMD_ATTRS
          2019-05-31 16:36:27.858 7fdc91735700 10 meta>> HTTP_X_AMZ_STORAGE_CLASS
          2019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-date:Fri, 31 May 2019 08:36:27 +0000
          2019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store
          2019-05-31 16:36:27.858 7fdc91735700 10 x>> x-amz-storage-class:STANDARD
          2019-05-31 16:36:27.858 7fdc91735700 20 get_handler handler=22RGWHandler_REST_Obj_S3
          2019-05-31 16:36:27.858 7fdc91735700 10 handler=22RGWHandler_REST_Obj_S3
          2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file::getting op 1
          2019-05-31 16:36:27.858 7fdc91735700 10 op=21RGWPutObj_ObjStore_S3
          2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying requester
          2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::StrategyRegistry::s3_main_strategy_t: trying rgw::auth::s3::AWSAuthStrategy
          2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::S3AnonymousEngine
          2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::S3AnonymousEngine denied with reason=-1
          2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy: trying rgw::auth::s3::LocalEngine
          2019-05-31 16:36:27.858 7fdc91735700 10 get_canon_resource(): dest=/cpp-bucket1/test.file
          2019-05-31 16:36:27.858 7fdc91735700 10 string_to_sign:
          PUT


          text/plain


          x-amz-date:Fri, 31 May 2019 08:36:27 +0000
          x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store
          x-amz-storage-class:STANDARD
          /cpp-bucket1/test.file
          2019-05-31 16:36:27.858 7fdc91735700 15 string_to_sign=PUT


          text/plain


          x-amz-date:Fri, 31 May 2019 08:36:27 +0000
          x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store
          x-amz-storage-class:STANDARD
          /cpp-bucket1/test.file

          这一段显示这个http请求被正式处理,主要操作是进行http格式化,检查各个字段,进行基本的字段认证,这里主要关注的地方有一个


          就是op=21RGWPutObj_ObjStore_S3,这个字段表明了这个请求的op类型,就是上传对象


          到这里为止,可以认为第一阶段,s3请求的初步解析完成了,注意到,整个过程只用了1ms


          第二阶段

          继续往下

            2019-05-31 16:36:27.858 7fdc91735700 15 server signature=9ISu+ty4SOQZ+9xcSkpSe5nRpdM=
            2019-05-31 16:36:27.858 7fdc91735700 15 client signature=9ISu+ty4SOQZ+9xcSkpSe5nRpdM=
            2019-05-31 16:36:27.858 7fdc91735700 15 compare=0
            2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::LocalEngine granted access
            2019-05-31 16:36:27.858 7fdc91735700 20 rgw::auth::s3::AWSAuthStrategy granted access
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:normalizing buckets and tenants
            2019-05-31 16:36:27.858 7fdc91735700 10 s->object=test.file s->bucket=cpp-bucket1
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:init permissions
            2019-05-31 16:36:27.858 7fdc91735700 15 decode_policy Read AccessControlPolicy<AccessControlPolicy xmlns="http://s3.amazonaws.com/doc/2006-03-01/"><Owner><ID>testid</ID><DisplayName>M. Tester</DisplayName></Owner><AccessControlList><Grant><Grantee xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="CanonicalUser"><ID>testid</ID><DisplayName>M. Tester</DisplayName></Grantee><Permission>FULL_CONTROL</Permission></Grant></AccessControlList></AccessControlPolicy>
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:recalculating target
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:reading permissions
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:init op
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op mask
            2019-05-31 16:36:27.858 7fdc91735700 20 required_mask= 2 user.op_mask=7
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op permissions
            2019-05-31 16:36:27.858 7fdc91735700 20 -- Getting permissions begin with perm_mask=50
            2019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0) mask=50
            2019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for uid=testid
            2019-05-31 16:36:27.858 7fdc91735700 5 Found permission: 15
            2019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for group=1 mask=50
            2019-05-31 16:36:27.858 7fdc91735700 5 Permissions for group not found
            2019-05-31 16:36:27.858 7fdc91735700 5 Searching permissions for group=2 mask=50
            2019-05-31 16:36:27.858 7fdc91735700 5 Permissions for group not found
            2019-05-31 16:36:27.858 7fdc91735700 5 -- Getting permissions done for identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0), owner=testid, perm=2
            2019-05-31 16:36:27.858 7fdc91735700 10 identity=rgw::auth::SysReqApplier -> rgw::auth::LocalApplier(acct_user=testid, acct_name=M. Tester, subuser=, perm_mask=15, is_admin=0) requested perm (type)=2, policy perm=2, user_perm_mask=2, acl perm=2
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:verifying op params
            2019-05-31 16:36:27.858 7fdc91735700 2 req 7:0.000999971s:s3:PUT cpp-bucket1/test.file:put_obj:pre-executing


            到这里,可以称为第二阶段,主要进行行为的鉴权,主要是


            对比op与server的signature以验证op的合法性

            判断用户是否有权限操作对应的资源

            确认权限掩码及权限参数


            当一个s3写入请求被返回403或者400时,多半是这个地方校验失败导致,具体要看日志的说法,这个阶段处理时间也是在1ms


            继续往下

              2019-05-31 16:36:27.858 7fdc91735700  2 req 7:0.000999971s:s3:PUT /cpp-bucket1/test.file:put_obj:executing
              2019-05-31 16:36:27.859 7fdc91735700 5 NOTICE: call to do_aws4_auth_completion
              2019-05-31 16:36:27.863 7fdc91735700 5 NOTICE: call to do_aws4_auth_completion
              2019-05-31 16:36:27.867 7fdc91735700 10 x>> x-amz-date:Fri, 31 May 2019 08:36:27 +0000
              2019-05-31 16:36:27.868 7fdc91735700 10 x>> x-amz-meta-s3cmd-attrs:atime:1559272575/ctime:1559272550/gid:10001/gname:store/md5:de138b6f69f2db4f158a782e840618c0/mode:33188/mtime:1559272550/uid:10001/uname:store
              2019-05-31 16:36:27.868 7fdc91735700 10 x>> x-amz-storage-class:STANDARD

              上述过程主要执行的函数在src/rgw/rgw_op.ccRGWPutObj::execute函数中,进行的动作在日志中没有太多的体现,看一下它的代码


              executing日志打印之后,handler进入RGWPutObj::execute,进行一些配置的设置和检查,例如md5计算配置、etage、chunk配置等,然后进入数据的读取

                //开始循环从缓冲区读取数据,读到数据后会立即写入osd
                tracepoint(rgw_op, before_data_transfer, s->req_id.c_str());
                do {
                bufferlist data;
                if (fst > lst)
                break;
                if (copy_source.empty()) {
                //从缓冲区读取data数据
                len = get_data(data);
                } else {
                /*
                * 数据读取的时候,要根据rgw_max_chunk_size与数据大小的实际操作来读取
                * rgw_max_chunk_size指的是一次性从缓冲区读出多少字节数据
                * 在每次读出数据后,每次写4M数据到osd中
                */
                uint64_t cur_lst = min(fst + s->cct->_conf->rgw_max_chunk_size - 1, lst);
                op_ret = get_data(fst, cur_lst, data);
                if (op_ret < 0)
                goto done;
                len = data.length();
                s->content_length += len;
                fst += len;
                }
                if (len < 0) {
                op_ret = len;
                //有时rgw返回500错误,就是在这里,data读取失败导致
                ldout(s->cct, 20) << "get_data() returned ret=" << op_ret << dendl;
                goto done;
                }
                ......
                ofs += len;
                //如果上一次有读出数据,则数据可能没有读完,于是继续读取
                } while (len > 0);
                ......
                op_ret = do_aws4_auth_completion();

                可以看到,当日志中打印NOTICE: call to do_aws4_auth_completion即表示数据读取完成


                这里面用到参数rgw_max_chunk_size,它是从配置中读取的,默认是4M,它表示每次从缓冲区读取多少数据,开头我们将其设置为8M,但是注意,这个数据大小不是直接下发到osd的数据大小,实际上读出这块数据之后,如果大于4M,则会以4M为一块,下发到osd中,这里上传的数据不足4M,所以直接就读出来下发到osd了,另外,这些读出的数据,会进行md5的计算,确保完整性


                至此,上传对象的数据部分就上传完成了,这个过程耗时10ms,从这里也可以看出,针对大文件,尤其是百M以上文件写入为主的场景,调大这个值有助于提高性能

                  2019-05-31 16:36:27.868 7fdc91735700 20 get_obj_state: rctx=0x7fdc9172e440 obj=cpp-bucket1:test.file state=0x5619dcf915a0 s->prefetch_data=0
                  2019-05-31 16:36:27.868 7fdc91735700 20 get_obj_state: rctx=0x7fdc9172e440 obj=cpp-bucket1:test.file state=0x5619dcf915a0 s->prefetch_data=0
                  2019-05-31 16:36:27.868 7fdc91735700 10 setting object write_tag=ba85d906-5451-49e4-b0e1-f90b7db13259.4155.7
                  2019-05-31 16:36:27.868 7fdc91735700 20 reading from default.rgw.meta:root:.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1
                  2019-05-31 16:36:27.868 7fdc91735700 20 get_system_obj_state: rctx=0x7fdc9172d1a0 obj=default.rgw.meta:root:.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 state=0x5619ddc981e0 s->prefetch_data=0
                  2019-05-31 16:36:27.868 7fdc91735700 10 cache get: name=default.rgw.meta+root+.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 : hit (requested=0x16, cached=0x17)
                  2019-05-31 16:36:27.868 7fdc91735700 20 get_system_obj_state: s->obj_tag was set empty
                  2019-05-31 16:36:27.868 7fdc91735700 10 cache get: name=default.rgw.meta+root+.bucket.meta.cpp-bucket1:ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 : hit (requested=0x11, cached=0x17)
                  2019-05-31 16:36:27.868 7fdc91735700 20 bucket index object: .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1
                  2019-05-31 16:36:27.868 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4137 5.1 5:85f057f7:::.dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1:head [call rgw.guard_bucket_resharding,call rgw.bucket_prepare_op] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0bb340 con 0
                  2019-05-31 16:36:27.873 7fdcaef70700 1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1775 ==== osd_op_reply(4137 .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 [call,call] v30'14 uv14 ondisk = 0) v8 ==== 234+0+0 (3061579556 0 0) 0x5619dd0bb340 con 0x5619dcf9be00
                  2019-05-31 16:36:27.873 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4138 7.7 7:fe73a341:::ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1_test.file:head [create,setxattr user.rgw.idtag (44),setxattr user.rgw.tail_tag (44),writefull 0~82695,setxattr user.rgw.manifest (313),setxattr user.rgw.acl (147),setxattr user.rgw.content_type (11),setxattr user.rgw.etag (32),setxattr user.rgw.x-amz-date (32),setxattr user.rgw.x-amz-meta-s3cmd-attrs (143),setxattr user.rgw.x-amz-storage-class (9),call rgw.obj_store_pg_ver,setxattr user.rgw.source_zone (4)] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0bb8c0 con 0
                  2019-05-31 16:36:27.879 7fdcaef70700 1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1776 ==== osd_op_reply(4138 ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1_test.file [create,setxattr (44),setxattr (44),writefull 0~82695,setxattr (313),setxattr (147),setxattr (11),setxattr (32),setxattr (32),setxattr (143),setxattr (9),call,setxattr (4)] v30'2 uv2 ondisk = 0) v8 ==== 701+0+0 (4094818144 0 0) 0x5619dd0bb8c0 con 0x5619dcf9be00
                  2019-05-31 16:36:27.879 7fdc91735700 1 -- 192.168.115.227:0/3536016146 --> 192.168.115.227:6805/1505287 -- osd_op(unknown.0.0:4139 5.1 5:85f057f7:::.dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1:head [call rgw.guard_bucket_resharding,call rgw.bucket_complete_op] snapc 0=[] ondisk+write+known_if_redirected e30) v8 -- 0x5619dd0ab080 con 0

                  上一段看到,数据部分读取完成,于是接下来进行了元数据的更新,首先是确认对象的状态,get_obj_state ,然后操作default.rgw.meta中的信息,更新相关内容,最后我们看到会更新index,这个地方比较关键,如果是海量文件存储,更新index中的内容将非常耗时


                  上述属于第二阶段,完成数据的读取和元数据信息的更新,总共耗时21ms


                   第三阶段

                    2019-05-31 16:36:27.879 7fdc91735700  2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:completing
                    2019-05-31 16:36:27.879 7fdc91735700 2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:op status=0
                    2019-05-31 16:36:27.879 7fdc91735700 2 req 7:0.0219994s:s3:PUT /cpp-bucket1/test.file:put_obj:http status=200
                    2019-05-31 16:36:27.879 7fdc91735700 1 ====== req done req=0x7fdc9172ec80 op status=0 http_status=200 ======
                    2019-05-31 16:36:27.879 7fdc91735700 1 civetweb: 0x5619dd18c000: 127.0.0.1 - - [31/May/2019:16:36:27 +0800] "PUT /cpp-bucket1/test.file HTTP/1.1" 200 206 - -
                    2019-05-31 16:36:27.883 7fdcaef70700  1 -- 192.168.115.227:0/3536016146 <== osd.1 192.168.115.227:6805/1505287 1777 ==== osd_op_reply(4139 .dir.ba85d906-5451-49e4-b0e1-f90b7db13259.4155.1 [call,call] v30'15 uv15 ondisk = 0) v8 ==== 234+0+0 (2261156683 0 0) 0x5619dd0ab080 con 0x5619dcf9be00


                    最后,handler将处理结果封装成http响应,返回给客户端,是每个op的最后一个阶段


                    这个op总耗时是26ms,其中第二阶段数据上传和元数据更新花费时间为21ms,占比80%


                    总结

                    上述可以看到,rgw处理普通的上传op过程中,会进行三个阶段的处理,首先是进行op的鉴权,然后是数据的接收和分发,更新元数据,最后是返回HTTP请求,从中我们也可以看到第二阶段的耗时最大,从中可以定位到性能优化点,例如减小网络延时、使用巨帧一次性传输更多的数据、加大一次从缓冲区读取的参数值,甚至是rados的块大小。在下篇,我们会针对分块上传进行分析,敬请期待


                    本文转自奋斗的松鼠公众号,感兴趣的可以在下方关注。


                    Ceph中国社区

                    是国内唯一官方正式授权的社区,

                    为广大Ceph爱好者提供交流平台!

                    ↓↓↓

                    开源-创新-自强

                    官方网站:www.ceph.org.cn

                    合作邮箱:devin@ceph.org.cn

                    投稿地址:tougao@ceph.org.cn

                    长期招募热爱翻译人员,

                    参与社区翻译外文资料工作。

                    文章转载自Ceph开源社区,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

                    评论