今天遇到一个问题,在es中有很多nginx日志的格式串了,经过一系列的跟大佬一起排错的时光。。。(非常曲折)最终得出的结论就是
在logstash中使用的filter json模块中有好多无法解析的字符串
比如“values less than 32 are escaped”,而我们的nginx日志中就有一个可恶的\x22(?号打码)
"request_uri":"/???/????=xxx&method=xxx¶ms={\x22???\x22:???,\x22???\x22:14404187131,\x22???\x22:9,\x22???\x22:???,\x22???\x22:???}"
所以如果你也遇到了这种情况那么你也有两种方法解决 如果你是nginx日志,并且nginx 版本大于1.11.8,那么可以修改nginx log_format
log_format name escape=json
如果nginx 小于1.11.8版本,或者不是nginx日志,可以使用ruby去在调用filter json前去处理 logstash 配置
filter {
ruby {
path => "/etc/logstash/conf.d/httpd_access_json_fixup_encoding.rb"
}
json {
source => "message"
}
}
vim /etc/logstash/conf.d/httpd_access_json_fixup_encoding.rb
-------------------------------------------------------------------------
# Done as a script, not inline, because escaping issues are not a fun use of my time
#
# https://www.elastic.co/guide/en/logstash/current/plugins-filters-ruby.html
#
def register(params)
# we could take arguments, but we don't really need to
# @my_param = params["my_param"]
end
# The filter method receives an event and must return a list of events.
# Dropping an event means not including it in the return array,
# while creating new ones only requires you to add a new instance of
# LogStash::Event to the returned array.
#
# Normative(ish) references:
# - https://tools.ietf.org/id/draft-ietf-json-rfc4627bis-09.html#rfc.section.7
# - https://github.com/omnigroup/Apache/blob/master/httpd/server/util.c#L1847
#
def filter(event)
if event.get('message').include?('\\x') then
event.set('message', reencode_apache_httpd_access_log_to_utf8(event.get('message')))
event.tag('_httpd_access_json_reencoded')
end
return [ event ]
end
ST_NORMAL = 0
ST_ESCAPE = 1
ST_PASS = 2
ST_FAIL = 3
# Given an ASCII character (integer value) of a single hex
# digit, return the integer value it represents.
#
def h(ascii_ord)
case ascii_ord
when 0x41..0x46 # A-F
return ascii_ord - 0x41 + 10
when 0x61..0x66 # a-f
return ascii_ord - 0x61 + 10
when 0x30..0x39 # 0-9
return ascii_ord - 0x30
else
raise "ASCII character (ord #{ascii_ord} is not a hexadecimal digit"
end
end
# Given two integers representing the ASCII character values
# of two hex digits (eg. the AB in \xAB), return an integer
# that is 0xAB (in this example)
#
def hh(ascii_ord1, ascii_ord2)
return (h(ascii_ord1) << 4) + h(ascii_ord2) # beware precedence of << and +
end
def reencode_apache_httpd_access_log_to_utf8(input)
ibs = input.bytes
ibsl = ibs.length
obs = []
i = 0
state = ST_NORMAL
while true
if i >= ibsl
state = ST_PASS
end
case state
when ST_NORMAL
if ibs[i] == 0x5C # backslash
state = ST_ESCAPE
i += 1
else
obs << ibs[i]
i += 1
end
when ST_ESCAPE
#
# Escape codes serve a few purposes:
# 1. To encode characters that would be control characters
# 2. To encode characters that would be non-printable ASCII characters
# 3. To retain correct message serialisation (eg. "...\"...\"...")
#
# Reencoding to UTF-8 only requires us to to operate on cases 1 and 2.
# Case #3 goes beyond character encoding into message serialisation,
# and we must keep that unmodified, hence backslash dquote ('\"')
# and backslash backslash ('\\'), which serve as message serialisation
# escapes, must retain their escape character.
#
# Remembering that our goal is to fix-up what httpd has done so that
# we have something that should now be valid JSON, we should leave
# all control characters escaped too. So really the only thing we
# should action on is point 2.
#
case ibs[i]
when 0x78 # x
i += 1
if i+2 > ibsl
state = ST_FAIL
else
obs << hh(ibs[i], ibs[i+1])
i += 2
state = ST_NORMAL
end
when 0x5C # backslash
obs << 0x5C # maintain escape as this is for message serialisation
obs << 0x5C
state = ST_NORMAL
i += 1
when 0x22 # double-quote
obs << 0x5C # maintain escape as this is for message serialisation
obs << 0x22
state = ST_NORMAL
i += 1
when 0x62 # b
obs << 0x5C # maintain escape as this is a control character
obs << 0x08
state = ST_NORMAL
i += 1
when 0x74 # t
obs << 0x5C # maintain escape as this is a control character
obs << 0x09
state = ST_NORMAL
i += 1
when 0x6E # n
obs << 0x5C # maintain escape as this is a control character
obs << 0x0A
state = ST_NORMAL
i += 1
when 0x76 # v
obs << 0x5C # maintain escape as this is a control character
obs << 0x0B
state = ST_NORMAL
i += 1
when 0x66 # f
obs << 0x5C # maintain escape as this is a control character
obs << 0x0C
state = ST_NORMAL
i += 1
when 0x72 # r
obs << 0x5C # maintain escape as this is a control character
obs << 0x0D
state = ST_NORMAL
i += 1
else
# Invalid or unsupported escape sequence
state = ST_FAIL
end
when ST_FAIL
# we tried; revert to an identity transform
puts "Failed at offset #{i} in #{ibs}"
return input
when ST_PASS
return obs.pack('C*').force_encoding('utf-8')
else
# BUG
state = ST_FAIL
end
end
end
文章转载自耶喝运维,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。




