实例讲解strace的用法

使用说明

strace是一个非常简单的工具,它可以跟踪系统调用的执行。最简单的方式,它可以从头到尾跟踪binary的执行,然后以一行文本输出系统调用的名字,参数和返回值。

strace的最简单的用法就是执行一个指定的命令,如:$ strace cat /dev/null 在命令结束之后它也就退出了。在命令执行的过程中,strace会记录和解析命令进程的所有系统调用以及这个进程所接收到的所有的信号值。

对于网络慢的问题,可以使用tcpdump抓包分析慢的问题;而如果就是本机一个程序慢呢?就需要使用strace来分析了

以下是详细参数:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
-c 统计每一系统调用的所执行的时间,次数和出错的次数等.
-d 输出strace关于标准错误的调试信息.
-f 跟踪由fork调用所产生的子进程.
-ff 如果提供-o filename,则所有进程的跟踪结果输出到相应的filename.pid中,pid是各进程的进程号.
-F 尝试跟踪vfork调用.在-f时,vfork不被跟踪.
-h 输出简要的帮助信息.
-i 输出系统调用的入口指针.
-q 禁止输出关于脱离的消息.
-r 打印出相对时间关于,,每一个系统调用.
-t 在输出中的每一行前加上时间信息.
-tt 在输出中的每一行前加上时间信息,微秒级.
-ttt 微秒级输出,以秒了表示时间.
-T 显示每一调用所耗的时间.
-v 输出所有的系统调用.一些调用关于环境变量,状态,输入输出等调用由于使用频繁,默认不输出.
-V 输出strace的版本信息.
-x 以十六进制形式输出非标准字符串
-xx 所有字符串以十六进制形式输出.
-a column 设置返回值的输出位置.默认 为40.
-e expr 指定一个表达式,用来控制如何跟踪.格式如下:
[qualifier=][!]value1[,value2]...
qualifier只能是 trace,abbrev,verbose,raw,signal,read,write其中之一.value是用来限定的符号或数字.默认的 qualifier是 trace.感叹号是否定符号.例如:
-eopen等价于 -e trace=open,表示只跟踪open调用.而-etrace!=open表示跟踪除了open以外的其他调用.有两个特殊的符号 all 和 none.
注意有些shell使用!来执行历史记录里的命令,所以要使用\\.
-e trace=set 只跟踪指定的系统 调用.例如:-e trace=open,close,rean,write表示只跟踪这四个系统调用.默认的为set=all.
-e trace=file 只跟踪有关文件操作的系统调用.
-e trace=process只跟踪有关进程控制的系统调用.
-e trace=network跟踪与网络有关的所有系统调用.
-e strace=signal跟踪所有与系统信号有关的 系统调用
-e trace=ipc 跟踪所有与进程通讯有关的系统调用
-e abbrev=set 设定 strace输出的系统调用的结果集.-v 等与 abbrev=none.默认为abbrev=all.
-e raw=set 将指 定的系统调用的参数以十六进制显示.
-e signal=set 指定跟踪的系统信号.默认为all.如 signal=!SIGIO(或者signal=!io),表示不跟踪SIGIO信号.
-e read=set 输出从指定文件中读出 的数据.例如:
-e read=3,5
-e write=set 输出写入到指定文件中的数据.
-o filename 将strace的输出写入文件filename
-p pid 跟踪指定的进程pid.
-s strsize 指定输出的字符串的最大长度.默认为32.文件名一直全部输出.
-u username 以username 的UID和GID执行被跟踪的命令

比较常用的参数有:

1
2
3
4
5
6
7
8
9
-p 跟踪指定的进程
-o filename 默认strace将结果输出到stdout。通过-o可以将输出写入到filename文件中
-ff 常与-o选项一起使用,不同进程(子进程)产生的系统调用输出到filename.PID文件
-r 打印每一个系统调用的相对时间
-t 在输出中的每一行前加上时间信息。
-tt 时间确定到微秒级。还可以使用-ttt打印相对时间
-s 指定每一行输出字符串的长度,默认是32。
-c 统计每种系统调用所执行的时间,调用次数,出错次数。
-e expr 输出过滤器,通过表达式,可以过滤出掉你不想要输出

统计每项调用的时间占比

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
[root@localhost ~]# strace -cp 2920
Process 2920 attached
^CProcess 2920 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
69.44 0.002775 6 442 epoll_wait
5.86 0.000234 4 63 63 connect
5.16 0.000206 4 46 write
3.53 0.000141 2 73 close
3.40 0.000136 2 63 socket
2.53 0.000101 1 70 23 sendto
2.05 0.000082 1 139 1 setsockopt
1.83 0.000073 1 73 fcntl
1.70 0.000068 1 61 epoll_ctl
1.45 0.000058 1 86 38 recvfrom
1.43 0.000057 1 103 10 getsockopt
0.60 0.000024 1 40 semop
0.55 0.000022 1 20 10 accept
0.40 0.000016 1 20 ioctl
0.08 0.000003 0 10 getsockname
------ ----------- ----------- --------- --------- ----------------
100.00 0.003996 1309 145 total

跟踪28979进程的所有系统调用(-e trace=all),并统计系统调用的花费时间,以及开始时间(并以可视化的时分秒格式显示),最后将记录结果存在output.txt文件里面

1
strace -o output.txt -T -tt -e trace=all -p 28979

使用-e跟踪指定的系统调用

1
2
3
4
5
[root@localhost ~]# strace -e open cat a.sh 
open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3
open("a.sh", O_RDONLY) = 3

实例一:Ping延时5秒

故障现象

线上有一台机器反馈,跟xxx.com交互很慢,在使用ping命令去测试时,发现经常会出现卡住的情况,ping一个包就花费的时间用了5s,但实际上时延很小,如下所示:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
[root@12002838 ~]# time ping -c 1 xxx.com
PING xxx.com.cdngtm.com (1.2.3.4) 56(84) bytes of data.
64 bytes from 1.2.3.4: icmp_seq=1 ttl=45 time=35.8 ms

--- xxx.com.cdngtm.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 35.852/35.852/35.852/0.000 ms

real 0m0.224s
user 0m0.000s
sys 0m0.002s
[root@12002838 ~]#
[root@12002838 ~]# time ping -c 1 xxx.com
PING xxx.com.cdngtm.com (1.2.3.4) 56(84) bytes of data.
64 bytes from 1.2.3.4: icmp_seq=1 ttl=45 time=35.7 ms

--- xxx.com.cdngtm.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 35.783/35.783/35.783/0.000 ms

real 0m5.603s
user 0m0.000s
sys 0m0.002s

再接着测试,会发现使用ping -c 1 xxx.com -n或者直接ping 1.2.3.4时,就很正常了。

分析过程

从上面的测试中可以看出,从网络连通性来说,是没有问题的,因为时延很低。而ping -n的这个参数的意思是指对解析出来的IP不进行反向解析,所以我们有理由怀疑是反向解析出来了问题。

但是使用tcpdump -i any -nns0 host 1.2.3.4抓包发现,未没有发现异常情况。

所以怀疑是系统上面有什么异常导致的,使用strace命令来跟踪下,-tt显示时间,-T显示调用时间,输出如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
[op@12002824 ~]$ strace -tt -T ping -c 1 xxx.com
13:42:35.810276 execve("/usr/bin/ping", ["ping", "-c", "1", "xxx.com"], [/* 29 vars */]) = 0 <0.000541>
13:42:35.811240 brk(0) = 0x7ff9d7b47000 <0.000187>
13:42:35.811671 fcntl(0, F_GETFD) = 0 <0.000140>
13:42:35.812035 fcntl(1, F_GETFD) = 0 <0.000107>
13:42:35.812336 fcntl(2, F_GETFD) = 0 <0.000047>
13:42:35.812542 access("/etc/suid-debug", F_OK) = -1 ENOENT (No such file or directory) <0.000056>
13:42:35.812853 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6427000 <0.000097>
13:42:35.813064 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) <0.000052>
13:42:35.813223 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3 <0.000025>
13:42:35.813307 fstat(3, {st_mode=S_IFREG|0644, st_size=59413, ...}) = 0 <0.000012>
13:42:35.813356 mmap(NULL, 59413, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff9d6418000 <0.000015>
13:42:35.813400 close(3) = 0 <0.000008>
13:42:35.813442 open("/usr/lib64/libcap.so.2", O_RDONLY|O_CLOEXEC) = 3 <0.000012>
13:42:35.813481 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \26\0\0\0\0\0\0"..., 832) = 832 <0.000009>
13:42:35.813519 fstat(3, {st_mode=S_IFREG|0755, st_size=20024, ...}) = 0 <0.000008>
13:42:35.813561 mmap(NULL, 2114112, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff9d6002000 <0.000011>
13:42:35.813599 mprotect(0x7ff9d6006000, 2093056, PROT_NONE) = 0 <0.000011>
13:42:35.813638 mmap(0x7ff9d6205000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7ff9d6205000 <0.000012>
13:42:35.813684 close(3) = 0 <0.000008>
13:42:35.813722 open("/usr/lib64/libidn.so.11", O_RDONLY|O_CLOEXEC) = 3 <0.000012>
13:42:35.813763 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\0000\0\0\0\0\0\0"..., 832) = 832 <0.000009>
13:42:35.813801 fstat(3, {st_mode=S_IFREG|0755, st_size=208928, ...}) = 0 <0.000007>
13:42:35.813839 mmap(NULL, 2302416, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff9d5dcf000 <0.000009>
13:42:35.813875 mprotect(0x7ff9d5e01000, 2093056, PROT_NONE) = 0 <0.000009>
13:42:35.813910 mmap(0x7ff9d6000000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x31000) = 0x7ff9d6000000 <0.000010>
13:42:35.813954 close(3) = 0 <0.000008>
13:42:35.813991 open("/usr/lib64/libc.so.6", O_RDONLY|O_CLOEXEC) = 3 <0.000010>
13:42:35.814030 read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0P%\2\0\0\0\0\0"..., 832) = 832 <0.000008>
13:42:35.814067 fstat(3, {st_mode=S_IFREG|0755, st_size=2173512, ...}) = 0 <0.000007>
13:42:35.814104 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6417000 <0.000008>
13:42:35.814143 mmap(NULL, 3981792, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff9d5a02000 <0.000009>
13:42:35.814179 mprotect(0x7ff9d5bc5000, 2093056, PROT_NONE) = 0 <0.000012>
13:42:35.814217 mmap(0x7ff9d5dc4000, 24576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x1c2000) = 0x7ff9d5dc4000 <0.000011>
13:42:35.814257 mmap(0x7ff9d5dca000, 16864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff9d5dca000 <0.000010>
13:42:35.814298 close(3) = 0 <0.000007>
13:42:35.814336 open("/usr/lib64/libattr.so.1", O_RDONLY|O_CLOEXEC) = 3 <0.000011>
13:42:35.814375 read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320\23\0\0\0\0\0\0"..., 832) = 832 <0.000009>
13:42:35.814413 fstat(3, {st_mode=S_IFREG|0755, st_size=19888, ...}) = 0 <0.000008>
13:42:35.814450 mmap(NULL, 2113904, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7ff9d57fd000 <0.000010>
13:42:35.814488 mprotect(0x7ff9d5801000, 2093056, PROT_NONE) = 0 <0.000010>
13:42:35.814524 mmap(0x7ff9d5a00000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x3000) = 0x7ff9d5a00000 <0.000011>
13:42:35.814569 close(3) = 0 <0.000008>
13:42:35.814613 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6416000 <0.000009>
13:42:35.814651 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6414000 <0.000008>
13:42:35.814691 arch_prctl(ARCH_SET_FS, 0x7ff9d6414740) = 0 <0.000008>
13:42:35.814775 mprotect(0x7ff9d5dc4000, 16384, PROT_READ) = 0 <0.000011>
13:42:35.814817 mprotect(0x7ff9d5a00000, 4096, PROT_READ) = 0 <0.000009>
13:42:35.814862 mprotect(0x7ff9d6000000, 4096, PROT_READ) = 0 <0.000009>
13:42:35.814900 mprotect(0x7ff9d6205000, 4096, PROT_READ) = 0 <0.000009>
13:42:35.814957 mprotect(0x7ff9d6634000, 4096, PROT_READ) = 0 <0.000009>
13:42:35.814994 mprotect(0x7ff9d6428000, 4096, PROT_READ) = 0 <0.000009>
13:42:35.815029 munmap(0x7ff9d6418000, 59413) = 0 <0.000013>
13:42:35.815117 brk(0) = 0x7ff9d7b47000 <0.000008>
13:42:35.815152 brk(0x7ff9d7b68000) = 0x7ff9d7b68000 <0.000009>
13:42:35.815186 brk(0) = 0x7ff9d7b68000 <0.000007>
13:42:35.815224 capget({_LINUX_CAPABILITY_VERSION_3, 0}, NULL) = 0 <0.000009>
13:42:35.815264 capget({_LINUX_CAPABILITY_VERSION_3, 0}, {CAP_NET_ADMIN|CAP_NET_RAW, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000008>
13:42:35.815303 capget({_LINUX_CAPABILITY_VERSION_3, 0}, NULL) = 0 <0.000007>
13:42:35.815340 capset({_LINUX_CAPABILITY_VERSION_3, 0}, {0, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000008>
13:42:35.815378 prctl(PR_SET_KEEPCAPS, 1) = 0 <0.000008>
13:42:35.815414 getuid() = 1000 <0.000008>
13:42:35.815448 setuid(1000) = 0 <0.000009>
13:42:35.815481 prctl(PR_SET_KEEPCAPS, 0) = 0 <0.000008>
13:42:35.815515 getuid() = 1000 <0.000007>
13:42:35.815554 geteuid() = 1000 <0.000008>
13:42:35.815601 open("/usr/lib/locale/locale-archive", O_RDONLY|O_CLOEXEC) = 3 <0.000012>
13:42:35.815642 fstat(3, {st_mode=S_IFREG|0644, st_size=106070960, ...}) = 0 <0.000007>
13:42:35.815680 mmap(NULL, 106070960, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7ff9cf2d4000 <0.000011>
13:42:35.815720 close(3) = 0 <0.000008>
13:42:35.815774 capget({_LINUX_CAPABILITY_VERSION_3, 0}, NULL) = 0 <0.000008>
13:42:35.815810 capget({_LINUX_CAPABILITY_VERSION_3, 0}, {0, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000008>
13:42:35.815849 capset({_LINUX_CAPABILITY_VERSION_3, 0}, {CAP_NET_RAW, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000008>
13:42:35.815889 socket(PF_INET, SOCK_RAW, IPPROTO_ICMP) = 3 <0.000013>
13:42:35.815935 capget({_LINUX_CAPABILITY_VERSION_3, 0}, NULL) = 0 <0.000007>
13:42:35.815970 capget({_LINUX_CAPABILITY_VERSION_3, 0}, {CAP_NET_RAW, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000008>
13:42:35.816007 capset({_LINUX_CAPABILITY_VERSION_3, 0}, {0, CAP_NET_ADMIN|CAP_NET_RAW, 0}) = 0 <0.000009>
13:42:35.816075 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=72, ...}) = 0 <0.000011>
13:42:35.816125 open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 4 <0.000011>
13:42:35.816168 fstat(4, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0 <0.000008>
13:42:35.816205 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6426000 <0.000009>
13:42:35.816242 read(4, "multi on\n", 4096) = 9 <0.000011>
13:42:35.816287 read(4, "", 4096) = 0 <0.000008>
13:42:35.816323 close(4) = 0 <0.000009>
13:42:35.816357 munmap(0x7ff9d6426000, 4096) = 0 <0.000012>
13:42:35.816396 open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 4 <0.000010>
13:42:35.816436 fstat(4, {st_mode=S_IFREG|0644, st_size=72, ...}) = 0 <0.000008>
13:42:35.816473 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6426000 <0.000008>
13:42:35.816510 read(4, "nameserver 119.29.29.29\nnameserv"..., 4096) = 72 <0.000010>
13:42:35.816551 read(4, "", 4096) = 0 <0.000008>
13:42:35.816585 close(4) = 0 <0.000009>
13:42:35.816620 munmap(0x7ff9d6426000, 4096) = 0 <0.000011>
13:42:35.816657 uname({sys="Linux", node="12002824", ...}) = 0 <0.000008>
13:42:35.816707 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 <0.000012>
13:42:35.816746 connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) <0.000018>
13:42:35.816853 close(4) = 0 <0.000010>
13:42:35.816893 socket(PF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 4 <0.000009>
13:42:35.816929 connect(4, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory) <0.000011>
13:42:35.816970 close(4) = 0 <0.000009>
13:42:35.817009 open("/etc/nsswitch.conf", O_RDONLY|O_CLOEXEC) = 4 <0.000010>
13:42:35.817047 fstat(4, {st_mode=S_IFREG|0644, st_size=1717, ...}) = 0 <0.000008>
13:42:35.817085 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6426000 <0.000009>
13:42:35.817122 read(4, "#\n# /etc/nsswitch.conf\n#\n# An ex"..., 4096) = 1717 <0.000010>
13:42:35.817170 read(4, "", 4096) = 0 <0.000008>
13:42:35.817205 close(4) = 0 <0.000008>
13:42:35.817239 munmap(0x7ff9d6426000, 4096) = 0 <0.000010>
13:42:35.817284 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4 <0.000010>
13:42:35.817322 fstat(4, {st_mode=S_IFREG|0644, st_size=59413, ...}) = 0 <0.000007>
13:42:35.817358 mmap(NULL, 59413, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7ff9d6418000 <0.000009>
13:42:35.817394 close(4) = 0 <0.000008>
13:42:35.817433 open("/usr/lib64/libnss_files.so.2", O_RDONLY|O_CLOEXEC) = 4 <0.000011>
13:42:35.817473 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\320!\0\0\0\0\0\0"..., 832) = 832 <0.000009>
13:42:35.817510 fstat(4, {st_mode=S_IFREG|0755, st_size=62184, ...}) = 0 <0.000008>
13:42:35.817549 mmap(NULL, 2173048, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7ff9cf0c1000 <0.000012>
13:42:35.817587 mprotect(0x7ff9cf0cd000, 2093056, PROT_NONE) = 0 <0.000012>
13:42:35.817626 mmap(0x7ff9cf2cc000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0xb000) = 0x7ff9cf2cc000 <0.000012>
13:42:35.817669 mmap(0x7ff9cf2ce000, 22648, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff9cf2ce000 <0.000009>
13:42:35.817711 close(4) = 0 <0.000008>
13:42:35.817755 mprotect(0x7ff9cf2cc000, 4096, PROT_READ) = 0 <0.000011>
13:42:35.817799 munmap(0x7ff9d6418000, 59413) = 0 <0.000011>
13:42:35.817844 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 4 <0.000011>
13:42:35.817885 fstat(4, {st_mode=S_IFREG|0644, st_size=271, ...}) = 0 <0.000008>
13:42:35.817922 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6426000 <0.000009>
13:42:35.817959 read(4, "127.0.0.1 localhost localhost."..., 4096) = 271 <0.000010>
13:42:35.818006 read(4, "", 4096) = 0 <0.000008>
13:42:35.818042 close(4) = 0 <0.000009>
13:42:35.818076 munmap(0x7ff9d6426000, 4096) = 0 <0.000010>
13:42:35.818115 open("/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 4 <0.000010>
13:42:35.818153 fstat(4, {st_mode=S_IFREG|0644, st_size=59413, ...}) = 0 <0.000008>
13:42:35.818190 mmap(NULL, 59413, PROT_READ, MAP_PRIVATE, 4, 0) = 0x7ff9d6418000 <0.000009>
13:42:35.818226 close(4) = 0 <0.000007>
13:42:35.818264 open("/usr/lib64/libnss_dns.so.2", O_RDONLY|O_CLOEXEC) = 4 <0.000011>
13:42:35.818303 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\260\20\0\0\0\0\0\0"..., 832) = 832 <0.000008>
13:42:35.818340 fstat(4, {st_mode=S_IFREG|0755, st_size=31824, ...}) = 0 <0.000008>
13:42:35.818378 mmap(NULL, 2121984, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7ff9ceeba000 <0.000010>
13:42:35.818416 mprotect(0x7ff9ceebf000, 2097152, PROT_NONE) = 0 <0.000012>
13:42:35.818454 mmap(0x7ff9cf0bf000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x5000) = 0x7ff9cf0bf000 <0.000010>
13:42:35.818499 close(4) = 0 <0.000008>
13:42:35.818538 open("/usr/lib64/libresolv.so.2", O_RDONLY|O_CLOEXEC) = 4 <0.000011>
13:42:35.818577 read(4, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0@9\0\0\0\0\0\0"..., 832) = 832 <0.000009>
13:42:35.818614 fstat(4, {st_mode=S_IFREG|0755, st_size=106848, ...}) = 0 <0.000008>
13:42:35.818652 mmap(NULL, 2198016, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 4, 0) = 0x7ff9ceca1000 <0.000010>
13:42:35.818690 mprotect(0x7ff9cecb7000, 2093056, PROT_NONE) = 0 <0.000011>
13:42:35.818727 mmap(0x7ff9ceeb6000, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 4, 0x15000) = 0x7ff9ceeb6000 <0.000011>
13:42:35.818773 mmap(0x7ff9ceeb8000, 6656, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x7ff9ceeb8000 <0.000010>
13:42:35.818812 close(4) = 0 <0.000008>
13:42:35.818862 mprotect(0x7ff9ceeb6000, 4096, PROT_READ) = 0 <0.000011>
13:42:35.818953 mprotect(0x7ff9cf0bf000, 4096, PROT_READ) = 0 <0.000010>
13:42:35.818991 munmap(0x7ff9d6418000, 59413) = 0 <0.000013>
13:42:35.819050 socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000011>
13:42:35.819088 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("119.29.29.29")}, 16) = 0 <0.000014>
13:42:35.819142 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) <0.000009>
13:42:35.819184 sendto(4, "\357\235\1\0\0\1\0\0\0\0\0\0\6omsapi\7snstack\2cn\0\0"..., 35, MSG_NOSIGNAL, NULL, 0) = 35 <0.000024>
13:42:35.819238 poll([{fd=4, events=POLLIN}], 1, 5000) = 1 ([{fd=4, revents=POLLIN}]) <0.019979>
13:42:35.839285 ioctl(4, FIONREAD, [108]) = 0 <0.000055>
13:42:35.839388 recvfrom(4, "\357\235\201\200\0\1\0\3\0\0\0\0\6omsapi\7snstack\2cn\0\0"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("119.29.29.29")}, [16]) = 108 <0.000014>
13:42:35.839450 close(4) = 0 <0.000016>
13:42:35.839515 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 4 <0.000012>
13:42:35.839551 connect(4, {sa_family=AF_INET, sin_port=htons(1025), sin_addr=inet_addr("163.177.221.143")}, 16) = 0 <0.000012>
13:42:35.839629 getsockname(4, {sa_family=AF_INET, sin_port=htons(41990), sin_addr=inet_addr("218.107.205.242")}, [16]) = 0 <0.000008>
13:42:35.839667 close(4) = 0 <0.000011>
13:42:35.839707 setsockopt(3, SOL_RAW, ICMP_FILTER, ~(ICMP_ECHOREPLY|ICMP_DEST_UNREACH|ICMP_SOURCE_QUENCH|ICMP_REDIRECT|ICMP_TIME_EXCEEDED|ICMP_PARAMETERPROB), 4) = 0 <0.000008>
13:42:35.839745 setsockopt(3, SOL_IP, IP_RECVERR, [1], 4) = 0 <0.000009>
13:42:35.839786 setsockopt(3, SOL_SOCKET, SO_SNDBUF, [324], 4) = 0 <0.000008>
13:42:35.839825 setsockopt(3, SOL_SOCKET, SO_RCVBUF, [65536], 4) = 0 <0.000008>
13:42:35.839861 getsockopt(3, SOL_SOCKET, SO_RCVBUF, [131072], [4]) = 0 <0.000008>
13:42:35.839906 fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 4), ...}) = 0 <0.000008>
13:42:35.839944 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6426000 <0.000009>
13:42:35.839985 write(1, "PING xxx.com.cdngtm.com ("..., 71PING xxx.com.cdngtm.com (163.177.221.143) 56(84) bytes of data.
) = 71 <0.000016>
13:42:35.840030 setsockopt(3, SOL_SOCKET, SO_TIMESTAMP, [1], 4) = 0 <0.000008>
13:42:35.840066 setsockopt(3, SOL_SOCKET, SO_SNDTIMEO, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000007>
13:42:35.840102 setsockopt(3, SOL_SOCKET, SO_RCVTIMEO, "\1\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0", 16) = 0 <0.000007>
13:42:35.840137 getpid() = 20102 <0.000007>
13:42:35.840170 rt_sigaction(SIGINT, {0x7ff9d642fbd0, [], SA_RESTORER|SA_INTERRUPT, 0x7ff9d5a382f0}, NULL, 8) = 0 <0.000007>
13:42:35.840214 rt_sigaction(SIGALRM, {0x7ff9d642fbd0, [], SA_RESTORER|SA_INTERRUPT, 0x7ff9d5a382f0}, NULL, 8) = 0 <0.000007>
13:42:35.840252 rt_sigaction(SIGQUIT, {0x7ff9d642fbc0, [], SA_RESTORER|SA_INTERRUPT, 0x7ff9d5a382f0}, NULL, 8) = 0 <0.000008>
13:42:35.840288 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000007>
13:42:35.840326 ioctl(1, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, {B38400 opost isig icanon echo ...}) = 0 <0.000008>
13:42:35.840364 ioctl(1, TIOCGWINSZ, {ws_row=42, ws_col=210, ws_xpixel=0, ws_ypixel=0}) = 0 <0.000007>
13:42:35.840401 sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("163.177.221.143")}, msg_iov(1)=[{"\10\0\5\273N\206\0\1K\272\277]\0\0\0\0\315\322\f\0\0\0\0\0\20\21\22\23\24\25\26\27"..., 64}], msg_controllen=0, msg_flags=0}, 0) = 64 <0.000023>
13:42:35.840460 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000009>
13:42:35.840500 recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(0), sin_addr=inet_addr("163.177.221.143")}, msg_iov(1)=[{"E\0\0T\315\374\0\0005\1\216\r\243\261\335\217\332k\315\362\0\0\r\273N\206\0\1K\272\277]"..., 192}], msg_controllen=32, {cmsg_len=32, cmsg_level=SOL_SOCKET, cmsg_type=0x1d /* SCM_??? */, ...}, msg_flags=0}, 0) = 84 <0.020365>
13:42:35.860923 stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=72, ...}) = 0 <0.000016>
13:42:35.861009 open("/etc/hosts", O_RDONLY|O_CLOEXEC) = 4 <0.000015>
13:42:35.861060 fstat(4, {st_mode=S_IFREG|0644, st_size=271, ...}) = 0 <0.000011>
13:42:35.861103 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff9d6425000 <0.000011>
13:42:35.861143 read(4, "127.0.0.1 localhost localhost."..., 4096) = 271 <0.000015>
13:42:35.861198 read(4, "", 4096) = 0 <0.000009>
13:42:35.861235 close(4) = 0 <0.000046>
13:42:35.861323 munmap(0x7ff9d6425000, 4096) = 0 <0.000022>
13:42:35.861408 socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 4 <0.000015>
13:42:35.861466 connect(4, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("119.29.29.29")}, 16) = 0 <0.000019>
13:42:35.861528 poll([{fd=4, events=POLLOUT}], 1, 0) = 1 ([{fd=4, revents=POLLOUT}]) <0.000010>
13:42:35.861583 sendto(4, "\34\312\1\0\0\1\0\0\0\0\0\0\003143\003221\003177\003163\7in-"..., 46, MSG_NOSIGNAL, NULL, 0) = 46 <0.000021>
13:42:35.861636 poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout) <5.005176>

13:42:40.866995 socket(PF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 5 <0.000133>
13:42:40.867341 connect(5, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("223.5.5.5")}, 16) = 0 <0.000134>
13:42:40.867662 poll([{fd=5, events=POLLOUT}], 1, 0) = 1 ([{fd=5, revents=POLLOUT}]) <0.000198>
13:42:40.868077 sendto(5, "\34\312\1\0\0\1\0\0\0\0\0\0\003143\003221\003177\003163\7in-"..., 46, MSG_NOSIGNAL, NULL, 0) = 46 <0.000100>
13:42:40.868271 poll([{fd=5, events=POLLIN}], 1, 3000) = 1 ([{fd=5, revents=POLLIN}]) <0.018325>
13:42:40.886880 ioctl(5, FIONREAD, [121]) = 0 <0.000097>
13:42:40.887278 recvfrom(5, "\34\312\201\203\0\1\0\0\0\1\0\0\003143\003221\003177\003163\7in-"..., 1024, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("223.5.5.5")}, [16]) = 121 <0.000200>
13:42:40.887805 close(4) = 0 <0.000149>
13:42:40.888173 close(5) = 0 <0.000047>
13:42:40.888424 write(1, "64 bytes from 163.177.221.143: i"..., 6264 bytes from 163.177.221.143: icmp_seq=1 ttl=53 time=20.4 ms
) = 62 <0.000121>
13:42:40.888708 write(1, "\n", 1
) = 1 <0.000045>
13:42:40.888867 write(1, "--- xxx.com.cdngtm.com pi"..., 50--- xxx.com.cdngtm.com ping statistics ---
) = 50 <0.000036>
13:42:40.889016 write(1, "1 packets transmitted, 1 receive"..., 601 packets transmitted, 1 received, 0% packet loss, time 0ms
) = 60 <0.000011>
13:42:40.889073 write(1, "rtt min/avg/max/mdev = 20.426/20"..., 53rtt min/avg/max/mdev = 20.426/20.426/20.426/0.000 ms
) = 53 <0.000008>
13:42:40.889139 exit_group(0) = ?
13:42:40.889364 +++ exited with 0 +++

根据strace的输出,可以看到,13:42:35.861636 poll([{fd=4, events=POLLIN}], 1, 5000) = 0 (Timeout) <5.005176>这里卡死了5s,结合上下文对比一下,

使用dig来查看一下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
[root@12002838 ~]# dig -x 1.2.3.4 @119.29.29.29

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7 <<>> -x 1.2.3.4 @119.29.29.29
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 64491
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;111.242.61.183.in-addr.arpa. IN PTR

;; AUTHORITY SECTION:
61.183.in-addr.arpa. 600 IN SOA soa. dns.guangzhou.gd.cn. 2016012139 10800 3600 604800 86400

;; Query time: 42 msec
;; SERVER: 119.29.29.29#53(119.29.29.29)
;; WHEN: Mon Nov 04 17:01:13 CST 2019
;; MSG SIZE rcvd: 114

[root@12002838 ~]# dig -x 1.2.3.4 @119.29.29.29

; <<>> DiG 9.9.4-RedHat-9.9.4-18.el7 <<>> -x 1.2.3.4 @119.29.29.29
;; global options: +cmd
;; connection timed out; no servers could be reached

结合抓包来分析的话,可以看到反向解析异常了。

1
2
3
4
5
6
7
[op@12002838 ~]$ tcpdump -r ptr-119.29.29.29.pcap
reading from file ptr-119.29.29.29.pcap, link-type LINUX_SLL (Linux cooked)
17:01:13.110935 IP 1.146.21.211.58085 > pdns.dnspod.cn.domain: 64491+ [1au] PTR? 111.242.61.183.in-addr.arpa. (56)
17:01:13.153168 IP pdns.dnspod.cn.domain > 1.146.21.211.58085: 64491 NXDomain 0/1/1 (114)
17:01:15.718078 IP 1.146.21.211.10569 > pdns.dnspod.cn.domain: 10577+ [1au] PTR? 111.242.61.183.in-addr.arpa. (56)
17:01:20.718195 IP 1.146.21.211.10569 > pdns.dnspod.cn.domain: 10577+ [1au] PTR? 111.242.61.183.in-addr.arpa. (56)
17:01:25.718336 IP 1.146.21.211.10569 > pdns.dnspod.cn.domain: 10577+ [1au] PTR? 111.242.61.183.in-addr.arpa. (56)

实例二:zk客户端连接慢

最近在部署zookeeper的时候,在本机使用/cache1/zookeeper/bin/zkCli.sh -server 10.8.50.189连接ZK时非常慢,需要等待10几秒才能由CONNECTING变成CONNECTED,如下图:

刚开始怀疑是网络问题,使用tcpdump抓包之后,发现zookeeper也是隔了15S左右才发现的建链请求。

使用strace -f -TT -t -o 189.txt /cache1/zookeeper/bin/zkCli.sh -server 10.8.50.189来跟踪,这时会将zkCli.sh所加载的子进程一起打印到189.txt的结果里面。我们取最后面一列,时间大于1秒的结果来分析

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
[root@misa ~]# cat 189.txt |egrep -v 'si_signo|unfinished|exited with|exit_group' |awk 'BEGIN{FS="[<>]+"}{if($(NF-1)>1)print $0}'
13695 17:55:28.198956 <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <1.241380>
13705 17:55:29.181802 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000035>
13705 17:55:30.182504 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000074>
13705 17:55:31.182737 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000081>
13705 17:55:32.286019 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000100>
13726 17:55:32.995436 <... poll resumed> ) = 0 (Timeout) <5.004650>
13705 17:55:33.286386 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000060>
13711 17:55:33.294683 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000069>
13710 17:55:33.294901 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000050>
13709 17:55:33.295178 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000063>
13712 17:55:33.295427 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000062>
13705 17:55:34.286639 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000061>
13705 17:55:35.286925 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000109>
13705 17:55:36.287219 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000071>
13705 17:55:37.287529 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000072>
13726 17:55:38.000944 <... poll resumed> ) = 0 (Timeout) <5.005081>
13705 17:55:38.287743 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000067>
13711 17:55:38.294922 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000071>
13710 17:55:38.295131 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000059>
13709 17:55:38.295420 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000060>
13712 17:55:38.295642 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000073>
13705 17:55:39.288386 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000053>
13705 17:55:40.288629 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000065>
13705 17:55:41.288916 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000013>
13705 17:55:42.289220 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000056>
13726 17:55:43.008426 <... poll resumed> ) = 0 (Timeout) <5.005070>
13705 17:55:43.289487 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000055>
13711 17:55:43.295181 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000061>
13710 17:55:43.295363 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000048>
13709 17:55:43.295633 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000005>
13712 17:55:43.295859 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <5.000047>
13705 17:55:44.289740 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000077>
13705 17:55:45.290048 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000095>
13705 17:55:46.290380 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000088>
13705 17:55:47.290651 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000055>
13726 17:55:48.012491 <... poll resumed> ) = 0 (Timeout) <5.003801>
13711 17:55:48.045227 <... futex resumed> ) = 0 <4.749875>
13710 17:55:48.045447 <... futex resumed> ) = 0 <4.749921>
13709 17:55:48.045658 <... futex resumed> ) = 0 <4.749881>
13712 17:55:48.045848 <... futex resumed> ) = 0 <4.749845>
13727 17:55:48.062549 <... futex resumed> ) = 0 <20.085666>
13705 17:55:48.290963 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000139>
13705 17:55:49.291661 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000052>
13705 17:55:50.291893 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000062>
13705 17:55:51.292141 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000072>
13696 17:55:51.310297 <... read resumed> "\r", 1) = 1 <23.003900>
13709 17:55:51.311488 <... futex resumed> ) = 0 <3.247963>
13711 17:55:51.311853 <... futex resumed> ) = 0 <3.247834>
13712 17:55:51.312082 <... futex resumed> ) = 0 <3.247748>
13710 17:55:51.312409 <... futex resumed> ) = 0 <3.247753>
13729 17:55:51.320436 <... futex resumed> ) = 0 <23.014312>
13705 17:55:52.344542 <... futex resumed> ) = -1 ETIMEDOUT (Connection timed out) <1.000074>
13695 17:55:52.806638 <... futex resumed> ) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <24.550556>
13687 17:55:52.806675 <... wait4 resumed> 0x7fff0dcc7610, 0, NULL) = ? ERESTARTSYS (To be restarted if SA_RESTART is set) <25.861684>
13708 17:55:52.806868 <... futex resumed> ) = 0 <25.566062>
13729 17:55:52.821612 <... futex resumed> ) = 0 <1.476969>
13918 17:55:52.828485 _exit(0) = ?
13714 17:55:52.830662 _exit(0) = ?

然后就对上述日志进行排查,发现了以下的异常:

1
2
3
4
5
6
7
13726 17:55:27.990116 uname({sys="Linux", node="misa", ...}) = 0 <0.000029>
13726 17:55:27.990215 socket(PF_INET, SOCK_DGRAM|SOCK_NONBLOCK, IPPROTO_IP) = 44 <0.000050>
13726 17:55:27.990327 connect(44, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("114.114.114.114")}, 16) = 0 <0.000050>
13726 17:55:27.990439 poll([{fd=44, events=POLLOUT}], 1, 0) = 1 ([{fd=44, revents=POLLOUT}]) <0.000033>
13726 17:55:27.990529 sendto(44, "\247\214\1\0\0\1\0\0\0\0\0\0\003189\00250\0018\00210\7in-addr"..., 42, MSG_NOSIGNAL, NULL, 0) = 42 <0.000075>
13726 17:55:27.990754 poll([{fd=44, events=POLLIN}], 1, 5000 <unfinished ...>
13726 17:55:32.995436 <... poll resumed> ) = 0 (Timeout) <5.004650>

线路13726有发起对114 DNS的请求,但是我们不清楚是什么域名,那就再次动用tcdump来分析问题:

1
2
3
4
5
6
7
[root@misa ~]# tcpdump -i any -nns0 port 53 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 65535 bytes
21:36:42.254545 IP 10.8.50.190.42709 > 114.114.114.114.53: 51804+ PTR? 189.50.8.10.in-addr.arpa. (42)
21:36:47.258444 IP 10.8.50.190.42709 > 114.114.114.114.53: 51804+ PTR? 189.50.8.10.in-addr.arpa. (42)
21:36:52.264034 IP 10.8.50.190.59759 > 114.114.114.114.53: 28250+ PTR? 189.50.8.10.in-addr.arpa. (42)
21:36:57.269104 IP 10.8.50.190.59759 > 114.114.114.114.53: 28250+ PTR? 189.50.8.10.in-addr.arpa. (42)

那就问题很明显了,还是遇到了ptr反向解析的问题了。我们用host命令来测试下是不是很慢:

1
2
3
4
5
6
[root@misa ~]# time host 10.8.50.191
;; connection timed out; no servers could be reached

real 0m10.007s
user 0m0.004s
sys 0m0.003s

也可以用 dig -x 10.8.50.189 来测试下,证明都是很慢。所以极大的可能是反向解析的问题,为了验证确认是这个问题导致的,我把nameserver都注释掉,发现一切正常了,秒级连上zk。

但问题是,为什么zk会去反向解析这个IP呢?换句话说,怎么样解决这个问题呢?

第一个想法是通过zk的参数去关闭反向解析,但翻查了官方文档 https://zookeeper.apache.org/doc/r3.6.3/zookeeperAdmin.html 之后,发现没有找到有这个选项。此方法不通。

第二个方法是想通过强制解析下ptr就可以了,但是/etc/hosts是不支持的,难道为了这个问题,还得去搭建一下DNS服务器吗?不值得。

思考了半天,我在/etc/hosts加上主机名的解析,即10.8.50.189 misa,这样是否可以呢?

神奇的发现,加上之后问题解决了,并且抓包也没有反向解析了,这是为什么呢?求大神指导。

  • 本文作者: wumingx
  • 本文链接: https://www.wumingx.com/performance/strace.html
  • 本文主题: 实例讲解strace的用法
  • 版权声明: 本站所有文章除特别声明外,转载请注明出处!如有侵权,请联系我删除。
0%