在阿里云Kubernetes集群或自建Kubernetes集群中使用Logtail采集日志发生异常时,您可以通过Logtail容器自助诊断工具查看客户端是否存在异常情况,根据工具提示快速定位并解决问题。诊断工具仅通过必要资源的信息进行诊断。诊断过程中,对您的Kubernetes环境无侵入。本文主要介绍如何使用Logtail容器快速诊断工具。
步骤一:下载及运行诊断工具
步骤二:开始诊断
[QUESTION]: 2022/01/14 16:30:04 please choose which item you want to check : Please input [0,1] to choose which item you want to check:
0: MachineGroup heartbeat fail.
1: MachineGroup heartbeat is ok, but log files have not been collected.
心跳问题检测
alibaba-log-configuration/kube-system
的方式定义基础的配置信息。心跳问题检测仅仅通过轻量读取此资源信息进行检测,不会对您Kubernetes集群造成影响。
根据页面提示,完成如下配置。
[QUESTION]: 2022/01/14 16:30:23 Does your aliuid exist in xxxxxxx Please input Y/N.:
y
[OK]: 2022/01/14 16:30:30 aliuid correct
[QUESTION]: 2022/01/14 16:30:30 Does your region[cn-beijing] and endpoint[cn-beijing-intranet.log.aliyuncs.com] correct? Please input Y/N.:
y
[OK]: 2022/01/14 16:30:33 region and endpoint correct
[QUESTION]: 2022/01/14 16:30:33 Does your machine group with the user deined id k8s-group-xxxxx? Please input Y/N.:
y
[OK]: 2022/01/14 16:30:35 user defined id correct
[WARNING]: 2022/01/14 16:30:35 If the problem persists,please contact with the sls developer and provide the ilogtail-trace.tar file.
[OK]: 2022/01/14 16:30:35 All tests are passed
采集问题检测
根据页面提示,设置容器所在位置,该容器为产生待采集日志的容器。配置格式为{podName}/{namespace},例如nginx-log-sidecar-demo-zlx5x/default。
[QUESTION]: 2022/01/14 16:31:49 please input the position of your pod having collection problem with {podName}/{namespace} pattern
nginx-log-sidecar-demo-zlx5x/default
- 如果问题Pod中存在多个容器,检测工具将自动显示容器内所有容器,请您选择产生日志的容器。
[QUESTION]: 2022/01/14 16:32:40 Please input [0,1] to choose which item you want to check: 0: nginx-log-demo container 1: logtail container 0
- 如果问题Pod中只有一个容器,则自动跳过此选项。
输入容器位置后,检测工具将首先进行Logtail容器Label、容器环境变量或Kubernetes基础信息的匹配性检测。如果机器匹配,继续进行容器路径探测,检查容器路径可达性(Logtail通过标准Docker或CRI接口获取容器路径,但是某些自建集群或其他云平台会存在自定义路径情况,可能存在路径不可达问题)。Logtail容器Label、容器环境变量的配置说明,请参见通过DaemonSet-控制台方式采集容器文本日志、通过DaemonSet-控制台方式采集容器标准输出。
- 如果返回如下信息,表示容器匹配失败。请先检查您的配置。
[ERROR]: 2022/01/14 16:33:34 centos/centos/default container label isn't mapping to config ##1.0##qs-demos$file-collect: .......
- 如果返回如下信息,表示容器匹配成功,开始对日志采集问题进行检测。
[INFO]: 2022/01/14 16:33:47 centos/centos/default pod mapping config ##1.0##qs-demos$file-collect: true [OK]: 2022/01/14 16:33:47 Pod is mapping to your LogConfig
如果存在问题,会返回提示信息。例如:[ERROR]: 2022/01/14 16:33:47 config: ##1.0##qs-demos$test find unaviable path: /logtail_host/var/lib/docker/overlay2/eb23efaa3e4afdfce0638da6e11f0ee2a020e7fa891f7a0cfa2c761102bf4ceb/diff/trace-log-demos [SUGGESTION]: 2022/01/14 16:33:47 Other config: ##1.0##qs-demos$test is also sniffing the container, maybe conflicts. [ERROR]: 2022/01/14 16:33:47 config: ##1.0##qs-demos$sls-mall find unaviable path: /logtail_host/var/lib/docker/overlay2/eb23efaa3e4afdfce0638da6e11f0ee2a020e7fa891f7a0cfa2c761102bf4ceb/diff/sls-mall [SUGGESTION]: 2022/01/14 16:33:47 Other config: ##1.0##qs-demos$sls-mall is also sniffing the container, maybe conflicts. [INFO]: 2022/01/14 16:33:48 find files in centos container path: total 1868 drwxr-xr-x 4 root root 4096 Nov 30 04:05 . drwxr-xr-x 9 root root 4096 Nov 30 03:58 .. dr-xr-xr-x 2 root root 4096 Nov 9 09:00 bin dr-xr-xr-x 3 root root 4096 Nov 9 09:00 lib -rw------- 1 root root 1880970 Dec 3 06:09 nohup.out -rw-r--r-- 1 root root 172 Nov 30 04:05 run.sh [ERROR]: 2022/01/14 16:33:47 Failed to get time zone of pod: nginx-log-sidecar-4hwbk under namespace default [WARNING]: 2022/01/14 16:33:47 Work pod timezone: is not same as the logtail pod:+0000 [WARNING]: 2022/01/14 16:33:47 If the problem persists,please contact with the sls developer and provide the ilogtail-trace.tar file.
- 检测容器内文件采集模式路径是否存在
采集文本日志时,Logtail会将源采集信息存储于Logtail Pod的/usr/local/ilogtail/docker_path_config.json文件,检测工具通过此文件进行检测,无需担心成本开销。
- 检测是否存在重复采集
由于配置问题,导致一份日志被多个Logtail采集。诊断工具会判断采集冲突的Logtail采集配置,您可以根据提示进行修复。
- 检测容器时间是否正确
如果遇到采集时区的问题,一般为采集容器与Logtail容器时区不匹配,默认Logtail为UTC时区。此时您可以通过挂载宿主机/etc/localtime进行处理。
volumeMounts: - name: timezone mountPath: /etc/localtime volumes: - name: timezone hostPath: /etc/localtime
- 检测容器内文件采集模式路径是否存在