链路追踪

本文介绍Java 11运行环境的链路追踪相关内容。

背景信息

阿里云链路追踪服务(Tracing Analysis)基于OpenTracing标准,兼容开源社区,为分布式应用的开发者提供了完整地分布式调用链查询和诊断、分布式拓扑动态发现、应用性能实时汇总等功能。

函数计算与链路追踪集成后,支持使用Jaeger SDKOpenTelemetry上传链路信息,使您能够跟踪函数的执行,帮助您快速分析和诊断Serverless架构下的性能瓶颈,提高Serverless场景的开发诊断效率。

功能简介

您可以在函数计算控制台配置链路追踪。具体操作,请参见配置链路追踪

为服务开启链路追踪后,函数计算会自动记录请求在系统侧的耗时,包含冷启动耗时、Initializer函数的耗时和函数的执行时间等。关于下图中系统Span的说明,请参见Span名称说明链路追踪

如您还需查看函数内业务侧的耗时,例如,在函数内访问RDS,NAS等服务的耗时,可以通过创建自定义Span来实现。

示例代码

函数计算的链路分析基于OpenTracing协议的Jaeger实现,Java运行时提供以下两种创建自定义Span的方式。

使用OpenTelemetry(推荐)

在Java语言的代码中,您可以通过OpenTelemetry SDK手动埋点将数据上报到链路追踪服务端。完整的示例代码,请参见java-tracing-openTelemetry

示例代码解析如下。

  • pom.xml文件中添加依赖。

      <dependencies>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>3.8.1</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>com.aliyun.fc.runtime</groupId>
          <artifactId>fc-java-core</artifactId>
          <version>1.4.1</version>
        </dependency>
        <dependency>
          <groupId>io.opentelemetry</groupId>
          <artifactId>opentelemetry-api</artifactId>
          <version>1.19.0</version>
        </dependency>
        <dependency>
          <groupId>io.opentelemetry</groupId>
          <artifactId>opentelemetry-sdk</artifactId>
          <version>1.19.0</version>
        </dependency>
        <dependency>
          <groupId>io.opentelemetry</groupId>
          <artifactId>opentelemetry-semconv</artifactId>
          <version>1.19.0-alpha</version>
        </dependency>
        <dependency>
          <groupId>io.opentelemetry</groupId>
          <artifactId>opentelemetry-exporter-jaeger-thrift</artifactId>
          <version>1.19.0</version>
        </dependency>
        <dependency>
          <groupId>io.jaegertracing</groupId>
          <artifactId>jaeger-thrift</artifactId>
          <version>1.8.1</version>
        </dependency>
        <dependency>
          <groupId>org.slf4j</groupId>
          <artifactId>slf4j-simple</artifactId>
          <version>1.6.6</version>
        </dependency>
      </dependencies>
  • 上报数据到链路追踪服务端。

    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
        String endpoint = context.getTracing().getJaegerEndpoint();
        try {
            ExampleConfiguration.initOpenTelemetry(endpoint);
        } catch (TTransportException e) {
            throw new RuntimeException(e);
        }
    
        SpanContext spanContext = contextFromString(context.getTracing().getSpanContext());
    
        startMySpan(io.opentelemetry.context.Context.current().with(Span.wrap(spanContext)));
    }
  • 创建一个全局OpenTelemetry对象,提供对Tracers的访问。

    static OpenTelemetry initOpenTelemetry(String jaegerEndpoint) throws TTransportException {
        // 导出traces到Jaeger
        JaegerThriftSpanExporter jaegerExporter =
            JaegerThriftSpanExporter.builder()
            .setThriftSender(new Builder(jaegerEndpoint).build())
            .setEndpoint(jaegerEndpoint)
            .build();
    
        Resource serviceNameResource =
            Resource.create(Attributes.of(ResourceAttributes.SERVICE_NAME, "otel-jaeger-example"));
    
        // 设置由Jaeger Exporter处理Span
        SdkTracerProvider tracerProvider =
            SdkTracerProvider.builder()
            .addSpanProcessor(SimpleSpanProcessor.create(jaegerExporter))
            .setResource(Resource.getDefault().merge(serviceNameResource))
            .build();
        OpenTelemetrySdk openTelemetry =
            OpenTelemetrySdk.builder().setTracerProvider(tracerProvider).buildAndRegisterGlobal();
    
        // JVM退出时关闭SDK
        Runtime.getRuntime().addShutdownHook(new Thread(tracerProvider::close));
    
        return openTelemetry;
    }
  • 获取上下文的Tracing信息,转换为SpanContext。

    SpanContext contextFromString(String value) throws IOException {
        {
            if (value != null && !value.equals("")) {
                String[] parts = value.split(":");
                if (parts.length != 4) {
                    throw new RuntimeException(value);
                } else {
                    String traceId = parts[0];
                    if (traceId.length() <= 32 && traceId.length() >= 1) {
                        return SpanContext.createFromRemoteParent("0000000000000000"+parts[0], parts[1], TraceFlags.getSampled(), TraceState.getDefault());
                    } else {
                        throw new RuntimeException("Trace id [" + traceId + "] length is not withing 1 and 32");
                    }
                }
            } else {
                throw new RuntimeException();
            }
        }
    }
  • 创建tracer并通过转换的Context创建子Span。每一个Span代表调用链中被命名并计时的连续性执行片段,您也可以基于该Span继续创建子Span。

    void startMySpan(io.opentelemetry.context.Context ctx){
        Tracer tracer = GlobalOpenTelemetry.getTracer("fc-Trace");
        Span parentSpan = tracer.spanBuilder("fc-operation").setParent(ctx).startSpan();
        parentSpan.setAttribute("version","fc-v1");
        try {
            TimeUnit.MILLISECONDS.sleep(150);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
        child(parentSpan.storeInContext(ctx));
        parentSpan.end();
    }
    
    void child(io.opentelemetry.context.Context ctx){
        Tracer tracer = GlobalOpenTelemetry.getTracer("fc-Trace");
        Span childSpan = tracer.spanBuilder("fc-operation-child").setParent(ctx).startSpan();
        childSpan.addEvent("timeout");
        try {
            Thread.sleep(100);
        } catch (InterruptedException e) {
            throw new RuntimeException(e);
        }
        childSpan.end();
    }

使用Jaeger SDK

您可以通过Jaeger SDK埋点,将数据上报到链路追踪服务端。完整的示例代码,请参见java-tracing

示例代码解析如下。

  • pom.xml文件中添加依赖。

    <dependencies>
      <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>3.8.1</version>
        <scope>test</scope>
      </dependency>
      <dependency>
        <groupId>com.aliyun.fc.runtime</groupId>
        <artifactId>fc-java-core</artifactId>
        <version>1.4.1</version>
      </dependency>
      <dependency>
        <groupId>io.jaegertracing</groupId>
        <artifactId>jaeger-client</artifactId>
        <version>1.8.1</version>
      </dependency>
      <dependency>
        <groupId>org.slf4j</groupId>
        <artifactId>slf4j-simple</artifactId>
        <version>1.6.6</version>
      </dependency>
    </dependencies>
  • 上报数据到链路追踪服务端。

    public void handleRequest(InputStream inputStream, OutputStream outputStream, Context context) throws IOException {
    
        registerTracer(context);
    
        JaegerSpanContext spanContext = contextFromString(context.getTracing().getSpanContext());
    
        startMySpan(spanContext);
    }
  • 根据上下文的Tracing信息创建一个tracer

    void registerTracer(Context context){
        io.jaegertracing.Configuration config = new io.jaegertracing.Configuration("FCTracer");
        io.jaegertracing.Configuration.SenderConfiguration sender = new io.jaegertracing.Configuration.SenderConfiguration();
        sender.withEndpoint(context.getTracing().getJaegerEndpoint());
        config.withSampler(new io.jaegertracing.Configuration.SamplerConfiguration().withType("const").withParam(1));
        config.withReporter(new io.jaegertracing.Configuration.ReporterConfiguration().withSender(sender).withMaxQueueSize(10000));
        GlobalTracer.register(config.getTracer());
    }
  • 转换spanContext并创建自定义Span,您也可以基于该Span继续创建子Span。

    static JaegerSpanContext contextFromString(String value) throws MalformedTracerStateStringException, EmptyTracerStateStringException {
        if (value != null && !value.equals("")) {
            String[] parts = value.split(":");
            if (parts.length != 4) {
                throw new MalformedTracerStateStringException(value);
            } else {
                String traceId = parts[0];
                if (traceId.length() <= 32 && traceId.length() >= 1) {
                    return new JaegerSpanContext(0L, (new BigInteger(traceId, 16)).longValue(), (new BigInteger(parts[1], 16)).longValue(), (new BigInteger(parts[2], 16)).longValue(), (new BigInteger(parts[3], 16)).byteValue());
                } else {
                    throw new TraceIdOutOfBoundException("Trace id [" + traceId + "] length is not withing 1 and 32");
                }
            }
        } else {
            throw new EmptyTracerStateStringException();
        }
    }