Skip to content

[Feature] Ensuring the safety of stopSpan #11214

Description

@xiaqi1210

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

In the agent, createSpan and stopSpan should appear in pairs. In fact, in many plugins, such as HttpClient, dubbo, etc., createSpan is in beforeMethod, while stopSpan is in afterMethod. There is a problem here. If createSpan fails, or an exception occurs in some code before createSpan, causing createSpan not to be executed, then stopping Span in afterMethod will be a dangerous action. This will cause the parentSpan to be stopped prematurely. This will lead to TraceContext being destroyed prematurely, causing chain tracking interruption, and will also cause NPE when the upper plugin calls stopSpan.

Therefore, a mechanism is needed to ensure that in this scenario, stopSpan can only stop the span created in the current Interceptor. Since beforeMethod and afterMethod must execute in the same Interceptor instance, a instance variable can be used to solve this problem. My idea is to create a base class and provide createSpan and stopSpan methods in the base class. Interceptors that need to ensure stopping the corresponding span can inherit this base class, and then call the base class methods to replace ContextManager's createSpan and stopSpan methods.

Use case

  1. Skywalking Agent 8.14.0, HttpClient-4.x-plugin
  2. SpringBoot Application, Controller requestMapping is "/test_http_client", server port=8080
  3. The logic in the Controller uses HttpClient to request another URL : "http://localhost:8080:8080/test" (This URL is invalid and contains 2 port)
  4. start application, request URL: "http://localhost:8080/test_http_client"

Based on the current situation:
in beforeMethod, new URL() (this operation is before createSpan) will throw an error because the URL is invalid. This causes the span to not be created, but afterMethod's stopSpan will still be executed. At this time, the span stopped is the one created by Tomcat. And because there is only one span left, after this span is stopped, the TracingContext object in ThreadLocal will also be deleted. Therefore, calling stopSpan in Tomcat plugin's afterMethod will produce an NPE. If there are other RPC calls after the HttpClient call, those RPC call chains can no longer be linked with this chain either.

Based on my base class design:
When calling stopSpan, it will check if the span variable in the current Interceptor instance has a value. If not, it will be ignored. If there is a value, the stop operation will be performed.

Related issues

No response

Are you willing to submit a pull request to fix this on your own?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    agentLanguage agent related.featureNew featurejavaJava agent relatedrejectedThe issue or PR can't be accepted by upstream.

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions