Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenJDK CRaC support #10363

Open
FilipPetrovic0 opened this issue Jan 31, 2024 · 6 comments
Open

OpenJDK CRaC support #10363

FilipPetrovic0 opened this issue Jan 31, 2024 · 6 comments
Labels
enhancement New feature or request needs triage New issue that requires triage

Comments

@FilipPetrovic0
Copy link

Is your feature request related to a problem? Please describe.

I'm using Spring Boot app and I was hoping I could utilize the CRaC project. My app uses this javaagent to send observability data to server.

In order for CRaC to work, so called checkpoint must be captured, and restored. Before checkpoint is captured, all open connections, sockets and files must be closed. Also, when restoring the checkpoint, every such connection,socket and file must be reopened so the app can perform the same way before the checkpoint was taken.

So the problem here is that opentelemetry java agent has an open connection to server where it sends data to, and from the java code, we have no way to close/reopen such connection, meaning capturing the checkpoint is impossible.

Describe the solution you'd like

The solution would be to make opentelemetry javaagent CRaC compatible, meaning that the agent can automatically respond to checkpoint capturing/restoring by closing /reopening connections,sockets and files it manages.

That way capturing the checkpoint would be possible.

Describe alternatives you've considered

Here is the list of all things I tried. All of them failed or we gave up on them for different reasons:

  • Considered using opentelemetry SDK
  • Tried to use dynamic attach mechanism to attach javaagent at runtime, when the checkpoint is restored
  • Force killing all connections from the app to try and capture the checkpoint
  • Starting the agent without a way for it to connect to server it uploads data to

Additional context

Other javaagents have the same issue with CRaC and are actively working on it:

elastic/apm-agent-java#2982
newrelic/newrelic-java-agent#1697

@FilipPetrovic0 FilipPetrovic0 added enhancement New feature or request needs triage New issue that requires triage labels Jan 31, 2024
@laurit
Copy link
Contributor

laurit commented Jan 31, 2024

@jack-berg could you have a look, should this issue be transferred to the sdk repo?

@jack-berg
Copy link
Member

It might make sense to keep an umbrella issue open for CRaC compatibility with the otel java agent, but it does sound like the most obvious issue is specification with the exporters from opentelemetry-java.

I could transfer the issue over there or we could keep it open here and open a new issue in the core repsoitory.

@trask
Copy link
Member

trask commented Jan 31, 2024

hi @FilipPetrovic0! have you tried CRaC with the OpenTelemetry Java SDK? if that doesn't work, then I'd recommend creating an issue in https://github.com/open-telemetry/opentelemetry-java first, then if/when the SDK support it, we can revisit agent support here

@FilipPetrovic0
Copy link
Author

HI @trask , I haven't tried to implement agent functionality using the SDK. The problem with SDK is the additional development and testing time that is required to make everything work as it works now (we are currently using the java agent).

I was looking into SDK anyway and I didn't see the support for CRaC.

I have additional issue that is more agent specific and maybe you could help me out. Since capturing the checkpoint using CRaC is impossible due to open connection to opentelemetry collector, I tried to make it impossible for agent to reach the collector. When I did that, and when I wanted to capture the checkpoint the issue was no longer the open connection, but the open jar file (opentelemetry-javaagent.jar). I guess that the resources from javaagent were not loaded correctly since the server was unavailable.

This is the output when trying to checkpoint and app which has javaagent up and running but without connection to the server it sends data to:

root@EU:~# jcmd app-1.0.jar JDK.checkpoint
10884:
An exception during a checkpoint operation:
jdk.internal.crac.mirror.CheckpointException
	Suppressed: jdk.internal.crac.mirror.impl.CheckpointOpenFileException: FD fd=6 type=regular path=/root/opentelemetry-javaagent.jar
		at java.base/jdk.internal.crac.mirror.Core.translateJVMExceptions(Core.java:114)
		at java.base/jdk.internal.crac.mirror.Core.checkpointRestore1(Core.java:188)
		at java.base/jdk.internal.crac.mirror.Core.checkpointRestore(Core.java:286)
		at java.base/jdk.internal.crac.mirror.Core.checkpointRestoreInternal(Core.java:299)

Do you have any idea why would the jar file remain open after loading the agent?

@laurit
Copy link
Contributor

laurit commented Feb 1, 2024

@FilipPetrovic0 Having open jar files if perfectly normal. Class loader generally don't open the jar file each time you need to load a class but rather opens it only once and keeps a reference to the open jar. My guess would be

though could easily be something completely different.

@jeanbisutti
Copy link
Member

The problem with SDK is the additional development and testing time that is required to make everything work as it works now (we are currently using the java agent).

@FilipPetrovic0 You could try CRaC with the OpenTelemetry Spring Boot strarter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request needs triage New issue that requires triage
Projects
None yet
Development

No branches or pull requests

5 participants