Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Direct Memory grows from 5MB-1GB leading to application crash #3558

Open
abhishek-sharma-20 opened this issue Dec 22, 2024 · 9 comments
Open
Assignees
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-feedback type/bug A general bug

Comments

@abhishek-sharma-20
Copy link

We have a routing reactive application which take a request from client and then routes this request to a backend. We originally had a non-reactive solution now we are moving to reactive solution. We have a flow in which backend response is large (1MB-5MB) in these cases when running with 2-3TPS for a duration of 30mins to 1 hour we see application getting restarted automatically . On investigating we saw it is getting restarted because application is consuming more memory and hitting memory limit of pod and leading to restart. On observing memory usage patterns we saw direct memory was keep on growing from 5MB - 1024 MB and then leading to application restart.

Expected Behavior

Application should be able to support large response size and should not consume more memory than expected.

Actual Behavior

Application is consuming more memory than expected and leading to application restart.

Steps to Reproduce

###HttpConfig.java

@Bean
    public ConnectionProvider connectionProvider() {
        return ConnectionProvider.builder("httpConn")
                .maxConnections(20)
                .metrics(true)
                .build();
    }

    @Bean
    LoopResources loopResources() {
        return LoopResources.create("loop", 100, true);
    }

    @Bean
    public PooledByteBufAllocator byteBufAllocator() {
        return PooledByteBufAllocator.DEFAULT;
    }

    public SslContext armSSLContext() {
        SslContext sslContext = null;
        String keyStoreFile = System.getProperty("javax.net.ssl.keyStore");
        char[] keyStorePassword = System.getProperty("javax.net.ssl.keyStorePassword").toCharArray();
        try (InputStream keyStoreStream = Files.newInputStream(Paths.get(keyStoreFile))) {
            KeyStore keyStore = KeyStore.getInstance(KeyStore.getDefaultType());
            keyStore.load(keyStoreStream, keyStorePassword);
            KeyManagerFactory keyManagerFactory = KeyManagerFactory.getInstance("SunX509");
            keyManagerFactory.init(keyStore, keyStorePassword);

            KeyStore trustStore = KeyStore.getInstance(KeyStore.getDefaultType());
            trustStore.load(new FileInputStream(keyStoreFile), keyStorePassword);

            TrustManagerFactory trustManagerFactory = TrustManagerFactory.getInstance("SunX509");
            trustManagerFactory.init(trustStore);

            sslContext = SslContextBuilder.forClient().keyManager(keyManagerFactory)
                    .trustManager(trustManagerFactory)
                    .build();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return sslContext;
    }

###HttpWebClient.java

 public WebClient reactiveWebClientBuilder() {
        HttpClient httpClient = HttpClient.create(connectionProvider)
                .resolver(DefaultAddressResolverGroup.INSTANCE)
                .secure(spec -> spec.sslContext(armSSLContext))
                .compress(true)
                .metrics(true, s->s)
                .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 0)
                .option(ChannelOption.TCP_NODELAY, true)
                .doOnConnected(conn -> conn.addHandlerLast(new ReadTimeoutHandler(
                        (int) TimeUnit.MILLISECONDS.toSeconds(connectionTimeout.intValue())
                )))
                .runOn(loopResources);
        return WebClient.builder()
            .clientConnector(new ReactorClientHttpConnector(httpClient))
            .codecs(configurer -> configurer.defaultCodecs().maxInMemorySize(armProperties.getHttp().getByteBufferSize() *1024 * 1024)) // 6MB
            .build();
    }

###ResponseController

 @GetMapping("/response")
    public Mono<ResponseEntity<String>> getResponse(@RequestParam int size) {
        return reactiveWebClientBuilder.method(HttpMethod.GET)
                .uri(uriBuilder -> uriBuilder.scheme("https").host(armProp.getHttp().getHostname())
            .port(armProp.getHttp().getPort()).path("/path").build())
                .contentType(MediaType.APPLICATION_JSON)
                .headers(this::addHttpHeaders).retrieve()
                .onStatus(httpStatusCode -> HttpStatus.valueOf(httpStatusCode.value())
                        .isError(), t -> Mono.empty()).toEntity(String.class);
    }

Above are our classes used for this app and below is JVM arguments used for this application
-Xmx3072M -Xss256K -Xms3072M -XX:+HeapDumpOnOutOfMemoryError -verbose:gc -XX:+ParallelRefProcEnabled -XX:ParallelGCThreads=4 -XX:ConcGCThreads=2 -XX:MaxGCPauseMillis=500 -XX:MetaspaceSize=64m -XX:MaxMetaspaceSize=512m -XX:+DisableExplicitGC -XX:MaxJavaStackTraceDepth=15 -Dspring.config.location=optional:classpath:/,optional:classpath:/config/ -Dorg.springframework.boot.logging.LoggingSystem=none

Possible Solution

Your Environment

  • reactor-netty-core: 1.2.0
  • netty: 4.1.111.Final
  • spring-framework: 6.1.13
  • spring-boot: 3.2.9

Below are system details

  • System Linux (5.4.0-200-generic)
  • OpenJDK 17.0.13
  • Kubernetes pod with 3 core CPU and 4 GB RAM
  • Reactor version(s) used:
  • Other relevant libraries versions (eg. netty, ...):
  • JVM version (java -version):
  • OS and version (eg. uname -a):
@abhishek-sharma-20 abhishek-sharma-20 added status/need-triage A new issue that still need to be evaluated as a whole type/bug A general bug labels Dec 22, 2024
@violetagg violetagg self-assigned this Dec 30, 2024
@violetagg violetagg added for/user-attention This issue needs user attention (feedback, rework, etc...) and removed status/need-triage A new issue that still need to be evaluated as a whole labels Dec 30, 2024
@violetagg
Copy link
Member

@abhishek-sharma-20 Is it possible that you see this issue spring-projects/spring-framework#29772 ?

@abhishek-sharma-20
Copy link
Author

Hi @violetagg We did more analysis and we are seeing same issue in VMs as well, in VMs as we have shared memory hence we don't see any crashed but we observed reactor_netty_bytebuf_allocator_used_direct_memory gauge is going till 1.3 GB

@violetagg
Copy link
Member

violetagg commented Jan 2, 2025

@abhishek-sharma-20 Can you specify your limit configuration for direct memory?
In the shared issue above there is a link where it is explained how direct memory limit is calculated #2590 (comment)

@abhishek-sharma-20
Copy link
Author

abhishek-sharma-20 commented Jan 2, 2025

Hi @violetagg We don't have any configuration for direct memory we have a memory limit for each pod which is 4GB. Our application starts with 3GB of heap size and when we run it on load direct memory reaches 1GB leading to pod memory limit break (3GB heap + 1GB direct memory) and pod restart

Additional we have loop-resource=300 and each response size is 5MB which equate to 300*5MB =1.5GB(at peak) which is what we see direct memory used how ever if direct memory is getting pooled this should not reach 1.5GB as per our understanding (please correct)

@violetagg
Copy link
Member

@abhishek-sharma-20 Then isn't this expected?

@violetagg
Copy link
Member

violetagg commented Jan 2, 2025

Additional we have loop-resource=300 and each response size is 5MB which equate to 300*5MB =1.5GB(at peak) which is what we see direct memory used how ever if direct memory is getting pooled this should not reach 1.5GB as per our understanding (please correct)

You do not have sequential requests/responses (I assume)? The direct memory is used not only for the concrete data that comes with the request/response but also for internal implementation (TLS handshake, parsing HTTP etc.). Spring Framework uses it for decoding/encoding.

@abhishek-sharma-20
Copy link
Author

Yes That is correct we wanted to understand that once the load(number of request or size of response) is reduced will the direct memory be released or not? In our case even after reduced load direct memory is not reduced.

We are also running performance test with less number of loop resources and will share more insights once we are done with test

@violetagg
Copy link
Member

Yes That is correct we wanted to understand that once the load(number of request or size of response) is reduced will the direct memory be released or not? In our case even after reduced load direct memory is not reduced.

Check this https://projectreactor.io/docs/netty/release/reference/http-client.html#metrics

This will not be reduced reactor.netty.bytebuf.allocator.used.direct.memory, but this will be reduced reactor.netty.bytebuf.allocator.active.direct.memory

Copy link

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 7 days this issue will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
for/user-attention This issue needs user attention (feedback, rework, etc...) status/need-feedback type/bug A general bug
Projects
None yet
Development

No branches or pull requests

2 participants