Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UndispatchedCoroutine leaks CoroutineContext via thread local when using custom ContinuationInterceptor #4296

Closed
pyricau opened this issue Dec 12, 2024 · 5 comments
Assignees
Labels

Comments

@pyricau
Copy link

pyricau commented Dec 12, 2024

Bug description

To fix #2930 (thread locals using ThreadContextElement not cleaned up), #3252 introduce a little hack (source) in UndispatchedCoroutine which immediately saves the previous thread context to UndispatchedCoroutine.threadStateToRecover within the UndispatchedCoroutine constructor, only when the ContinuationInterceptor is not a CoroutineDispatcher:

internal actual class UndispatchedCoroutine<in T>actual constructor () : ScopeCoroutine<T>() {
    init {
        if (uCont.context[ContinuationInterceptor] !is CoroutineDispatcher) {
            val values = updateThreadContext(context, null)
            restoreThreadContext(context, values)
            saveThreadContext(context, values)
        }
    }

If the continuation doesn't suspend, then UndispatchedCoroutine.afterResume() isn't invoked and UndispatchedCoroutine.threadStateToRecover.remove() is not called.

One thing of note, the javadoc for UndispatchedCoroutine.threadStateToRecover claims this:

     * Each access to Java's [ThreadLocal] leaves a footprint in the corresponding Thread's `ThreadLocalMap`
     * that is cleared automatically as soon as the associated thread-local (-> UndispatchedCoroutine) is garbage collected.

Unfortunately, that's not exactly correct. When the ThreadLocal is garbage collected, its value actually stays in the ThreadLocalMap, until either the thread itself gets garbage collected, or the ThreadLocal map cleans up stale entries, which only happens when using thread locals some more.

So when the right conditions apply, the CoroutineContext instance ends up being held in memory long after it's no longer needed.

To reproduce this, you need:

  • A CoroutineContext that contains a ContinuationInterceptor that is not a CoroutineDispatcher (to trigger the UndispatchedCoroutine constructor hack)
  • Call withContext(), changing the CoroutineContext but without changing the ContinuationInterceptor (to trigger the withContext() fast path 2 that starts the coroutine on the same ContinuationInterceptor, wrapping it with UndispatchedCoroutine.

Here's a repro: #4295

package kotlinx.coroutines

import kotlinx.coroutines.testing.*
import org.junit.Test
import java.lang.ref.*
import kotlin.coroutines.*
import kotlin.test.*

class CustomContinuationInterceptorTest : TestBase() {

    fun `CoroutineDispatcher not suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(Dispatchers.Default, suspend = false)

    @Test
    fun `CoroutineDispatcher suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(Dispatchers.Default, suspend = true)


    @Test
    fun `CustomContinuationInterceptor suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomContinuationInterceptor(Dispatchers.Default),
            suspend = true
        )

    // This is the one test that fails
    @Test
    fun `CustomContinuationInterceptor not suspending leaks CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomContinuationInterceptor(Dispatchers.Default),
            suspend = false
        )

    @Test
    fun `CustomNeverEqualContinuationInterceptor suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
            suspend = true
        )

    @Test
    fun `CustomNeverEqualContinuationInterceptor not suspending does not leak CoroutineContext`() =
        ensureCoroutineContextGCed(
            CustomNeverEqualContinuationInterceptor(Dispatchers.Default),
            suspend = false
        )


    private fun ensureCoroutineContextGCed(coroutineContext: CoroutineContext, suspend: Boolean) {
        runTest {
            lateinit var ref: WeakReference<CoroutineName>
            val job = GlobalScope.launch(coroutineContext) {
                val coroutineName = CoroutineName("Yo")
                ref = WeakReference(coroutineName)
                withContext(coroutineName) {
                    if (suspend) {
                        delay(1)
                    }
                }
            }
            job.join()

            System.gc()
            assertNull(ref.get())
        }
    }

}

class CustomContinuationInterceptor(private val delegate: ContinuationInterceptor) :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {

    override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
        return delegate.interceptContinuation(continuation)
    }
}

class CustomNeverEqualContinuationInterceptor(private val delegate: ContinuationInterceptor) :
    AbstractCoroutineContextElement(ContinuationInterceptor), ContinuationInterceptor {

    override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> {
        return delegate.interceptContinuation(continuation)
    }

    override fun equals(other: Any?) = false
}

Fix

I see a few options:

Workaround

Override equals in any custom ContinuationInterceptor implementation and have it return always false, to skip the fast way that's creating the UndispatchedCoroutine creation.

How we found this

I ran into some Compose related memory leaks in Square's UI tests back in February, asked the Compose team if they knew anything about that. They didn't. Then yesterday folks from the Compose team reached out because they're enabling leak detection in their test suite and were running into the exact same leak. We worked together on analyzing a heap dump and debugging and eventually root cause it to this bug.

The Compose team is looking into whether they can start using CoroutineDispatcher implementations directly in Compose tests (for other reasons as well, e.g. see this issue)

Here's the leaktrace as reported by leakcanary:

====================================
HEAP ANALYSIS RESULT
====================================
1 APPLICATION LEAKS

References underlined with "~~~" are likely causes.
Learn more at https://squ.re/leaks.

44298 bytes retained by leaking objects
Signature: c4797f2a268bbe8b50072affe05b2f82ce29f5ed
┬───
│ GC Root: Thread object
│
├─ java.lang.Thread instance
│    Leaking: NO (the main thread always runs)
│    Thread name: 'main'
│    ↓ Thread.threadLocals
│             ~~~~~~~~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap instance
│    Leaking: UNKNOWN
│    Retaining 446.6 kB in 9221 objects
│    ↓ ThreadLocal$ThreadLocalMap.table
│                                 ~~~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry[] array
│    Leaking: UNKNOWN
│    Retaining 446.6 kB in 9220 objects
│    ↓ ThreadLocal$ThreadLocalMap$Entry[2]
│                                      ~~~
├─ java.lang.ThreadLocal$ThreadLocalMap$Entry instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9136 objects
│    ↓ ThreadLocal$ThreadLocalMap$Entry.value
│                                       ~~~~~
├─ kotlin.Pair instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9135 objects
│    ↓ Pair.first
│           ~~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 444.0 kB in 9134 objects
│    ↓ CombinedContext.left
│                      ~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 442.9 kB in 9112 objects
│    ↓ CombinedContext.left
│                      ~~~~
├─ kotlin.coroutines.CombinedContext instance
│    Leaking: UNKNOWN
│    Retaining 442.9 kB in 9111 objects
│    ↓ CombinedContext.element
│                      ~~~~~~~
├─ kotlinx.coroutines.internal.ScopeCoroutine instance
│    Leaking: UNKNOWN
│    Retaining 441.0 kB in 9040 objects
│    ↓ ScopeCoroutine.uCont
│                     ~~~~~
├─ androidx.compose.foundation.gestures.DefaultScrollableState$scroll$2 instance
│    Leaking: UNKNOWN
│    Retaining 441.0 kB in 9038 objects
│    Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│    ↓ DefaultScrollableState$scroll$2.$block
│                                      ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic$scroll$2 instance
│    Leaking: UNKNOWN
│    Retaining 104 B in 2 objects
│    Anonymous subclass of kotlin.coroutines.jvm.internal.SuspendLambda
│    ↓ ScrollingLogic$scroll$2.this$0
│                              ~~~~~~
├─ androidx.compose.foundation.gestures.ScrollingLogic instance
│    Leaking: UNKNOWN
│    Retaining 796 B in 33 objects
│    ↓ ScrollingLogic.overscrollEffect
│                     ~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.AndroidEdgeEffectOverscrollEffect instance
│    Leaking: UNKNOWN
│    Retaining 484 B in 12 objects
│    ↓ AndroidEdgeEffectOverscrollEffect.edgeEffectWrapper
│                                        ~~~~~~~~~~~~~~~~~
├─ androidx.compose.foundation.EdgeEffectWrapper instance
│    Leaking: UNKNOWN
│    Retaining 56 B in 1 objects
│    context instance of androidx.activity.ComponentActivity with mDestroyed = true
│    ↓ EdgeEffectWrapper.context
│                        ~~~~~~~
╰→ androidx.activity.ComponentActivity instance
     Leaking: YES (ObjectWatcher was watching this because androidx.activity.ComponentActivity received Activity#onDestroy() callback and Activity#mDestroyed is true)
     Retaining 44.3 kB in 797 objects
     key = 6bdc725d-e124-437c-895d-437bf03213af
     watchDurationMillis = 5182
     retainedDurationMillis = 174
     mApplication instance of android.app.Application
     mBase instance of android.app.ContextImpl

For anyone with access to the ASG slack, here's the investigation thread: https://androidstudygroup.slack.com/archives/CJH03QASH/p1733875296139909?thread_ts=1708204699.328019&cid=CJH03QASH

@qwwdfsad
Copy link
Contributor

Thanks for a really comprehensive and easy-to-follow bug report!

The issue is clear, and the potential solution is easy to implement (though hard to accept :)) and is hidden in the problem statement: If the continuation doesn't suspend, then UndispatchedCoroutine.afterResume() isn't invoked.

The undispatched handling path should just be aware of that -- either by having an explicit cleanup function in ScopedCoroutine` or by if'ing the actual implementation

@qwwdfsad
Copy link
Contributor

I have a prototype that passes your tests (again, thanks!), I'll polish it up tomorrow morning and open a PR. It should make it to 1.10 coroutines with Kotlin 2.1

qwwdfsad added a commit that referenced this issue Dec 18, 2024
…d continuations

There was a one codepath not covered by undispatched thread local cleanup procedure: when a custom ContinuationInterceptor is used and the scoped coroutine (i.e. withContext) is completed in-place without suspensions.

Fixed with the introduction of the corresponding machinery for ScopeCoroutine

Fixes #4296
@pyricau
Copy link
Author

pyricau commented Dec 21, 2024

Thank you for fixing!!

@liutikas
Copy link

liutikas commented Dec 21, 2024

Could I request a backport for 1.8.x, we'd like to pick up this fix in compose without forcing our clients to kotlin 2.0?

@dkhalanskyjb
Copy link
Collaborator

@liutikas, typically, we don't publish any backports for kotlinx.coroutines. From the original submission, it looks like you do have a workaround, so this is not critical, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants