Using separate cuda streams for one session #23319

cozeybozey · 2025-01-10T15:34:50Z

Describe the issue

I have multiple threads that are calling session.run on one session. I recently made it so I am using pinned memory and asynchronous mem copies, which is working great. However, to do this I am using separate cuda streams for the mem copies. I noticed that session.run does not work with these cuda streams. I can link one cuda stream to a session via the options, but I want to use multiple cuda streams and a different one for every run call. How can I achieve this? Or should I just use multiple sessions instead? But then I will have multiple instances of the same model in memory, which doesn't seem great.

To reproduce

--

Urgency

No response

Platform

Windows

OS Version

10

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

16.2

ONNX Runtime API

C++

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

No response

github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Jan 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using separate cuda streams for one session #23319

Using separate cuda streams for one session #23319

cozeybozey commented Jan 10, 2025

Using separate cuda streams for one session #23319

Using separate cuda streams for one session #23319

Comments

cozeybozey commented Jan 10, 2025

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version