You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Bug / Problem:
When distributed tracing is enabled in Ray, any REST calls made to the serve deployments or generic endpoints like /-/routes gets hanged and no response is returned. Details Below.
Expected Behavior
Calls to Serve REST Endpoints and other generic endpoints should not hang and return the functionality response.
Useful Information / Details
When distributed tracing is enabled, some of the tracing functionality is working fine and reporting to Tempo instances. It is just that the REST calls won't work. After debugging, we found that it is failing because there is no _ray_trace_ctx argument available on calling actor methods. Detailed stack trace below.
I had to do the changes shown in this PR to get it to work. We need your help if these are right set of changes to do. If not, please suggest/guide the right set of changes to do.
Exception in callback <function LongPollClient._process_update.<locals>.chained at 0x12e36caf0>
handle: <Handle LongPollClient._process_update.<locals>.chained>
Traceback (most recent call last):
File "uvloop/cbhandles.pyx", line 61, in uvloop.loop.Handle._run
File "/lib/python3.10/site-packages/ray/serve/_private/long_poll.py", line 172, in chained
callback(arg)
File "/lib/python3.10/site-packages/ray/serve/_private/router.py", line 420, in update_running_replicas
self._replica_scheduler.update_running_replicas(running_replicas)
File "/lib/python3.10/site-packages/ray/serve/_private/replica_scheduler/replica_scheduler.py", line 33, in update_running_replicas
return self.update_replicas(
File "/lib/python3.10/site-packages/ray/serve/_private/replica_scheduler/pow_2_scheduler.py", line 294, in update_replicas
r.push_proxy_handle(self._self_actor_handle)
File "/lib/python3.10/site-packages/ray/serve/_private/replica_scheduler/replica_wrapper.py", line 64, in push_proxy_handle
self._actor_handle.push_proxy_handle.remote(handle)
File "/lib/python3.10/site-packages/ray/actor.py", line 202, in remote
return self._remote(args, kwargs)
File "/lib/python3.10/site-packages/ray/_private/auto_init_hook.py", line 21, in auto_init_wrapper
return fn(*args, **kwargs)
File "/lib/python3.10/site-packages/ray/util/tracing/tracing_helper.py", line 445, in _start_span
return method(self, args, kwargs, *_args, **_kwargs)
File "/lib/python3.10/site-packages/ray/actor.py", line 345, in _remote
return invocation(args, kwargs)
File "/lib/python3.10/site-packages/ray/actor.py", line 326, in invocation
return actor._actor_method_call(
File "/lib/python3.10/site-packages/ray/actor.py", line 1452, in _actor_method_call
list_args = signature.flatten_args(function_signature, args, kwargs)
File "/lib/python3.10/site-packages/ray/_private/signature.py", line 110, in flatten_args
raise TypeError(str(exc)) from None
TypeError: got an unexpected keyword argument '_ray_trace_ctx'
Versions / Dependencies
uname
Linux x86_64
ray --version
ray, version 2.40.0
Reproduction script
Enable Distributed Tracing where Ray Serve deployments exists and try to call the endpoints for inference and the problem can be re-produced.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered:
venkatkalluru
added
bug
Something that is supposed to be working; but isn't
triage
Needs triage (eg: priority, bug/not-bug, and owning component)
labels
Jan 8, 2025
What happened + What you expected to happen
The Bug / Problem:
When distributed tracing is enabled in Ray, any REST calls made to the serve deployments or generic endpoints like
/-/routes
gets hanged and no response is returned. Details Below.Expected Behavior
Calls to Serve REST Endpoints and other generic endpoints should not hang and return the functionality response.
Useful Information / Details
When distributed tracing is enabled, some of the tracing functionality is working fine and reporting to Tempo instances. It is just that the REST calls won't work. After debugging, we found that it is failing because there is no
_ray_trace_ctx
argument available on calling actor methods. Detailed stack trace below.I had to do the changes shown in this PR to get it to work. We need your help if these are right set of changes to do. If not, please suggest/guide the right set of changes to do.
Versions / Dependencies
Reproduction script
Enable Distributed Tracing where Ray Serve deployments exists and try to call the endpoints for inference and the problem can be re-produced.
Issue Severity
Medium: It is a significant difficulty but I can work around it.
The text was updated successfully, but these errors were encountered: