Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed Best Practice: always including trip_id in TripDescriptor for SCHEDULED trips #465

Open
isabelle-dr opened this issue May 24, 2024 · 8 comments
Labels
Change: Best Practice Changes focusing on recommendations for optimal use of the specification. GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Status: Discussion Issues and Pull Requests that are currently being discussed and reviewed by the community.

Comments

@isabelle-dr
Copy link
Collaborator

isabelle-dr commented May 24, 2024

Context

This issue is part of an effort to bring one of the outstanding issues we've identified from the Best Practices repos.

Issue

In the Realtime spec, producers can identify trips by either 1) having a trip_id that corresponds to the trip_id in GTFS Schedule, or 2) by including all of route_id, direction_id, start_date, and start_time instead.

However, based on conversation in #gtfs-realtime (you can join the Slack here), option 1 (using trip_id) is easiest and most commonly used by consumers. Option 2 causes headaches and sometimes isn't supported by consumers at all. As a result, it would make sense to recommend that producers use trip_id in all cases.

Proposed solution

Add a mention in the TripDescriptor trip_id description that for SCHEDULED trips that are not frequency-based, the identification of the trip should be done via trip_id.

Tagging folks involved on slack @leonardehrenfried @e-lo @lauriemerrell @gcamp @doconnoronca @willcanderson

@isabelle-dr isabelle-dr added GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Change: Best Practice Changes focusing on recommendations for optimal use of the specification. labels May 24, 2024
@doconnoronca
Copy link
Contributor

TransSee requires trip_id, but I haven't seen a feed without it, even for unscheduled trips. TransSee also benefits significantly from having route_id including.

@leonardehrenfried
Copy link
Contributor

@doconnoronca Doesn't the static GTFS contain the relationship from trip to route?

@doconnoronca
Copy link
Contributor

@doconnoronca Doesn't the static GTFS contain the relationship from trip to route?

Yes, but it is an extra query to look it up. It is also needed for added trips.

@leonardehrenfried
Copy link
Contributor

What if the route_id in a SCHEDULED trip update doesn't match what's in the GTFS?

@doconnoronca
Copy link
Contributor

What if the route_id in a SCHEDULED trip update doesn't match what's in the GTFS?

If that happens it's probably a symptom of bigger problems. The increased performance it worth the risk.

@willcanderson
Copy link

willcanderson commented May 24, 2024

I think this makes sense as a best practice. If most consumers only support matching on trip_id, or greatly prefer that, it is important for producers to know that.

I will describe below why this proposed best practice has been difficult for my agency, and what we are doing about it. I don't think the difficulty means we shouldn't make this a best practice; I'm just noting some implications.

Why this proposed best practice can be difficult

The proposed best practice of including trip_id in TripDescriptor for SCHEDULED trips has a tricky interaction with the GTFS Schedule best practice of including both the current and upcoming schedule in a single GTFS Schedule file.

Each trip_id value in trips.txt must be unique. At my agency it is not feasible to generate new trip_id values for minor schedule revisions. This means we must modify the trip_id values in GTFS Schedule in order to make them unique when we merge the current and upcoming schedules. Modifying the values makes them fall out of sync with the trip_id values in our realtime data sources.

Options for data producers

At my agency, we are currently writing some code that ingests the data coming out of our trackers and rewrites trip_id values to match the ones in GTFS Schedule. Other agencies have described doing something similar.

But if the industry is working toward a future where having matching trip_id values across GTFS Schedule and Realtime is easy for producers, even if they are merging two schedules for GTFS Static and even if their scheduling and tracking tools are from different vendors, it may be worth revisiting the conversation about whether to make the primary key for trips.txt (service_id, trip_id) instead of (trip_id).

Alternatively, I wonder if getting realtime tracker systems to use the Operational Data Standard for their schedule information would help--presumably if ODS is a superset of GTFS, trip_id values must match. But I haven't wrapped my head around the question of whether ODS would include both current and upcoming schedule info in a way that would correspond to the public GTFS Schedule file.

@isabelle-dr isabelle-dr added the Status: Discussion Issues and Pull Requests that are currently being discussed and reviewed by the community. label Jul 18, 2024
@isabelle-dr
Copy link
Collaborator Author

This issue could be of use for this discussion: #462

@skinkie
Copy link
Contributor

skinkie commented Aug 1, 2024

In addition to what @willcanderson wrote, I think what is fundamentally missing is the relationship between GTFS Static version and GTFS-RT. This is #434

This is an everything is connected situation where a broader vision is important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Change: Best Practice Changes focusing on recommendations for optimal use of the specification. GTFS Realtime Issues and Pull Requests that focus on GTFS Realtime Status: Discussion Issues and Pull Requests that are currently being discussed and reviewed by the community.
Projects
None yet
Development

No branches or pull requests

5 participants