GTFSManager¶
GTFS feed manager for validation, cleaning, analysis, modification, and visualization.
This class loads a General Transit Feed Specification (GTFS) dataset from a directory containing standard GTFS text files and provides features to:
- Check consistency between GTFS tables
- Clean unused or inconsistent entities
- Filter services or agencies and add idle times at stops or terminals
- Trim trip shapes to match terminal locations
- Export cleaned GTFS data
- Compute transit indicators and summary statistics
- Generate interactive maps and textual summary reports
Notes
- The
calendar_dates.txtfile is not supported. Service exceptions are therefore ignored.
Attributes:
| Name | Type | Description |
|---|---|---|
gtfs_datafolder |
Path
|
Absolute path to the GTFS data folder. |
agency |
DataFrame
|
Agency table. |
routes |
DataFrame
|
Routes table. |
trips |
DataFrame
|
Trips table. |
stop_times |
DataFrame
|
Stop times table. |
stops |
GeoDataFrame
|
Stops with point geometries. |
shapes |
GeoDataFrame
|
Shapes with LineString geometries. |
calendar |
DataFrame
|
Service calendar. |
frequencies |
DataFrame
|
Trip frequencies. |
Examples:
Attributes¶
agency
property
¶
GTFS agency table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Agency data as defined in |
calendar
property
¶
GTFS calendar table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
frequencies
property
¶
GTFS frequencies table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
gtfs_datafolder
property
writable
¶
Absolute path to the GTFS data folder.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Absolute path to the GTFS data folder. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the GTFS data folder has not been initialized. |
routes
property
¶
GTFS routes table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Routes data from |
shapes
property
¶
GTFS shapes table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
stop_times
property
¶
GTFS stop_times table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
stops
property
¶
GTFS stops table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
trips
property
¶
GTFS trips table.
Returns:
| Type | Description |
|---|---|
DataFrame
|
pd.DataFrame: Trips data from |
Functions¶
__init__(gtfs_datafolder)
¶
Initialize a GTFSManager instance by loading all required GTFS tables.
This constructor validates the GTFS directory structure and loads all mandatory GTFS files into memory as pandas or GeoPandas DataFrames.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gtfs_datafolder
|
str
|
Path to the directory containing the GTFS text files. |
required |
Raises:
| Type | Description |
|---|---|
FileNotFoundError
|
If the provided directory does not exist. |
ValueError
|
If required GTFS files are missing or invalid. |
add_idle_time_stops(mean_idle_time_s, std_idle_time_s)
¶
Add stochastic idle time at intermediate stops of each trip.
For each intermediate stop (excluding the first and last stop),
a random idle time is drawn from a normal distribution defined by
mean_idle_time_s and std_idle_time_s. The idle time is added to
the departure time, increasing dwell time, and all subsequent stop
times are shifted accordingly.
Idle times are clipped to non-negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mean_idle_time_s
|
float
|
Mean idle time at stops, in seconds. |
required |
std_idle_time_s
|
float
|
Standard deviation of idle time, in seconds. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Notes
- Arrival and departure times are converted to pandas datetime if needed.
- Trips with fewer than three stops are ignored.
- This operation modifies
self.stop_timesin place.
add_idle_time_terminals(mean_idle_time_s, std_idle_time_s)
¶
Add stochastic idle time at the terminals of each trip.
For each trip, a random idle time is drawn from a normal distribution
defined by mean_idle_time_s and std_idle_time_s. The sampled idle
time is split evenly between the first and last stops of the trip.
All subsequent stop times are shifted accordingly to preserve temporal
consistency.
Idle times are clipped to non-negative values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
mean_idle_time_s
|
float
|
Mean idle time at terminals, in seconds. |
required |
std_idle_time_s
|
float
|
Standard deviation of idle time, in seconds. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
Notes
- Arrival and departure times are converted to pandas datetime if needed.
- This operation modifies
self.stop_timesin place.
ave_distance_between_stops(trip_id, correct_stop_loc=True)
¶
Calculate the average distance between consecutive stops along a trip.
Distances are computed along the trip shape. Optionally, stop locations can be projected onto the closest point along the shape to improve alignment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip. |
required |
correct_stop_loc
|
bool
|
Whether to project stop locations onto the closest point along the trip shape. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Average distance between stops in kilometers. |
Examples:
ave_distance_between_stops_all(correct_stop_loc=True)
¶
Compute the average distance between stops for all trips.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
correct_stop_loc
|
bool
|
Whether to project stop locations onto trip shapes. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
float
|
pd.DataFrame: Trips table with average inter-stop distances and number of stops. |
Examples:
bounding_box()
¶
check_agency()
¶
check_all()
¶
check_calendar()
¶
check_frequencies()
¶
check_routes()
¶
check_shapes()
¶
check_stop_times()
¶
check_stops()
¶
check_trips()
¶
Check that each trip is associated with all required GTFS entities.
Each trip must reference: - A valid route - A valid service - Stop times - A shape - A frequency entry
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
True if a data consistency problem is detected, False otherwise. |
Notes
Trips without frequency definitions may cause issues for headway-based services.
Examples:
clean_all()
¶
Execute all GTFS cleaning routines until the dataset becomes consistent.
This method repeatedly applies internal cleaning functions to remove unused or inconsistent entities (agencies, routes, trips, stops, stop times, frequencies, shapes, and services) until all data consistency checks pass.
The cleaning process is silent during iterations and stops as soon as the GTFS feed satisfies all validation rules.
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
export_to_csv(output_folder)
¶
Export the current GTFS dataset to standard GTFS CSV files.
This method writes all GTFS tables (agency, routes, trips, stops, stop_times, calendar, frequencies, and shapes) to disk using the official GTFS text file format. Time columns are converted back to HH:MM:SS strings, calendar booleans are exported as 0/1, and geometries are properly serialized.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
output_folder
|
str
|
Path to the directory where GTFS CSV files will be written. The directory is created if it does not already exist. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
filter_agency(agency_id, clean_all=True)
¶
Remove all GTFS data associated with a specific agency.
This method drops all routes belonging to the given agency_id.
Optionally, it then cleans the entire dataset to remove any
orphaned or unreferenced entities created by this filtering
operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
agency_id
|
str
|
Identifier of the agency to remove
(as defined in |
required |
clean_all
|
bool
|
Whether to clean the dataset after filtering to restore consistency. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
filter_services(service_id, clean_all=True)
¶
Remove all GTFS data associated with a specific service.
This method drops all trips belonging to the given service_id.
Optionally, it then cleans the entire dataset to remove any
orphaned or unreferenced entities created by this filtering
operation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
service_id
|
str
|
Identifier of the service to remove
(as defined in |
required |
clean_all
|
bool
|
Whether to clean the dataset after filtering to restore consistency. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
generate_network_map(filepath)
¶
Generate an interactive HTML map visualizing the full GTFS network.
The map displays all route shapes as polylines and all stops as clustered markers. Each shape includes popup information such as associated trips, trip length, and number of stops.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path where the generated HTML map will be saved. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
generate_single_trip_map(trip_id, filepath='trip_map.html', projected=True)
¶
Generate an interactive HTML map for a single trip.
The map displays the trip shape and all associated stops. Stops can optionally be projected onto the trip shape to visually align them with the route geometry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip to visualize. |
required |
filepath
|
str
|
Path where the HTML map will be saved. Defaults to "trip_map.html". |
'trip_map.html'
|
projected
|
bool
|
If True, stop locations are projected onto the trip shape geometry. Defaults to True. |
True
|
Returns:
| Type | Description |
|---|---|
None
|
None |
Raises:
| Type | Description |
|---|---|
IndexError
|
If the provided trip_id does not exist. |
generate_summary_report(filepath)
¶
Generate a text-based summary report of key GTFS statistics.
This report includes general feed information, spatial extent, temporal frequency consistency, trip statistics, and stop statistics. The output is written to a plain text file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
filepath
|
str
|
Path where the summary report will be saved. |
required |
Returns:
| Type | Description |
|---|---|
None
|
None |
get_shape(trip_id)
¶
Return the geometric shape associated with a given trip.
This method retrieves the LineString geometry corresponding
to the shape used by the specified trip.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip. |
required |
Returns:
| Type | Description |
|---|---|
LineString
|
shapely.geometry.LineString: Geometry of the trip shape. |
Raises:
| Type | Description |
|---|---|
IndexError
|
If the provided |
get_stop_locations(trip_id)
¶
Return the stop locations associated with a given trip.
This method retrieves the geometries of all stops served by the specified trip, ordered according to the stop sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip. |
required |
Returns:
| Type | Description |
|---|---|
list
|
list[shapely.geometry.Point]: List of stop geometries corresponding to the trip. |
Raises:
| Type | Description |
|---|---|
IndexError
|
If the provided |
n_stops(trip_id)
¶
show_general_info()
¶
Display a high-level summary of the GTFS feed.
This method prints aggregated information about the GTFS dataset, including the number of trips, routes, services, stops, stop times, frequencies, agencies, and shapes. It also provides basic consistency hints and temporal coverage information.
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
simulation_area_km2()
¶
stop_frequencies()
¶
stop_statistics()
¶
trim_tripshapes_to_terminal_locations()
¶
Trim trip shapes so that they start and end at the closest terminal stops.
This method modifies shape geometries by projecting the first and last stops of each trip onto the corresponding route shape and trimming the shape accordingly. Stop locations themselves are not altered.
The trimming is performed once per shape, assuming that all trips sharing the same shape follow the same terminal alignment.
Returns:
| Type | Description |
|---|---|
None
|
None |
Examples:
trip_duration_sec(trip_id)
¶
Calculate the duration of a trip in seconds.
The duration is computed as the difference between the first departure time and the last arrival time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Trip duration in seconds. |
Examples:
trip_length_km(trip_id, geodesic=True)
¶
Calculate the length of a trip in kilometers.
The trip length is computed using the geometry of the shape associated with the given trip.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
trip_id
|
str
|
Identifier of the trip. |
required |
geodesic
|
bool
|
Whether to use geodesic distance. Currently unused. Defaults to True. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
float |
float
|
Length of the trip in kilometers. |
Examples: