Skip to content

GTFSManager

GTFS feed manager for validation, cleaning, analysis, modification, and visualization.

This class loads a General Transit Feed Specification (GTFS) dataset from a directory containing standard GTFS text files and provides features to:

  • Check consistency between GTFS tables
  • Clean unused or inconsistent entities
  • Filter services or agencies and add idle times at stops or terminals
  • Trim trip shapes to match terminal locations
  • Export cleaned GTFS data
  • Compute transit indicators and summary statistics
  • Generate interactive maps and textual summary reports
Notes
  • The calendar_dates.txt file is not supported. Service exceptions are therefore ignored.

Attributes:

Name Type Description
gtfs_datafolder Path

Absolute path to the GTFS data folder.

agency DataFrame

Agency table.

routes DataFrame

Routes table.

trips DataFrame

Trips table.

stop_times DataFrame

Stop times table.

stops GeoDataFrame

Stops with point geometries.

shapes GeoDataFrame

Shapes with LineString geometries.

calendar DataFrame

Service calendar.

frequencies DataFrame

Trip frequencies.

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.check_all()
True
>>> manager.show_general_info()

Attributes

agency property

GTFS agency table.

Returns:

Type Description
DataFrame

pd.DataFrame: Agency data as defined in agency.txt.

calendar property

GTFS calendar table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from calendar.txt.

frequencies property

GTFS frequencies table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from frequencies.txt.

gtfs_datafolder property writable

Absolute path to the GTFS data folder.

Returns:

Name Type Description
str str

Absolute path to the GTFS data folder.

Raises:

Type Description
ValueError

If the GTFS data folder has not been initialized.

routes property

GTFS routes table.

Returns:

Type Description
DataFrame

pd.DataFrame: Routes data from routes.txt.

shapes property

GTFS shapes table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from shapes.txt.

stop_times property

GTFS stop_times table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from stop_times.txt.

stops property

GTFS stops table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from stops.txt.

trips property

GTFS trips table.

Returns:

Type Description
DataFrame

pd.DataFrame: Trips data from trips.txt.

Functions

__init__(gtfs_datafolder)

Initialize a GTFSManager instance by loading all required GTFS tables.

This constructor validates the GTFS directory structure and loads all mandatory GTFS files into memory as pandas or GeoPandas DataFrames.

Parameters:

Name Type Description Default
gtfs_datafolder str

Path to the directory containing the GTFS text files.

required

Raises:

Type Description
FileNotFoundError

If the provided directory does not exist.

ValueError

If required GTFS files are missing or invalid.

add_idle_time_stops(mean_idle_time_s, std_idle_time_s)

Add stochastic idle time at intermediate stops of each trip.

For each intermediate stop (excluding the first and last stop), a random idle time is drawn from a normal distribution defined by mean_idle_time_s and std_idle_time_s. The idle time is added to the departure time, increasing dwell time, and all subsequent stop times are shifted accordingly.

Idle times are clipped to non-negative values.

Parameters:

Name Type Description Default
mean_idle_time_s float

Mean idle time at stops, in seconds.

required
std_idle_time_s float

Standard deviation of idle time, in seconds.

required

Returns:

Type Description
None

None

Notes
  • Arrival and departure times are converted to pandas datetime if needed.
  • Trips with fewer than three stops are ignored.
  • This operation modifies self.stop_times in place.

add_idle_time_terminals(mean_idle_time_s, std_idle_time_s)

Add stochastic idle time at the terminals of each trip.

For each trip, a random idle time is drawn from a normal distribution defined by mean_idle_time_s and std_idle_time_s. The sampled idle time is split evenly between the first and last stops of the trip. All subsequent stop times are shifted accordingly to preserve temporal consistency.

Idle times are clipped to non-negative values.

Parameters:

Name Type Description Default
mean_idle_time_s float

Mean idle time at terminals, in seconds.

required
std_idle_time_s float

Standard deviation of idle time, in seconds.

required

Returns:

Type Description
None

None

Notes
  • Arrival and departure times are converted to pandas datetime if needed.
  • This operation modifies self.stop_times in place.

ave_distance_between_stops(trip_id, correct_stop_loc=True)

Calculate the average distance between consecutive stops along a trip.

Distances are computed along the trip shape. Optionally, stop locations can be projected onto the closest point along the shape to improve alignment.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required
correct_stop_loc bool

Whether to project stop locations onto the closest point along the trip shape. Defaults to True.

True

Returns:

Name Type Description
float float

Average distance between stops in kilometers.

Examples:

>>> manager.ave_distance_between_stops("TRIP_001")
0.45

ave_distance_between_stops_all(correct_stop_loc=True)

Compute the average distance between stops for all trips.

Parameters:

Name Type Description Default
correct_stop_loc bool

Whether to project stop locations onto trip shapes. Defaults to True.

True

Returns:

Type Description
float

pd.DataFrame: Trips table with average inter-stop distances and number of stops.

Examples:

>>> manager.ave_distance_between_stops_all().head()

bounding_box()

Compute the geographic bounding box of the GTFS feed.

The bounding box is derived from all route shapes.

Returns:

Type Description
box

shapely.geometry.box: Bounding box enclosing all shapes.

Examples:

>>> manager.bounding_box()

check_agency()

Check that each agency is associated with at least one route.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.check_agency()
False

check_all()

Run all GTFS data consistency checks.

This method sequentially calls all individual validation methods and reports whether the GTFS feed is globally consistent.

Returns:

Name Type Description
bool bool

True if no problems are found, False otherwise.

Examples:

>>> manager.check_all()
True

check_calendar()

Check that each service is associated with at least one trip.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_calendar()
False

check_frequencies()

Check that each frequency entry is associated with an existing trip.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_frequencies()
False

check_routes()

Check that each route is associated with an agency and at least one trip.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_routes()
False

check_shapes()

Check that each shape is associated with at least one trip.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_shapes()
False

check_stop_times()

Check that each stop time references valid stops and trips.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_stop_times()
False

check_stops()

Check that each stop is associated with at least one stop time.

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Examples:

>>> manager.check_stops()
False

check_trips()

Check that each trip is associated with all required GTFS entities.

Each trip must reference: - A valid route - A valid service - Stop times - A shape - A frequency entry

Returns:

Name Type Description
bool bool

True if a data consistency problem is detected, False otherwise.

Notes

Trips without frequency definitions may cause issues for headway-based services.

Examples:

>>> manager.check_trips()
False

clean_all()

Execute all GTFS cleaning routines until the dataset becomes consistent.

This method repeatedly applies internal cleaning functions to remove unused or inconsistent entities (agencies, routes, trips, stops, stop times, frequencies, shapes, and services) until all data consistency checks pass.

The cleaning process is silent during iterations and stops as soon as the GTFS feed satisfies all validation rules.

Returns:

Type Description
None

None

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.clean_all()

export_to_csv(output_folder)

Export the current GTFS dataset to standard GTFS CSV files.

This method writes all GTFS tables (agency, routes, trips, stops, stop_times, calendar, frequencies, and shapes) to disk using the official GTFS text file format. Time columns are converted back to HH:MM:SS strings, calendar booleans are exported as 0/1, and geometries are properly serialized.

Parameters:

Name Type Description Default
output_folder str

Path to the directory where GTFS CSV files will be written. The directory is created if it does not already exist.

required

Returns:

Type Description
None

None

filter_agency(agency_id, clean_all=True)

Remove all GTFS data associated with a specific agency.

This method drops all routes belonging to the given agency_id. Optionally, it then cleans the entire dataset to remove any orphaned or unreferenced entities created by this filtering operation.

Parameters:

Name Type Description Default
agency_id str

Identifier of the agency to remove (as defined in agency.txt).

required
clean_all bool

Whether to clean the dataset after filtering to restore consistency. Defaults to True.

True

Returns:

Type Description
None

None

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.filter_agency("AGENCY_PARATRANSIT")

filter_services(service_id, clean_all=True)

Remove all GTFS data associated with a specific service.

This method drops all trips belonging to the given service_id. Optionally, it then cleans the entire dataset to remove any orphaned or unreferenced entities created by this filtering operation.

Parameters:

Name Type Description Default
service_id str

Identifier of the service to remove (as defined in calendar.txt).

required
clean_all bool

Whether to clean the dataset after filtering to restore consistency. Defaults to True.

True

Returns:

Type Description
None

None

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.filter_services("WEEKEND")

generate_network_map(filepath)

Generate an interactive HTML map visualizing the full GTFS network.

The map displays all route shapes as polylines and all stops as clustered markers. Each shape includes popup information such as associated trips, trip length, and number of stops.

Parameters:

Name Type Description Default
filepath str

Path where the generated HTML map will be saved.

required

Returns:

Type Description
None

None

generate_single_trip_map(trip_id, filepath='trip_map.html', projected=True)

Generate an interactive HTML map for a single trip.

The map displays the trip shape and all associated stops. Stops can optionally be projected onto the trip shape to visually align them with the route geometry.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip to visualize.

required
filepath str

Path where the HTML map will be saved. Defaults to "trip_map.html".

'trip_map.html'
projected bool

If True, stop locations are projected onto the trip shape geometry. Defaults to True.

True

Returns:

Type Description
None

None

Raises:

Type Description
IndexError

If the provided trip_id does not exist.

generate_summary_report(filepath)

Generate a text-based summary report of key GTFS statistics.

This report includes general feed information, spatial extent, temporal frequency consistency, trip statistics, and stop statistics. The output is written to a plain text file.

Parameters:

Name Type Description Default
filepath str

Path where the summary report will be saved.

required

Returns:

Type Description
None

None

get_shape(trip_id)

Return the geometric shape associated with a given trip.

This method retrieves the LineString geometry corresponding to the shape used by the specified trip.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required

Returns:

Type Description
LineString

shapely.geometry.LineString: Geometry of the trip shape.

Raises:

Type Description
IndexError

If the provided trip_id does not exist or has no associated shape.

get_stop_locations(trip_id)

Return the stop locations associated with a given trip.

This method retrieves the geometries of all stops served by the specified trip, ordered according to the stop sequence.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required

Returns:

Type Description
list

list[shapely.geometry.Point]: List of stop geometries corresponding to the trip.

Raises:

Type Description
IndexError

If the provided trip_id does not exist.

n_stops(trip_id)

Return the number of stops served by a trip.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required

Returns:

Name Type Description
int int

Number of stops for the given trip.

Examples:

>>> manager.n_stops("TRIP_001")
15

show_general_info()

Display a high-level summary of the GTFS feed.

This method prints aggregated information about the GTFS dataset, including the number of trips, routes, services, stops, stop times, frequencies, agencies, and shapes. It also provides basic consistency hints and temporal coverage information.

Returns:

Type Description
None

None

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.show_general_info()

simulation_area_km2()

Calculate the simulation area covered by the GTFS feed.

The area is computed from the bounding box of the shapes and returned in square kilometers.

Returns:

Name Type Description
float float

Simulation area in square kilometers.

Examples:

>>> manager.simulation_area_km2()
125.6

stop_frequencies()

Compute how often each stop is used across all trips.

Returns:

Type Description
int

pd.DataFrame: Table containing stop identifiers, geometries, and the number of times each stop appears in stop times.

Examples:

>>> manager.stop_frequencies().head()

stop_statistics()

Compute aggregated statistics related to stops.

Returns:

Name Type Description
dict dict

Dictionary containing statistics about the number of stops per trip and per route.

Examples:

>>> manager.stop_statistics()

trim_tripshapes_to_terminal_locations()

Trim trip shapes so that they start and end at the closest terminal stops.

This method modifies shape geometries by projecting the first and last stops of each trip onto the corresponding route shape and trimming the shape accordingly. Stop locations themselves are not altered.

The trimming is performed once per shape, assuming that all trips sharing the same shape follow the same terminal alignment.

Returns:

Type Description
None

None

Examples:

>>> manager = GTFSManager("data/gtfs")
>>> manager.trim_tripshapes_to_terminal_locations()

trip_duration_sec(trip_id)

Calculate the duration of a trip in seconds.

The duration is computed as the difference between the first departure time and the last arrival time.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required

Returns:

Name Type Description
float float

Trip duration in seconds.

Examples:

>>> manager.trip_duration_sec("TRIP_001")
1800.0

trip_length_km(trip_id, geodesic=True)

Calculate the length of a trip in kilometers.

The trip length is computed using the geometry of the shape associated with the given trip.

Parameters:

Name Type Description Default
trip_id str

Identifier of the trip.

required
geodesic bool

Whether to use geodesic distance. Currently unused. Defaults to True.

True

Returns:

Name Type Description
float float

Length of the trip in kilometers.

Examples:

>>> manager.trip_length_km("TRIP_001")
12.4

trip_length_km_all()

Compute the length of all trips.

Returns:

Type Description
float

pd.DataFrame: Trips table with an additional column containing trip lengths in kilometers.

Examples:

>>> manager.trip_length_km_all().head()

trip_statistics()

Compute aggregated statistics related to trips.

Returns:

Name Type Description
dict dict

Dictionary containing total, average, minimum, and maximum trip lengths, as well as trip-to-route ratio.

Examples:

>>> manager.trip_statistics()