Module constants

Constants used throughout the package.

make_gtfs.constants.BUFFER = 10

Meters to buffer trip paths to find stops

make_gtfs.constants.SEP = '-'

Character to separate different chunks within an ID

make_gtfs.constants.STOP_OFFSET = 5

Meters to offset stops from route shapes

Module protofeed

class make_gtfs.protofeed.ProtoFeed(meta: DataFrame, service_windows: DataFrame, shapes: GeoDataFrame, frequencies: DataFrame, stops: DataFrame | None = None, speed_zones: GeoDataFrame | None = None)

Bases: object

A ProtoFeed instance holds the source data from which to build a GTFS feed. The most common way to build is from files via the function read_protofeed().

static clean_speed_zones(speed_zones: GeoDataFrame, service_area: GeoDataFrame, default_speed_zone_id: str = 'default', default_speed: float = inf) GeoDataFrame

Clip the speed zones to the service area. The zone ID of the service area outside of the speed zones will be set to default_speed_zone_id and the speed there will be set to default_speed. Return the resulting service area of (Multi)Polygons, now partitioned into speed zones. The result is a GeoDataFrame with the columns ‘speed_zone_id’, ‘speed’, ‘geometry’.

copy() ProtoFeed

Return a copy of this ProtoFeed, that is, a feed with all the same attributes.

frequencies: DataFrame
meta: DataFrame
route_types() list[int]
service_windows: DataFrame
shapes: GeoDataFrame
speed_zones: GeoDataFrame | None = None
stops: DataFrame | None = None
make_gtfs.protofeed.SPEED_BY_RTYPE = {0: 11, 1: 30, 2: 45, 3: 22, 4: 22, 5: 13, 6: 20, 7: 18, 11: 22, 12: 65}

Default average speeds by route type in kilometers per hour

make_gtfs.protofeed.read_protofeed(path: str | Path) ProtoFeed

Read the data files at the given directory path (string or Path object) and build a ProtoFeed from them. Validate the resulting ProtoFeed. If invalid, raise a ValueError specifying the errors. Otherwise, return the resulting ProtoFeed.

The data files needed to build a ProtoFeed are

  • meta.csv (required). A CSV file containing network metadata. The CSV file contains the columns

    • agency_name (required): string; the name of the transport agency

    • agency_url (required): string; a fully qualified URL for the transport agency

    • agency_timezone (required): string; timezone where the transit agency is located; timezone names never contain the space character but may contain an underscore; refer to for a list of valid values

    • start_date, end_date (required): strings; the start and end dates for which all this network information is valid, formated as YYYYMMDD strings

  • service_windows.csv (required). A CSV file containing service window information. A service window is a time interval and a set of days of the week during which all routes have constant service frequency, e.g. Saturday and Sunday 07:00 to 09:00. The CSV file contains the columns

    • service_window_id (required): string; a unique identifier for a service window

    • start_time, end_time (required): strings; the start and end times of the service window in HH:MM:SS format where the hour is less than 24

    • monday, tuesday, wednesday, thursday, friday, saturday, sunday (required): integer 0 or 1; indicates whether the service is active on the given day (1) or not (0)

  • shapes.geojson (required). A GeoJSON file representing shapes for all (route, direction 0 or 1, service window) combinations. The file comprises one feature collection of LineString features (in WGS84 coordinates), where each feature has the property

    • shape_id (required): a unique identifier of the shape

    Each LineString should represent the run of one representive trip of a route. In particular, the LineString should not traverse the same section of road many times, unless you want a trip to actually do that.

  • frequencies.csv (required). A CSV file containing route frequency information. The CSV file contains the columns

    • route_short_name (required): string; a unique short name for the route, e.g. ‘51X’

    • route_long_name (required): string; full name of the route that is more descriptive than route_short_name

    • route_type (required): integer; the GTFS type of the route type;

    • service_window_id (required): string; a service window ID for the route taken from the file service_windows.csv

    • direction (required): integer 0, 1, or 2; indicates whether the route travels in the direction of its shape (1), or in the reverse direction of its shape (0), or in both directions (2); in the latter case, trips will be created that travel in both directions along the route’s shape, each direction operating at the given frequency; otherwise, trips will be created that travel in only the given direction

    • frequency (required): integer; the frequency of the route during the service window in vehicles per hour.

    • shape_id (required): string; a shape ID that is listed in shapes.geojson and corresponds to the linestring of the (route, direction 0 or 1, service window) tuple.

    • speed (optional): float; the average speed of the route in kilometers per hour

    Missing speed values will be filled with values from the dictionary SPEED_BY_RTYPE.

  • stops.csv (optional). A CSV file containing all the required and optional fields of stops.txt in the GTFS

  • speed_zones.geojson (optional). A GeoJSON file of Polygons representing speed zones for routes. The file consists of one feature collection of Polygon features (in WGS84 coordinates), each with the properties

    • speed_zone_id (required): string; a unique identifier of the zone polygon; can be re-used if the polygon is re-used

    • route_type (required): integer; a GTFS route type to which the zone applies

    • speed (required): positive float; the average speed in kilometers per hour of routes of that route type that travel within the zone; overrides route speeds in frequencies.csv within the zone.

Module validators

ProtoFeed validators.

make_gtfs.validators.check_frequencies(pfeed: ProtoFeed) DataFrame

Return pfeed.frequencies if it is valid. Otherwise, raise a Pandera SchemaError.

make_gtfs.validators.check_meta(pfeed: ProtoFeed) DataFrame

Return pfeed.meta if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.

make_gtfs.validators.check_service_windows(pfeed: ProtoFeed) DataFrame

Return pfeed.service_windows if it is valid. Otherwise, raise a Pandera SchemaError.

make_gtfs.validators.check_shapes(pfeed: ProtoFeed) DataFrame

Return pfeed.shapes if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.

make_gtfs.validators.check_speed_zones(pfeed: ProtoFeed) DataFrame

Return pfeed.shapes if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.

make_gtfs.validators.check_stops(pfeed: ProtoFeed) DataFrame

Return pfeed.stops if it is valid. Otherwise, raise a Pandera SchemaError.

make_gtfs.validators.crosscheck_ids(id_col: str, src_table: DataFrame, src_table_name: str, tgt_table: DataFrame, tgt_table_name: str) None

Check that the set of id_col values in the given source table are a subset of those in the target table. Raise a ValueError if not; otherwise do nothing.


Return the given ProtoFeed if it is valid. Otherwise, raise a ValueError after encountering the first error.

Module main

This module contains the main logic.

make_gtfs.main.buffer_side(linestring: LineString, side: str, buffer: float) Polygon

Given a Shapely LineString, a side of the LineString (string; ‘left’ = left hand side of LineString, ‘right’ = right hand side of LineString, or ‘both’ = both sides), and a buffer size in the distance units of the LineString, buffer the LineString on the given side by the buffer size and return the resulting Shapely polygon.

make_gtfs.main.build_agency(pfeed: ProtoFeed) DataFrame

Given a ProtoFeed, return a DataFrame representing agency.txt

make_gtfs.main.build_calendar_etc(pfeed: ProtoFeed) DataFrame

Given a ProtoFeed, return a DataFrame representing calendar.txt and a dictionary of the form <service window ID> -> <service ID>, respectively.

make_gtfs.main.build_feed(pfeed: ProtoFeed, buffer: float = 10, stop_offset: float = 5, num_stops_per_shape: int = 2, stop_spacing: float | None = None) Feed

Convert the given ProtoFeed to a GTFS Feed with meter distance units. Look at a distance of buffer meters from route shapes to find stops. If no stops are given, then for each shape build stops offset by stop_offset meters on the traffic side of each built shape. Make n equally spaced stops for each shape, then offset them. But if stop_spacing is given, then instead space the stops every stop_spacing meters along each shape, then offset them. If a shape has an antiparallel clone, then only build stops on the shape, not its clone, thereby avoiding unnecessary stops. Output distance units will be in meters

make_gtfs.main.build_routes(pfeed: ProtoFeed) DataFrame

Given a ProtoFeed, return a DataFrame representing routes.txt.

make_gtfs.main.build_shapes(pfeed: ProtoFeed) DataFrame

Given a ProtoFeed, return DataFrame representing shapes.txt. Only use shape IDs that occur in both pfeed.shapes and pfeed.frequencies. Create reversed shapes where routes traverse shapes in both directions.

make_gtfs.main.build_stop_times(pfeed: ProtoFeed, routes: DataFrame, shapes: DataFrame, stops: DataFrame, trips: DataFrame, buffer: float = 10) DataFrame

Given a ProtoFeed and its corresponding routes, shapes, stops, and trips DataFrames, return a DataFrame representing stop_times.txt. Includes the optional shape_dist_traveled column rounded to the nearest meter. Does not make stop times for trips with no stops within the buffer.

make_gtfs.main.build_stop_times_for_trip(trip_id: str, stops_g_nearby: GeoDataFrame, shape_id: str, linestring: LineString, speed_zones: GeoDataFrame, route_type: int, shape_point_speeds: GeoDataFrame, default_speed: float, start_time: int) DataFrame

Build stop times for the given trip ID.

Assume all coordinates are in meters, distances are in meters, and speeds are in kilometers per hour.

make_gtfs.main.build_stops(pfeed: ProtoFeed, shapes: DataFrame | None = None, offset: float = 5, n: int = 2, spacing: float | None = None) DataFrame

Given a ProtoFeed, return a DataFrame representing stops.txt. If pfeed.stops is not None, then return that. Otherwise, require built shapes output by build_shapes(). In that case, for each shape, build n equally spaced stops offset by offset meters on the traffic side of the shape. If spacing is not None, then ignore n and for each shape, create offset stops spaced spacing meters apart (when projected onto the shape), except allow the spacing of the last two stops to be < 2 * spacing.

When building stops, drop stops with duplicate geometries within a shape to gracefully handle loop shapes. Also, if a shape has an antiparallel clone, then only build stops for the shape, not its clone.

make_gtfs.main.build_trips(pfeed: ProtoFeed, routes: DataFrame, service_by_window: dict) DataFrame

Given a ProtoFeed and its corresponding routes and service-by-window, return a DataFrame representing trips.txt. Trip IDs encode route, direction, and service window information to make it easy to compute stop times later.

make_gtfs.main.compute_shape_point_speeds(shapes: DataFrame, speed_zones: GeoDataFrame, route_type: int, *, use_utm: bool = False) GeoDataFrame

Intersect the given GTFS shapes table with the given speed zones subset to the given route type to assign speeds to each shape point. Also add points and speeds where the speed zones intersect the linestrings corresponding to the shapes (the boundary points). Return a GeoDataFrame with the columns

  • shape_id

  • shape_dist_traveled: in meters

  • shape_pt_sequence: -1 if a boundary point

  • geometry: Point object representing shape point

  • route_type: route_type

  • speed: in kilometers per hour

  • speed_zone_id: speed zone ID

Use UTM coordinates if specified. Return an empty GeoDataFrame if there are no speed zones for the given route type.

make_gtfs.main.get_duration(timestr1: str, timestr2: str, units='s') float

Return the duration of the time period between the first and second time string in the given units. Allowable units are ‘s’ (seconds), ‘min’ (minutes), ‘h’ (hours). Assume timestr1 < timestr2.

make_gtfs.main.get_stops_nearby(geo_stops: GeoDataFrame, linestring: LineString, side: str, buffer: float = 10) GeoDataFrame

Given a GeoDataFrame of stops, a Shapely LineString in the same coordinate system, a side of the LineString (string; ‘left’ = left hand side of LineString, ‘right’ = right hand side of LineString, or ‘both’ = both sides), and a buffer in the distance units of that coordinate system, do the following. Return a GeoDataFrame of all the stops that lie within buffer distance units to the side of the LineString.

make_gtfs.main.make_stop_points(lines: GeoDataFrame, id_col: str, offset: float, side: str, n: int = 2, spacing: float | None = None) GeoDataFrame

Given a GeoDataFrame of lines with at least the columns

  • id_col: a unique identifier of the line

  • 'geometry': a LineString in a meters-based CRS

return a GeoDataFrame containing n equally spaced points for each line, offset by offset meters on the side side (‘left’ or ‘right’) of the line. Set offset = 0 to make points on each line. The lines represent route shapes and the points represent stops.

If spacing is not None, then ignore n and for each line, sample points along the line spaced spacing meters apart from start to end, except allow the spacing of the last two points to be < 2 * spacing. Then offset these points according to offset and side.

Drop duplicate point geometries, which can occur in loops.

The resulting GeoDataFrame has the columns

  • 'point_id': a unique identifier of the point

  • id_col: ID of the line the point corresponds to

  • 'shape_dist_traveled': how far along the line the point lies; in meters

  • 'geometry': a Point in the same CRS as the lines GeoDataFrame

Helper function for generating stops along trip shapes.

Module cli

The command-line-interface module.

