Make GTFS 4.0.7 documentation¶
Module constants¶
Constants used throughout the package.
- make_gtfs.constants.BUFFER = 10¶
Meters to buffer trip paths to find stops
- make_gtfs.constants.SEP = '-'¶
Character to separate different chunks within an ID
- make_gtfs.constants.STOP_OFFSET = 5¶
Meters to offset stops from route shapes
Module protofeed¶
- class make_gtfs.protofeed.ProtoFeed(meta: DataFrame, service_windows: DataFrame, shapes: GeoDataFrame, frequencies: DataFrame, stops: DataFrame | None = None, speed_zones: GeoDataFrame | None = None)¶
Bases:
object
A ProtoFeed instance holds the source data from which to build a GTFS feed. The most common way to build is from files via the function
read_protofeed()
.- static clean_speed_zones(speed_zones: GeoDataFrame, service_area: GeoDataFrame, default_speed_zone_id: str = 'default', default_speed: float = inf) GeoDataFrame ¶
Clip the speed zones to the service area. The zone ID of the service area outside of the speed zones will be set to
default_speed_zone_id
and the speed there will be set todefault_speed
. Return the resulting service area of (Multi)Polygons, now partitioned into speed zones. The result is a GeoDataFrame with the columns ‘speed_zone_id’, ‘speed’, ‘geometry’.
- frequencies: DataFrame¶
- meta: DataFrame¶
- route_types() list[int] ¶
- service_windows: DataFrame¶
- shapes: GeoDataFrame¶
- speed_zones: GeoDataFrame | None = None¶
- stops: DataFrame | None = None¶
- make_gtfs.protofeed.SPEED_BY_RTYPE = {0: 11, 1: 30, 2: 45, 3: 22, 4: 22, 5: 13, 6: 20, 7: 18, 11: 22, 12: 65}¶
Default average speeds by route type in kilometers per hour
- make_gtfs.protofeed.read_protofeed(path: str | Path) ProtoFeed ¶
Read the data files at the given directory path (string or Path object) and build a ProtoFeed from them. Validate the resulting ProtoFeed. If invalid, raise a
ValueError
specifying the errors. Otherwise, return the resulting ProtoFeed.The data files needed to build a ProtoFeed are
meta.csv
(required). A CSV file containing network metadata. The CSV file contains the columnsagency_name
(required): string; the name of the transport agencyagency_url
(required): string; a fully qualified URL for the transport agencyagency_timezone
(required): string; timezone where the transit agency is located; timezone names never contain the space character but may contain an underscore; refer to http://en.wikipedia.org/wiki/List_of_tz_zones for a list of valid valuesstart_date
,end_date
(required): strings; the start and end dates for which all this network information is valid, formated as YYYYMMDD strings
service_windows.csv
(required). A CSV file containing service window information. A service window is a time interval and a set of days of the week during which all routes have constant service frequency, e.g. Saturday and Sunday 07:00 to 09:00. The CSV file contains the columnsservice_window_id
(required): string; a unique identifier for a service windowstart_time
,end_time
(required): strings; the start and end times of the service window in HH:MM:SS format where the hour is less than 24monday
,tuesday
,wednesday
,thursday
,friday
,saturday
,sunday
(required): integer 0 or 1; indicates whether the service is active on the given day (1) or not (0)
shapes.geojson
(required). A GeoJSON file representing shapes for all (route, direction 0 or 1, service window) combinations. The file comprises one feature collection of LineString features (in WGS84 coordinates), where each feature has the propertyshape_id
(required): a unique identifier of the shape
Each LineString should represent the run of one representive trip of a route. In particular, the LineString should not traverse the same section of road many times, unless you want a trip to actually do that.
frequencies.csv
(required). A CSV file containing route frequency information. The CSV file contains the columnsroute_short_name
(required): string; a unique short name for the route, e.g. ‘51X’route_long_name
(required): string; full name of the route that is more descriptive thanroute_short_name
route_type
(required): integer; the GTFS type of the route type;https://developers.google.com/transit/gtfs/reference/#routestxt
service_window_id
(required): string; a service window ID for the route taken from the fileservice_windows.csv
direction
(required): integer 0, 1, or 2; indicates whether the route travels in the direction of its shape (1), or in the reverse direction of its shape (0), or in both directions (2); in the latter case, trips will be created that travel in both directions along the route’s shape, each direction operating at the given frequency; otherwise, trips will be created that travel in only the given directionfrequency
(required): integer; the frequency of the route during the service window in vehicles per hour.shape_id
(required): string; a shape ID that is listed inshapes.geojson
and corresponds to the linestring of the (route, direction 0 or 1, service window) tuple.speed
(optional): float; the average speed of the route in kilometers per hour
Missing speed values will be filled with values from the dictionary
SPEED_BY_RTYPE
.stops.csv
(optional). A CSV file containing all the required and optional fields ofstops.txt
in the GTFSspeed_zones.geojson
(optional). A GeoJSON file of Polygons representing speed zones for routes. The file consists of one feature collection of Polygon features (in WGS84 coordinates), each with the propertiesspeed_zone_id
(required): string; a unique identifier of the zone polygon; can be re-used if the polygon is re-usedroute_type
(required): integer; a GTFS route type to which the zone appliesspeed
(required): positive float; the average speed in kilometers per hour of routes of that route type that travel within the zone; overrides route speeds infrequencies.csv
within the zone.
Module validators¶
ProtoFeed validators.
- make_gtfs.validators.check_frequencies(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.frequencies if it is valid. Otherwise, raise a Pandera SchemaError.
- make_gtfs.validators.check_meta(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.meta if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.
- make_gtfs.validators.check_service_windows(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.service_windows if it is valid. Otherwise, raise a Pandera SchemaError.
- make_gtfs.validators.check_shapes(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.shapes if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.
- make_gtfs.validators.check_speed_zones(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.shapes if it is valid. Otherwise, raise a ValueError or a Pandera SchemaError.
- make_gtfs.validators.check_stops(pfeed: ProtoFeed) DataFrame ¶
Return pfeed.stops if it is valid. Otherwise, raise a Pandera SchemaError.
- make_gtfs.validators.crosscheck_ids(id_col: str, src_table: DataFrame, src_table_name: str, tgt_table: DataFrame, tgt_table_name: str) None ¶
Check that the set of id_col values in the given source table are a subset of those in the target table. Raise a ValueError if not; otherwise do nothing.
- make_gtfs.validators.validate(pfeed)¶
Return the given ProtoFeed if it is valid. Otherwise, raise a ValueError after encountering the first error.
Module main¶
This module contains the main logic.
- make_gtfs.main.buffer_side(linestring: LineString, side: str, buffer: float) Polygon ¶
Given a Shapely LineString, a side of the LineString (string; ‘left’ = left hand side of LineString, ‘right’ = right hand side of LineString, or ‘both’ = both sides), and a buffer size in the distance units of the LineString, buffer the LineString on the given side by the buffer size and return the resulting Shapely polygon.
- make_gtfs.main.build_agency(pfeed: ProtoFeed) DataFrame ¶
Given a ProtoFeed, return a DataFrame representing
agency.txt
- make_gtfs.main.build_calendar_etc(pfeed: ProtoFeed) DataFrame ¶
Given a ProtoFeed, return a DataFrame representing
calendar.txt
and a dictionary of the form <service window ID> -> <service ID>, respectively.
- make_gtfs.main.build_feed(pfeed: ProtoFeed, buffer: float = 10, stop_offset: float = 5, num_stops_per_shape: int = 2, stop_spacing: float | None = None) Feed ¶
Convert the given ProtoFeed to a GTFS Feed with meter distance units. Look at a distance of
buffer
meters from route shapes to find stops. If no stops are given, then for each shape build stops offset bystop_offset
meters on the traffic side of each built shape. Maken
equally spaced stops for each shape, then offset them. But ifstop_spacing
is given, then instead space the stops everystop_spacing
meters along each shape, then offset them. If a shape has an antiparallel clone, then only build stops on the shape, not its clone, thereby avoiding unnecessary stops. Output distance units will be in meters
- make_gtfs.main.build_routes(pfeed: ProtoFeed) DataFrame ¶
Given a ProtoFeed, return a DataFrame representing
routes.txt
.
- make_gtfs.main.build_shapes(pfeed: ProtoFeed) DataFrame ¶
Given a ProtoFeed, return DataFrame representing
shapes.txt
. Only use shape IDs that occur in bothpfeed.shapes
andpfeed.frequencies
. Create reversed shapes where routes traverse shapes in both directions.
- make_gtfs.main.build_stop_times(pfeed: ProtoFeed, routes: DataFrame, shapes: DataFrame, stops: DataFrame, trips: DataFrame, buffer: float = 10) DataFrame ¶
Given a ProtoFeed and its corresponding routes, shapes, stops, and trips DataFrames, return a DataFrame representing
stop_times.txt
. Includes the optionalshape_dist_traveled
column rounded to the nearest meter. Does not make stop times for trips with no stops within the buffer.
- make_gtfs.main.build_stop_times_for_trip(trip_id: str, stops_g_nearby: GeoDataFrame, shape_id: str, linestring: LineString, speed_zones: GeoDataFrame, route_type: int, shape_point_speeds: GeoDataFrame, default_speed: float, start_time: int) DataFrame ¶
Build stop times for the given trip ID.
Assume all coordinates are in meters, distances are in meters, and speeds are in kilometers per hour.
- make_gtfs.main.build_stops(pfeed: ProtoFeed, shapes: DataFrame | None = None, offset: float = 5, n: int = 2, spacing: float | None = None) DataFrame ¶
Given a ProtoFeed, return a DataFrame representing
stops.txt
. Ifpfeed.stops
is notNone
, then return that. Otherwise, require built shapes output bybuild_shapes()
. In that case, for each shape, buildn
equally spaced stops offset byoffset
meters on the traffic side of the shape. Ifspacing
is notNone
, then ignoren
and for each shape, create offset stops spacedspacing
meters apart (when projected onto the shape), except allow the spacing of the last two stops to be < 2 * spacing.When building stops, drop stops with duplicate geometries within a shape to gracefully handle loop shapes. Also, if a shape has an antiparallel clone, then only build stops for the shape, not its clone.
- make_gtfs.main.build_trips(pfeed: ProtoFeed, routes: DataFrame, service_by_window: dict) DataFrame ¶
Given a ProtoFeed and its corresponding routes and service-by-window, return a DataFrame representing
trips.txt
. Trip IDs encode route, direction, and service window information to make it easy to compute stop times later.
- make_gtfs.main.compute_shape_point_speeds(shapes: DataFrame, speed_zones: GeoDataFrame, route_type: int, *, use_utm: bool = False) GeoDataFrame ¶
Intersect the given GTFS shapes table with the given speed zones subset to the given route type to assign speeds to each shape point. Also add points and speeds where the speed zones intersect the linestrings corresponding to the shapes (the boundary points). Return a GeoDataFrame with the columns
shape_id
shape_dist_traveled: in meters
shape_pt_sequence: -1 if a boundary point
geometry: Point object representing shape point
route_type:
route_type
speed: in kilometers per hour
speed_zone_id: speed zone ID
Use UTM coordinates if specified. Return an empty GeoDataFrame if there are no speed zones for the given route type.
- make_gtfs.main.get_duration(timestr1: str, timestr2: str, units='s') float ¶
Return the duration of the time period between the first and second time string in the given units. Allowable units are ‘s’ (seconds), ‘min’ (minutes), ‘h’ (hours). Assume
timestr1 < timestr2
.
- make_gtfs.main.get_stops_nearby(geo_stops: GeoDataFrame, linestring: LineString, side: str, buffer: float = 10) GeoDataFrame ¶
Given a GeoDataFrame of stops, a Shapely LineString in the same coordinate system, a side of the LineString (string; ‘left’ = left hand side of LineString, ‘right’ = right hand side of LineString, or ‘both’ = both sides), and a buffer in the distance units of that coordinate system, do the following. Return a GeoDataFrame of all the stops that lie within
buffer
distance units to theside
of the LineString.
- make_gtfs.main.make_stop_points(lines: GeoDataFrame, id_col: str, offset: float, side: str, n: int = 2, spacing: float | None = None) GeoDataFrame ¶
Given a GeoDataFrame of lines with at least the columns
id_col
: a unique identifier of the line'geometry'
: a LineString in a meters-based CRS
return a GeoDataFrame containing
n
equally spaced points for each line, offset byoffset
meters on theside
side (‘left’ or ‘right’) of the line. Setoffset = 0
to make points on each line. The lines represent route shapes and the points represent stops.If
spacing
is notNone
, then ignoren
and for each line, sample points along the line spacedspacing
meters apart from start to end, except allow the spacing of the last two points to be < 2 * spacing. Then offset these points according tooffset
andside
.Drop duplicate point geometries, which can occur in loops.
The resulting GeoDataFrame has the columns
'point_id'
: a unique identifier of the pointid_col
: ID of the line the point corresponds to'shape_dist_traveled'
: how far along the line the point lies; in meters'geometry'
: a Point in the same CRS as thelines
GeoDataFrame
Helper function for generating stops along trip shapes.
Module cli¶
The command-line-interface module.