Welcome to itinerum-tripkit’s documentation!¶
The itinerum-tripkit is a processing library for handling exports from the Itinerum data collection platform. The tripkit provides methods for handling and cleaning GPS .csv data, processing collected points to trip features, and writing GIS-compatible files for visualizing results.
The Quick Start shows the most common usage patterns of the tripkit library while the API section will detail how to interact with the library’s core methods directly.
Installation¶
The itinerum-tripkit can be installed as a library using pip can be included in your own project by cloning: http://github.com/TRIP-Lab/itinerum-tripkit and
copying the tripkit
directory as a submodule.
Virtual Environments¶
It is recommended to use venv to keep the tripkit dependency versions isolated from system packages.
Linux & MacOS¶
$ python3 -m venv tripkit-venv
$ source ./tripkit-ven/bin/activate
Windows¶
PS C:\Code\itinerum-tripkit> python -m venv tripkit-venv
PS C:\Code\itinerum-tripkit> .\tripkit-venv\Scripts\Activate.ps1
PowerShell Note:
With PowerShell, Set-ExecutionPolicy Unrestricted -Force
may be required to allow the Activate.ps1
script to run. The PowerShell prompt must then be restarted for these permissions to take effect.
Dependencies¶
Windows¶
Windows does not provide a build environment by default with libraries relied upon by the GDAL package. Instead, the Visual C++ Redistributable for Visual Studio 2015 (13.4MB) can be installed to provide the necessary system libraries.
If an existing GDAL installation is available to Python, it may be possible to skip this step.
Compiled Python Packages¶
On Windows without a C/C++ build environment, some packages requiring will fail to install using pip. Instead, compiled wheel versions can be downloaded from various mirrors and installed with pip directly from file.
(tripkit-venv) PS C:\Code\itinerum-tripkit> pip install .\Fiona-1.8.6-cp37-cp37m-win_amd64.whl
Compiled packages to install:
- GDAL: https://www.lfd.uci.edu/~gohlke/pythonlibs/#gdal
- Fiona: https://www.lfd.uci.edu/~gohlke/pythonlibs/#fiona
GDAL (Alternative)¶
If the above is not successful, another method is using the gisinternals.com pre-compiled binaries. For your system version (likely MSVC 2017 / x64), click “Downloads”. From the downloads page, the core GDAL library is all that is needed (gdal-204-1911-64-core.msi).
Install this file and set two Windows environment variables:
- Append to PATH:
C:\Program Files\GDAL
- Create GDAL_DATA:
C:\Program Files\GDAL\gdal-data
After setting these variables, close and re-open the command prompt (re-activate the virtual environment if using) and the Python dependencies can be installed.
First the GDAL library must be installed for geospatial operations and outputs.
Optional Components¶
Scikit-learn¶
Scikit-learn can optionally be installed for optimized clustering such as by swapping out the included K-Means implementation:
- Scikit-learn: https://www.lfd.uci.edu/~gohlke/pythonlibs/#scikit-learn
OSRM¶
The itinerum-tripkit provides interfaces for submitting map matching queries to an OSRM API and writing results to file.
The instructions that follow use the Multi-Level Djikstra processing pipelines recommended by Project OSRM.
- Download an OSM extract for your region (ex. Québec)
$ mkdir osrm && cd osrm
$ wget http://download.geofabrik.de/north-america/canada/quebec-latest.osm.pbf
- Process the OSM data using the default network profiles included with OSRM:
# car
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-extract -p /opt/car.lua /data/quebec-latest.osm.pbf
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-partition /data/quebec-latest
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-customize /data/quebec-latest
$ mkdir car
$ mv quebec-latest.orsm* car
# bike
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-extract -p /opt/bicycle.lua /data/quebec-latest.osm.pbf
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-partition /data/quebec-latest
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-customize /data/quebec-latest
$ mkdir bicycle
$ mv quebec-latest.orsm* bicycle
# walking
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-extract -p /opt/foot.lua /data/quebec-latest.osm.pbf
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-partition /data/quebec-latest
$ docker run -t -v $(pwd):/data osrm/osrm-backend osrm-customize /data/quebec-latest
$ mkdir foot
$ mv quebec-latest.orsm* foot
- Run the Docker OSRM API containers on ports
5000-5002
to reverse proxy for public access
$ docker run -d --restart always -p 5000:5000 -v $(pwd)/car:/data osrm/osrm-backend osrm-routed --algorithm MLD --max-matching-size=5000 /data/quebec-latest.osrm
$ docker run -d --restart always -p 5001:5000 -v $(pwd)/bicycle:/data osrm/osrm-backend osrm-routed --algorithm MLD --max-matching-size=5000 /data/quebec-latest.osrm
$ docker run -d --restart always -p 5002:5000 -v $(pwd)/foot:/data osrm/osrm-backend osrm-routed --algorithm MLD --max-matching-size=5000 /data/quebec-latest.osrm
Configuration¶
The itinerum-tripkit is configured by a global configuration object that is passed to
the class at initialization. This can be created either as a Python file of
global variables that is imported or defined as a bare class Config object and named
tripkit_config
.
Generating the Configuration¶
The following parameters are accepted by itinerum-tripkit:
DATABASE_FN |
The filename to be used for the cache SQLite database. |
INPUT_DATA_DIR |
Directory of the unpacked TripKit
export .csv files. Usually a subdirectory
of the ./input directory. |
INPUT_DATA_TYPE |
Data source: itinerum or qstarz |
OUTPUT_DATA_DIR |
Output directory to save processed export data. |
SUBWAY_STATIONS_FP |
Relative filepath of subway .csv data for connecting gaps during trip detection algorithms. |
TRIP_DETECTION_BREAK_INTERVAL_SECONDS |
Minimum stop time for breaking GPS coordinates into trips. |
TRIP_DETECTION_SUBWAY_BUFFER_METERS |
Buffer in meters for associating a trip end with a subway station entrance. |
TRIP_DETECTION_COLD_START_DISTANCE_METERS |
Leeway distance in meters for allowing a device acquire a GPS fix before inferring that an intemediary trip with missing data has occured. |
TRIP_DETECTION_ACCURACY_CUTOFF_METERS |
Minimum horizontal accuracy in meters for including GSP points within trip detection algorithms. Greater values indicate worse accuracy; generally 30-50 is deemed an acceptable range. |
Process parameters
These parameters are only needed if their related processes will be run.
TIMEZONE |
The timezone name as described within the tzdata database for complete days detection (e.g., America/Montreal) |
SEMANTIC_LOCATIONS |
Mapping of semantic locations to latitude longitude columns within survey responses (see below for example). |
SEMANTIC_LOCATION_PROXIMITY_METERS |
Buffer distance in meters to consider a GPS point to be at a semantic location. |
Extra parameters
These parameters are to configure plug-in processes (OSRM map matching API below). Check the plug-in source code to see what is expected in these cases.
MAP_MATCHING_BIKING_API_URL |
Endpoint for OSRM bicycle network map maptching. |
MAP_MATCHING_DRIVING_API_URL |
Endpoint for OSRM car network map maptching. |
MAP_MATCHING_WALKING_API_URL |
Endpoint for OSRM foot network map maptching. |
Quick Start¶
The itinerum-tripkit library provides simple interfaces to load .csv data into a local SQLite database
and providing Python class
objects (such as tripkit.models.User
and tripkit.models.Trip
)
to represent this data for processing and inferring metadata.
Load Data¶
To initilize the libary, a configuration object is expected (see Generating the Configuration).
First, input .csv data be loaded to the itinerum-tripkit cache database:
from tripkit import TripKit
import tripkit_config
tripkit = TripKit(config=tripkit_config)
tripkit.setup()
After data has been loaded to the database, survey participants or users can be loaded as a list of tripkit.models.User
objects:
users = tripkit.load_users()
for user in users:
print(len(user.coordinates))
Note: On first run, .csv data will be imported if the table user_survey_responses
does not exist in the cache database.
To delete the cache, the temporary files can be deleted between runs.
Run Trip Detection on a User¶
Instead of running trip detection on the whole survey, it is possible to focus on a single user in detail. For writing new processing libraries, this is often an essential first step.
user = tripkit.load_users(uuid='00000000-0000-0000-0000-000000000000')
params = {
'load_subway_entrances': tripkit.database.load_subway_entrances(),
'break_interval_seconds': tripkit_config.TRIP_DETECTION_BREAK_INTERVAL_SECONDS,
'subway_buffer_meters': tripkit_config.TRIP_DETECTION_SUBWAY_BUFFER_METERS,
'cold_start_distance': tripkit_config.TRIP_DETECTION_COLD_START_DISTANCE_METERS,
'accuracy_cutoff_meters': tripkit_config.TRIP_DETECTION_ACCURACY_CUTOFF_METERS
}
trips = tripkit.process.trip_detection.triplab.algorithm.run(user.coordinates,
parameters=params)
Run Complete Days Summaries on a User¶
When trips have been detected for a user, the complete days summary process can be run on individual users. This will check to see if a day is “complete” (contains no missing trips), “incomplete”, or “inactive”. There are additional rules to consider days as “complete” if there is an inactive day between two complete days; it is recommended to review the process source code.
user = tripkit.load_users(uuid='00000000-0000-0000-0000-000000000000')
trip_day_summaries = tripkit.process.complete_days.triplab.counter.run(user.trips,
tripkit_config.TIMEZONE)
tripkit.database.save_trip_day_summaries(user, trip_day_summaries, tripkit_config.TIMEZONE)
Run Semantic Location Activity Detection on a User¶
If common user locations are available within survey responses or supplied separately (such as from the outputs of a
clustering process), dwell times from nearby GPS points can be tallied. Note: the Coordinate
model is currently
created on-the-fly as demonstrated, but this should soon be available as an included library class-object.
Coordinate = namedtuple('Coordinate', ['latitude', 'longitude'])
user = tripkit.load_users(uuid='00000000-0000-0000-0000-000000000000')
locations = {
'home': Coordinate(latitude=45.5, longitude=-73.5)
}
tripkit.io.write_activity_locations_geojson(tripkit_config, fn_base=user.uuid, locations=locations)
summary = tripkit.process.activities.triplab.detect.run(
user, locations, proximity_m=tripkit_config.ACTIVITY_LOCATION_PROXIMITY_METERS, timezone=tripkit_config.TIMEZONE)
dwell_time_summaries = [summary] # usually, multiple users would be summarized for output
tripkit.io.write_user_summaries_csv(tripkit_config, dwell_time_summaries)
Run OSRM Map Matching on a Trip¶
If an OSRM server is available, map matching queries can be passed to the API and the response saved to a GIS-friendly format (.geojson or .gpkg). The API query is limited by URL length, so map matching should be done for a single trip and especially long trips may have to be supplied in chunks.
user = tripkit.load_users(uuid='00807c5b-7542-4868-8462-14b79a9fcc9f',
start=datetime(2017, 11, 29),
end=datetime(2017, 11, 30))
map_matcher = tripkit.process.map_match.osrm(tripkit_config)
mapmatched_results = map_matcher.match(coordinates=user.coordinates, matcher='DRIVING')
tripkit.io.write_mapmatched_geojson(cfg=tripkit_config, fn_base=user.uuid, results=mapmatched_results)
API¶
-
class
tripkit.
TripKit
(config)¶ The base TripKit object provides an interface for working with .csv data exported from the Itinerum platform or QStarz GPS data loggers. The object is passed the config at initialization which should be an imported Python file of global variables or class with the same attributes as expected here.
This TripKit object is the entry API for loading .csv data to an SQLite database (used as cache), running algorithms on the GPS data, and visualizing or exporting the results to GIS-friendly formats.
The TripKit instance is usually created in your main module like this:
from tripkit import TripKit import tripkit_config tripkit = TripKit(config=tripkit_config) tripkit.setup()
Parameters: config – An imported Python file of global variables or a bare config class with the same attributes, see Generating the Configuration for more information. -
check_setup
()¶ Raises exception if UserSurveyResponse table is not found in database.
-
csv
¶ Provides access to the CSV parser objects.
-
database
¶ Provides access to the cache database object.
-
io
¶ Provides access to the file reading and writing functions.
-
load_user_by_orig_id
(orig_id, load_trips=True, start=None, end=None)¶ Returns all available users as
tripkit.models.User
objects from the databaseParameters: - orig_id – An individual user’s original ID from a non-Itinerum dataset to load
- load_trips (boolean, optional) – Supply False to disable automatic loading of trips to py:class:tripkit.models.User objects on initialization
- start (datetime, optional) – Miminum timestamp bounds (inclusive) for loading user coordinate and prompts data
- end (datetime, optional) – Maximum timestamp bounds (inclusive) for loading user coordinate and prompts data
Return type:
-
load_users
(uuid=None, load_trips=True, load_locations=True, limit=None, start=None, end=None)¶ Returns all available users as
tripkit.models.User
objects from the databaseParameters: - uuid (string, optional) – Supply an individual user’s UUID to load
- load_trips (boolean, optional) – Supply False to disable automatic loading of trips to
tripkit.models.User
objects on initialization - load_locations (boolean, optional) – Supple False to disable automatic loading of activity locations to
tripkit.models.User
objects on initialization - limit (integer, optional) – Maximum number of users to load
- start (datetime, optional) – Mininum timestamp bounds (inclusive) for loading user coordinate and prompts data
- end (datetime, optional) – Maximum timestamp bounds (inclusive) for loading user coordinate and prompts data
Return type: list of
tripkit.models.User
-
process
¶ Provides access to the GPS point and trip processing algorithm submodules.
-
setup
(force=False, generate_null_survey=False)¶ Create the cache database tables if the
UserSurveyResponse
table does not exist.Parameters: - force (boolean, optional) – Supply True to force creation of a new cache database
- generate_null_survey (boolean, optional) – Supply True to generate an empty survey responses table for coordinates-only data
-
CSV Parser¶
-
class
tripkit.csvparser.
ItinerumCSVParser
(database)¶ Parses Itinerum platform csv files and loads to them to a cache database.
Parameters: database – Open Peewee connection the cache database -
generate_null_survey
(input_dir)¶ Wrapper function to generate null survey responses for each user in coordinates.
Parameters: input_dir – Directory containing input .csv data
-
load_export_cancelled_prompt_responses
(input_dir)¶ Loads Itinerum cancelled prompt responses data to the cache database. For each .csv row, the data is fetched by column name if it exists and cast to appropriate types as set in the database.
Parameters: input_dir – The directory containing the self.cancelled_prompt_responses.csv data file.
-
load_export_coordinates
(input_dir)¶ Loads Itinerum coordinates data to the cache database.
Parameters: input_dir – The directory containing the self.coordinates_csv data file.
-
load_export_prompt_responses
(input_dir)¶ Loads Itinerum prompt responses data to the cache database. For each .csv row, the data is fetched by column name if it exists and cast to appropriate types as set in the database.
Parameters: input_dir – The directory containing the self.prompt_responses.csv data file.
-
load_export_survey_responses
(input_dir)¶ Loads Itinerum survey responses data to the cache database.
Parameters: input_dir – The directory containing the self.survey_responses_csv data file.
-
load_trips
(trips_csv_fp)¶ Loads trips processed by the web platform itself. This is mostly useful for comparing current algorithm results against the deployed platform’s version.
Parameters: trips_csv_fp – The full filepath of the downloaded trips .csv file for a survey.
-
-
class
tripkit.csvparser.
QstarzCSVParser
(config, database)¶ Parses Qstarz csv files and loads to them to a cache database.
Parameters: - config –
- database – Open Peewee connection the cache database
- csv_input_dir – Path to the directory containing the input coordinates .csv data
-
generate_null_survey
(input_dir)¶ Wrapper function to generate null survey responses for each user in coordinates.
Parameters: input_dir – Directory containing input .csv data
-
load_export_coordinates
(input_dir)¶ Loads QStarz coordinates data to the cache database.
Parameters: input_dir – The directory containing the self.coordinates_csv data file.
-
load_user_locations
(input_dir)¶ Loads QStarz user locations data to the cache database.
Parameters: input_dir – The directory containing the self.locations_csv data file.
I/O¶
-
class
tripkit.io.
IO
(cfg)¶
-
class
tripkit.io.
CSVIO
(cfg)¶ -
write_activities_daily
(daily_summaries, extra_cols=None, append=False)¶ Write the user activity summaries by date with a record for each day that a user participated in a survey.
Parameters: - daily_summaries (list of dict) – Iterable of user summaries for row records.
- append (boolean, optional) – Toggles whether summaries should be appended to an existing output file.
-
write_activity_summaries
(summaries, append=False)¶ Write the activity summary data consisting of complete days and trips tallies with a record per each user for a survey.
Parameters: - summaries (list of dict) – Iterable of user summaries for row records
- append (boolean, optional) – Toggles whether summaries should be appended to an existing output file.
-
write_complete_days
(trip_day_summaries, append=False)¶ Write complete day summaries to .csv with a record per day per user over the duration of their participation in a survey.
Parameters: - trip_day_summaries (list of dict) – Iterable of complete day summaries for each user enumerated by uuid and date.
- append (boolean, optional) – Toggles whether summaries should be appended to an existing output file.
-
write_condensed_activity_locations
(user, append=True)¶ Write or append the provided user’s activity locations to file.
Parameters: - locations (list of dict) – Iterable of user summaries for row records.
- append (boolean, optional) – Toggles whether summaries should be appended to an existing output file.
-
write_condensed_trip_summaries
(user, trip_summaries, complete_day_summaries, append=False)¶ Write the trip summaries with added columns for labeled trip origins/destinations and whether a trip occured on a complete trip day.
Parameters: - daily_summaries (list of dict) – Iterable of user summaries for row records.
- append (boolean, optional) – Toggles whether summaries should be appended to an existing output file.
-
write_trip_summaries
(fn_base, summaries, extra_fields=None, append=False)¶ Write detected trip summary data to csv consisting of a single record for each trip.
Parameters: - fn_base (str) – The base filename to prepend to the output csv file.
- summaries (list of dict) – Iterable of trip summaries for row records.
- extra_fields (list, optional) – Additional columns to append to csv (must have matching key in summaries object).
- append (boolean, optional) – Append data to an existing .csv file.
-
write_trips
(fn_base, trips, extra_fields=None, append=False)¶ Write detected trips data to a csv file.
Parameters: - fn_base (str) – The base filename to prepend to the output csv file
- trips – Iterable of database trips to write to csv file
- trips – list of
tripkit.models.Trip
-
-
class
tripkit.io.
GeoJSONIO
(cfg)¶ -
write_activity_locations
(fn_base, locations)¶ Write activity locations (from config or detected) to a geojson file.
Parameters: - fn_base (str) – The base filename to prepend to each output geojson file.
- locations (dict) – A dictionary object of a user’s survey responses containing columns with activity location latitude and longitudes.
-
write_inputs
(fn_base, coordinates, prompts, cancelled_prompts)¶ Writes input coordinates, prompts and cancelled prompts data selected from cache to individual geojson files.
Parameters: - fn_base (str) – The base filename to prepend to each output geojson file.
- coordinates (list of
tripkit.database.Coordinate
) – Iterable of database coordinates to write to geojson file. - prompts (list of
tripkit.database.PromptResponse
) – Iterable of database prompts to write to geojson file. - cancelled_prompts (list of
tripkit.database.CancelledPromptResponse
) – Iterable of database cancelled prompts to write to geojson file.
-
write_mapmatch
(fn_base, results)¶ Writes map matching results from API query to geojson file.
Parameters: - fn_base (str) – The base filename to prepend to the output geojson file
- results – JSON results from map matching API query
-
write_trips
(fn_base, trips)¶ Writes detected trips data selected from cache to geojson file.
Parameters: - fn_base (str) – The base filename to prepend to the output geojson file
- trips (list of
tripkit.models.Trip
) – Iterable of database trips to write to geojson file
-
-
class
tripkit.io.
GeopackageIO
(cfg)¶ -
write_activity_locations
(fn_base, locations)¶ Write activity locations (from config or detected) to a geopackage file.
Parameters: - fn_base (str) – The base filename to prepend to each output geopackage file.
- locations (dict) – A dictionary object of a user’s survey responses containing columns with activity location latitude and longitudes.
-
write_inputs
(fn_base, coordinates, prompts, cancelled_prompts)¶ Writes input coordinates, prompts and cancelled prompts data selected from cache to individual geopackage files.
Parameters: - fn_base (str) – The base filename to prepend to each output geopackage file.
- coordinates (list of
tripkit.database.Coordinate
) – Iterable of database coordinates to write to geopackage file. - prompts (list of
tripkit.database.PromptResponse
) – Iterable of database prompts to write to geopackage file. - cancelled_prompts (list of
tripkit.database.CancelledPromptResponse
) – Iterable of database cancelled prompts to write to geopackage file.
-
write_trips
(fn_base, trips)¶ Writes detected trips data to a geopackage file.
Parameters: - fn_base – The base filename to prepend to the output geopackage file
- trips – Iterable of database trips to write to geopackage file
- fn_base – str
- trips – list of
tripkit.models.Trip
-
Database¶
-
class
tripkit.database.
Database
(config)¶ Handles itinerum-tripkit interactions with the cached database using peewee.
-
bulk_insert
(Model, rows, chunk_size=50000)¶ Bulk insert an iterable of dictionaries into a supplied Peewee model by
chunk_size
.Parameters: - Model – Peewee database model of target table for inserts.
- rows (list) – Iterable of dictionaries matching table model for bulk insert.
- chunk_size (int, optional) – Number of rows to insert per transaction.
-
clear_trips
(user=None)¶ Clears the detected trip points table or for an individual user.
Parameters: user – (Optional) Delete trips for particular user only.
-
count_users
()¶ Returns a count of all survey responses in cache database.
-
create
()¶ Creates all the tables necessary for the itinerum-tripkit cache database.
-
delete_user_from_table
(Model, user)¶ Deletes a given user’s records from a table in preparation for overwriting.
-
drop
()¶ Drops all cache database tables.
-
get_uuid
(orig_id)¶ Returns the database uuid for a user’s original id from a non-Itinerum dataset.
Parameters: orig_id – The original dataset’s user id for an individual user.
-
load_activity_locations
(user)¶ Queries cache database for activity locations known for a user.
Parameters: user – A database user response record
-
load_subway_entrances
()¶ Queries cache database for all available subway entrances.
-
load_trip_day_summaries
(user)¶ Load the daily trip summaries for a given user as dict.
Parameters: user – A database user response record with a populated detected_trip_day_summaries relation.
-
load_trips
(user, start=None, end=None)¶ Load the sorted trips for a given user as list.
Parameters: user – A database user response record with a populated detected_trip_coordinates relation.
-
load_user
(uuid, start=None, end=None)¶ Loads user by
uuid
to an itinerum-tripkitUser
object.Parameters: - uuid – A individual user’s UUID from within an Itinerum survey.
- start – Optional. Naive datetime object (set within UTC) for selecting a user’s coordinates start period.
- end – Optional. Naive datetime object (set within UTC) for selecting a user’s coordinates end period.
-
save_trip_day_summaries
(user, trip_day_summaries, timezone, overwrite=True)¶ Saves the daily summaries for detected trip days to cache database. This table with be recreated on each save by default.
Parameters: - user (
tripkit.models.User
) – A database user response record associated with the trip day summaries. - trip_day_summaries (list of
tripkit.models.DaySummary
) – List of daily summaries from a daily trip counts algorithm. - timezone (str) – The tz database timezone name for the location that was used to localize the complete days detection.
- overwrite (boolean, optional) – Provide False to keep existing summaries for user in database.
- user (
-
save_trips
(user, trips, overwrite=True)¶ Saves detected trips from processing algorithms to cache database. This table will be recreated on each save by default.
Parameters: - user (
tripkit.models.User
) – A database user response record associated with the trip records. - trips (list of
tripkit.models.Trip
) – Iterable of detected trips from a trip processing algorithm.
- user (
-
Library Objects¶
User¶
The tripkit.models.User
object provides the representation of a user in a survey with their
associated coordinates, prompt, cancelled prompts and trips (if available in cache).
-
class
tripkit.models.
User
(db_user)¶ Parameters: db_user – The
peewee
survey response for a given user.Variables: - activity_locations (list) – A user’s saved or detected activity locations. This is loaded automatically
when a
User
is initialized bytripkit.database.Database.load_user()
. - trips (list) – A user’s detected trips. This is loaded automatically when a
User
is initialized bytripkit.database.Database.load_user()
.
- activity_locations (list) – A user’s saved or detected activity locations. This is loaded automatically
when a
Trip¶
The tripkit.models.Trip
object provides the representation of a user’s trip with some inferred
properties about it.
-
class
tripkit.models.
Trip
(num, trip_code)¶ Parameters: - num (int) – The integer index (starting at 1) of all the detected trip events (completer or incomplete) as user has made over the duration of their survey participation.
- trip_code (int) – The integer code representing the detected trip type.
Variables: points (list of
tripkit.models.TripPoint
) – The timestamp-ordered points that comprise thisTrip
.
Trip Point¶
The tripkit.models.TripPoint
object provides the respresentation of a user’s GPS coordinates
after having been processed by a trip detection algorithm.
-
class
tripkit.models.
TripPoint
(database_id, latitude, longitude, h_accuracy, distance_before, trip_distance, period_before, timestamp_UTC)¶ Parameters: - database_id (integer) – The GPS point’s original database coordinates record id.
- latitude (float) – The GPS point’s latitude.
- longitude (float) – The GPS point’s longitude.
- h_accuracy (float) – The reported horizontal accuracy of a GPS point.
- distance_before (float) – The distance between this point and point immediately prior in meters.
- trip_distance (float) – The cumulative distance of the current trip so far in meters.
- period_before (integer) – The number of seconds passed since the last recorded point.
- timestamp_UTC (datetime) – The point’s naive datetime localized to UTC.
Variables: timestamp_epoch (int) – The point’s datetime within the UNIX epoch format.
Day Summary¶
The tripkit.models.DaySummary
object provides the representation of complete trip days after
processing a user’s trips by a complete days detection algorithm.
-
class
tripkit.models.
DaySummary
(timezone, date, has_trips, is_complete, start_point, end_point, consecutive_inactive_days, inactivity_streak)¶ Parameters: - timezone (str) – String representation of the timezone as listed in the IANA tz database (e.g., America/Montreal). See: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones
- date (datetime.Date) – The naive date used for determining a complete trip day.
- has_trips (bool) – Boolean value to indicate whether a date contains any trips.
- is_complete (bool) – Boolean value to indicate whether a date with trips is considered complete or not.
- start_point (object) – The database point for the first available GPS coordinate on a localized date.
- end_point (object) – The database point for the last available GPS coordinate on a localized date.
- consecutive_inactive_days (int) – The total number of days in the latest inactivity streak (reset on any complete day).
- inactivity_streak (int) – The longest streak of consecutively inactive days for a user.
Outputs¶
Trips¶
Complete Days¶
The Complete Days/Trip Lab process iterates over the trips by 24-hour cycle (00:00:00-23:59:59) in the supplied timezone. For each day that a user participates in a study, the process checks if any trips are labeled as missing and if none exist, the day is labelled as “is_complete”. If no trips exist, the day is not labeled as complete.
Special cases¶
- It is common that a user may legitimately make no trips on a participation day. In the case that there is a day without any trips, but the day prior and after are labeled complete and the distance between the last trip end and the next trip start is <= 750 meters, this day will be labeled “is_complete”. Cases with more than 1 day without trips are not considered by this rule.
Notes¶
- The output results are grouped by
uuid
and ordered bydate
. - Boolean values are represented as
1
(True) or0
(False).
uuid |
The unique user ID in the survey. |
date |
The date (localized by timezone) for complete day record. |
has_trips |
Boolean to indicate whether date contains any trips. |
is_complete |
Boolean to indicate if date contains trips and no missing trips. |
start_latitude |
The latitude of the date’s first trip. |
start_longitude |
The longitude of the date’s first trip. |
end_latitude |
The latitude of the date’s last trip. |
end_longitude |
The longitude of the date’s last trip. |
consecutive_inactive_days |
The current tally of inactive days in a row (reset each complete day). |
inactivity_streak |
The maximum tally of incomplete days in a row for a user. |
FAQ¶
Q: Why are there additional points with duplicate timestamps in my detected trips?
A: Trip detection creates a labeled, uninterrupted trajectory from a user’s coordinates. Occasionally gaps occur between trips, either because of device handling of geofences, signal loss due to interference or battery drain, or a user temporarily pausing data collection. When this occurs, a missing trip will be generated.
If a missing trip is sufficiently small and immediately preceding a detected complete trip, it will be prepended to the complete trip. When this occurs, a new point is generated with the end location of the previous trip and the same timestamp as the start (now the second point) of the next complete trip.
Contributing¶
Style Guide¶
Notes¶
The itinerum-tripkit is a modular library that is built to be extended. The most common place to contribute will be adding a new process (itinerum-tripkit/tripkit/process) for modifying or building new GPS detection algorithms. The process filepath format is: tripkit/process/<descriptive-name>/<organization>/library_files.py. Try to group additions as best as possible within existing directories if a one exists, but otherwise it is encouraged to be descriptive. If a suitable name doesn’t exist, it’s better to create a new directory altogether.
Database¶
Library readability is prioritized over absolute execution performance, although both are desired. To keep the Python code base neater, the Peewee ORM is used for database operations. Peewee is used instead of SQLAlchemy to help keep the library compact while providing comparable functionality. The database functions are intended to be SQL database agnostic, however, bulk inserts for initial data loading is tightly coupled to SQLite. Additional database engines will require specially handling data import from .csv.
Algorithms¶
- Included algorithms follow the format of:
- Input user, user.coordinates, user.trips, etc. objects from the database
- Perform processing with a siblings
models.py
tightly coupled to the processing script. The methods should be primarily getters/setters
- and helper methods, the core processing should try to occur entirely in the algorithm scripts to help keep the intention of each step obvious.
- Algorithm models should be “wrapped for datakit” which is mapping the algorith’s generally more complex model to the library’s base
models for general usage. The base model should be as simple as possible to limit what must be stored in the database; if an attribute like duration can be inferred from a start and end timestamp, this is preferred over storing the value.
Pull Commits¶
Python black is used to format all library code into a consistent format. Note that two options are used:
- line limit of 120 characters (-l 120)
- skip string normalization (-S, explained below)
$ black -l 120 -S -t py37 tripkit
The string formatting exception to Python blacks default rules is to better convey meaning about intention within the library. Internal strings such as dictionary keys and filenames use single-quote (’) endings whereas strings that will print to the user such as log messages use double-quote (”) endings.