This page outlines the sequence and most important steps of this exercise. The complete set of steps are in the code files.
- The SQL file contains all the statements for this exercise
In this exercise, we will look at the public transportation network of Adelaide, Australia, and learn how to run routing/pathfinding queries on this spatio-temporal network. For that, we create the networks edges and vertices from the GTFS data. Then we create a Graph Workspace in SAP HANA and run some "GraphScript" functions, which will answer "shortest path one-to-one" and "top k nearest neighbors" queries.
So the GTFS data describes a network in which vehicles can take you from a STOP
to another at certain times. To query the network for paths, we will use HANA's graph engine. To do that, we will transform the data into a set of EDGES
(transport connections) and VERTICES
(stops).
First, we will create EDGES
from the STOPTIMES
by creating SOURCE
/TARGET
pairs from subsequent stops of a trip.
CREATE COLUMN TABLE "TECHED_USER_000"."EDGES" AS (
SELECT "trip_id" , "SOURCE", "dep_time", "TARGET", "arr_time", 'transport' AS "transfer_type" FROM (
SELECT "trip_id", "stop_sequence", "dep_time", "stop_id" AS "SOURCE",
LEAD("arr_time") over (partition by "trip_id" order by "stop_sequence" ASC) as "arr_time",
LEAD("stop_id") over (partition by "trip_id" order by "stop_sequence" ASC) as "TARGET"
FROM "TECHED_USER_000"."GTFS_STOPTIMES"
)
WHERE "TARGET" IS NOT NULL AND "dep_time" IS NOT NULL AND "arr_time" IS NOT NULL
);
The EDGES
table contains data about which TRIP
takes you from a SOURCE
(i.e. a stop) to a TARGET
(i.e. the next stop). We have also added some additional columns, e.g. diff_sec
which indicates how many seconds it takes to "traverse" this EDGE
.
At this point, our network is just connected via STOP
s, so we can change vehicles only if the share a STOP
. But we can also walk, right? Let's add some data to the EDGES
table by connecting STOP
s which are closer than 500m. For simplicity reasons we just calculate the airline distance. In a real scenario we would calculate the walking distance.
DO() BEGIN
T_AllPairs = SELECT STO1."stop_id" AS "SOURCE", STO2."stop_id" AS "TARGET", STO1."SHAPE_28354".ST_DISTANCE(STO2."SHAPE_28354") AS "distance"
FROM "TECHED_USER_000"."GTFS_STOPS" AS STO1
INNER JOIN "TECHED_USER_000"."GTFS_STOPS" AS STO2 ON STO1."SHAPE_28354".ST_WITHINDISTANCE(STO2."SHAPE_28354", 500, 'meter') = 1;
INSERT INTO "TECHED_USER_000"."EDGES"("SOURCE", "TARGET", "transfer_type", "diff_sec", "distance")
SELECT "SOURCE", "TARGET", 'walk' AS "transfer_type", "distance"/1.2 AS "diff_sec", "distance"
FROM :T_AllPairs WHERE "SOURCE" != "TARGET";
END;
For the EDGES
with transfer_type
= 'walk' we don't have a trip_id
, and no dparture and arrival time (dep_time
, arr_time
). But we can derive diff_sec
from the distance
. The assumption is that we can walk about 1.2m per second.
Finally, we will also connect the POIs to our network. The query below is similar to the last one and adds an EDGE
from a STOP
to a POI
.
DO() BEGIN
T_AllPairs = SELECT STO."stop_id" AS "SOURCE", POI."id" AS "TARGET", STO."SHAPE_28354".ST_DISTANCE(POI."SHAPE_28354") AS "distance"
FROM "TECHED_USER_000"."GTFS_STOPS" AS STO
INNER JOIN "TECHED_USER_000"."POIS" AS POI ON STO."SHAPE_28354".ST_WITHINDISTANCE(POI."SHAPE_28354", 500, 'meter') = 1;
INSERT INTO "TECHED_USER_000"."EDGES"("SOURCE", "TARGET", "transfer_type", "diff_sec", "distance")
SELECT "SOURCE", "TARGET", 'walk' AS "transfer_type", "distance"/1.2 AS "diff_sec", "distance"
FROM :T_AllPairs WHERE "SOURCE" != "TARGET";
END;
Our final EDGES
table contains over 700k 'transport' connections and over 200k 'walk' connections.
Now let's create the graph's VERTICES
. We'll use a view to union the STOPS
and the POIS
.
CREATE OR REPLACE VIEW "TECHED_USER_000"."V_VERTICES" AS (
SELECT "stop_id" AS "ID", 'stop' AS "TYPE", 'stop' AS "POI_TYPE", "stop_name" AS "NAME", "SHAPE_28354" FROM "TECHED_USER_000".GTFS_STOPS
UNION
SELECT "id" AS "ID", 'poi' AS "TYPE", "tags.amenity" AS "POI_TYPE", "tags.name" AS "NAME", "SHAPE_28354" FROM "TECHED_USER_000".POIS
);
The VERTCIES
data looks like this. We got 'poi' and 'stop' TYPE
of vertices, a NAME
column and a geolocation.
Looking at the POI_TYPE
we see the following distribution in the VERTICES
table.
Last task is to create a GRAPH WORKSPACE
to expose the network to the graph engine.
CREATE OR REPLACE GRAPH WORKSPACE "TECHED_USER_000"."GRAPH_GTFS_POIS"
EDGE TABLE "TECHED_USER_000"."EDGES"
SOURCE COLUMN "SOURCE"
TARGET COLUMN "TARGET"
KEY COLUMN "ID"
VERTEX TABLE "TECHED_USER_000"."V_VERTICES"
KEY COLUMN "ID";
The below GraphScript function will find shortest paths in the transportation network. It takes a start vertex, end vertex, and a parameter that indicates the time of our departure (in seconds), and returns a table with the path's edges. Basically, the program just calls the built-in SHORTEST_PATH
algorithm, but the magic is in the edge weight expression which we need to pass to the SHORTEST_PATH algorithm
- we can only take 'transport' connections that are in the future
- if we need to wait for a bus, the wait time needs to be added to our overall travel time
- we can take a 'walk' at anytime
CREATE OR REPLACE FUNCTION "TECHED_USER_000"."F_SPOO_GTFS_POIS"(
IN i_startVertex BIGINT, -- the ID of the start vertex, this is a STOP or a POI
IN i_endVertex BIGINT, -- the ID of the end vertex
IN i_startSec INT -- the time at which the TRIP should start, in seconds from midnight. So 9*3600 is 9am
)
RETURNS TABLE ("ID" BIGINT, "EDGE_ORDER" BIGINT, "transfer_type" NVARCHAR(9), "trip_id" INT, "dep_sec" INT, "SOURCE" BIGINT, "arr_sec" INT, "TARGET" BIGINT, "diff_sec" INT, "distance" DOUBLE)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
GRAPH g = Graph("TECHED_USER_000", "GRAPH_GTFS_POIS");
VERTEX v_start = Vertex(:g, :i_startVertex);
VERTEX v_end = Vertex(:g, :i_endVertex);
WeightedPath<INT> p = SHORTEST_PATH(:g, :v_start, :v_end,
(EDGE e, INT current_path_sec)=> INT{
IF(:e."transfer_type" == 'transport' AND (:i_startSec + :current_path_sec) <= :e."dep_sec") { -- the edge is a transport and the depature is in the future
RETURN (:e."dep_sec" - (:i_startSec + :current_path_sec)) + :e."diff_sec"; -- return potential wait time plus traverse time
}
ELSE { -- the edge is a walk between stops
IF(:e."transfer_type" == 'walk') { RETURN :e."diff_sec"; } -- its a transfer, i.e. you can walk any time
ELSE { END TRAVERSE; } --not a valid edge
}
}, 'OUTGOING');
RETURN SELECT :e."ID", :EDGE_ORDER, :e."transfer_type", :e."trip_id", :e."dep_sec", :e."SOURCE", :e."arr_sec", :e."TARGET", :e."diff_sec", :e."distance"
FOREACH e IN Edges(:p) WITH ORDINALITY AS EDGE_ORDER;
END;
If we execute this function with specific input, the result looks like this. First we need to walk a little bit, before we can hop on a bus at 20:12:37. The complete path is made of 79 edges and depicted on the map.
Next we want to calculate the travel time from one start vertex to all others. The SHORTEST_PATHS_ONE_TO_ALL
built-in algorithm helps us to identify isochrones - areas which can be reached with the same time budget. The magic edge function in the middle is the same as above.
CREATE OR REPLACE FUNCTION "TECHED_USER_000"."F_SPOA_GTFS_POIS"(
IN i_startVertex BIGINT, -- the ID of the start vertex
IN i_startSec INT
)
RETURNS TABLE ("stop_id" BIGINT, "SHAPE_28354" ST_GEOMETRY(28354), "TRAVEL_TIME" INT)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
GRAPH g = Graph("TECHED_USER_000", "GRAPH_GTFS_POIS");
VERTEX v_start = Vertex(:g, :i_startVertex);
GRAPH g_spoa = SHORTEST_PATHS_ONE_TO_ALL(:g, :v_start, "TRAVEL_TIME",
(EDGE e, INT current_path_sec)=> INT{
IF(:e."transfer_type" == 'transport' AND (:i_startSec + :current_path_sec) <= :e."dep_sec") {
RETURN (:e."dep_sec" - (:i_startSec + :current_path_sec)) + :e."diff_sec"; -- potenital wait time plus traverse time
}
ELSE {
IF(:e."transfer_type" == 'walk') { RETURN :e."diff_sec"; } -- its a transfer, i.e. you can walk any time
ELSE { END TRAVERSE; } --not a valid edge
}
}, 'OUTGOING');
RETURN SELECT :v."ID", :v."SHAPE_28354", :v."TRAVEL_TIME" FOREACH v IN Vertices(:g_spoa);
END;
Below are the 7am and 8pm isochrones for our start vertex (large green dot). The yellow area is reachable in within about 45 minutes. To reach the red areas take 2 hours or longer.
At the end of this exercise we want to find a pub. Remember that we connected the POIS
data to the transportation network? Let's explore which POIs are nearby our start location. The function below uses the TRAVERSE DIJKSTRA
operator. This allows us to traverse the network in a "shortest path" manner, inspecting each vertex as we traverse. With this approach we can look for the top 5 pubs which we can reach with public transport at 8pm. Again, the edge weight function is the same as above, but we now check each visited vertex if it is a specific POI_TYPE
('pub'). If so, we collect the pubs until we found enough (5) and stop execution.
CREATE OR REPLACE PROCEDURE "TECHED_USER_000"."F_TOP_K_NEAREST_POIS"(
IN i_startVertex BIGINT,
IN i_startSec INT,
IN i_poiType NVARCHAR(5000), -- NULL will find any amenity/POI; possible values are 'pub' or 'clinic'
IN i_k BIGINT, -- indicates how many POIs ARE returned
OUT o_vertices TABLE ("ID" BIGINT, "TYPE" NVARCHAR(5000), "POI_TYPE" NVARCHAR(5000), "NAME" NVARCHAR(5000), "TRAVEL_TIME" INT, "SHAPE_28354" ST_GEOMETRY(28354)),
OUT o_paths TABLE ("DISCOVERED_POI" BIGINT, "transfer_type" NVARCHAR(5000), "SOURCE" BIGINT, "TARGET" BIGINT, "diff_sec" INT, "distance" DOUBLE)
)
LANGUAGE GRAPH READS SQL DATA AS
BEGIN
GRAPH g = Graph("TECHED_USER_000", "GRAPH_GTFS_POIS");
VERTEX v_start = Vertex(:g, :i_startVertex);
MAP<VERTEX, INT> m_travelTimeMap = MAP<VERTEX, INT>(:g, :i_k);
TRAVERSERESULT<INT> shortest_path_tree = TRAVERSERESULT<INT>(:g);
TRAVERSE DIJKSTRA :g FROM :v_start WITH RESULT :shortest_path_tree WITH WEIGHT (EDGE e, INT current_path_sec)=> INT{
IF(:e."transfer_type" == 'transport' AND (:i_startSec + :current_path_sec) <= :e."dep_sec") { -- the edge is a transport and the depature is in the future
RETURN (:e."dep_sec" - (:i_startSec + :current_path_sec)) + :e."diff_sec"; -- return potential wait time plus traverse time
}
ELSE { -- the edge is a walk between stops
IF(:e."transfer_type" == 'walk') { RETURN :e."diff_sec"; } -- its a transfer, i.e. you can walk any time
ELSE { END TRAVERSE; } --not a valid edge
}
}
ON VISIT VERTEX (Vertex v_visited, INT travel_time) {
IF ( :v_visited."TYPE" == 'poi' AND (:v_visited."POI_TYPE" == :i_poiType OR :i_poiType IS NULL) ) { -- check if POI
m_travelTimeMap[:v_visited] = :travel_time; -- store vertex AND distance IN the map
IF (COUNT(:m_travelTimeMap) >= :i_k) { END TRAVERSE ALL; } -- if we found what we need, let's stop the traversal
}
};
o_vertices = SELECT :v_discoveredVertex."ID", :v_discoveredVertex."TYPE", :v_discoveredVertex."POI_TYPE", :v_discoveredVertex."NAME", :travel_time, :v_discoveredVertex."SHAPE_28354"
FOREACH (v_discoveredVertex, travel_time) IN :m_travelTimeMap;
BIGINT idx = 1L;
FOREACH (v_discoveredVertex, travel_time) IN :m_travelTimeMap {
WEIGHTEDPATH<INT> p = GET_PATH(:shortest_path_tree, :v_discoveredVertex);
FOREACH e IN Edges(:p) {
o_paths."DISCOVERED_POI"[:idx] = :v_discoveredVertex."ID";
o_paths."transfer_type"[:idx] = :e."transfer_type";
o_paths."SOURCE"[:idx] = :e."SOURCE";
o_paths."TARGET"[:idx] = :e."TARGET";
o_paths."diff_sec"[:idx] = :e."diff_sec";
o_paths."distance"[:idx] = :e."distance";
idx = :idx + 1L;
}
}
END;
Here are the top 5 pubs which we can reached fastest at 8pm. Little Bang Brewery can be reach within 24 minutes.
The function also returns the paths to each target.
Interestingly, the top 5 clinics are much faster to reach.
You've now created a network from the GTFS and POI data, and used GraphScript to find paths and neighbors in the network.
Continue to - Exercise 3 - Import and export spatial vector and raster data, spatial clustering