Technical Information for Mobility Dataset

This section contains detailed technical information shared by the datasets in the Mobility domain. The purpose of this section is manifold:

The JSON Response Schema

We recall that every query to the mobility datasets will return a JSON-structured file with a number of information about one station (or more) and values it collected over time, both real-time and historical data.

The overall structure of the JSON is the following:

"offset": 0,
"data": [],
"limit": 200

Here, offset and limit are used for limiting the displayed results. The three keys have the following meaning:

  • limit gives the maximum number of results that are included in the response. It defaults to 200.

    Hint

    By setting the value to -1, limit will be disabled and all results will be shown.

  • offset allows to skip elements from the result set. The default is 0, i.e., the results start from the first one.

  • data is the actual payload of the response, that is, the data answering the query; since it changes depending on which API call/method is used, it will be described in the next section.

Hint

It is possible to simulate pagination when there are many results: for example, if there are 1000 values, by adding to successive queries the offsets 0, 200, 400, 600, and 800, the response of the query is split on 5 pages of 200 results each.

Structure of the API calls and Payload

In the Mobility domain, there are different general methods that can be used to extract data from the Open Data Hub’s datasets and allow to incrementally refine the data retrieved. They are:

  1. /v2/ gives the list of the Open Data Hub’s entry points, that is, the possible representations of the data contained in the datasets. to be used in the next methods. See the details below.

  2. /v2/{representation}/ shows all the StationTypes available, that is, all the sources that provided data to the Open Data Hub.

  3. /v2/{representation}/{stationTypes} returns data about the stations themselves, including metadata associated with each, and data about its parent stations, if any.

  4. /v2/{representation}/{edgeTypes} returns data related to the edges and their parts, and is very similar to the previous call.

  5. /v2/{representation}/{stationTypes}/{dataTypes}. In addition to the data of the previous call, it contains the data types defined in the dataset.

  6. /v2/{representation}/{stationTypes}/{dataTypes}/latest. In addition to all the data retrieved by the previous call, this call retrieves also the most recent measurement. This method is especially suited for real time retrieval of data.

  7. /v2/{representation}/{stationTypes}/{dataTypes}/{from}/{to}. All the data retrieved by method #3, but limited to a given historical interval (fromto).

    Note

    The interval is half-open, i.e., [from, to), meaning that the from date is included in the result set, while the to date is excluded.

Representation types

The first method described in the previous list introduces the available entry points to the API v2: the types of representation that can be used to browse or access the data provided by the Open Data Hub Team

The representation consists now of a pair of comma-separated keywords composed of:

  1. the already existent flat or tree AND
  2. either node and edge

In both the flat and tree representations, all the metadata and available data are shown and browsable, the difference being that in flat, while tree keeps the hierarchical structure of the metadata.

The node and edge describe a StationType and the connection between two StationTypes, respectively.

Flat

In the flat representation, all metadata and available data can be accessed and browsed. However, no hierarchy appears and data and metadata are shown at the same level.

Tree

In the tree representation, all metadata and available data can be accessed and browsed as in flat, but in this case, any hierarchy of data or metadata is preserved and shown.

Node

A node is a measurement station and contains all metadata associated to it. The node representation corresponds to the old (pre-2020.10) output of the API calls, therefore it can safely be omitted for backward compatibility. As an example, valid for all methods listed in the previous section, these API calls are equivalent.

/v2/tree,node/{stationTypes}

/v2/flat,node/{stationTypes}

/v2/tree/{stationTypes}

/v2/flat/{stationTypes}

Note

While only available nodes are exposed by the Open Data Hub, the resulting JSON response might still include the savailable field, short for station available.

Edge

An Edge is a connection between two stations, improved with additional information, including some descriptive field and geometries that describe the connection on a map. Internally, an edge is composed of three parts (all calles stations): a start station (beginning of the edge), an end station and a station describing the edge. Whenever retrieving an Edge, all metadata referring directly to it begin with e, like for example eactive, eavailable, and so on.

Note

While only available edges are exposed by the Open Data Hub, the resulting JSON response might still include the sbavailable, seavailable and eavailable fields, referring to start station, end station, and edge description, respectively.

Moreover, there are neither measurements nor types associated with edges.

Valid combinations are therefore: flat,node; tree,node; flat,edge; tree,edge; if neither node or edge are provided, the default node will be used.

An additional representation is apispec, which allows to see and reuse the API specification in an OpenAPI v3 YAML format, suitable for swagger-like access to the data.

In the reminder of this section we show examples of some of the above mentioned API methods and describe the outcome, including the various keys and types of data returns by the call.

/v2/{representation}/{stationTypes}

To describe the outcome of this method in details, we will use the following snippet.

Listing 1 An excerpt of information about a charging station.
 1    {
 2   "pactive": false,
 3   "pavailable": true,
 4   "pcode": "AER_00000005",
 5   "pcoordinate": {
 6     "x": 11.349217,
 7     "y": 46.499702,
 8     "srid": 4326
 9   },
10   "pmetadata": {
11     "city": "BOLZANO - BOZEN",
12     "state": "ACTIVE",
13     "address": "Via Cassa di Risparmio  - Sparkassenstraße 14",
14     "capacity": 2,
15     "provider": "Alperia Smart Mobility",
16     "accessType": "PUBLIC",
17     "paymentInfo": "https://www.alperiaenergy.eu/smart-mobility/punti-di-ricarica.html",
18     "municipality": "Bolzano - Bozen"
19   },
20   "pname": "BZ_CASSARISP_01",
21   "porigin": "ALPERIA",
22   "ptype": "EChargingStation",
23   "sactive": false,
24   "savailable": true,
25   "scode": "AER_00000005-1",
26   "scoordinate": {
27     "x": 11.349217,
28     "y": 46.499702,
29     "srid": 4326
30   },
31   "smetadata": {
32     "outlets": [
33       {
34         "id": "1",
35         "maxPower": 22,
36         "maxCurrent": 31,
37         "minCurrent": 0,
38         "hasFixedCable": false,
39         "outletTypeCode": "Type2Mennekes"
40       }
41     ],
42     "maxPower": 7015,
43     "maxCurrent": 31,
44     "minCurrent": 6,
45     "municipality": "Bolzano - Bozen",
46     "outletTypeCode": "IEC 62196-2 type 2 outlets (all amperage and phase)"
47   },
48   "sname": "BZ_CASSARISP_01-253",
49   "sorigin": "ALPERIA",
50   "stype": "EChargingPlug"
51 }

You immediately notice that all the keys in the first level start either with a p (pactive, pcoordinate, and so on) or an s (sactive, scoordinate, and so on): the former, p, refers to data about the parent stations, s to data of the station itself. Besides the initial p or s, the meaning of the key is the same. In the snippet above, you see that all the data about a station are grouped together and come after the data of its parent (see lines.

The meaning of the keys are:

  • active: the station is actively sending data to the Open Data Hub. A station is automatically marked as not active (i.e., pactive = false) when it does not send data for a given amount of time (24 hours).

  • available: data from this station is available in the Open Data Hub.

    Note

    active and available might seem duplicates, but a station can be available but not active or vice-versa: In the former case, it means that its historical data have been recorded and can be accessed, although it currently does not send any data (for example, due to a network error or because it is not working or because it has been decommissioned); in the latter case, the station has started to send its data but they are not yet accessible (for example, because the are still being pre-processed by the Open Data Hub).

  • code: a unique IDentifier

  • coordinate: the station’s geographical coordinates

  • metadata: it may contain any kind of information about the station and mostly depends on the type of the station and the data it sends. In the snippets above, lines 10-16 contain information about the location of a charging station, while lines 28-38 technically describe the type of plugs available to recharge a car.

    Hint

    The metadata has only one limitation: it must be either a JSON object or NULL.

  • name: a (human readable) name of the station

  • origin: the source of the station, which can be anything, like for example the name of the Data Providers, the spreadsheet or database that contained the data, a street address, and so on.

  • type: the type of the station, which can be a MeteoStation, TrafficStation, EChargingPlug, Bicycle, and so on.

    Note

    The name of the StationType is Case Sensitive! You can retrieve all the station types with the following API call.

    ~$ curl -X GET "https://mobility.api.opendatahub.com/v2/tree" -H "accept: application/json"
    

/v2/{representation}/{stationTypes}/{dataTypes}/latest

This API call introduces two new prefixes to the keys, as shown in Listing 2.

Listing 2 An excerpt of information about a charging station.
 1{
 2   "tdescription": "",
 3   "tmetadata": {},
 4   "tname": "number-available",
 5   "ttype": "Instantaneous",
 6   "tunit": "number of available vehicles / charging points",
 7
 8   "mperiod": 300,
 9   "mtransactiontime": "2018-10-24 01:05:00.614+0000",
10   "mvalidtime": "2020-05-01 07:30:00.335+0000",
11   "mvalue": 1,
12}

The new prefixes are t and m. The t prefix refers to Data Types, i.e., how the values collected by the sensors are measured. See below for a more detailed description of data types and some tip about them. The m prefix refers to a measurement, that is, how often the data are collected, timestamp of the measure, when it is transmitted to be stored, and other information.

Alongside all keys present in Listing 1 (see previous section), Listing 2 contains the additional key:

  • ttype: the type of the data, which can be expressed as either a custom string, like in the example above, or as a DB function like COUNT, SUM, AVERAGE, or similar
  • tunit the unit of measure
  • mperiod: the time in seconds between two consecutive measures
  • mtransactiontime: timestamp of the transmission of the data to the database
  • mvalidtime: timestamp of the measurement. It is either the moment in time when the measurement took place or the time in the future in which the next measure will be collected.
  • mvalue: the absolute value of the measure, represented in either double precision or string format. It must be paired with the t keys to understand its meaning.

Listing 2 represents an EChargingStation with one available charging point; the last measure was taken on 2020-05-01 07:30:00.335+0000 and will be repeated every 5 minutes (300 seconds). Moreover, the station appears to not transmit its data anymore, so historical data might not be available.

Data types in the datasets.

Data types are not normalised; that is, there is no standard or common unit across the datasets. Indeed, each data collector defines its own data types and they may vary quite a lot from one dataset to another. There is also neither a common representation format for data types, therefore a same unit can appear quite different in different datasets. For example, to express microseconds, one dataset can use

"tdescription": "Time interval measured in microseconds",
"tmetadata": {},
"tname": "Time interval",
"ttype": "Instantaneous",
"tunit": "ms",

While another:

"tdescription": "Microseconds between two consecutive measures",
"tmetadata": {},
"tname": "Time interval",
"ttype": "COUNT",
"tunit": "milliseconds",

We can see that, although we might understand that the measures from the two datasets are indeed expressed in milliseconds, this is not true for machine-processed data

/v2/{representation}/{stationTypes}/{dataTypes}/{from}/{to}

This method does not add any other keys to the JSON response; all the keys described in the previous two section are valid and can be used.

Advanced Data Processing

Before introducing advanced data processing techniques, we recall that queries against the Open Data Hub’s datasets always return a JSON output.

Advanced processing allows to build SQL-style queries using the SELECT and WHERE keywords to operate on the JSON fields returned by the calls described in the previous section. SELECT and WHERE have the usual meaning, with the former retrieving data from a JSON field, in the form of SELECT=target[,target,...], and the latter retrieving records from the JSON output, using the WHERE=filter[,filter,...] form, with an implicit and among the filters, therefore evaluation of the filters takes place only if all filters would individually evaluate to true.

The SELECT Clause

In order to build select clauses, it is necessary to know the structure of the JSON output to a query, therefore we illustrate this with an example with the following excerpt from the Parking dataset that represents all data about one parking station:

{
  "sactive": false,
  "savailable": true,
  "scode": "102",
  "scoordinate": {
    "x": 11.356305,
    "y": 46.496449,
    "srid": 4326
  },
  "smetadata": {
    "state": 1,
    "capacity": 233,
    "mainaddress": "Via Dr. Julius Perathoner",
    "phonenumber": "0471 970289",
    "municipality": "Bolzano - Bozen",
    "disabledtoiletavailable": true
  },
  "sname": "P02 - City parking",
  "sorigin": "FAMAS",
  "stype": "ParkingStation"
}

You see that there are two hierarchies with two levels in the snippet: scoordinate and smetadata; to retrieve only data from them we will use the select clause with the /v2/{representation}/{stationTypes} call; you can therefore:

  • retrieve only the metadata associated with all the stations; the select clause would be: select=smetadata
  • retrieve all the cities in which there are ParkingStations with select=smetadata.municipality
  • retrieve all cities and addresses of all ParkingStations: select=smetadata.municipality,smetadata.mainaddress

The latter two examples show that to go down one more step into the hierarchy, you simply add a dot (”.”) before the attribute in the next level of the hierarchy. Moreover, you can extract multiple values from a JSON output, provided you separate them with a comma (”,”) and use no empty spaces in the clause. In the above examples, each of the element within parentheses–smetadata, smetadata.municipality, and smetadata.mainaddress-- is called target.

Within a SELECT clause, SQL functions are allowed and can be mixed with targets, allowing to further process the output, with the following limitations:

  • Only numeric functions are allowed, like e.g., min, max, avg, and count
  • No string selection or manipulation is allowed, but left as a post-processing task
  • When a function is used together with other targets, these are used for grouping purposes. For example: select=sname,max(smetadata.capacity),min(smetadata.capacity) will return the parking lots with the highest and lowest number of available parking spaces.

The WHERE Clause

The WHERE clause can be used to define conditions to filter out unwanted results and can be built with the use of the following operators:

  • eq: equal
  • neq: not equal
  • lt: less than
  • gt: greater than
  • lteq: less than or equal
  • gteq: greater than or equal
  • re: regular expression
  • ire: case insensitive regular expression
  • nre: negated regular expression
  • nire: negated case insensitive regular expression
  • bbi: bounding box intersecting objects (ex., a street that is only partially covered by the box)
  • bbc: bounding box containing objects (ex., a station or street, that is completely covered by the box)
  • in: true if the value of the target can be found within the given list. Example: name.in.(Patrick,Rudi,Peter)
  • nin: False if the value of the target can be found within the given list. Example: name.nin.(Patrick,Rudi,Peter)
  • and(filter,filter,…): Conjunction of filters (can be nested)
  • or(filter,filter,…): Disjunction of filters (can be nested)

As an argument to the filter, it is possible to add either a single value or a list of values; in both cases, operators are used to determine a condition and only items matching all of the filters will be included in the answer to the query (implicit AND). Like in the case of SELECT clauses, multiple comma-separated conditions may be provided. As an example, the following queries use a value and a list of values, respectively:

  • where=smetadata.capacity.gt.100 returns only parking lots with more than 100 parking spaces
  • where=smetadata.capacity.gt.100,smetadata.municipality.eq."Bolzano - Bozen" same as previous query, but only parking lots in Bolzano are shown.

In these two examples we use a number in the filter (i.e., gt.100), which is by default automatically recognised as a number and the required math is calculated out of the box. In case there is a query in which you use a number, but need to consider it as a string, you need to use double quotes, like gt.“100”.

Logical Operators

Besides the operators described in section The WHERE Clause, Open Data Hub supports the use of logical operators and and or in the WHERE clause, like these examples show.

1and(x.eq.3,y.eq.5)
2x.eq.3,y.eq.5
3
4or(x.eq.3,y.eq.5)
5or(x.eq.3,and(y.gt.5,y.lt.10))

Logical operators are followed by a comma-separated list of targets, which can be filters (see previous section for some example), or other logical operators. In complex logical expression, parentheses are employed to assign precedence. Lines 1 and 2 above are equivalent, because the default logical operator is and.

The above example will be translated into Postgres as follows:

1(x = 3 AND y = 5)
2(x = 3 AND y = 5)
3
4(x = 3 OR y = 5)
5(x = 3 OR (y > 5 AND y < 10))

Additional Parameters

There are a couple of other parameter that can be given to the API calls and are described in this section.

shownull

In order to show null values in the output of a query, add shownull=true to the end of your query.

distinct

Results in query responses contain unique results, that is, if for some reason one element is retrieved multiple times while the query is executed, it will be nonetheless shown only once, for performance reasons. It is however possible to retrieve each single result and have it appear in the response by adding distinct=true to the API call.

Warning

Keeping track of all distinct values might be a resource-intensive process that significantly rises the response time, therefore use it with care.

timezone

By default, the timestamp of the Open Data Hub responses is given in UTC time zone. The use of the timezone parameter allows to modify the timestamp whenever desirable. To use it, simply append the parameter to your API call.

/flat/ParkingStation/occupied/latest?timezone=UTC-2

/flat/ParkingStation/occupied/latest?timezone=Europe/Rome

Note

As argument to the timezone parameter, you can use any allowed value in Java’s Time zone implementation.