How to Access Mobility Data With API v2

The new API v2 (see the description) for the Mobility domain has simplified the access to data; among its features, we recall that there is now one single endpoint from which to retrieve data from all datasets.

The starting point for all actions to be carreid out on the datasets made available by the Open Data Hub team is the following:

https://mobility.api.opendatahub.bz.it/v2/swagger-ui.html

../../_images/mobility-swagger.png

Figure 12 The swagger interface of the Mobility API v2.

From this site, links provide access to documentation about data licencing and use of the API; it is also possible to contact the Open Data Hub team by sending an email to the issue tracker, to ask questions, provide feedback, or to report issues.

Getting Started

In the API v2, the central concept is Station: all data come from a given StationType, whose complete list can be retrieved by simply opening the first method of the data-controller, /api, then click on Try it out and then on Execute.

Station types in the resulting list can be used in the other methods to retrieve additional data about each of them. To check which station belongs to which datasets, you can check the list of Datasets in the Mobility Domain.

The JSON Response Schema

We recall that every query to the mobility datasets will return a JSON-structured file with a number of information about one station (or more) and values it collected over time, both real-time and historical data.

The overall structure of the JSON is the following:

"offset": 0,
"data": [],
"limit": 200

Here, offset and limit are used for limiting the displayed results: limit gives the maximum number of results (defaults to 200), while offset allows to skip elements from the result set (defaults to 0, i.e., the results start from the first one . It is therefore possible to simulate pagination when there are many results: for example, if there are 1000 values, by adding to successive queries the offsets 0, 200, 400, 600, and 800, the response of the query is split on 5 pages of 200 results each.

Data is the actual payload of the response, that is, the data answering the query; since it changes depending on which API call/method is used, it will be described in the next section.

Structure of the API calls and Payload

In the Mobility domain, there are three general methods that can be used to extract data from the Open Data Hub’s datasets and allow to incrementally refine the data retrieved. They are:

  1. /api/{representation}/{stationTypes} returns data about the stations themselves, including metadata associated with it, and data about its parent stations.

  2. /api/{representation}/{stationTypes}/{dataTypes}. In addition to the data of the previous call, it contains the data types defined in the dataset and the most recent measurement. This method is especially suited for real time retrieval of data.

  3. /api/{representation}/{stationTypes}/{dataTypes}/{from}/{to}. All the data retrieved by the previous method, but limited to a given historical interval (fromto)

These methods introduce another facility made available by the API v2: the type of representation: each result set can be reproduced as a single, flat or as an indented, tree-like JSON file, the former more suitable for machine consumption, while the latter more convenient for human reading.

/api/{representation}/{stationTypes}

To describe the outcome of this method in details, we will use the following snippet.

Listing 1 An excerpt of information about a charging station.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
    {
   "pactive": false,
   "pavailable": true,
   "pcode": "AER_00000005",
   "pcoordinate": {
     "x": 11.349217,
     "y": 46.499702,
     "srid": 4326
   },
   "pmetadata": {
     "city": "BOLZANO - BOZEN",
     "state": "ACTIVE",
     "address": "Via Cassa di Risparmio  - Sparkassenstraße 14",
     "capacity": 2,
     "provider": "Alperia Smart Mobility",
     "accessType": "PUBLIC",
     "paymentInfo": "https://www.alperiaenergy.eu/smart-mobility/punti-di-ricarica.html",
     "municipality": "Bolzano - Bozen"
   },
   "pname": "BZ_CASSARISP_01",
   "porigin": "ALPERIA",
   "ptype": "EChargingStation",
   "sactive": false,
   "savailable": true,
   "scode": "AER_00000005-1",
   "scoordinate": {
     "x": 11.349217,
     "y": 46.499702,
     "srid": 4326
   },
   "smetadata": {
     "outlets": [
       {
         "id": "1",
         "maxPower": 22,
         "maxCurrent": 31,
         "minCurrent": 0,
         "hasFixedCable": false,
         "outletTypeCode": "Type2Mennekes"
       }
     ],
     "maxPower": 7015,
     "maxCurrent": 31,
     "minCurrent": 6,
     "municipality": "Bolzano - Bozen",
     "outletTypeCode": "IEC 62196-2 type 2 outlets (all amperage and phase)"
   },
   "sname": "BZ_CASSARISP_01-253",
   "sorigin": "ALPERIA",
   "stype": "EChargingPlug"
 }

You immediately notice that all the keys in the first level start either with a p (pactive, pcoordinate, and so on) or an s (sactive, scoordinate, and so on): the former, p, refers to data about the parent stations, s to data of the station itself. Besides the initial p or s, the meaning of the key is the same. In the snippet above, you see that all the data about a station are grouped together and come after the data of its parent (see lines.

The meaning of the keys are:

  • active: the station is actively sending data to the Open Data Hub. A station is automatically marked as not active (i.e., pactive = false) when it does not send data for a given amount of time (24 hours).

  • available: data from this station is available in the Open Data Hub.

    Note

    active and available might seem duplicates, but a station can be available but not active or vice-versa: In the former case, it means that its historical data have been recorded and can be accessed, although it currently does not send any data (for example, due to a network error or because it is not working or because it has been decommissioned); in the latter case, the station has started to send its data but they are not yet accessible (for example, because the are still being pre-processed by the Open Data Hub).

  • code: a unique IDentifier

  • coordinate: the station’s geographical coordinates

  • metadata: it may contain any kind of information about the station and mostly depends on the type of the station and the data it sends. In the snippets above, lines 10-16 contain information about the location of a charging station, while lines 28-38 technically describe the type of plugs available to recharge a car.

    Hint

    The metadata has only one limitation: it must be either a JSON object or NULL.

  • name: a (human readable) name of the station

  • origin: the source of the station, which can be anything, like for example the name of the Data Providers, the spreadsheet or database that contained the data, a street address, and so on.

  • type: the type of the station, which can be a MeteoStation, TrafficStation, EChargingPlug, Bicycle, and so on.

    Note

    This key is Case Sensitive! You can retrieve all the station types with the following call:

    curl -X GET "https://mobility.api.opendatahub.bz.it/v2/" -H "accept: application/json"
    

/api/{representation}/{stationTypes}/{dataTypes}

This API call introduces two new prefixes to the keys, as shown in Listing 2.

Listing 2 An excerpt of information about a charging station.
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
{
   "tdescription": "",
   "tmetadata": {},
   "tname": "number-available",
   "ttype": "Instantaneous",
   "tunit": "number of available vehicles / charging points",

   "mperiod": 300,
   "mtransactiontime": "2018-10-24 01:05:00.614+0000",
   "mvalidtime": "2020-05-01 07:30:00.335+0000",
   "mvalue": 1,
}

The new prefixes are t and m. The t prefix refers to Data Types, i.e., how the values collected by the sensors are measured. See below for a more detailed description of data types and some tip about them. The m prefix refers to a measurement, that is, how often the data are collected, timestamp of the measure, when it is transmitted to be stored, and other information.

Alongside all keys present in Listing 1 (see previous section), Listing 2 contains the additional key:

  • ttype: the type of the data, which can be expressed as either a custom string, like in the example above, or as a DB function like COUNT, SUM, AVERAGE, or similar

  • tunit the unit of measure

  • mperiod: the time in seconds between two consecutive measures

  • mtransactiontime: timestamp of the transmission of the data to the database

  • mvalidtime: timestamp of the measurement. It is either the moment in time when the measurement took place or the time in the future in which the next measure will be collected.

  • mvalue: the absolute value of the measure, represented in either double precision or string format. It must be paired with the t keys to understand its meaning.

Listing 2 represents an EChargingStation with one available charging point; the last measure was taken on 2020-05-01 07:30:00.335+0000 and will be repeated every 5 minutes (300 seconds). Moreover, the station appears to not transmit its data anymore, so historical data might not be available.

Data types in the datasets.

Data types are not normalised; that is, there is no standard or common unit across the datasets. Indeed, each data collector defines its own data types and they may vary quite a lot from one dataset to another. There is also neither a common representation format for data types, therefore a same unit can appear quite different in different datasets. For example, to express microseconds, one dataset can use

"tdescription": "Time interval measured in microseconds",
"tmetadata": {},
"tname": "Time interval",
"ttype": "Instantaneous",
"tunit": "ms",

While another:

"tdescription": "Microseconds between two consecutive measures",
"tmetadata": {},
"tname": "Time interval",
"ttype": "COUNT",
"tunit": "milliseconds",

We can see that, although we might understand that the measures from the two datasets are indeed expressed in milliseconds, this is not true for machine-processed data

/api/{representation}/{stationTypes}/{dataTypes}/{from}/{to}

This method does not add any other keys to the JSON response; all the keys described in the previous two section are valid and can be used.

Advanced Data Processing

Before introducing advanced data processing techniques, we recall that queries against the Open Data Hub’s datasets always return a JSON output.

Advanced processing allows to build SQL-style queries using the SELECT and WHERE keywords to operate on the JSON fields returned by the calls described in the previous section. SELECT and WHERE have the usual meaning, with the former retrieving data from a JSON field, in the form of SELECT=alias[,alias,...], and the latter retrieving records from the JSON output, using the WHERE=filter[,filter,...] form, with an implicit and among the filters, therefore evaluation of the filters takes place only if all filters would individually evaluate to true.

The SELECT Clause

In order to build select clauses, it is necessary to know the structure of the JSON output to a query, therefore we illustrate this with an example with the following excerpt from the it.bz.opendatahub.parking that represents all data about one parking station:

{
  "sactive": false,
  "savailable": true,
  "scode": "102",
  "scoordinate": {
    "x": 11.356305,
    "y": 46.496449,
    "srid": 4326
  },
  "smetadata": {
    "state": 1,
    "capacity": 233,
    "mainaddress": "Via Dr. Julius Perathoner",
    "phonenumber": "0471 970289",
    "municipality": "Bolzano - Bozen",
    "disabledtoiletavailable": true
  },
  "sname": "P02 - City parking",
  "sorigin": "FAMAS",
  "stype": "ParkingStation"
}

You see that there are two hierarchies with two levels in the snippet: scoordinate and smetadata; to retrieve only data from them we will use the select clause with the /api/{representation}/{stationTypes} call; you can therefore:

  • retrieve only the metadata associated with all the stations; the select clause would be: select=smetadata

  • retrieve all the cities in which there are ParkingStations with select=smetadata.municipality

  • retrieve all cities and addresses of all ParkingStations: select=smetadata.municipality,smetadata.mainaddress

The latter two examples show that to go down one more step into the hierarchy, you simply add a dot (“.”) before the attribute in the next level of the hierarchy. Moreover, you can extract multiple values from a JSON output, provided you separate them with a comma (“,”) and use no empty spaces in the clause. in the above examples, each of the element within parentheses–smetadata, smetadata.municipality, and smetadata.mainaddress– is called alias.

Within a SELECT clause, SQL functions are allowed and can be mixed with aliases, allowing to further process the output, with the following limitations:

  • Only numeric functions are allowed, like e.g., min, max, avg, and count

  • No string selection or manipulation is allowed, but left as a post-processing task

  • Functions can be use only with the flat representation

  • When a function is used together with other aliases, these are used for grouping purposes. For example: select=sname,max(smetadata.capacity),min(smetadata.capacity) will return the parking lots with the highest and lowest number of available parking spaces.

The WHERE Clause

The WHERE clause can be used to define conditions to filter out unwanted results and can be built with the use of the following operators:

  • eq: equal

  • neq: not equal

  • lt: less than

  • gt: greater than

  • lteq: less than or equal

  • gteq: greater than or equal

  • re: regular expression

  • ire: case insensitive regular expression

  • nre: negated regular expression

  • nire: negated case insensitive regular expression

  • bbi: bounding box intersecting objects (ex., a street that is only partially covered by the box)

  • bbc: bounding box containing objects (ex., a station or street, that is completely covered by the box)

  • in: true if the value of the alias can be found within the given list. Example: name.in.(Patrick,Rudi,Peter)

  • nin: False if the value of the alias can be found within the given list. Example: name.nin.(Patrick,Rudi,Peter)

  • and(filter,filter,…): Conjunction of filters (can be nested)

  • or(filter,filter,…): Disjunction of filters (can be nested)

As an argument to the filter, it is possible to add either a single value or a list of values; in both cases, operators are used to determine a condition and only items matching all of the filters will be included in the answer to the query (implicit AND). Like in the case of SELECT clauses, multiple comma-separated conditions may be provided. As an example, the following queries use a value and a list of values, respectively:

  • where=smetadata.capacity.gt.100 returns only parking lots with more than 100 parking spaces

  • where=smetadata.capacity.gt.100,smetadata.municipality.eq."Bolzano - Bozen" same as previous query, but only parking lots in Bolzano are shown.