This is the website of the Open Data Hub documentation, a collection of technical resources about the Open Data Hub project. The website serves as the main resource portal for everyone interested in accessing the data or deploying apps based on datasets & APIs provided by the Open Data Hub team.
The technical stuff is composed of:
- Catalogue of available datasets.
- How-tos, FAQs, and various tips and tricks for users.
- Links to the full API documentation.
- Resources for developers.
For non-technical information about the Open Data Hub project, please point your browser to https://opendatahub.bz.it/.
The Open Data Hub project envisions the development and set up of a portal whose primary purpose is to offer a single access point to all (Open) Data from the region of South Tyrol, Italy, that are relevant for the economy sector and its actors. This will also allow everybody to utilise these data in all digital communication channels and build application on top of the data offered, be them either a PoC to explore new means or new field in which to use Open Data Hub data, or novel and innovative services or software products built on top of the data.
All the data within the Open Data Hub will be easily accessible, preferring open interfaces and APIs which are built on existing standards like The Open Travel Alliance (OTA), The General Transit Feed Specification (GTFS), Alpinebits.
Depending on the development of the project and the interest of users, more standards and data formats might be supported in the future.
Open Data Hub Architecture¶
The architecture of the Open Data Hub is depicted in Figure 2, which shows its composing elements together with its main goal: To gather data from Data Sources and make them available to Data Consumers, which are usually third-party applications that use those data in any way that they deem useful, including (but not limited to) study the evolution of historical data, or carry out data analysis to produce statistical graphics.
At the core of the Open Data Hub lays the Big Data Platform, a java application which contains all the business logic and handles all the connections with the underling database using the DAL. The Big Data Platform is composed by different modules: A Writer, that receives data from the Data Sources and stores them in the Database using the DAL.
Communication with the Data Sources is guaranteed by the Data Collectors, which are Java applications built on top of the dc-interface that use a DTO for each different source to correctly import the data. Dual to the dc-interface, the ws-interface allows the export of DTOs to web services, that expose them to Data Consumers.
Records in the Data Sources can be stored in any format and are converted into JSON as DTOs. They are then transmitted to the Writer, who converts them and stores them in the Database using SQL. To expose data, the Reader queries the DB using SQL, transforms them in JSON’s DTOs to the Web Services who serve the JSON to the Data Consumers.
The Elements of the Big Data Platform in Details¶
As Figure 2 shows, the Big Data Platform is composed by a number of elements, described in the remainder of this section in the same order as they appear in the picture.
- Data Source
- A Data Source is the origin of one ore more datasets, which usually belongs to a single domain. Data are usually automatically picked up by sensors and stored in some format, like for example CSV.
- A dataset is a collection of records that originate from the same Data Source. Within the Open Data Hub, a same Data Source may provide more datasets, that include slight different data, but there is at least one dataset per domain. The underlying data format of a dataset never changes.
- Data Collectors
- Data collectors are a library of Java classes used to transform data coming from Data Sources into a format that can be understood, used, and stored by the Big Data Platform. As a rule of thumb, each Data Collector is used for one Data Source or dataset and use DTOs to transfer them to the Big Data Platform. They are usually created by extending the dc-interface in the bpd-core repository.
- The Data Transfer Object are used to translate the data format from the various formats used by the Data Sources, to be read from the writer and to be exposed by the reader (see below). DTOs are written in JSON. and are composed of three Entities: Station, Data Type, and Record.
- With the Writer, we enter in the core of the Big Data Platform. Its purpose is to receive DTOs from the Data Collectors and store them into the DB and therefore implements all methods to read the DTO’s JSON format and to write to the database using SQL.
- The Data Abstraction Layer is used by both the Writer and the Reader to access the Database and exchange DTOs and relies on Java Hibernate. It contains classes that map the content of a DTO to corresponding database tables.
- Database (DB)
- The database represents the persistence layer and contains all the data sent by the Writer. Its configuration requires that two users be defined, one with full permissions granted -used by the writer, and one with read-only permissions, used bye the Reader.
- The reader is the last component of the Core. It uses the DAL to retrieve DTOs from the DB and to transmit them to the web services.
- Web Services
- The Web Services, which extend the ws-interface in the bdp-core repository, receive data from the Reader and make them available to Data Consumers by exposing APIs and REST endpoints. They transform the DTO they get into JSON.
- Data Consumers
- Data consumers are (web-)applications that use the JSON produced by web services and manipulates them to produce a useful output for the final user.
Also part of the architecture, but not pictured in the diagram, is the
persistence.xml file, which contains the credentials and
postgres configuration used by both the Reader and Writer.
Available Domains and APIs¶
The domains intended as sources for data served by the Open Data Hub are depicted in Figure 1.
The API of a software contains the definition of methods and of their signatures, that can be invoked to retrieve data from the web services provided by the software itself. The signature of each method defines how to invoke the method (i.e., the name of the method), which parameters should be supplied (i.e., their names and types, if they are mandatory or not, and what the method returns (i.e., the type and format of the output produced by the method. By using an API, it is possible to receive data from the web service and process them.
Currently, the following APIs are available from the Open Data Hub:
- Mobility APIs
- SASAbus APIs
- Tourism APIs.
The first and second APIs provide datasets that belong to the Mobility Domain, while the third one to datasets in the Tourism Domain.
The Mobility APIs allow to access real-time data of the datasets concerning the e-mobility, including data about e-charging stations, availability of plugs to recharge e-cars, and so on.
The SASAbus APIs are part of the Mobility domain and allow to access various type of data about buses and station.
The Tourism API allows to access locations (of hotels, museums, events, and so on), points of interests, and a number of other information about the tourism in South Tyrol.
The authentication layer is currently intended for internal use only.
Authentication in Open Data Hub is mainly used in the part of the Big Data Platform which exposes data to the consumer, which means by the Reader and in every single webservice accessing the Reader, to allow the access to closed data in each dataset only to those who are allowed to.
There are currently two different authentication methods available:
- The Token-based Authentication, defined in RFC 6750, requires that anyone who wants to access resources supply a valid username and password and becomes a Bearer Token that must be used to access the data. After the token expires, a new one must be obtained. This type of authentication is used for the datasets in the tourism domain.
- The OAuth2 Authentication follows the RFC 6749 and is used for all the datasets in the mobility domain.
For those not familiar with the OAuth2 mechanism, here is a quick description of the client-server interaction:
The client requests the permission to access restricted resources to the authorisation server.
The authorisation server replies with a refresh token and an access token. The access token contains an expire date.
The access token can now be used to access protected resources on the resource server. To be able to use the access token, add it as a Bearer token in the Authorization header of the HTTP call. Bearer is a means to use tokens in HTTP transactions. The complete specification can be found in RFC 6750.
If the access token has expired, you’ll get a HTTP
401 Unauthorizedresponse. In this case you need to request a new access-token, passing your refresh token in the Authorization header as Bearer token. As an example, in Open Data Hub datasets Bearer tokens can be inserted in a curl call like follows:
curl -X GET "$HTTP_URL_WITH_GET_PARAMETERS" -H "accept: */*" -H "Authorization: Bearer $TOKEN"
Here, $HTTP_URL_WITH_GET_PARAMETERS is the URL containing the API call and “$TOKEN” is the string of the token.