New release of DataGraft!

We are delighted to announce the second beta release of the DataGraft platform!

What is DataGraft?

DataGraft serves as the core of the proDataMarket producer portal. DataGraft is an online platform that provides data transformation, publishing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., Open Data publishers, Linked Data developers, data scientists).

The DataGraft platform mainly consists of three components – the DataGraft portal, Grafterizer and a cloud-enabled semantic graph database-as-a-service (as shown on the picture below), which is based on a dedicated instance of the Ontotext GraphDB Cloud platform mentioned in a previous blog post.

Main components of DataGraft platform

What’s new?

DataGraft has undergone major changes since the previous version:

  • New asset types for the catalogue and better sharing between users of the platform
    • SPARQL endpoints
    • queries
    • file pages
  • Improved Grafterizer capabilities
    • conditional RDF mappings
    • support various types and formats of tabular inputs
  • Versioning of assets
    • browsing
    • recording of provenance when copying assets
  • Visual browsing of SPARQL endpoints (using RDF Surveyor)
  • New Dashboard
    • more control of user assets
    • instant search and various filters
  • Improved security (authentication) using OAuth2
  • REST API improvements using Swagger
  • Updated version of the semantic graph database, which now supports geospatial queries and serialisation to GeoJSON
  • Various bug fixes and performance improvements
  • Updated user documentation
  • Quota management console allowing users to track their use of resources on the platform

DataGraft beta 2 is available for testing on and more details can be found in the platform documentation here. All platform code except the GraphDB Cloud component (used as a service) is open-source and is available on GitHub.


Understanding territorial distribution of Properties of Managers and Shareholders: a Data-driven Approach

Thanks to the collaboration between Cerved, SINTEF and “Territorio Italia” it was possible to publish a paper which presents a new score developed by Cerved.”Territorio Italia” is an open access peer-reviewed scientific magazine focused on territorial and geographic topics; it is edited by Agenzia dell’Entrate, which is the Italian Revenue authority.

The paper has been announced in the previous blog post. In this post we highlight the main results, the Manager and Shareholders Concentration score and its application to the cities of Turin, Milan and Rome.

Manager and Shareholders Concentration (MSHC) score

The paper introduces the “Manager and Shareholders Concentration (MSHC) score” – an index created with the aim of identifying the wealthiest areas within a certain municipality. This is of
particular interest for the real estate market, especially when there are several wealthy areas within
the same city. The paper thus introduces the index and demonstrate how it can correctly identify
the areas with high real estate values within a city, even when they are located far from the city centre.
The approach proposed in the paper aims to directly observe the distribution of the properties of the wealthiest citizens, who usually choose to move to and live in the most prestigious areas. While this phenomenon can be observed in many cities around the world, in Italy it is particularly evident in the city of Turin: although they are endowed with fascinating city centres, many of the buildings of greatest importance are located on the hills far from the centre. The crucial question becomes to correctly determine which sample of citizens to select and qualify as managers or, more generally, wealthy people. To do this, we used Cerved’s proprietary database – a database containing public data on all Italian companies – to extract information about individuals recognized as shareholders and/or managers. In the context of this work, a shareholder is considered anyone who owns shares above the threshold percentage of 25% of the company’s share capital, while a manager is defined as anyone who holds a key position within a company, accomplishes management duties, and is legally liable for the company’s debts. In calculating the MSHC score, the basic idea is to observe the total number of properties of managers and shareholders per geographic area, comparing this information with the total number of residents in the same geographic area. This approach provides a result that can be immediately visualized graphically using thematic maps; for example, by plotting the score on a map of the city of Turin, it may be noted that the two most relevant areas are, respectively, the centre and the hill on the eastern side of the city.


The territorial distribution of the MSHC score can be easily observed through a heat map. On the maps, darker colours correspond to high scores, while lighter colours are associated with lower scores. Heat maps also allow the territorial distribution of real estate values to be easily compared, in order to verify whether there is a correlation between prices and scores. For the city of Turin, it was possible to analyse the correlation between the MSHC score and the asking prices for real estate provided by Osservatorio Immobiliare della Città di Torino – OICT (Turin Real Estate Market Observatory), in comparison with their territorial distribution. For the cities of Rome and Milan, the comparison between the MSHC score and real estate values was made using the values published by Osservatorio del Mercato Immobiliare (OMI) of Agenzia dell’Entrate, an important reference for the real estate market on the national level.


The score shows high values in the city centre, the hill, and the micro-areas on the western side of the city, while it correctly identifies the south and north areas of the city as less prestigious. This result confirms that the score can also be considered a valuable tool for predicting values on the real estate market.

Figure 1 Territorial distribution of the MSHC score in the city of Turin. The MSHC score is displayed on the map, associating a darker colour with higher scores and brighter colours with lower


The second city chosen to analyse the MSHC score is Rome, a very complex city due to the vastness of the municipal area that is not comparable to any Italian metropolis, as well as due to the particular shape of some specific areas, namely the proximity to the city-state of the Vatican, the large number of historical and cultural points of interest, and access to the sea.

The size of the Italian capital does not allow the distribution to be observed in detail, but it may be noted that there are more high-value areas, which correspond to actual high-value neighbourhoods and others, which can be defined as emerging neighbourhoods due to the presence of undergrounds and public transit.

Figure 2 Territorial distribution of the MSHC score in the city of Rome. The MSHC score is visualised on the map by associating a darker colour with higher scores, and brighter colours with lower scores


The third city used to analyse the MSHC score was Milano – a city that has experienced major changes in recent years. Milan has seen the development of new neighbourhoods and skyscrapers, a universal exposition (EXPO), and a new underground line (with another under development) after years of inactivity. The highest MSHC score is found in the centre of the city, while in the suburbs not many neighbourhoods are identified as particularly wealthy.

Figure 3 Territorial distribution of the MSHC score in the city of Milan. The MSHC score is visualised on the map by associating a darker colour with higher scores, and brighter colours with lower scores


The MSHC score illustrated in the paper provides an interesting index that may be used to better comprehend where the richest segments of the population live, and consequently to identify the areas of the city with the highest real estate values. Obviously, although considering this score alone is not enough to support the valuation of real estate property values, together with other indicators under development at Cerved (for real estate valuation) it represents an excellent starting point. For a more in-depth analysis and to observe how much the score is correlated with housing price please have a look at the entire paper and the complete results [1].


[1] Stefano Pozzati, Diego Sanvito, Claudio Castelli, Dumitru Roman. Understanding territorial distribution of Properties of Managers and Shareholders: a Data-driven Approach. Territorio Italia 2 (2016), DOI: 10.14609/Ti_2_16_2e

URL to access the article in Italian.

URL to access the article in English.


Integrating multisectoral datasets: from satellites to real estate scoring model

During a project meeting in Sofia on September 21, 2016, Cerved teamed up with TRAGSA to brainstorm ideas of re-using the TRAGSA methods for processing satellite imagery to analyse green areas in urbanized cities.

Fundamentals of Tragsa Processing

A common feature in Vegetation Spectra is the high contrast observed between the red band and the Near Infrared (NIR) region. The optical instrument carried by Sentinel 2 satellites samples 13 spectral bands, including high resolution bands in the red (bands 4, 5 & 6) as well as bands in the NIR (8 & 8A). Refer to this blog post for more details about processing Sentinel 2 data.

Using the TRAGSA methodology it is possible to isolate and enhance the vegetation, to locate green areas in urban areas. Green areas are important input to the Cerved’s innovative real estate evaluation model (which is being developed within one of the Cerved’s business cases in the project, as introduced in this blog post). Cerved uses open data, to generate indicators of green areas defined for the model: green area coverage and distance to the wood. Operations that Cerved performs to compute these indicators are similar to those that TRAGSA does on satellite data, such as clustering of green areas into big areas and isolating trees and group of trees. This motivated us to experiment with satellite data and TRAGSA’s methodology, to see whether we could potentially use more complete, structured and up-to-date source of green areas information as input to our real estate evaluation model.


We identified a highly urbanized Italian city but with particular attention to green areas, which is the city of Turin.

The steps that we followed:

  • extraction of city boundaries of Turin in GeoJSON format by SPAZIODATI
  • selections of good quality imagery for Turin from the Sentinel data repository by TRAGSA
  • processing S2 imagery in order to get a vector layer which indicates the presence or absence of a green area in each pixel (1/0) by TRAGSA
  • display of the green areas of the tiles (see the screenshot below) prototype Amerigo visualisation service, under development by SPAZIODATI
  • data processing and aggregation of the tiles into census cells areas, in order to develop green areas indicators for each census cell, by CERVED
  • integration and testing of the score dedicated to green areas within the business model CCRS (Cerved Cadastral Report Service) by CERVED


The result of this experiment was extremely surprising; the detail and accuracy of this new score in identifying the green areas (not only public green areas) is far greater than accuracy of the other scores, developed on public and open green areas of datasets.