CAPAS Business Case: results & outlook

Tragsa has developed the CAPAS service which integrates multi-sectorial data for better Common Agriculture Policy (CAP) funds assignments to farmers and land owners. Several external datasets – as LiDAR , Copernicus Sentinel2 and Protected Sites from the Spanish Environment and Agriculture Ministry, among others- has been used to improve the Spanish Land Parcel Identification System.

Products from LiDAR data

LiDAR files are a collection of points stored as tuples which represent longitude, latitude, and elevation. This data is provided by the Spanish National Geographic Institute (IGN). This data was processed using automatic algorithms to detect landscape elements (copses and isolated trees) within agricultural parcels.

Protected sites and ecological value report

On one hand, the density of isolated trees and the presence of copses were evaluated with the Landscape Elements Value. On the other hand, the presence/absence of protected areas that intersects subplots was evaluated with a score named Protected sites Value. The result of the sum of Protected Sites Value plus Landscape Elements Value is an Ecological value.

The full description of these products and how they were generated and their validation is explained here.

Products from Sentinel2 data

The Sentinels are a fleet of satellites for land monitoring which is part of the European Copernicus program. The products generated from satellite data were explained in a previous blogpost.

Every week, the images with low cloud cover percentage were downloaded and processed to generate three single products (true colour image, false colour image and NDVI). For the pilot area, Castile and Madrid regions, a total amount of 168 tiles were processed during the year 2017 (until the 31st of August). The irrigation maps were generated in two pilot areas. They were evaluated and they proved to be helpful to identify the crops in control tasks.

Other products generated by CAPAS have been used to update the LPIS database.  For example, the grassland layer displays actual grassland areas. The change detection layer highlights the changes happened since the last updating of LPIS and it is focused in changes between agricultural land, forests, and grassland areas.

Change detection layer in TAEJ
Change detection layer in TAEJ
Legend changes

Grassland layer in LUPI
Grassland layer in LUPI
 Legend grassland

Conclusion

Many innovative products were generated by CAPAS business case leveraging previously under-used data. The different methodologies and derived products proved a high success ratio after several tests and all the resulting data can be obtained and visualized on the ProDataMarket platform.

Understanding territorial distribution of Properties of Managers and Shareholders: a Data-driven Approach

Thanks to the collaboration between Cerved, SINTEF and “Territorio Italia” it was possible to publish a paper which presents a new score developed by Cerved.”Territorio Italia” is an open access peer-reviewed scientific magazine focused on territorial and geographic topics; it is edited by Agenzia dell’Entrate, which is the Italian Revenue authority.

The paper has been announced in the previous blog post. In this post we highlight the main results, the Manager and Shareholders Concentration score and its application to the cities of Turin, Milan and Rome.

Manager and Shareholders Concentration (MSHC) score

The paper introduces the “Manager and Shareholders Concentration (MSHC) score” – an index created with the aim of identifying the wealthiest areas within a certain municipality. This is of
particular interest for the real estate market, especially when there are several wealthy areas within
the same city. The paper thus introduces the index and demonstrate how it can correctly identify
the areas with high real estate values within a city, even when they are located far from the city centre.
The approach proposed in the paper aims to directly observe the distribution of the properties of the wealthiest citizens, who usually choose to move to and live in the most prestigious areas. While this phenomenon can be observed in many cities around the world, in Italy it is particularly evident in the city of Turin: although they are endowed with fascinating city centres, many of the buildings of greatest importance are located on the hills far from the centre. The crucial question becomes to correctly determine which sample of citizens to select and qualify as managers or, more generally, wealthy people. To do this, we used Cerved’s proprietary database – a database containing public data on all Italian companies – to extract information about individuals recognized as shareholders and/or managers. In the context of this work, a shareholder is considered anyone who owns shares above the threshold percentage of 25% of the company’s share capital, while a manager is defined as anyone who holds a key position within a company, accomplishes management duties, and is legally liable for the company’s debts. In calculating the MSHC score, the basic idea is to observe the total number of properties of managers and shareholders per geographic area, comparing this information with the total number of residents in the same geographic area. This approach provides a result that can be immediately visualized graphically using thematic maps; for example, by plotting the score on a map of the city of Turin, it may be noted that the two most relevant areas are, respectively, the centre and the hill on the eastern side of the city.

HEATMAP

The territorial distribution of the MSHC score can be easily observed through a heat map. On the maps, darker colours correspond to high scores, while lighter colours are associated with lower scores. Heat maps also allow the territorial distribution of real estate values to be easily compared, in order to verify whether there is a correlation between prices and scores. For the city of Turin, it was possible to analyse the correlation between the MSHC score and the asking prices for real estate provided by Osservatorio Immobiliare della Città di Torino – OICT (Turin Real Estate Market Observatory), in comparison with their territorial distribution. For the cities of Rome and Milan, the comparison between the MSHC score and real estate values was made using the values published by Osservatorio del Mercato Immobiliare (OMI) of Agenzia dell’Entrate, an important reference for the real estate market on the national level.

TURIN

The score shows high values in the city centre, the hill, and the micro-areas on the western side of the city, while it correctly identifies the south and north areas of the city as less prestigious. This result confirms that the score can also be considered a valuable tool for predicting values on the real estate market.

Figure 1 Territorial distribution of the MSHC score in the city of Turin. The MSHC score is displayed on the map, associating a darker colour with higher scores and brighter colours with lower

ROME

The second city chosen to analyse the MSHC score is Rome, a very complex city due to the vastness of the municipal area that is not comparable to any Italian metropolis, as well as due to the particular shape of some specific areas, namely the proximity to the city-state of the Vatican, the large number of historical and cultural points of interest, and access to the sea.

The size of the Italian capital does not allow the distribution to be observed in detail, but it may be noted that there are more high-value areas, which correspond to actual high-value neighbourhoods and others, which can be defined as emerging neighbourhoods due to the presence of undergrounds and public transit.

Figure 2 Territorial distribution of the MSHC score in the city of Rome. The MSHC score is visualised on the map by associating a darker colour with higher scores, and brighter colours with lower scores

MILANO

The third city used to analyse the MSHC score was Milano – a city that has experienced major changes in recent years. Milan has seen the development of new neighbourhoods and skyscrapers, a universal exposition (EXPO), and a new underground line (with another under development) after years of inactivity. The highest MSHC score is found in the centre of the city, while in the suburbs not many neighbourhoods are identified as particularly wealthy.

Figure 3 Territorial distribution of the MSHC score in the city of Milan. The MSHC score is visualised on the map by associating a darker colour with higher scores, and brighter colours with lower scores

CONCLUSION

The MSHC score illustrated in the paper provides an interesting index that may be used to better comprehend where the richest segments of the population live, and consequently to identify the areas of the city with the highest real estate values. Obviously, although considering this score alone is not enough to support the valuation of real estate property values, together with other indicators under development at Cerved (for real estate valuation) it represents an excellent starting point. For a more in-depth analysis and to observe how much the score is correlated with housing price please have a look at the entire paper and the complete results [1].

References

[1] Stefano Pozzati, Diego Sanvito, Claudio Castelli, Dumitru Roman. Understanding territorial distribution of Properties of Managers and Shareholders: a Data-driven Approach. Territorio Italia 2 (2016), DOI: 10.14609/Ti_2_16_2e

URL to access the article in Italian.

URL to access the article in English.

 

The proDataMarket Ontology: Enabling Semantic Interoperability of Real Property Data

Real property data (often referred to as real estate, realty, or immovable property data) represent a valuable asset that has the potential to enable innovative services when integrated with related contextual data (e.g., business data). Such services can range from providing evaluation of real estate to reporting on up-to-date information about state-owned properties. Real property data integration is a difficult task primarily due to the heterogeneity and complexity of the real property data, and the lack of generally agreed upon semantic descriptions of the concepts in this domain. The proDataMarket ontology is developed in the project as a key enabler for integration of real property data.

The proDataMarket ontology design and development process followed techniques and design choices supported by existing methodologies, mainly the one proposed by Noy [1]. Requirements are extracted from a set of relevant business cases and competency questions [2] are defined for each business case, so as core concepts and relationships. A conceptual model is then developed based on the requirements mentioned above and international standards including ISO 19152:2012 and European Union’s INSPIRE data specifications. For example, the LADM conceptual model from ISO 19152:2012 is used as reference model to the proDataMarket cadastral domain conceptual model. Afterwards we implemented the conceptual model using RDFS/OWL linked data standard. RDFS is used to model concepts, properties and simple relationships such as rdfs:subClassOf. OWL is built upon RDFS and provides a richer language for web ontology modelling and it is used to model constraints and other advanced relationships, such as the cardinality constraint needed to express the relationship between properties and buildings.

The proDataMarket ontology can be accessed at http://vocabs.datagraft.net/proDataMarket/. The ontology has been divided into several sub-ontologies (see Table below), reflecting the cross-domain nature of the requirements. This modular approach also helped to handle the complexity of the model and made it easier to maintain. In the current version, there are 11 sub-ontologies with 43 native classes and 43 native properties.

Table: Composition of the proDataMarket ontology

Domain/module Namespace prefix URL Classes Properties Business cases
Common prodm-com http://vocabs.datagraft.net/proDataMarket/0.1/Common# 4 4 ALL
Cadaster prodm-cad http://vocabs.datagraft.net/proDataMarket/0.1/Cadastre# 6 16 SoE, RVAS, NNAS, SIM
State of Estate Report prodm-soe http://vocabs.datagraft.net/proDataMarket/0.1/SoE# 4 2 SoE, RVAS
Business Entity-Reuse the existing vocabularies, no new classes and properties 0 0 SoE, RVAS
Building Accessibility-Reuse the existing vocabularies, no new classes and properties 0 0 SoE
Natural Hazard prodm-nh http://vocabs.datagraft.net/proDataMarket/0.1/NaturalHazard# 1 0 RVAS
Land Parcel Identification System (LPIS) prodm-lpis http://vocabs.datagraft.net/proDataMarket/0.1/LPIS# 1 7 CAPAS
Sentinel data prodm-sen http://vocabs.datagraft.net/proDataMarket/0.1/Sentinel# 1 1 CAPAS
Landscape Elements (LiDAR data) prodm-lid http://vocabs.datagraft.net/proDataMarket/0.1/Lidar# 3 0 CAPAS
Assessment prodm-asm http://vocabs.datagraft.net/proDataMarket/0.1/Assessment# 3 3 CAPAS
CensusTract prodm-ct http://vocabs.datagraft.net/proDataMarket/0.1/CensusTract# 1 0 CST,CCRS
Urban Infrastructure prodm-ui http://vocabs.datagraft.net/proDataMarket/0.1/UrbanInfrastructure# 17 10 SIM
Protected Sites prodm-ps http://vocabs.datagraft.net/proDataMarket/0.1/ProtectedSite# 2 0 CAPAS
Total: 43 43

More than 30 datasets have been published through the DataGraft platform [3] [4] using the proDataMarket ontology as a central reference model. All seven business cases use the proDataMarket ontology in data publishing.

More details on the proDataMarket vocabulary will be found in the paper “The proDataMarket Ontology for Publishing and Integrating Cross-domain Real Property Data” that was accepted for publication in scientific journal Territorio ItaliaLand AdministrationCadastre and Real Estate [5].

References

  • [1] Noy, Natalya F., and Deborah L. McGuinness. “Ontology development 101: A guide to creating your first ontology.” (2001).
  • [2] Grüninger, Michael, and Mark S. Fox. “Methodology for the Design and Evaluation of Ontologies.” (1995).
  • [3] Roman, D., et al. DataGraft: One-Stop-Shop for Open Data Management. 2017. Semantic Web, vol. Preprint, no. Preprint, pp. 1-19, 2017. DOI: 10.3233/SW-170263.
  • [4] Roman, D., et al. DataGraft: Simplifying Open Data Publishing. ESWC (Satellite Events) 2016: 101-106.
  • [5] L. Shi, N. Nikolov, D. Sukhobokb, T. Tarasova and D. Roman. “The proDataMarket Ontology for Publishing and Integrating Cross-domain Real Property Data”. To appear in the journal “Territorio Italia Land Administration, Cadastre and Real Estate”. n.2/2017.

Integrating multisectoral datasets: from satellites to real estate scoring model

During a project meeting in Sofia on September 21, 2016, Cerved teamed up with TRAGSA to brainstorm ideas of re-using the TRAGSA methods for processing satellite imagery to analyse green areas in urbanized cities.

Fundamentals of Tragsa Processing

A common feature in Vegetation Spectra is the high contrast observed between the red band and the Near Infrared (NIR) region. The optical instrument carried by Sentinel 2 satellites samples 13 spectral bands, including high resolution bands in the red (bands 4, 5 & 6) as well as bands in the NIR (8 & 8A). Refer to this blog post for more details about processing Sentinel 2 data.

Using the TRAGSA methodology it is possible to isolate and enhance the vegetation, to locate green areas in urban areas. Green areas are important input to the Cerved’s innovative real estate evaluation model (which is being developed within one of the Cerved’s business cases in the project, as introduced in this blog post). Cerved uses open data, to generate indicators of green areas defined for the model: green area coverage and distance to the wood. Operations that Cerved performs to compute these indicators are similar to those that TRAGSA does on satellite data, such as clustering of green areas into big areas and isolating trees and group of trees. This motivated us to experiment with satellite data and TRAGSA’s methodology, to see whether we could potentially use more complete, structured and up-to-date source of green areas information as input to our real estate evaluation model.

Experiment

We identified a highly urbanized Italian city but with particular attention to green areas, which is the city of Turin.

The steps that we followed:

  • extraction of city boundaries of Turin in GeoJSON format by SPAZIODATI
  • selections of good quality imagery for Turin from the Sentinel data repository by TRAGSA
  • processing S2 imagery in order to get a vector layer which indicates the presence or absence of a green area in each pixel (1/0) by TRAGSA
  • display of the green areas of the tiles (see the screenshot below) prototype Amerigo visualisation service, under development by SPAZIODATI
  • data processing and aggregation of the tiles into census cells areas, in order to develop green areas indicators for each census cell, by CERVED
  • integration and testing of the score dedicated to green areas within the business model CCRS (Cerved Cadastral Report Service) by CERVED

image001

The result of this experiment was extremely surprising; the detail and accuracy of this new score in identifying the green areas (not only public green areas) is far greater than accuracy of the other scores, developed on public and open green areas of datasets.

Data Workflow in CAPAS

Description of the data workflow processes

TRAGSA, as a business case provider in the project, is developing the CAPAS service which aims at publishing  and integrating multi-sectorial data from several sources into an existing data-intensive service, targeting better Common Agriculture Policy (CAP) funds assignments to farmers and land owners. The goal is to leverage the data integration facilities offered by proDataMarket, to better define the funds assignments features in parcels and subplots.

CAPAS is working on an improvement of the efficiency and competitiveness of the existing Spanish CAP (Common Agriculture Policy) service by integrating more datasets, underused at the beginning of the proDataMarket project. To use them as a powerful tool, it was necessary to create and develop new data processing algorithms. Therefore, CAPAS is not only an end-user application. Indeed, it involves data collection, data modelling and data processing techniques.

The CAPAS Business Case is oriented towards the replacement of human-generated  (subjective) data with more objective data that can be collected and integrated from different cross-sectorial sources in an automated way.

At least two external datasets (LIDAR and Copernicus SENTINEL2) are being used to improve the agricultural cadastre Spanish database. The economic value generated by this process and its relation to CAP funds assignment will be evaluated during the next year, in the final phase of the project.

Managing LIDAR data

LIDAR files are a collection of points stored as x, y, z which represent longitude, latitude, and elevation, respectively. This data is hard to process for non-specialists. To use them as a powerful tool to define objectively the parameters of agricultural use of parcels and the presence of landscape elements, a new data processing and treatment algorithm has been created.

This algorithm classifies and groups the cloud of points in order to simplify the huge amount of data. The clouds of points are topologically processed to obtain connected areas as polygons or to maintain them as single points. In conclusion, LIDAR datasets are transformed into new raster and vector files, more popular data types, and easier to be dealt with. The overlaps and intersections of the new datasets produced (as Landscape elements) will define the CAP parameters for a specific subplot or parcel.

Managing Satellite data

The Sentinels are a fleet of satellites designed specifically to deliver the wealth of data and imagery that are fundamental to the European Commission’s Copernicus program. The use of satellite images in CAPAS has already been explained in this blog entry.

Description of the source datasets and result dataset

The main source datasets of Business Case CAPAS and main processes used to obtain output datasets are explained below:

LIDAR files

LIDAR files can be available under two different formats: .las and .laz. The LAS file format is a public file format commonly used to exchange 3-dimensional point cloud data between data users, being LAS just an abbreviation of LASER. LAZ files, due to the big size of LAS files, is the zipped version of the LAS format.

Although developed primarily for exchange of LIDAR point cloud data, LAS format supports the exchange of any 3-dimensional x,y,z tuples. This format maintains information specific to the LIDAR nature of the data while not being overly complex.

Technical description of LIDAR format
Technical description of LIDAR format

In the context of the ProDataMarket Project, LAS files used in the CAPAS business case will just be a collection of points (latitude, longitude, elevation).

Spanish LIDAR information is freely and openly available at http://centrodedescargas.cnig.es/CentroDescargas/buscadorCatalogo.do?codFamilia=LIDAR

SENTINEL files

The information to be used in CAPAS business case is the Image Data (JPEG2000) provided by Copernicus at Sentinels Scientific Data Hub (https://scihub.copernicus.eu/). The description of JPEG2000[1] format is beyond the aim of this blog entry but some general ideas will be described.

Sentinel data are freely and openly available at:

https://sentinel.esa.int/web/sentinel/sentinel-data-access/access-to-sentinel-data

More information and general factsheet at: https://earth.esa.int/documents/247904/1848117/Sentinel-2_Data_Products_and_Access.

SIGPAC Database

SigPAC database is a complex information system that covers the whole Spanish geography and all agricultural activities and others related to Biodiversity and nature conservation.

In regards to SigPAC database, the main datasets produced or modified by CAPAS are:

  • Landscape Elements
  • Parcels and Subplots

The level of accessibility of SigPAC database varies depending on Autonomous Communities. For example, it is open and freely available in Castile at http://www.datosabiertos.jcyl.es/web/jcyl/set/es/cartografia/SIGPAC/1284225645888

Data workflow process for CAPAS

The following data workflow, as shown in the diagram below, illustrates the evolution of the different datasets, their transformations and their integration to generate the final result datasets.

CAPAS Workflow
CAPAS Workflow


LIDAR processing

The Grouping process gathers the LIDAR points using the following rules:

  • Errors, noise and overlaps are not taken into account (Classifications 1, 4, 7 and 12). As a consequence, more than 50% of points are removed from the process.
  • Soil, water and buildings have their own groups
  • Classification 19 is considered as short trees
  • Classification 20 is considered as medium trees
  • Classification 21 are 22 are grouped as tall trees

The result of this process is still a LAS file. The following image shows how LIDAR points (green points) have been processed and classified (Green points as trees, red points as soil, orange and yellow as bushes).

lidar-1

The next steps, such as Rasterization or Vectorization, involve topological rules in order to group the points to generate squares (raster) that would be processed to obtain the final vector shapefile.

The following image shows how LIDAR points have been grouped to create topologically connected surfaces. In the image below, yellow areas are Soil, orange are Bushes, green are Trees. Grey areas and blue surfaces (not present in this image) are Buildings and Water, respectively.

lidar-2

Once the trees class is defined in a raster format by LiDAR data, it wasrefined thanks to Sentinel Data which has more updated information. RGB and NDVI products help to identify which pixels have an NDVI value over 0.5 and it could be detected by RGB product in order to check which pixels represent vegetation areas.

Finally, trees auxiliary layer refined by Sentinel is processed to obtain different configurations:

  • Isolated trees
  • Copses

The final result of the process is a vector ESRI shape file, where the copses layer is a polygon feature type and the isolated trees layer is as point feature type. All of them have a direct correspondence with the landscape elements.

The overlaps between detected landscape elements, currently protected sites of Natura 2000 network and the Land Parcel Identification System allows performing an accurate ecological value report for Spanish crops areas.

LiDAR algorithm allows to obtain more detailed information because the landscape value helps to identify which subplot has more value per parcel, obtaining the following benefits:

  • Farmers will get an economical profit through fund-assignments to maintain these trees forms, and
  • the ecosystem and its species will be preserved.

ecological-value

This Ecological value report has been developed regarding the following queries:

  • Query 1: Surface of Sites of Community Importance (LIC) / subplot area.

Score between 0 and 1.

  • Query 2: Surface of Special Protected Areas for Birds (ZEPA) / subplot area.

Score between 0 and 1.

  • Query 3: Protected Sites Value = Sum of query 1 + query 2. Score between 0 and 2.
  • Query 4: Number of Isolated tree / subplot area. Score between 0 and 1.
  • Query 5: Surface of copses area / subplot area. Score between 0 and 1.
  • Query 6: Landscape Elements Value = Sum of query 1 + query 2. Score between 0 and 2.
  • Query 7: Ecological Value = Sum of query 3 + Query 6.

Sentinel Products generation

In the first place, Sentinel 2 (S2) imagery has to be downloaded from the ESA server. In the automatic download process developed, selection parameters were incorporated in order to download only the imagery that satisfies our quality criteria. Two kinds of products are generated from S2 imagery.

  • Simple products: Those which have been generated with one-date imagery. By an automatic process, TRAGSA is generating RGB products for supporting photo interpretation. Another simple product generated is the Normalized Difference Vegetation Index (NDVI) which is widely used for vegetation monitoring.
  • Complex products: Those which are generated with imagery from different dates. The following four thematic layers are going to be created.
    • Permanent grassland: This layer will be useful to determine photosynthetically active vegetation and non active (unproductive or bare soil) areas. Therefore it will help to monitor the maintaining of existing permanent grassland, which is an agricultural beneficial practice for the climate and the environment (REGULATION (EU) No 1307/2013).
    • Herbaceous and woody crops: By using decision algorithms, different crops can be identified. The results will be displayed in two different layers, one for herbaceous crops and other for woody crops.
    • Change detection layer: This layer will highlight areas where changes have happened. The layer will be focused on forests and grassland areas in order to detect dramatic changes, such as those caused by logging or forest fires, as well as to detect more subtle changes associated with AIS (Alien Invasive Species), diseases and reforestation.

Hitherto, only one of the twin S2 satellites (Sentinel 2A) has been launched. When the second satellite (Sentinel 2B) is on orbit, the revisit time at the equator will be 5 days which results in 2-3 days at mid latitude. This high revisit time will offer a quicker updating of SigPAC database in comparison with current updates that are based on low precision data (LANDSAT and SPOT5 satellites) or ortophoto flights generated by each Autonomous Community.

Final Result

As stated previously, Common Agriculture Policy funds Assignments Service (CAPAS) is a set of tools that improves the existing Common Agriculture Policy service (CAP), in order to innovatively manage and upgrade the CAP database provided by Spanish Administration to farmers and land owners. It is important to note that this CAP database is one of the main pillars of the CAP funds calculation systems. As mentioned earlier, the improvement process is based on the leverage of new cross-sectorial data sources from different fields and geographical areas, and the result datasets will be also available at the proDataMarket marketplace.

To use these new datasets as a powerful tool to define objectively the parameters of agricultural use of parcels, presence of landscape elements or temporal evolution of crops, the explained data processing and treatment algorithms have been, at the moment, partially developed.

As a summary, the usage of LIDAR files modifies some Parcel and Subplots features, and SENTINEL images will improve the definition of Parcel and Subplots land use and its temporal evolution.

The new datasets produced by CAPAS using those external sources will be RDFized and incorporated to proDataMarket platform. Therefore, Spanish rural property data, improved using new and underexploited datasets, will be accessible through proDataMarket platform providing the users with advanced visualization and querying features.

[1] JPEG 2000 (JP2) is an image compression standard and coding system. It was created by the Joint Photographic Experts Group committee in 2000

Data Workflow in SoE

The datasets and challenges in integration

The State of Estate (SoE) business case focuses on generating an up-to-date, dynamic and high quality report on State-owned properties and buildings in Norway. It collects and integrates several datasets as listed below. The datasets are originated from heterogeneous sources and of different quality. Here are some scenarios that will cause challenges in the integration process.

Matrikkel data

Though Matrikkel data from the Norwegian mapping authority is one the most authoritative source of property data, not all the information is up to date. It could be sometimes caused by the delay of administrative procedure in municipalities, and sometimes the owners don’t report change to the municipalities because of the high cost to report the change, and sometimes it could be typos and some other manual updating errors. The buildings less than 15 square meters are not required to be registered in the Matrikkel.

Statsbygg’s property data

The Statsbygg’s property data is updated since the last report. However, the Matrikkel’s building number is not correctly registered on all the buildings. The address information is not necessarily updated either. It could be also be typos and some other manual updating errors in the dataset.

Business Entity register

The Business Entity register dataset is from another national authoritative source with information of ministries and their subordinate organizations. However, not all the subordinate organizations of the ministries are registered as a sub-organization in the Business Entity register. The missing organizations need to be added manually as extra business entities to the dataset.

State-owned properties Report 2013-2014 (SoEReport2013)

The SoEReport2013 is a report from 2013 and it includes some properties or buildings that could be sold, rebuilt, demolished in the last few years. The old report also includes some non-reported ownership of properties and buildings in the government that we need to take care of in the new report. For example several properties were registered as owned by Statsbygg in the old report; however, they are registered as owned by the King in the Matrikkel database, which means that Statsbygg has taken care of the King’s property without reporting to the municipalities that ownership has changed.

ByggForAlle

The Matrikkel’s building number has not been registered on all the buildings in the ByggForAlle dataset and some of the key information could include typos, manual updating errors or be out-of-date too.

The data workflow

To meet the challenges in the data integration, we’ve developed a data workflow as shown in the diagram below. It illustrates the process of importing the datasets, quality control and integration of the datasets, and finally generating the result dataset. The involved roles and their activities are modelled as swimming lanes. The original and generated datasets are modelled as dataobjects in the diagram such as SoEReport2013, BusinessEntityRegister, NewOrgList_Comfirmed etc. The quality control process can be both machine automated and manual work based on human tasks and it will take care of the integration exceptions.

dataworkflowsoefigure1

There are 3 roles involved in this process.

  • The SystemAdmin is a technical role and its main tasks are dataset import and integration.
  • The SystemManager is a functional role that has the main task of quality control and generating the SoE report including organizing and communication tasks with other involved organizations.
  • The PropertyResponsible is a role for each involved organization and its main task is to prepare data, quality control and submit its own property-list and building-list.

The activity boxes are explained as below:

  • ImportOldReportWithOrgList: SystemAdmin starts with checking if the SoE report from 2013 is imported. If not, the SystemAdmin imports the report which also includes the old organization list.
  • ImportMinistrySub_Brreg: Then the SystemAdmin imports the organization list of the Ministries and subordinate organizations from the Business Entity Register.
  • MergeOrgListBrreg_SoEReport2013: The two organization lists are merged.
  • EditComfirmOrgList: The SystemManager will get signal to start editing and updating the list, the result will be the confirmed OrgList.
  • ImportOwnedPropertyBuildingFromMatrikkelBasedOnOrglist_Comfirmed: Based on the confirmed OrgList, the owned properties and buildings from the Cadastre database (Matrikkel) are imported by the SystemAdmin.
  • PrepareExportForOwned: The property responsible will prepare a property list in a format as agreed.
  • ImportOwnedFromOrg: If some of the organizations such as Statsbygg have their own database or list of owned properties and buildings the lists will be imported as necessary.
  • ImportByggForAlleData: Then the ByggForAlle data is imported.
  • MergeAllDatasets: Afterwards data from Matrikkel and Business Entity Register (OrgList_comfirmed), the SoE reports 2013, properties data from organizations such as Statsbygg, ByggForAlle are merged by the SystemAdmin.
  • QualityControlMergedList: The SystemManager will then start the quality control cycle of the merged list.
  • EditAndConfirmOwnedList: The property responsible in each organization will get the task to edit and confirm their property and building list.
  • ApproveAndFinalizeNewSoEReport: The SystemManager will do the final quality control before approving and finalizing the new SoE Report.

 

Expected results and an example

Here below is one of the expected result from data quality control and integration in the step of “MergeAllDatasets”. The maps below shows both the examples of properties on the SoEReport2013 but not on the list based on Matrikkel_Brreg integration, and the properties on the Matrikkel_brreg integration but not on the SoEReport2013. After identifying the mismatches in this way, the users can work further on to clean the datasets to correct the wrong registrations in the source systems.

Symbol BRREG_Matrikkel integrated dataset Old SoE Report Example
Simple hatch No Yes “, NORSK INST.FOR SKOG OG LANDSKAP, NORSK INSTITUTT FOR SKOG OG LANDSKAP”

“,BIOFORSK, TOLLEFSRUD MARI METTE”

Cross hatch Yes No “STATENS VEGVESEN, ,STATENS VEGVESEN”
land parcels filled with solid color Yes Yes “MATTILSYNET,MATTILSYNET,MATTILSYNET”

 

The figure below shows that inside the Campus Ås. Some land parcels owned/leased by NMBU and Statens vegvesen according to Matrikkel are not included in the old SoE report, those land parcels are marked with crosshatch pattern. On the other side, some land parcels from the old SoE report are not included in the list based on BRREG and Matrikkel, such as the hatched land parcel with the label “, NORSK INST.FOR SKOG OG LANDSKAP, NORSK INSTITUTT FOR SKOG OG LANDSKAP” or “,BIOFORSK, TOLLEFSRUD MARI METTE”. Both of the simple hatch and cross hatch properties in the map need to be quality check and confirmed by the step of “QualityControlMergedList” and thereafter “EditAndConfirmOwnedList”.

dataworkflowsoefigure2

Proof of Concept with Augmented Reality

 

The potential of the proDataMarket platform is huge, and by letting third party actors use and contribute to the “big data” platform, the potential could be even greater. To show how proDataMarket can be utilized, EVRY is developing two mobile applications that rely on proDataMarket service. The applications combine data from proDataMarket along with “augmented reality technology” to give the user a visual representation of the data. By doing this, EVRY will help contractors, construction or municipalities visualize future building projects. This is done with two iPad applications. The first application show underground infrastructure such as pipes and cables. The other application augments a 3D model in a real world scene.

Augmented reality (AR) is a live direct or indirect view of a physical, real-world environment whose elements are augmented (or supplemented) by computer-generated sensory input such as sound, video, graphics or GPS data [1]. The applications EVRY develops uses augmented reality technology to present cadastral data, distributed by proDataMarket. By doing this the applications can show underground structure on the screen (through the device camera), as well as 3D models of future building projects in a “real world scene” with information about the surroundings. This is done by having a 3D-model with correct measurement data (relative to its real world size), and by knowing the distance between a desired location and the user, the model can be scaled to the correct size according to the distance. Of course, if the user decides to manipulate the model (i.e. scaling it up), the size/distance relationship will be invalid. The 3D model augmentation can ease both private and commercial building projects by giving a visual presentation of how a building may look in a landscape.

The development process has been a process of trail and error and different augmented reality SDK have been examined. In the end the development team chose “Wikitude SDK [2]” to handle the augmentation processing. The task of augmenting a custom 3D model at a desired location is a suitable task for Wikitude SDK. By setting the model as a “Point of Interest” (POI) and using “GeoLocation”, the user can set the model at a desired location in a 2D map (Google map).

1

The model will be scaled to the correct size relative to the distance from the user. When a model is placed, Wikitude will augment the model and the user can see and manipulate with onscreen controls.

2

The manipulation controls are necessary because the iPad compass and location service are not accurate enough to get a satisfying result. If a user needs to place a model at a very exact location, there must be some way to tweak and calibrate the model. All in all, there are still some bugs left to fix in the applications, but the main functionality is in place and we are looking forward to show demos of what we have made.

[1] https://en.wikipedia.org/wiki/Augmented_reality

[2] http://www.wikitude.com/

Cerved and SpazioDati at Data Driven Innovation 2016

Cerved and SpazioDati participated in the first edition of Data Driven Innovation 2016 with a presentation and a stand about preliminary results of their collaborative work in the ProDataMarket project.

Cerved & SpazioDati present the first prototype for proDataMarket @DataDrivenInnovation 2016
Cerved & SpazioDati present the first prototype for proDataMarket @DataDrivenInnovation 2016

 

Data Driven Innovation is an open summit about big data hosted by Roma Tre university and organized by Codemotion. During two days of the summit many people have had the possibility to see the first results of Cerved & SpazioDati proDataMarket project: the Cerved Scouting Terrain Service (CST), an interactive map showing Bologna territory scores and social demographic scores, as the social disease index, the economic disease index, the socio-demographic score and much more territory scores.

CST, 2d business case of Cerved: Employees of the working population in Bologna
CST, 2d business case of Cerved: Employees of the working population in Bologna

 

CST is the second business case Cerved is being developed within the proDataMarket project: the goal of this service is to provide target users with a tool to search and see property and territory information on a map. In order to achieve this, Cerved is developing value-added geo-marketing indicators, analyses and visualisations.

Authors: Claudio Castelli & Diego Sanvito

Satellite images applied to property data

The Sentinels are a fleet of satellites designed specifically to deliver the wealth of data and imagery that are central to the European Commission’s Copernicus programme . This unique environmental monitoring programme is making a step change in the way we manage our environment, understand and tackle the effects of climate change and safeguard everyday lives. Sentinel-2 carries an innovative wide swath high-resolution multispectral imager with 13 spectral bands for a new perspective of our land and vegetation. The combination of high resolution, novel spectral capabilities, a swath width of 290 km and frequent revisit times is generating unprecedented views of Earth. Sentinel-2 is providing information for agricultural and forestry practices and for helping manage food security. Satellite images will be used to determine various crop and plant indexes. Some examples of these parameters could be:

  • Normalised Difference Vegetation Index (NDVI)
  • Normalised Difference Snow and Ice Index (NDSI)
  • Enhanced vegetation index (EVI)

This is particularly important for effective crops production prediction and applications related to Earth’s vegetation.

SentinelExampleSentinel use example

Sentinel-2 is the first optical Earth observation mission of its kind to include three bands in the ‘red edge’, which provide key information on the state of vegetation. In the previous image from 6 July 2015 acquired near Toulouse, France, the satellite’s multispectral instrument was able to discriminate between two types of crops: sunflower (in orange) and maize (in yellow).
These new and advanced datasets will be used inside CAPAS Business case to improve and enrich the information already obtained using LIDAR datasets (What is LIDAR?). Indeed, using LIDAR is possible to obtain accurate surface maps. However, data updates frequency is not very high. On the other hand, Sentinel constellation has a very high revisit frequency (five days) and offers information about kind of crops and their evolution. In conclusion, the use and merging of those different datasets answer several question regarding CAP parameters:

  • Is a specific parcel cultivated?
  • What kind of crop is growing in a plot?
  • Has the number of trees of a copse changed? When?
  • What is the ratio between Ecological Surfaces Areas (EFAs) and Productive areas in a given place?

Processing this kind of information could be very complex and laborious. It depends on selected indexes, chosen bands and geographical area. Furthermore, the processing is complicated by the high volumes of data. However, final results will offer a very detailed and accurate overview about land cover changes, environmental monitoring, crop monitoring, food security and detailed vegetation & forest monitoring parameters as leaf area index, chlorophyll concentration or carbon mass estimations. All this information and results have direct relation with Common Agricultural Policy principles and new European “Greening” policies.

Note: Some details about the characteristics and features of these instruments are available here.

proDataMarket business cases at RuleML2015 Industry Track

The proDataMarket SoE and CAPAS business cases have been published/presented at the RuleML2015 Industry Track:

Norwegian State of Estate: A Reporting Service for the State-Owned Properties in Norway by Ling Shi, Bjørg E. Pettersen, Ivar Østhassel, Nikolay Nikolov, Arash Khorramhonarnama, Arne J. Berre, and Dumitru Roman

  • Abstract: Statsbygg is the public sector administration company responsible for reporting the state-owned property data in Norway. Traditionally the reporting process has been resource-demanding and error-prone. The State of Estate (SoE) business case presented in this paper is creating a new reporting service by sharing, integrating and utilizing cross-sectorial property data, aiming to increase the transparency and accessibility of property data from public sectors enabling downstream innovation. This paper explains the ambitions of the SoE business case, highlights the technical challenges related to data integration and data quality, data sharing and analysis, discusses the current solution and potential use of rules technologies.
  • Paper

 

CAPAS: A Service for Improving the Assignments of Common Agriculture Policy Funds to Farmers and Land Owners by Mariano Navarro, Ramón Baiget, Jesús Estrada and Dumitru Roman

  • Abstract: The Tragsa Group is part of the group of companies administered by the Spanish state-owned holding company Sociedad Estatal de Participaciones Industriales (SEPI). Its 37 years of experience have placed this business group at the forefront of different sectors ranging from agricultural, forestry, livestock, and rural development services, to conservation and protection of the environment in Spain. Tragsa is currently developing a business case around the implementation of a Common Agriculture Policy Assignment Service (CAPAS) – an extension of a currently active and widely used service (more than 20 million visits per year). The extension of the service in this business case is based on leveraging new cross-sectorial data sources, and targets a substantial reduction of incorrect agricultural funds assignments to farmers and land owners. This paper provides an overview of the business case, technical challenges related to the implementation of CAPAS (in areas such as data integration), discusses the current solution and potential use of rule technologies.
  • Paper