Scientific production of proDataMarket project: Publications and Papers

proDataMarket has paid special attention to one of the general project objectives of all H2020 projects: showing how European collaboration can be achieved more than would have otherwise been possible, notably in achieving scientific excellence, contributing to research production and solving technical challenges. Through its research production, proDataMaket has made a remarkable effort ensuring that its scientific results are taken up by the scientific community to guarantee follow-up, by decision-makers to influence policy-making and by industry to improve their businesses. This scientific production, as publications and papers, can be summarized in the following lists:

First Period (M1-M18)

  1. Roman, N. Nikolov, A. Putlier, D. Sukhobok, B. Elvesæter, A. Berre, X. Ye, M. Dimitrov, A. Simov, M. Zarev, R. Moynihan, B. Roberts, I. Berlocher, S. Kim, T. Lee, A. Smith, and T. Heath. DataGraft: One-Stop-Shop for Open Data Management. In the Semantic Web journal, 2016.
  2. Roman and S. Gatti. Towards a Reference Architecture for Trusted Data Marketplaces. In the Proceedings of the 2nd International Conference on Open and Big Data (OBD). IEEE. Vienna, Austria, 22-24 August 2016.
  3. Roman, M. Dimitrov, N. Nikolov, A. Putlier, B. Elvesæter, A. Simov, Y. Petkov. DataGraft: A Platform for Open Data Publishing. In the Joint Proceedings of the 4th International Workshop on Linked Media and the 3rd Developers Hackshop. (LIME/SemDev@ESWC 2016).
  4. Roman, M. Dimitrov, N. Nikolov, A. Putlier, D. Sukhobok, B. Elvesæter, A..J. Berre, X. Ye, A. Simov and Y. Petkov. DataGraft: Simplifying Open Data Publishing. ESWC Demo paper. 2016.
  5. Sukhobok, N. Nikolov, A. Pultier, X. Ye, A..J. Berre, R. Moynihan, B. Roberts, B. Elvesæter, N. Mahasivam and D. Roman. Tabular Data Cleaning and Linked Data Generation with Grafterizer. ESWC Demo paper. 2016.
  6. Shi, B. E. Pettersen, I. Østhassel, N. Nikolov, A. Khorramhonarnama, A. J. Berre and D. Roman. Norwegian State of Estate: A Reporting Service for the State-owned Properties in Norway. Industry Track paper in Proceedings of the 9th International Web Rule Symposium, August 2-5, 2015. Berlin, Germany. Vol. 9202, pp 456-464. Springer, 2015.
  7. Navarro, R. Baiget, J. Estrada and D. Roman: CAPAS: A Service for Improving the Assignments of Common Agriculture Policy Funds to Farmers and Land Owners. Industry Track paper at the 9th International Web Rule Symposium, August 2-5, 2015. Berlin, Germany, Challenge+DC@RuleML 2015.
  8. The project is given as example on the Norwegian Government website in the report “Identification and assessment of Big Data in the public sector” (in Norwegian) https://www.regjeringen.no/no/dokumenter/kartlegging-og-vurdering-av-stordata-i-offentlig-sektor/id2478539/

Second Period (M19 – M30)

  1. Pozzati, D. Sanvito, C. Castelli, D. Roman. “Understanding territorial distribution of Properties of Managers and Shareholders: a Data-driven Approach”. Territorio Italia 2 (2016), DOI: 10.14609/Ti_2_16_2e, Pages 27-40, ISSN 2499-2674.
  2. Estrada, H. Sánchez, L. Hernanz, M. J. Checa, D. Roman. 2017. “Enabling the Use of Sentinel-2 and LiDAR Data for Common Agriculture Policy Funds Assignment.” ISPRS Int. J. Geo-Inf. 6, no. 8: 255.
  3. Sukhobok, N. Nikolov, and D. Roman. “Tabular Data Anomaly Patterns”. In the proceedings of The 3rd International Conference on Big Data Innovations and Applications (Innovate-Data 2017), 21-23 August 2017, Prague, Czech Republic, IEEE.
  4. Mahasivam, N. Nikolov, D. Sukhobok and D. Roman. “Data preparation as a service based on Apache Spark”. To appear in the proceedings of The European Conference on Service-Oriented and Cloud Computing (ESOCC), Springer, Sept 28-29, 2017, Oslo, Norway.
  5. Shi and D. Roman. “Using rules for assessing and improving data quality: A case study for the Norwegian State of Estate report”. In the Proceedings of the Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR 2017, hosted by International Joint Conference on Rules and Reasoning 2017 (RuleML+RR 2017), London, UK, July 11-15, 2017.
  6. Shi and D. Roman. “From Standards and Regulations to Executable Rules: A Case Study in the Building Accessibility Domain”. In the Proceedings of the Doctoral Consortium, Challenge, Industry Track, Tutorials and Posters @ RuleML+RR 2017, hosted by International Joint Conference on Rules and Reasoning 2017 (RuleML+RR 2017), London, UK, July 11-15, 2017.
  7. Shi, N. Nikolov, D. Sukhobokb, T. Tarasova and D. Roman. “The proDataMarket Ontology for Publishing and Integrating Cross-domain Real Property Data”. To appear in the journal “Territorio Italia. Land Administration, Cadastre and Real Estate”. n.2/2017.
  8. Costantini, E Franconi, W. Van Woensel, R. Kontchakov, F. Sadri, D. Roman: “Rules and Reasoning. International Joint Conference, RuleML+RR 2017, London, UK, July 12–15, 2017, Proceedings.” Lecture Notes Computer Science 10364, Springer 2017, ISBN 978-3-319-61251-5.
  9. Y.C. Gan and D. Roman. “Mobile Big Data: The Silver Bullet for Telcos? A Case Study in the Norwegian Telcos Market”. In proceedings of the International Conference on Big Data Analytics, Data Mining and Computational Intelligence, 21 – 23 July 2017, Lisbon, Portugal
  10. Roman, M. Kobernus, R. Ødegård, N. Nikolov, D. Sukhobok, B. M. von Zernichow, T. C. Lech. “ALaDIn: Shining a Light on Air Quality through Data Integration and Machine Learning”. To appear in the proceedings of the Environmental Informatics – From Science to Society: The Bridge provided by Environmental Informatics (EnviroInfo 2017), Luxembourg, 13th – 15th September 2017.
  11. Shi, D. Sukhobok, N. Nikolov and D. Roman. “Norwegian State of Estate Report as Linked Open Data”. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, 24-25 October 2017, Rhodes, Greece.
  12. M. von Zernichow and D. Roman. “Usability of Visual Data Profiling in Data Cleaning and Transformation”. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, 24-25 October 2017, Rhodes, Greece.
  13. Roman, D. Sukhobok, N. Nikolov, B. Elvesæter and A. Pultier. “The InfraRisk Ontology: Enabling Semantic Interoperability for Critical Infrastructures at Risk from Natural Hazards”. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, 24-25 October 2017, Rhodes, Greece.
  14. Sukhobok, H. Sanchez, J. Estrada, D. Roman. “Linked Data for Common Agriculture Policy: Enabling Semantic Querying over Sentinel-2 and LiDAR Data.” ISWC Demo paper. 2017.
  15. D Sukhobok, N. Nikolov, T. C. Lech, A.-H. Moberg, R. Frantsvag, H. R. Bergaas, D. Roman. “Interacting with Subterranean Infrastructure Linked Data using Augmented Reality”. ISWC Demo paper. 2017.
  16. Sukhobok, D. Djordjevic, D. Sanvito and D. Roman. “Publishing Socio-Economic Territory Indices as Linked Data and their Visualization for Real Estate Valuation”. ISWC Demo paper. 2017.
  17. Shi, B. E. Pettersen, D. Sukhobok, N. Nikolov and D. Roman. “Linked Data for the Norwegian State of Estate Reporting Service”. ISWC Demo paper. 2017.
  18. Roman, J. Paniagua, T. Tarasova, G. Georgiev, D. Sukhobok, N. Nikolov, and T. C. Lech. “proDataMarket: A Data Marketplace for Monetizing Linked Data”. ISWC Demo paper. 2017.
  19. Nikolov, D. Sukhobok, S. Dragnev, S. Dalgard, B. Elvesæter, B. M. von Zernichow and D. Roman. “DataGraft beta v2: New Features and Capabilities”. ISWC Demo paper. 2017.
  20. M. von Zernichow and D. Roman. “A Visual Data Profiling Tool for Data Preparation.” To appear in the proceedings of the Sixth International Conference on Data Analytics DATA ANALYTICS 2017, November 12 – 16, 2017 – Barcelona, Spain.

The proDataMarket project represents a successful collaboration between public and private sectors that have proved the potential of the Open and Linked Data approach applied to Property Data in a series of Business Cases using several technological tools. The proDataMarket Communication Plan had the aim to disseminate the efforts of the project partners and their scientific production. In this regard, one well defined objective was the dissemination of research results to the broader scientific community.

As main conclusion, it is possible to say that proDataMarket’s has carried out numerous scientific activities whose results have been published and reviewed by various research communities and presented in multiple relevant venues. Consequently, proDataMaket project research activities can be considered fully successful.

Tags: dissemination, scientific production, research, papers, publications

CAPAS Business Case: results & outlook

Tragsa has developed the CAPAS service which integrates multi-sectorial data for better Common Agriculture Policy (CAP) funds assignments to farmers and land owners. Several external datasets – as LiDAR , Copernicus Sentinel2 and Protected Sites from the Spanish Environment and Agriculture Ministry, among others- has been used to improve the Spanish Land Parcel Identification System.

Products from LiDAR data

LiDAR files are a collection of points stored as tuples which represent longitude, latitude, and elevation. This data is provided by the Spanish National Geographic Institute (IGN). This data was processed using automatic algorithms to detect landscape elements (copses and isolated trees) within agricultural parcels.

Protected sites and ecological value report

On one hand, the density of isolated trees and the presence of copses were evaluated with the Landscape Elements Value. On the other hand, the presence/absence of protected areas that intersects subplots was evaluated with a score named Protected sites Value. The result of the sum of Protected Sites Value plus Landscape Elements Value is an Ecological value.

The full description of these products and how they were generated and their validation is explained here.

Products from Sentinel2 data

The Sentinels are a fleet of satellites for land monitoring which is part of the European Copernicus program. The products generated from satellite data were explained in a previous blogpost.

Every week, the images with low cloud cover percentage were downloaded and processed to generate three single products (true colour image, false colour image and NDVI). For the pilot area, Castile and Madrid regions, a total amount of 168 tiles were processed during the year 2017 (until the 31st of August). The irrigation maps were generated in two pilot areas. They were evaluated and they proved to be helpful to identify the crops in control tasks.

Other products generated by CAPAS have been used to update the LPIS database.  For example, the grassland layer displays actual grassland areas. The change detection layer highlights the changes happened since the last updating of LPIS and it is focused in changes between agricultural land, forests, and grassland areas.

Change detection layer in TAEJ
Change detection layer in TAEJ
Legend changes

Grassland layer in LUPI
Grassland layer in LUPI
 Legend grassland

Conclusion

Many innovative products were generated by CAPAS business case leveraging previously under-used data. The different methodologies and derived products proved a high success ratio after several tests and all the resulting data can be obtained and visualized on the ProDataMarket platform.

New release of the proDataMarket marketplace!

We are glad to announce second release of the proDataMarket marketplace!

What is proDataMarket Marketplace?

The proDataMarket marketplace is a virtual space that connects providers of open and proprietary real-estate and related contextual data with consumers of this data. On one hand, the marketplace aims at making it easier for data providers to publish, distribute and eventually reach out to potential consumers of their data. On the other hand, it helps data consumers discover and easily access data published at the Marketplace.

Access to the marketplace can be done through the marketplace landing page available at http://prodatamarket.eu.

proDataMarket marketplace
Landing page of the proDataMarket marketplace

 

Conceptually, there are two areas in the marketplace: Consumer site (area dedicated to data consumers in the marketplace) and Producer site (area dedicated to data producers in the marketplace).

Consumer site

Services available to data consumers have been deployed at the Consumer
Portal: https://store.prodatamarket.eu.

proDataMarket Consumer Marketplace Portal
Landing page of the proDataMarket Consumer Marketplace Portal

New look’n’feel

The data consumer services have seen further development since their initial release in the first period of the project. The Portal has been redesign following the feedback from the business case providers. New design includes landing page (see the screenshot above) with easy access to and search over the whole catalogue of data published in the marketplace.

Interactive geospatial data exploration

The Portal’s geospatial data analytics based on Amerigo Data Visualisation Service have been improved. Since its first release, map widgets have been transitioned from Leaflet to CartoDB, to support fast map rendering and provide a UI for maps configuration. New data exploration capabilities were added to the maps with data filtering widgets, that were developed on top of CartoDB. To accommodate different types of data of the business case providers, two types of filters have been implemented: discrete and continues.

data visualisation
proDataMarket marketplace data visualisation

Access to open and proprietary data

User profile management has been added to the Portal. Not-authorised users can browse through all open data available in the marketplace and samples of proprietary datasets, if their owners made them available for public. In order to get access to proprietary data, users have to sign up.

Purchase of proprietary data

Finally, authorised users can now buy proprietary data in the marketplace through the payment component, new feature of the Portal.

The purchase itself is implemented in the Purchase Management component that takes as input instructions about which data or data subsets are sold at which price. These instructions are passed to the component via a subscription configuration file. At the moment this file is prepared by the technical partners (SpazioDati, SINTEF and Ontotext) based on configuration options received from data producers (see data publisher instructions). In the future, data producers will be able to generate configuration file automatically using the  Data Pricing Setup component.

Please, note, this feature is available for proprietary paid datasets only such as “Social Network Thermometer by Municipality”, as demonstrated in the screenshot below. Open data (e.g., “State-owned buildings by municipality”) is public and free, hence, no subscription options are shown.

Subscription options
Proprietary dataset with subscription options

Producer site

Services available to data producers have been deployed at http://publish.prodatamarket.eu, the DataGraft portal. The latest release of the DataGraft portal has been announced in the recent blog post.

Current release of the marketplace includes a tutorial for data producers that describes the process of data publication from setting up a database to cleaning data and populating the database, to cataloging  data and configuring it visualisation at the at the Consumer portal. The tutorial is available at https://store.prodatamarket.eu/publisher_help/.

proDataMarket marketplace help for data producers
proDataMarket marketplace help for data producers

Marketplace Platform overview

Technical platform of the marketplace is composed of the tools, services and infrastructure developed to support two types of users: producers and consumers. Diagram below gives an overview of the marketplace and services it provides for data producers and data consumers.

Overview of the marketplace platform
Overview of the marketplace platform

New Demo Papers at ISWC 2017

Sukhobok,D., H. Sanchez, J. Estrada, D. Roman. Linked Data for Common Agriculture Policy: Enabling Semantic Querying over Sentinel-2 and LiDAR Data. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: The amount of open and free satellite earth observation data combined with available data from other sectors (e.g. biodiversity, landscape elements, cadaster data) has the potential to enhance decision-making processes in various domains. An example of such a domain is agriculture, where the ability to objectively and automatically identify different types of agricultural features (e.g., irrigation patterns and landscape elements) can lead to more effective agriculture management. In this paper we show the possibility to publish and integrate multi-sectoral data from several sources into an existing data-intensive service targeting better and fairer Common Agriculture Policy (CAP) funds assignments to farmers and land owners. We show an end-to-end approach for integrating multi-sectoral data and publishing the result as Linked Data with the help of the DataGraft platform. To demonstrate the use of the resulted dataset, we developed a visualization system prototype showing various information about agricultural parcel features.
  • Download paper

Sukhobok, D., Nikolov, N., Lech, T. C., Moberg, A.-H., Frantsvag, R., Bergaas, H. R., Roman, D. . Interacting with Subterranean Infrastructure Linked Data using Augmented Reality. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: Subterranean infrastructure damages caused by excavation works of all kinds are costly and potentially dangerous for workers. Such damages are often caused by poor subterranean data or inappropriate use of the existing data. We aim to provide solutions and services that will hinder obstacles related to the use of subterranean infrastructure data to ensure less damage and less time spent on finding and integrating data about subterranean infrastructure. The result of the work reported in this paper is an augmented reality application that can provide users the ability to see what subterranean infrastructure is located at a given physical location. In this paper we demonstrate a method to create such an application using Linked Data technologies.
  • Download paper

Sukhobok, D. Djordjevic, D. Sanvito and D. Roman. Publishing Socio-Economic Territory Indices as Linked Data and their Visualization for Real Estate Valuation. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: The correct estimation of the real estate value facilitates decision making in various sectors, such as public administration or the real estate market. In this paper we demonstrate a method to manage territory scores and property valuation estimations as Linked Data with
    the help of the proDataMarket technical framework. The demo illustrates how the proDataMarket technical framework can be used to generate, maintain and serve territory and property valuation estimation data with the help of semantic technologies.
  • Download paper

Shi, L., Pettersen, B. E., Sukhobok, D., Nikolov N., and Roman, D. Linked Data for the Norwegian State of Estate Reporting Service. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: The Norwegian State of Estate (SoE) report includes information about all Norwegian state-owned properties and buildings in the public sector and aims to assist government decision makers to allocate resources more effectively. A Linked Data based approach is presented here to increase the transparency in the government administration, improve the report generating process and also the report quality. Cross-domain government data originated from the business entity register, the cadastral system, the building accessibility register and the old SoE report are acquired, prepared, cleaned, transformed to Linked Data format and published. The source datasets are then integrated, augmented and interlinked before the results are published as a SPARQL endpoint, used for data visualization and report generation.
  • Download paper

Roman, D., Paniagua, J., Tarasova, T., Georgiev, G., Sukhobok, D., Nikolov, N., and Lech, T. C. proDataMarket: A Data Marketplace for Monetizing Linked Data. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: Linked data has emerged as an interesting technology for publishing structured data on the Web but also as a powerful mechanism for integrating disparate data sources. Various tools and approaches have been developed in the semantic Web community to produce and consume linked data, however little attention has been paid to monetization of linked data. In this paper we introduce a data marketplace – proDataMarket – that enables data providers to generate, advertise, and sell linked data, and data consumers to purchase linked data on the marketplace. The marketplace was originally designed with a focus on geospatial linked data (targeting property-related data providers and consumers) but its capabilities are generic and can be used for data in various domains. This demo will highlight the capabilities offered to the providers and consumers of the data made available on the marketplace.
  • Download paper

Nikolov, N., Sukhobok, D., Dragnev, S., Dalgard, S., Elvesæter, B., von Zernichow, B. M., and Roman, D. DataGraft beta v2: New Features and Capabilities. International Semantic Web Conference. Demo paper. 2017. To appear.

  • Abstract: In this demonstrator, we will introduce the latest features and capabilities added to DataGraft – a Data-as-a-Service platform for data preparation and knowledge graph generation. DataGraft provides data transformation, publishing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., Open Data publishers, Linked Data developers, data scientists). This demonstrator highlights the recent features added to DataGraft by exemplifying data publication of statistical data – going from the raw data published at a public portal to published and accessible Linked Data with the help of the tools and features of the platform.
  • Download paper

New Papers at ODBASE 2017

Shi, D. Sukhobok, N. Nikolov and D. Roman. Norwegian State of Estate Report as Linked Open Data. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, Springer, 24-25 October 2017, Rhodes, Greece.

  • Abstract: This paper presents the Norwegian State of Estate (SoE) dataset containing data about real estates owned by the central government in Norway. The dataset is produced by integrating cross-domain government datasets including data from sources such as the Norwegian business entity register, cadastral system, building accessibility register and the previous SoE report. The dataset is made available as Linked Data. The Linked Data generation process includes data acquisition, cleaning, transformation, annotation, publishing, augmentation and interlinking the annotated data as well as quality assessment of the interlinked datasets. The dataset is published under the Norwegian License for Open Government Data (NLOD) and serves as a reference point for applications using data on central government real estates, such as generation of the SoE report, searching properties suitable for asylum reception centres, risk assessment for state-owned buildings or a public building application for visitors.
  • Download paper

M. von Zernichow and D. Roman. Usability of Visual Data Profiling in Data Cleaning and Transformation. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, Springer, 24-25 October 2017, Rhodes, Greece.

  • Abstract: This paper presents the Norwegian State of Estate (SoE) dataset containing data about real estates owned by the central government in Norway. The dataset is produced by integrating cross-domain government datasets including data from sources such as the Norwegian business entity register, cadastral system, building accessibility register and the previous SoE report. The dataset is made available as Linked Data. The Linked Data generation process includes data acquisition, cleaning, transformation, annotation, publishing, augmentation and interlinking the annotated data as well as quality assessment of the interlinked datasets. The dataset is published under the Norwegian License for Open Government Data (NLOD) and serves as a reference point for applications using data on central government real estates, such as generation of the SoE report, searching properties suitable for asylum reception centres, risk assessment for state-owned buildings or a public building application for visitors.
  • Download paper

Roman, D. Sukhobok, N. Nikolov, B. Elvesæter and A. Pultier. The InfraRisk Ontology: Enabling Semantic Interoperability for Critical Infrastructures at Risk from Natural Hazards. To appear in the proceedings of ODBASE 2017 – The 16th International Conference on Ontologies, DataBases, and Applications of Semantics, Springer, 24-25 October 2017, Rhodes, Greece.

  • Abstract: Earthquakes, landslides, and other natural hazard events have severe negative socio-economic impacts. Among other consequences, those events can cause damage to infrastructure networks such as roads and railways. Novel methodologies and tools are needed to analyse the potential impacts of extreme natural hazard events and aid in the decision-making process regarding the protection of existing critical road and rail infrastructure as well as the development of new infrastructure. Enabling uniform, integrated, and reliable access to data on historical failures of critical transport infrastructure can help infrastructure managers and scientist from various related areas to better understand, prevent, and mitigate the impact of natural hazards on critical infrastructures. This paper describes the construction of the InfraRisk ontology for representing relevant information about natural hazard events and their impact on infrastructure components. Furthermore, we present a software prototype that visualizes data published using the proposed ontology.
  • Download paper

New Paper: Data Preparation as a Service Based on Apache Spark

Mahasivam N., Nikolov N., Sukhobok D., Roman D. (2017) Data Preparation as a Service Based on Apache Spark. In: De Paoli F., Schulte S., Broch Johnsen E. (eds) Service-Oriented and Cloud Computing. ESOCC 2017. Lecture Notes in Computer Science, vol 10465. Springer, Cham

  • Abstract: Data preparation is the process of collecting, cleaning and consolidating raw datasets into cleaned data of certain quality. It is an important aspect in almost every data analysis process, and yet it remains tedious and time-consuming. The complexity of the process is further increased by the recent tendency to derive knowledge from very large datasets. Existing data preparation tools provide limited capabilities to effectively process such large volumes of data. On the other hand, frameworks and software libraries that do address the requirements of big data, require expert knowledge in various technical areas. In this paper, we propose a dynamic, service-based, scalable data preparation approach that aims to solve the challenges in data preparation on a large scale, while retaining the accessibility and flexibility provided by data preparation tools. Furthermore, we describe its implementation and integration with an existing framework for data preparation – Grafterizer. Our solution is based on Apache Spark, and exposes application programming interfaces (APIs) to integrate with external tools. Finally, we present experimental results that demonstrate the improvements to the scalability of Grafterizer.
  • Download paper

New paper: Enabling the Use of Sentinel-2 and LiDAR Data for Common Agriculture Policy Funds Assignment

Estrada J, Sánchez H, Hernanz L, Checa MJ, Roman D. Enabling the Use of Sentinel-2 and LiDAR Data for Common Agriculture Policy Funds Assignment. ISPRS International Journal of Geo-Information. 2017; 6(8):255.

  • Abstract: A comprehensive strategy combining remote sensing and field data can be helpful for more effective agriculture management. Satellite data are suitable for monitoring large areas over time, while LiDAR provides specific and accurate data on height and relief. Both types of data can be used for calibration and validation purposes, avoiding field visits and saving useful resources. In this paper, we propose a process for objective and automated identification of agricultural parcel features based on processing and combining Sentinel-2 data (to sense different types of irrigation patterns) and LiDAR data (to detect landscape elements). The proposed process was validated in several use cases in Spain, yielding high accuracy rates in the identification of irrigated areas and landscape elements. An important application example of the work reported in this paper is the European Union (EU) Common Agriculture Policy (CAP) funds assignment service, which would significantly benefit from a more objective and automated process for the identification of irrigated areas and landscape elements, thereby enabling the possibility for the EU to save significant amounts of money yearly.
  • Download paper

New release of DataGraft!

We are delighted to announce the second beta release of the DataGraft platform!

What is DataGraft?

DataGraft serves as the core of the proDataMarket producer portal. DataGraft is an online platform that provides data transformation, publishing and hosting capabilities that aim to simplify the data publishing lifecycle for data workers (i.e., Open Data publishers, Linked Data developers, data scientists).

The DataGraft platform mainly consists of three components – the DataGraft portal, Grafterizer and a cloud-enabled semantic graph database-as-a-service (as shown on the picture below), which is based on a dedicated instance of the Ontotext GraphDB Cloud platform mentioned in a previous blog post.

Main components of DataGraft platform

What’s new?

DataGraft has undergone major changes since the previous version:

  • New asset types for the catalogue and better sharing between users of the platform
    • SPARQL endpoints
    • queries
    • file pages
  • Improved Grafterizer capabilities
    • conditional RDF mappings
    • support various types and formats of tabular inputs
  • Versioning of assets
    • browsing
    • recording of provenance when copying assets
  • Visual browsing of SPARQL endpoints (using RDF Surveyor)
  • New Dashboard
    • more control of user assets
    • instant search and various filters
  • Improved security (authentication) using OAuth2
  • REST API improvements using Swagger
  • Updated version of the semantic graph database, which now supports geospatial queries and serialisation to GeoJSON
  • Various bug fixes and performance improvements
  • Updated user documentation
  • Quota management console allowing users to track their use of resources on the platform

DataGraft beta 2 is available for testing on http://datagraft.io and more details can be found in the platform documentation here. All platform code except the GraphDB Cloud component (used as a service) is open-source and is available on GitHub.

 

New paper: Tabular Data Anomaly Patterns

Sukhobok, N. Nikolov, and D. Roman. Tabular Data Anomaly Patterns. To appear in the proceedings of The 3rd International Conference on Big Data Innovations and Applications (Innovate-Data 2017), 21-23 August 2017, Prague, Czech Republic, IEEE.

  • Abstract: One essential and challenging task in data science is data cleaning — the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer — a software framework that implements those data operations.
  • Download paper

GraphDB Cloud: an on-demand enterprise ready RDF database

We, from Ontotext, are excited to announce GraphDB Cloud – the easy way to introduce you to a semantic database like our signature GraphDB product. The automated tasks in GraphDB Cloud save the organizations the time and effort to install and manage hardware and software as well as the cost to buy it. Compared to a do-it-yourself database, DBaaS offers developers the opportunity to cut down the time it took them to work with their databases and spend their valuable time on creating and innovating instead of administrating.

GraphDB Cloud is one part of the Cognitive Cloud solutions for low-cost and on-demand smart data management.

The users are with the following profile:

  • Small cognitive-technology oriented team in a big organization that needs low upfront and ongoing costs for a database.
  • Start-up companies without a database infrastructure, which requires a reliable technology that scales up along with their business.
  • Corporate solution architects working to solve the challenges their enterprise faces when handling huge amounts of data and information

As a next step, we want to invite you to watch our webinar “GraphDB Cloud – Enterprise Ready RDF Database on Demand” where we introduce you to the DraphDB Cloud console and advise you how you could create custom solutions to address your company’s specific data and information needs.