Sukhobok, N. Nikolov, and D. Roman. Tabular Data Anomaly Patterns. To appear in the proceedings of The 3rd International Conference on Big Data Innovations and Applications (Innovate-Data 2017), 21-23 August 2017, Prague, Czech Republic, IEEE.
- Abstract: One essential and challenging task in data science is data cleaning — the process of identifying and eliminating data anomalies. Different data types, data domains, data acquisition methods, and final purposes of data cleaning have resulted in different approaches in defining data anomalies in the literature. This paper proposes and describes a set of basic data anomalies in the form of anomaly patterns commonly encountered in tabular data, independently of the data domain, data acquisition technique, or the purpose of data cleaning. This set of anomalies can serve as a valuable basis for developing and enhancing software products that provide general-purpose data cleaning facilities and can provide a basis for comparing different tools aimed to support tabular data cleaning capabilities. Furthermore, this paper introduces a set of corresponding data operations suitable for addressing the identified anomaly patterns and introduces Grafterizer — a software framework that implements those data operations.
- Download paper
We, from Ontotext, are excited to announce GraphDB Cloud – the easy way to introduce you to a semantic database like our signature GraphDB product. The automated tasks in GraphDB Cloud save the organizations the time and effort to install and manage hardware and software as well as the cost to buy it. Compared to a do-it-yourself database, DBaaS offers developers the opportunity to cut down the time it took them to work with their databases and spend their valuable time on creating and innovating instead of administrating.
GraphDB Cloud is one part of the Cognitive Cloud solutions for low-cost and on-demand smart data management.
The users are with the following profile:
- Small cognitive-technology oriented team in a big organization that needs low upfront and ongoing costs for a database.
- Start-up companies without a database infrastructure, which requires a reliable technology that scales up along with their business.
- Corporate solution architects working to solve the challenges their enterprise faces when handling huge amounts of data and information
As a next step, we want to invite you to watch our webinar “GraphDB Cloud – Enterprise Ready RDF Database on Demand” where we introduce you to the DraphDB Cloud console and advise you how you could create custom solutions to address your company’s specific data and information needs.