Towards a context-dependent numerical data quality evaluation framework

Research output: Working paper

36 Downloads (Pure)

Abstract

This paper focuses on numeric data, with emphasis on distinct characteristics like varying significance, unstructured format, mass volume and real-time processing. We propose a novel, context-dependent valuation framework specifically devised to assess quality in numeric datasets. Our framework uses eight relevant data quality dimensions, and provide a simple metric to evaluate dataset quality along each dimension. We argue that the proposed set of dimensions and corresponding metrics adequately captures the unique quality antipatterns that are typically associated with numerical data. The introduction of our framework is part of a wider research effort that aims at developing an articulated numerical data quality improvement approach for Oil and Gas exploration and production workflows that is based on artificial intelligence techniques.
Original languageEnglish
PublisherArXiv
Number of pages12
Publication statusSubmitted - 22 Oct 2018

Fingerprint

Artificial intelligence
Processing
Gases
Oils

Keywords

  • cs.DB
  • data quality
  • numerical data
  • evaluation framework

Cite this

@techreport{0a3158b261ed4414977bb92ad668fdf8,
title = "Towards a context-dependent numerical data quality evaluation framework",
abstract = "This paper focuses on numeric data, with emphasis on distinct characteristics like varying significance, unstructured format, mass volume and real-time processing. We propose a novel, context-dependent valuation framework specifically devised to assess quality in numeric datasets. Our framework uses eight relevant data quality dimensions, and provide a simple metric to evaluate dataset quality along each dimension. We argue that the proposed set of dimensions and corresponding metrics adequately captures the unique quality antipatterns that are typically associated with numerical data. The introduction of our framework is part of a wider research effort that aims at developing an articulated numerical data quality improvement approach for Oil and Gas exploration and production workflows that is based on artificial intelligence techniques.",
keywords = "cs.DB, data quality, numerical data, evaluation framework",
author = "Marev, {Milen S.} and Ernesto Compatangelo and Wamberto Vasconcelos",
year = "2018",
month = "10",
day = "22",
language = "English",
publisher = "ArXiv",
type = "WorkingPaper",
institution = "ArXiv",

}

TY - UNPB

T1 - Towards a context-dependent numerical data quality evaluation framework

AU - Marev, Milen S.

AU - Compatangelo, Ernesto

AU - Vasconcelos, Wamberto

PY - 2018/10/22

Y1 - 2018/10/22

N2 - This paper focuses on numeric data, with emphasis on distinct characteristics like varying significance, unstructured format, mass volume and real-time processing. We propose a novel, context-dependent valuation framework specifically devised to assess quality in numeric datasets. Our framework uses eight relevant data quality dimensions, and provide a simple metric to evaluate dataset quality along each dimension. We argue that the proposed set of dimensions and corresponding metrics adequately captures the unique quality antipatterns that are typically associated with numerical data. The introduction of our framework is part of a wider research effort that aims at developing an articulated numerical data quality improvement approach for Oil and Gas exploration and production workflows that is based on artificial intelligence techniques.

AB - This paper focuses on numeric data, with emphasis on distinct characteristics like varying significance, unstructured format, mass volume and real-time processing. We propose a novel, context-dependent valuation framework specifically devised to assess quality in numeric datasets. Our framework uses eight relevant data quality dimensions, and provide a simple metric to evaluate dataset quality along each dimension. We argue that the proposed set of dimensions and corresponding metrics adequately captures the unique quality antipatterns that are typically associated with numerical data. The introduction of our framework is part of a wider research effort that aims at developing an articulated numerical data quality improvement approach for Oil and Gas exploration and production workflows that is based on artificial intelligence techniques.

KW - cs.DB

KW - data quality

KW - numerical data

KW - evaluation framework

M3 - Working paper

BT - Towards a context-dependent numerical data quality evaluation framework

PB - ArXiv

ER -