Research Output

A framework for data cleaning in data warehouses.

  It is a persistent challenge to achieve a high quality of data in data warehouses. Data cleaning is a crucial task for such a challenge. To deal with this challenge, a set of methods and tools has been developed. However, there are still at least two questions needed to be answered: How to improve the efficiency while performing data cleaning? How to improve the degree of automation when performing data cleaning? This paper challenges these two questions by presenting a novel framework, which provides an approach to managing data cleaning in data warehouses by focusing on the use of data quality dimensions, and decoupling a cleaning process into several sub-processes. Initial test run of the processes in the framework demonstrates that the approach presented is efficient and scalable for data cleaning in data warehouses.

  • Type:

    Article

  • Date:

    30 November 2007

  • Publication Status:

    Published

  • Publisher

    Springer Verlag

  • Library of Congress:

    QA75 Electronic computers. Computer science

Citation

Peng, T. (2007). A framework for data cleaning in data warehouses.

Authors

Keywords

data cleaning; data warehouse; performance efficiency; automation; data quallity; decoupling; scalable;

Available Documents