Towards a Synthetic Data Generator for Matching Decision Trees

Research Output

It is popular to use real-world data to evaluate or teach data mining techniques. However, there are some disadvantages to use real-world data for such purposes. Firstly, real-world data in most domains is difficult to obtain for several reasons, such as budget, technical or ethical. Secondly, the use of many of the real-world data is restricted or in the case of data mining, those data sets do either not contain specific patterns that are easy to mine for teaching purposes or the data needs special preparation and the algorithm needs very specific settings in order to find patterns in it. The solution to this could be the generation of synthetic, “meaningful data” (data with intrinsic patterns). This paper presents a framework for such a data generator, which is able to generate datasets with intrinsic patterns, such as decision trees. A preliminary run of the prototype proves that the generation of such “meaningful data” is possible. Also the proposed approach could be extended to a further development for generating synthetic data with other intrinsic patterns

Date:

25 April 2016
Publication Status:

Published
Publisher

SCITEPRESS - Science and and Technology Publications
DOI:

10.5220/0005829001350141
Library of Congress:

QA75 Electronic computers. Computer science
Dewey Decimal Classification:

004 Data processing & computer science
Funders:

Edinburgh Napier Funded

http://researchrepository.napier.ac.uk/output/947202 <p>Peng, T., & Hanke, F. (2016). Towards a Synthetic Data Generator for Matching Decision Trees. In <i>Proceedings of the 18th International Conference on Enterprise Information Systems</i>. , (135-141). https://doi.org/10.5220/0005829001350141</p>

Citation

Peng, T., & Hanke, F. (2016). Towards a Synthetic Data Generator for Matching Decision Trees. In Proceedings of the 18th International Conference on Enterprise Information Systems. , (135-141). https://doi.org/10.5220/0005829001350141

Authors

Dr Taoxin Peng

Lecturer
School of Computing Engineering and the Built Environment

0131 455 2748

T.Peng@napier.ac.uk

Keywords

Synthetic, Data Generator, Data Mining, Decision Trees, Classification, Pattern

Monthly Views:

Available Documents

pdf

Towards a Synthetic Data Generator for Matching Decision Trees

464KB
Downloadable citations
HTML BIB RTF

Date:

Publication Status:

Publisher

DOI:

Library of Congress:

Dewey Decimal Classification:

Funders:

Citation

Authors

Dr Taoxin Peng

Keywords

Monthly Views:

Towards a Synthetic Data Generator for Matching Decision Trees

Downloadable citations