Research Output
Fraud prevention in the B2C e-Commerce mail order business: a framework for an economic perspective on data mining
  A remarkable gap exists between the financial impact of fraud in the B2C e-commerce mail order business and the amount of research conducted in this area — whether it be qualitative or quantitative research about fraud prevention. Projecting published fraud rates of only approx. one percent to e-commerce sales data, the affected sales volume amounts to $ 651 million in the German market, and in the North American market, the volume amounts to $ 5.22 billion; empirical data, however, indicate even higher fraud rates. Low profit margins amplify the financial damage caused by fraudulent activities. Hence, companies show increasing concern for raising numbers of internet fraud. The problem motivates companies to invest into data analytics and, as a more sophisticated approach, into automated machine learning systems in order to inspect and evaluate the high volume of transactions in which potential fraud cases can be buried. In other areas that face fraud (e.g. automobile insurance), machine learning has been applied successfully. However, there is little evidence yet about which variables may act as fraud risk indicators and how to design such systems in the e-commerce mail order business. In this research, mixed methods are applied in order to investigate the question how computer-aided systems can help detect and prevent fraudulent transactions. In the qualitative part, experts from fraud prevention companies are interviewed in order to understand how fraud prevention has been conventionally conducted in the e-commerce mail order business. The quantitative part, for which a dataset containing transactions from one of the largest e-commerce firms in Europe has been analyzed, consists of three analytical components: First, feature importance is evaluated by computing information gain and training a decision tree in order to find out which features are relevant fraud indicators. Second, a prediction model is built using logistic regression and gradient boosted trees. The prediction model allows to estimate the fraud risk of future transactions. Third, because risk estimation alone does not equal profit maximization, utility theory is woven into prioritization of transactions such that the model optimizes the financial value of fraud prevention activities. Results indicate that the interviewed companies want to use intelligent computer-aided systems that support manual inspection activities through the use of data mining techniques. Feature analysis reveals that some features, such as whether a shipment has been sent to a parcel shop, can help separate fraudulent from legitimate orders better than others. The predictive model yields promising results as it is able to correctly identify approximately 86% of the 2% most suspicious transactions as fraud. When the model is used to optimize the financial outcome instead of pure classification quality, results suggest that the company providing the dataset could achieve substantial additional savings of up to 87% through introduction of expected utility as a ranking measure when being constrained by limited inspection resources.

  • Type:


  • Date:

    29 June 2018

  • Publication Status:


  • Library of Congress:

    QA75 Electronic computers. Computer science

  • Dewey Decimal Classification:

    006.3 Artificial intelligence

  • Funders:

    Edinburgh Napier Funded


Knuth, T. Fraud prevention in the B2C e-Commerce mail order business: a framework for an economic perspective on data mining. (Thesis). Edinburgh Napier University. Retrieved from



Data Mining, E-Commerce, Fraud Prevention,

Monthly Views:

Available Documents