Researchers Find New Ways To Investigate Big Data Infrastructures With Minimal Disruption

Date posted

5 August 2016

In traditional digital forensics, analysts will examine static traces from hard disks, but increasingly the evidence is found within Cloud-based systems, where the trails of evidence are found on Cloud-based disk systems. Within a Hadoop Cluster we have a number of computers, and where we can run tasks which can take hours, days or even months to run.

Overall the task is broken up into threads, which are then run across the cluster, and where there is redundancy built-in, so that data and processes can be replicated across the infrastructure. If one computer or disk fails, then the data/process can be recovered. In this way we create a robust environment for Big Data analysis.

Another major change in digital investigations has been the move from analysing static data on disks, towards investigating live data - known as live forensics. This is where the RAM of the computer is investigated, and where the device is left powered-on. This changes investigations, as there can be a great deal of information that can be gained if the device is still powered on, such as usernames and passwords, where, if it is powered off, a great deal of user information might be lost.

A new RAM-based method of analysis
In the paper, published in Digital Investigation (, the researchers outline the methods by which traces of evidence can be found within the RAM of a Hadoop Cluster - which is one of the most typical Big Data infrastructures used. The usage of in-memory analysis causes the least amount of disruption to the business process of the cluster, as most companies would not be able to shut-down the Hadoop cluster when it is in operation.