Robert-Jan Mora and Bas Kloet have released an interesting paper called DigitalForensicSampling.pdf and it's about applying statistical sampling to digital forensics. Digital forensic practitioners are frequently faced with extremely large amounts of data to analyze, a situation that looks to get worse as storage capacities continue to increase. Mora and Kloet propose the use of random sampling for certain types of cases as a means of alleviating this problem.
Here's a quote from the paper's introduction:
In this paper we would like to address a few problems that we encounter in the digital forensic field, in general, which probably will get worse if our methods do not get smarter soon. A few problems that the digital forensic community has to deal with are:
- The amount of data that needs to be investigated in cases increases every year;
- Forensic software is unstable when processing large quantities of data;
- Law Enforcement has a huge backlog in processing cases in time;
- More and more pressure is placed on digital forensic investigators to produce reliable results in a small amount of time.
- So what can we do to be more effective and investigate the right data at the right time? In this paper we would like to propose a solution based on the technique of random sampling, which can be applied to the working field of digital forensics. The goal of this paper is to explain:
- when and why random sampling might be useful in a digital forensic investigation;
- present the reader with background information on relatively straightforward random sampling techniques;
- describe a number of cases where random sampling might be used to drastically reduce the amount of work required in a digital forensic investigation, without a significant (negative) impact on the reliability of the investigation.
The full paper is available at the link below:
DigitalForensicSampling.pdf