Get the latest e-discovery and computer forensics news in one place.

Sign up for the monthly JD&A Newsletter today!






EDD Reviewer Power Tools Volume Deux | Print |  E-mail
Written by Ryan Meeks   
Tuesday, 09 December 2008 10:56

Disclaimer: We are in no way affiliated with the owners or creators of the tool(s) described below; it's just something we've used regularly that's saved us from many hours of lost time and frustration.

Most E-Discovery projects are going to involve a lot of files. This can be thousands of native files or the hundreds of thousands of TIFF and TXT files involved in a load file production. Moving or copying large quantities of files can be a very time consuming process that can delay a review and tie up resources. In addition to speed constraints, traditional copy methods have several shortcomings including lack of verification, inability to handle errors, and more. To rectify this problem, we decided to conduct several tests to find a better way to move our files from disk to disk.

Before any files are moved, we need to look at how they are stored on a hard drive. When a file is saved to a drive, it is stored on a section of the drive called a cluster. Each cluster has a limited size (varying sizes based on the drive) and only one file can be placed in side. If the file size is smaller than the cluster, the extra space will remain empty. If the file is bigger than the cluster, it will carry on to another cluster. Because of this, one hundred files that are smaller than the cluster size will occupy one hundred clusters while one large file that is equal to the size of all one hundred of the smaller files may occupy as few as one cluster. Now that we know how these files are stored, we can look at how the computer is going to move them. When we want to move the large file, the computer only has find one cluster to obtain the information to be copied. To obtain the smaller files, the computer is going to have to find one hundred different clusters which in most instances are not near each other on the drive. From our perspective it is like moving one file folder from a file cabinet to another as opposed to having to move one hundred folders that aren't in the same drawer. Based off this concept, we tested and show that one large file copied faster than lots of smaller files of equal capacity.

After determining that a single large file transferred faster, our next test involved zipping the files. Zip programs allow us to combine multiple files into a single container file. Using both WinRAR and 7-zip (two popular zip programs), both programs were able to compress all the files into a single load and transfer to the destination drive in a fraction of the time that the traditional copy took. Next we had to unzip (uncompress) the files to make them ready for use. While zipping was looking promising, it showed to be ineffective during this uncompress phase which ended up taking just as much time as the traditional copy.

From here we looked at software programs that specialized in copying files. After testing several free tools with little to no success we came across Teracopy. Our tests showed that Teracopy greatly improved transfer time. Not only did it prove more time efficient, it included several additional features that make this a well rounded program. Unlike traditional copy methods, Teracopy easily deals with errors during the copying process. Normally, the entire copy process would be canceled if a file encounters an error and you would be left without knowing what files were not copied. If an error occurs using Teracopy, the program will continue with the rest of the workload and create a log of the files that were troublesome. This allows for an easy review of the files that failed to copy so that they can be handled appropriately. Teracopy also analyzes each original file's "digital fingerprint" and compares it to the copy to ensure an accurate copy. A stop and start feature is also included just in case the process needed to be paused for any reason. Teracopy can also be installed to take the place of traditional Windows copy so that drag-and-drop copying is seamless. When tested on smaller data sets, Teracopy did not show to be much more speed efficient than traditional Windows copy but the added features still made it the choice method of moving files.

A project should not have to be put on hold because a file transfer takes forever or fails in the middle. Although Teracopy cannot eliminate all of your wait time, it will be able to shorten and ensure that the copy completes accurately.