New Storage Technology to Enhance Digital Asset Management Systems

8GB of storage, then and now. Photo by Andrew Forber

8GB of storage, then and now.

As a reader of blogs like this one, you’re probably already familiar with Moore’s Law. It states, more or less, that the power of new computers increases exponentially with time as the costs come down. If you look at the history of microprocessors going back to the 1970s, you’ll see that CPU speeds have doubled on average about every 1.8 years.

For those of us who have been doing software development for thirty years this has seemed like a surprisingly steady progression because, I suppose, we’ve been immersed in the technology. When a new, faster iteration of a key technology emerges it’s often a few months before business conditions and prices come down enough that we have a reason for adopting it, so it already seems old before we get to play. It’s actually easy to become jaded about how fast things are getting faster – until, that is, a jump in the state of the technology is larger and more significant than most. Then we take notice.

The most exciting thing happening in IT these days is the introduction of low-cost solid state disk drives. SSDs represent a major advancement with the potential to revolutionize database systems and particularly the search engines used in digital asset management systems and electronic document discovery.

The reason that SSDs are particularly good for full-text search engines like MerlinOne’s Dox is that a large part of the work involved in creating, maintaining, and searching an index is spent moving the head of a conventional disk from one track to another. The data used by a text search engine is typically stored in what’s called an “inverted word index”. To index a large document containing a few hundred distinct words requires storing data in a few hundred distinct locations in the index files. (The problem becomes even worse for phrase-based indexing.) With mechanical disk drives, even with advanced caching and clever programming, a lot of seeks are needed to get the read/write heads to where the data resides. Google’s web search, for example, solves this problem by storing their indexes in RAM, in huge arrays of machines with large amounts of memory installed. That’s not practical for smaller-scale systems.

In fact it has been true for some time that on a typical server the CPU is no longer the gating factor in the speed of the machine. Millisecond seek times are more critical to performance than GHz of processor speed. Solid-state disks have the potential to speed up computers mostly because they don’t require extra time for disk head movement when data is scattered across the media. Because of that, we should expect to see the cost of high-performance enterprise search systems come down, and their performance increase, over the next few years as the technology improves.

Is it perfect? Not yet. Yes, the costs of drives using the current generation of Flash technology will come down, and that will improve performance for management of large relatively static document collections. But flash SSDs still suffer from a limitation: a given block of data can only be written about a million times before that part of the drive won’t take data any more. When that limitation is eliminated by the next technological leap, we can start using the drives for storing actual relational database information: and then we’ll have a revolution in IT in general.

And it’s coming none too soon. As the acceleration of the e-Discovery business continues, and as more and more data becomes subject to discovery in civil litigation, the industry will need every new advancement it can find just to keep pace.

Posted by Andrew Forber
Photo by Andrew Forber

Leave a Reply