Paleontologists routinely resurrect and sequence DNA from woolly mammoths and other long-extinct species. Future paleontologists, or librarians, may do much the same to pull up Shakespeare’s sonnets, listen to Martin Luther King Jr.’s “I have a dream” speech, or view photos. Researchers in the United Kingdom report today that they’ve encoded these works and others in DNA and later sequenced the genetic material to reconstruct the written, audio, and visual information.
The new work isn’t the first example of large-scale storage of digital information in DNA. Last year, researchers led by bioengineers Sriram Kosuri and George Church of Harvard Medical School reported that they stored a copy of one of Church’s books in DNA, among other things, at a density of about 700 terabits per gram, more than six orders of magnitude more dense than conventional data storage on a computer hard disk. Now, researchers led by molecular biologists Nick Goldman and Ewan Birney of the European Bioinformatics Institute (EBI) in Hinxton, U.K., report online today in Nature that they’ve improved the DNA encoding scheme to raise that storage density to a staggering 2.2 petabytes per gram, three times the previous effort.
To do so, the team first translated written words or other data into a standard binary code of 0s and 1s, and then converted this to a trinary code of 0s, 1s, and 2s—a step needed to help prevent the introduction of errors. The researchers then rewrote that data as strings of DNA’s chemical bases: As, Gs, Cs, and Ts. At the storage density achieved, a single gram of DNA would hold 2.2 million gigabits of information, or about what you can store in 468,000 DVDs. What’s more, the researchers also added an error correction scheme, encoding the information multiple times, among other tricks, to ensure that it could be read back with 100% accuracy.
Beyond demonstrating DNA’s superlative information storage abilities, Goldman, Birney, and their colleagues also asked when such a technology might be worth implementing. Institutions such as the Large Hadron Collider, a particle accelerator in Geneva, Switzerland, produce on the order of 15 petabytes of data each year. So the need for vast archival storage is growing rapidly. Now, such institutions commonly archive data by storing it on magnetic tape. Keeping that data safe over many decades requires rewriting it at regular intervals, adding to the cost of preservation. DNA, on the other hand, can be stable for thousands of years if kept in a cool, dry place. Goldman also notes that the costs of synthesizing DNA, which corresponds to writing the code, as well as sequencing, or reading out the code, are dropping fast. According to the EBI researchers, at current rates, DNA data storage is now cost-effective for only data that need to be archived for 600 years or more. But if the costs of DNA synthesis—currently the most expensive part of the enterprise—drop 100-fold, that break-even number would drop to about 50 years.
Harvard’s Kosuri calls the latest study “good work.” But he says that cost won’t be the hitch. For starters, he notes, once you write a batch of data in DNA, you can’t change it or rewrite over it, as is often done with other data storage technologies. And you can’t access any particular piece of information, but rather must sequence large swaths of DNA to find what you’ve archived.
So even though DNA’s data storage densities are off the charts, it may still be worth putting those family photos on a DVD for now.