Researchers store operating system and video in DNA
Researchers at the New York Genome Center have proposed an efficient and robust method for storing data in DNA. As a proof-of-concept, they were able to save an operating system, video and gift card and get them back without errors.
In recent years, great strides have been made in storing data in DNA, but according to the researchers of Team Erlich, involved in the New York Genome Center, there is room for improvement. They point out that existing methods are not always scalable and usually have flaws in retrieving the information.
They therefore propose a new method called DNA Fountain. This strategy would approach the theoretical maximum of Shannon’s capacity to store information per nucleotide. At DNA Fountain, they process a file into a series of non-overlapping segments of a specified length. They then encode them into short data packets, droplets, that can be sent over a noisy channel. DNA storage can be compared to a communication channel with noise. The 38-byte droplets contain a 32-byte payload, a 4-byte seed, and an additional 2 bytes for error correcting code. The seed corresponds to the state of the random generator at the time the droplet was created. This allows the seed to allow a decoder algorithm to derive the identities of the segments in the droplet.
With their method, the researchers believe that in theory they can copy data without limitation while preserving data integrity. In a test, they encoded a compressed file of 2,146,816 bytes into DNA. The tarball contained the complete Kolibri operating system, a $50 Amazon gift card, a short video, and an image of the Pioneer program’s gold record. They recovered the file completely without errors, with the decoding taking nine minutes using a Python script on a standard laptop.
The scientists released their research in a pre-publication called Capacity-approaching DNA storage. The paper has not yet been peer-reviewed.