Friday, September 18, 2009

An improved Huffman coding method for archiving text, images, and music characters in DNA

Menachem Ailenberg, Ori D. Rotstein

Departments of Surgery, University of Toronto, and St. Michael's Hospital, Li Ka Shing Knowledge Institute, Keenan Research Centre, Toronto, Ontario, Canada

BioTechniques, Vol. 47, No. 3, September 2009, pp. 747–754

 Full Text (PDF)

Supplementary Material

Supplementary Material For: An improved Huffman coding method for archiving text, images, and music characters in DNA (.pdf)


An improved Huffman coding method for information storage in DNA is described. The method entails the utilization of modified unambiguous base assignment that enables efficient coding of characters. A plasmid-based library with efficient and reliable information retrieval and assembly with uniquely designed primers is described. We illustrate our approach by synthesis of DNA that encodes text, images, and music, which could easily be retrieved by DNA sequencing using the specific primers. The method is simple and lends itself to automated information retrieval.


The increasing use of digital technology presents a challenge for existing storage capabilities. The need for a reliable and long-term solution for information storage is further heightened by the prediction that the current magnetic and optical storage will become unrecoverable within a century or less (1). DNA is a compact, long-term, and proven medium for information storage. Indeed, over the last few decades, a good case has been made for crucial information storage in DNA (2). Desirable properties of DNA include its capacity for long-term information storage and recovery, which are mostly independent of technological changes, the ability to conceal data in a miniaturized fashion and its ability to be transferred, when required, via self propagation (1,2,3,4,5,6). Various approaches for information coding in DNA have been reported, including the Huffman code, the comma code, and the alternating code (4), a straight coding based on 3 bases per letter (1,2,6), or sequential conversion of text to keyboard scan codes followed by conversion to hexadecimal code and then conversion to binary code with a designed nucleotide encryption key (5). Each approach offers advantages and inherent difficulties, and differs in the degree of economical use of nucleotides. We sought to develop an alternate approach for information archiving in DNA. We used the principles of the Huffman code (4,7) to define DNA codons for the entire keyboard, for unambiguous information coding. The approach described in this manuscript is based on the construction of plasmid library for information archiving with specially designed primers embedded in the message segment with an exon/intron structure for rapid, reliable, and efficient information retrieval.

Materials and methods

The DNA coding was based on modification of the Huffman code (2,4,7,8). We also adopted the nomenclature suggested by Cox (2) for definition of the DNA segment representing a single character as ‘codon’. DNA (844 bp; Figure 1A) was synthesized and inserted as a SacI/KpnI fragment in pBluescript-based plasmid (Mr. Gene GmbH, Regensburg, Germany). Sequence confirmation of supplied plasmid was provided by the manufacturer using plasmid universal primer. For information retrieval, plasmid (300 ng/7 µL) was mixed with sequencing primer (5 pmol/0.7µL; Sigma, Oakville, Ontario) (Figure 1B) and subjected to sequencing (service was performed by The Centre for Applied Genomics, The Hospital for Sick Children, Toronto, ON, Canada). The chromatogram was created using the FinchTV 1.4 application (Geospiza Inc., Seattle, WA, USA). Sequences of designed and sequenced DNA were aligned using bl2seq (NCBI, Bethesda, MD, USA). PCR amplification was performed in iQ5 cycler (Bio-Rad Laboratories, Mississauga, ON, Canada). A reaction mixture contained 2 units Taq polymerase with 1× reaction buffer (New England BioLabs, Pickering, ON, Canada), 0.2 mM each dNTP (Fermentas, Burlington, ON, Canada), 0.3 mM each primer, 200 ng plasmid DNA, and UltraPure distilled water (Invitrogen, Burlington, ON, Canada) to a volume of 20 µL. PCR conditions were 94°C for 3 min; 94°C for 30 s, 55°C for 30 s, and 72°C for 60 s for 30 cycles; then 72°C for 7 min final extension; and hold at 4°C. Ten micro-liters of PCR reaction was mixed with 2 µL 6× loading buffer (Fermentas). DNA fragment size was determined by loading in parallel 5 µL 100-bp DNA ladder (Fermentas) and resolved on 1% agarose gel (Bioshop, Burlington, ON, Canada). Gel was visualized with UV transillumination, and image was captured with Biospectrum AC Imaging System (UVP, Upland, CA, USA).

Read More

No comments: