Traditional Culture Encyclopedia - Traditional culture - DNA storage, a cure to save the human data crisis?
DNA storage, a cure to save the human data crisis?
Like Da Liu, I'm afraid it's too late to stop the Earth's rotation and escape the solar system. And if, like Noah's Ark, a brain carried human beings, plants and animals, and human knowledge to the spaceship, I am afraid that the existing rocket capacity will not be able to hold one billionth of this material.
If we want to preserve as much of Earth's biology as possible for as long as possible, all we need to do is collect and package the DNA sequence information of all the species, and we can preserve it for hundreds of thousands of years in the low-temperature environment of the spaceship; and what about the information of human civilization? We know that the most efficient form of information is data, and that data is stored on hard disks and CD-ROMs.
We have to be discouraged again when we think about the weight and data density of these hard disk storage devices. What's more, the data may be lost before the ship even leaves the solar system, when the hard disk or CD-ROM dies.
So can DNA be used as a hard disk to store data information? The answer is that it can.
DNA is definitely the oldest life information storage tool on the planet, and can also be used as a storage medium for data information, and the storage density and service life should be far beyond the existing disk-based storage solutions. As a result, DNA storage is being recognized as the future of data storage and the best alternative to save the human data storage crisis.
How does DNA storage work? Where is it now? What are the obstacles to commercialization? We'll need to answer each of these questions.
Before we get into how DNA storage works, let's take a quick look at the principles of the two existing solutions, magnetic storage and optical storage.
The principle of magnetic storage is that a magnetic medium is coated on a metallic material, which, when energized, creates an electromagnetic effect that can store and express 0101 binary information. The advantage of magnetic storage hard disk is the speed of entry and reading, the disadvantage is compared with the volume weight, data density is low. After 60 years of development, roughly 3TB of data can be stored on a 3.5-inch size hard disk drive.
The principle of optical storage is to burn digitally encoded video and audio storage in grooves on the surface of the optical disk, and then read the data from these grooves through a laser for dumping or playback. Currently, optical storage is also experiencing storage limits. To store more data, the grooves must be smaller and more compact, requiring greater laser precision. Currently, single-layer Blu-ray discs are capable of holding more than 25GB of information, and another ultraviolet laser, if developed successfully, could reach 500GB of disc capacity.
What are the advantages of DNA storage over magnetic and optical storage?
First, there's the space savings. But these single-layer, tiled storage methods are orders of magnitude more voluminous than the double-helix, three-dimensional structure of DNA, which has a very small physical size and a three-dimensional structure, resulting in a very high data density per unit of space. For example, 1 gram of DNA is less than the size of a drop of dew on your fingertip, but is capable of storing 700TB of data, which is equivalent to 14,000 Blu-ray discs with a capacity of 50GB, or 233 hard disks with a capacity of 3TB (weighing in at almost 151KG).
Plus, it's very energy efficient. Existing storage methods, such as a data center, consume a lot of monocrystalline silicon and a lot of electricity. In contrast, DNA material only needs to be stored in a cool, dry place, basically no additional manual maintenance. Even if the DNA needs to be frozen, the resources and energy consumed are almost negligible.
In addition, the most important point is that the preservation time is very long. Nowadays, high-density memory will decay over time, and the tool that can be stored for the longest time is magnetic tape, which has a lifespan of just 50 years, while other memories have an even shorter lifespan. DNA, on the other hand, has a shelf life of a hundred years, and if you freeze it, you can keep it for thousands or even tens of thousands of years.
It seems that there is a rescue plan for human civilization, but how exactly does DNA storage work?
As we all know, DNA consists of four nitrogenous bases - A, T, C, and G - that are complementarily paired, and scientists assign binary values to adenine (A), guanine (G), cytosine (C), and thymine (T) (A and C = 0 , G and T = 1), and then the gene sequence is synthesized by microfluidic chips. Synthesis was performed so that the position of the sequence was matched to the relevant dataset. This encodes these base pairs as combinations of 1s and 0s, and it is possible to express the language of binary in terms of DNA sequence information.
Once the binary language has been written into the DNA sequence, the "DNA hard disk" can be stored in a cryogenic environment. When it's time to read the data, all you have to do is sequence the target DNA, reduce the base pairs to binary code, and then decode the data to return it to our common data.
The principle is very simple, but how do scientists do it? It's a matter of briefly reviewing the history of DNA storage technology.
The first person to come up with the idea was an artist, Joe Davis, who in 1988, in collaboration with Harvard researchers, took a 7×5 pixel matrix of photographs called Microvenus, converted it into a 35-base DNA sequence, and inserted it into an E. coli bacterium, for the first time writing information that was not part of the natural evolution of the DNA.
(Microvenus stands for woman and Earth)
In 2010, American synthetic biologist Craig Venter (Craig Venter) wrote the first DNA sequence that was not part of the natural evolution of E. coli. Craig Venter led a team of researchers to chemically synthesize the entire Mycoplasma genome DNA, named it "Synthia", and "amused" himself by writing the names of the researchers, the institute, and the name of the subject. The name of the researcher, the Institute's website and a poem by the Irish poet James were encoded into the newly synthesized DNA in a "self-indulgent" way.
In 2011, a team led by synthetic biologists George Church of Harvard and Sriram Kosuri of the University of California, along with genomics experts from Johns Hopkins University, developed the new DNA. Hopkins genome expert Yuan Gao conducted the first proof-of-concept experiments. The team used short DNA fragments to encode a book of Church's 659KB of data.
In 2013, Nick Goldman of the European Bioinformatics Institute (EBI), who had been working on the project for over a decade, published a paper on the subject. Nick Goldman and his team of researchers also successfully encoded the data from a book that included Shakespeare's sonnets and Martin Luther's poems. Martin Luther King's "I Have a Dream". The EBI's Nick Goldman and his research team also managed to write five documents into DNA fragments, including Shakespeare's sonnets and Martin Luther King Jr.'s "I have a dream" speech, and a copy of Watson and Crick's DNA double-helix paper. 739 KB of data made it the largest DNA storage file at the time.
In 2016, Microsoft and the University of Washington used DNA storage technology to complete the storage of about 200MB of data, which became a leap forward in DNA information storage technology.
In July 2017, the journal Nature published a paper by Seth K. Shipman of Harvard Medical School. Seth Shipman and George Church collaborated on a study of living DNA storage. They put a 130-year-old black-and-white movie called "Horses on the Run" on the DNA of E. coli. Although E. coli has a "strange DNA" that not only survives, but also inherits, each reproduction is a copy of the data. And the movie stored in the genome, in each generation of E. coli have been intact.
But because of the replication, division, and death of cells, which can cause the risk of information errors, the future of data security, most of the time the DNA that stores the information is in the form of DNA dry powder, the study of living cell storage shifted to synthetic DNA storage.
In the same year, Columbia University and the New York Genome Center published an algorithm called "DNA Fountain" in the journal Science, an efficient DNA storage strategy. This technique demonstrated the maximization of DNA's storage potential by successfully compressing massive amounts of information into four bases of DNA, encoding 1.6 bits of data per DNA, which is 60% more information than before and approaching the theoretical limit (1.8 bits). The method is capable of storing 215 petabytes of data in one gram of DNA, the equivalent of 220 million movies.
In 2018, researchers at Waterford Institute of Technology (WIT) in Ireland developed a novel DNA storage method that can store 1ZB of data in 1 gram of E. coli DNA.
In 2019, Church's team published the results of another experiment in the journal Science. They encoded a copy of Church's roughly 53,400-word book, Regeneration: How Synthetic Biology Will Transform the Future of Nature and Ourselves, along with 11 images and a Java program, into less than a billionth of a gram of DNA microchips, and then successfully used DNA sequencing to read the book.
These rapid advances in scientific research also mean that DNA synthesis (where data is written) and DNA sequencing (where data is read) are maturing. But at the same time, the DNA encoding process still has issues with storage/reading speed and cost, and DNA storage is still on the way to commercialization.
In the lab, it doesn't look like DNA storage is complicated, but there are still some issues facing commercialization.
First, both storage and readout are slow.DNA storage devices are slow to access and time-consuming to access. In contrast to the electromagnetic signals of disk storage, DNA synthesis depends on a series of chemical reactions. It takes 1 second to write 200MB of data on a disk, and almost 3 weeks to synthesize it with DNA.
Second, the DNA medium cannot be overwritten or rewritten. In DNA, once information is stored in it, it generally cannot be modified. To read the document, you need to completely sequence all the information and then transcode it.
Third, the accuracy of data storage needs to be improved. Currently, repeated reads during DNA sequencing result in a higher probability of read errors.
Fourth, it is difficult to read and write randomly. The current DNA synthesis technology is unable to produce longer DNA molecules at once, and can only synthesize numerous short fragments. This makes it difficult to quickly retrieve specific data from a mixture of small DNA fragments.
Finally, and most importantly, DNA storage is too expensive. For example, it currently costs $800,000 for DNA to store 200MB of data, whereas with electronic devices, the cost is less than a dollar.
But as mentioned above, if put on a longer time scale and data storage space pressure, DNA has a large storage density, high energy saving and environmental protection, the unique advantages of ultra-long stability is revealed. As long as with the development of storage and reading technology, the efficiency of DNA coding and sequencing is improved, and the cost is significantly reduced, DNA storage is not far away from commercialized applications.
So what progress is being made towards commercialization now?
In 2015, Microsoft and the University of Washington collaborated to publish an achievement that uses targeted readout information, which means adding a number of tracking markers to a long strand of DNA. These markers, which are similar to an indexing mechanism, allow the right markers to be selected for reading without having to wait to sequence the full long strand of DNA each time.
In 2018, another breakthrough in readout technology was achieved when Microsoft developed "nanopore" readout technology, which allows columns of DNA media to be squeezed through a very small nanopore to read each of the DNA bases. This technology greatly reduces the cost of space for reading devices, and a palm-sized USB device can be read, but at a speed of a few kilobytes per second, it can still be said to be quite slow.
In March 2019, a Microsoft team published a new advancement in Nature in which they developed the world's first automated DNA storage medium. Being able to encode and decode DNA in an automated way is the way forward for commercialization, as opposed to doing it manually for DNA synthesis and sequencing.
There are also issues about the length and cost of DNA storage and reading that Catalog, a US startup founded in 2016, is trying to address.
Last year, Catalog stored a **** 16G of Wikipedia text in English on a DNA molecule. They used a DNA writer device to record this data in the DNA at 4Mbps. This means that 125GB could be recorded in a single day, which is roughly equivalent to what a high-end cell phone can store. This speed is already three times the storage speed of previous research.
Currently, Catalog uses strands of prefabricated synthetic DNA that are 20 to 30 base pairs long, nested together by enzymes to store more data. The segments are arranged like the English language uses 26 letters, which could theoretically create countless combinations. Catalog estimates that in the future it will cost less than 0.001 cents to perform 1MB of data DNA storage.
Of course, if the startup does manage to bring the cost down significantly in the future, it does have the potential to pave the way for the commercialization of DNA data storage.
In 2019, DNA data storage technology was among the top 10 emerging technologies in the world that year, published by Scientific American in conjunction with the World Economic Forum.
It can be expected that magnetic and optical storage methods will continue to dominate data storage methods for some time to come. However, even if we don't see the end of the world, because of the surge in data in recent years, humans are facing a serious problem of insufficient data storage space. At the same time, the surge in demand for data storage has brought about a surge in the use of silicon wafers, and the resulting problems of environmental pollution, water and energy consumption.
The realization of DNA storage technology will, to a certain extent, alleviate the capacity problem of traditional storage and significantly reduce the consumption of electronic components and energy.
- Previous article:Who was the first person to draw a horse in China?
- Next article:What does a circle of red tattoos mean?
- Related articles
- What is ink made of?
- Famous Stories About Modesty
- What are the traditional craftsmen in China?
- 20 19-05- 18
- This paper expounds the characteristics of Guangdong folk houses (explained from the perspective of geographical knowledge)
- How to set the boot startup in win1
- What are the precautions for offering Guanyin at home?
- How to repair the brightness of an old TV set
- 20 17 what are the latest food safety laws in People's Republic of China (PRC)?
- What is ceiling