Traditional Culture Encyclopedia - Traditional culture - Construction of the Second Generation Sequencing Library: Overview and Challenges (1)
Construction of the Second Generation Sequencing Library: Overview and Challenges (1)
In the past five years, NGS technology has been widely used by researchers in the field of life sciences. At the same time, with the development and progress of sequencing technology, some methods of nucleic acid extraction and library preparation have been derived. For example, RNA and DNA from a single cell have been successfully used for library preparation. The preparation of NGS library is based on transforming target nucleic acid, RNA or DNA into a form that can be used by sequencer (Figure 1). Here, we compare several library preparation strategies and NGS applications, focusing on libraries compatible with illumina sequencing technology. However, it should be pointed out that almost all the principles discussed in this paper can be applied to other NGS platforms with slight modifications, such as Life Technology Company, Roche Company and Pacific Bioscience Company.
Generally speaking, the core steps of library preparation include: 1) fragmenting and/or selecting fragments with specific length, 2) transforming them into double-stranded form, 3) connecting oligonucleotide linkers to the ends of fragments, 4) quantifying the library; Objective The size of DNA fragment is the key factor to construct NGS library. The methods of nucleic acid fragmentation mainly include physical, enzymatic and chemical methods. Physical methods include acoustic shear (representative: Covaris) and ultrasonic wave (representative: BioRuptor). Enzymatic digestion methods include non-specific endonuclease and transposase fragmentation. In our laboratory, Covaris, Woburn and MA are mainly used to obtain DNA fragments in the range of 100-5000bp, while Covaris g-TUBEs are mainly used to obtain DNA fragments in the range of 6-20kb required for matching libraries. Enzymatic digestion methods include DNase I or fragmented enzyme digestion, and the mixture of the two enzymes (New England biolabs, IP Switzerland, MA). Both methods are effective. However, fragmented enzymes will produce more false indel than physical methods. Another enzyme digestion method is Illumina's Nextera, which uses transposase for random fragmentation and inserts linker sequences into double-stranded DNA. This method has several advantages, including reducing the time of sample processing and preparation.
The size of the library is determined by the size of the inserted fragments (referring to the part of the library between the connecting sequences), because the length of the connecting sequences is constant. On the contrary, the optimal insertion length is determined by NGS equipment and specific sequencing application. For example, in illumina, the optimal fragment size is influenced by the cluster generation process, which includes library preparation, dilution and distribution to the chip surface for amplification. Although short fragment amplification is more effective, long fragment library can produce larger and more dispersed clusters. The largest library we sequenced with illumina was 1500bp.
The optimal library size is also determined by the sequencing application. For exon sequencing, more than 80% of human exons are less than 200bp in length. We detected PE 100bp, and the exon library was about 250bp, which could match the average size of most exons. There were no overlapping reading pairs in the results. The size of RNA-seq library is also determined by the application. For gene expression analysis, we used SE 100 sequencing. However, we choose the scheme of PE 100 to determine the initiation and termination sites of alternative splicing or transcription. In most applications, RNA is reverse transcribed into cDNA before breaking. Generally, divalent metal ions (magnesium or zinc) are used for controlled thermal digestion of RNA. The size of library fragments can be controlled by adjusting the time of digestion reaction, which has good repeatability.
In the recent research on the preparation methods of seven RNA-seq libraries, most of them fragment RNA first, and then add linkers. There are two ways to synthesize full-length cDNA sequences with fixed 3' and 5' sequences without using random primers or using smarter ultra-low RNA kits. The full-length cDNA library (average 2kb) can be amplified by long-distance PCR(LD-PCR). The amplified double-stranded cDNA was cut into appropriate length by acoustic wave, which was used for the preparation of standard illumina library (including terminal repair and flattening, adding A and linker connection, and then amplifying by PCR). )
Another step of processing library size after library construction is chip selection and removing linker dimer or other by-products from library preparation. The linker dimer is the result of the self-connection of the linker. The clustering efficiency of these dimers is very high, which will consume valuable chip space, but will not produce any effective data. Therefore, we usually use magnetic bead method or glue cutting method for recycling. Magnetic bead method is suitable for the case of sufficient raw materials. If the sample input is limited, more linker dimers will be produced. Our experience is that the method based on magnetic beads is not applicable in this case, and it is necessary to combine magnetic beads with rubber cutting and recycling.
In the preparation of microRNA/ MicroRNA library, the target product is usually only 20-30bp longer than the linker dimer of 120bp. Therefore, it is necessary to use rubber cutting and recycling methods to obtain as many target sequences as possible. This separation accuracy is not suitable for magnetic beads. In addition, we often need to build a large library of inserts (1kb), combined with a longer reading length of PE300 and no PCR steps, for the de novo assembly of bacterial genomes. In order to obtain as much data as possible for assembly, it is necessary to carefully cut and recycle the glue to obtain the same size insert.
There are several considerations in the process of constructing a library using DNA samples, including the amount of starting materials and whether the library is used for resequencing (there are reference sequences available for comparison) or de novo sequencing (new reference sequences need to be assembled using offline data). Because there are high GC or low GC regions in the genome, library preparation is prone to bias. At present, methods to solve these problems have been developed, including careful selection of polymerase, cycle number, conditions and buffer for amplification.
The library preparation of DNA samples, whether used in WGS, WES, ChIP-seq or PCR amplicon, usually follows the same process. In general, for any application, the goal is to make the library as complex as possible.
There are several brands of DNA database kits at present. Competition has also led to a rapid decline in prices and an improvement in quality. These kits can handle various levels of DNA starting amount from ug to pg. However, we need to remember that a large initial amount can reduce the number of amplification cycles, so the library is more complicated. In addition to Nextera, library preparation steps usually include: 1) fragmentation, 2) terminal repair, 3)5- terminal phosphorylation, 4)3- terminal addition of a, 5) linker ligation, and 6) several cycles of PCR to enrich products with linkers. The main difference of ion torrent process is that the flat end is connected with different joint sequences.
After initial DNA fragmentation, the mixture of three enzymes (T4 polynucleotide kinase, T4 DNA polymerase and Klenow large fragment) will be used for terminal compensation and 5- terminal phosphorylation. The a- tail was added to the 3- end by Taq polymerase or Klenow fragment (exo-). Taq is more effective in adding tails, but Klenow can be used when heating methods are not available, such as pairing library. In the process of joint connection, the optimal joint: fragment ratio is about 10: 1, calculated by mole. Too many linkers will form dimers that are difficult to separate, and these dimers will dominate the subsequent amplification. After terminal repair and an addition reaction, the method of recovering magnetic beads or glue is applicable, but after the connection reaction, we find that the method of magnetic beads can remove the linker dimer more effectively.
In order to facilitate multi-sample mixing, different barcodes can be used for different samples. In addition, barcodes can also be added by PCR amplification process through primers of different barcodes. High-quality connectors and PCR primers with bar codes can be purchased from multiple suppliers. At present, all the components of DNA library construction, from linker to enzyme, have detailed written instructions, which can be assembled into a self-made library preparation kit.
Another method is Nextera method, which uses transposase to randomly interrupt DNA and label it in a single tube (also called tagging). This engineering enzyme has two functions: fragmenting DNA and adding specific linkers at both ends of fragmented DNA. These linker sequences are used to amplify the inserted fragments in the next PCR process. The barcode will be added to the PCR reaction. Compared with the traditional method, the advantage of this preparation process is that it combines crushing, end repair and joint connection into one. This method is more sensitive to the initial amount of DNA than the mechanical fragmentation method. In order to achieve fracture at a suitable distance, the ratio of transposase to sample is very important. Because the size of fragments depends on the reaction efficiency, all reaction parameters, such as temperature and reaction time, are very critical and need to be strictly controlled.
Some research groups published the results of single-cell genome sequencing. The current strategy uses multi-strand displacement (MDA) to amplify the whole genome. MDA mainly uses random primers and phi29, a highly progressive strand displacement polymerase. Although this technology can generate enough numbers to construct sequencing libraries, one of its problems is a large number of deviations caused by nonlinear amplification. Recently, it is thought that the deviation can be reduced by adding a semi-linear pre-amplification step. Based on single cell separation and microfluidic technology, a single cell bank was prepared by using Fluidgm, and up to 96 single cells could be obtained each time.
For RNA library, we need to screen the library construction scheme according to the sequencing purpose. If the purpose is to find complex and comprehensive transcription events, the library needs to cover the whole transcription group, including coding, non-coding, antisense and intergenic RNA, and it needs to be as complete as possible. However, in many cases, the purpose is only to study transcripts encoding mRNA that can be translated into protein. The other case only involves small RNA, most mirnas, including snoRNA, piRNA, snRNA and tRNA. Although we want to elaborate on the principle of RNA sequencing library, we can't list them one by one. Interested readers can study it themselves.
The first successful example of NGS applied to RNA-seq is miRNA. The preparation of miRNA sequencing library is very simple, usually a one-step reaction. In fact, miRNA has a natural phosphate modification at the 5- terminal, which enables ligase to selectively target miRNA.
In the first step of the illumina step, the 3- terminal blocked and 5- terminal adenylate DNA linker is connected to the RNA sample by truncated T4 RNA ligase 2. This enzyme has been modified to adenylate the 3- terminal linker substrate. Therefore, other RNA fragments will not be linked together in this reaction. Only adenylate oligonucleotides can be linked to the 3- end of free RNA. Since the end of the connector 3 is blocked, self-connection cannot be performed. Next, under the action of ATP and RNA ligase 1, a 5-terminal RNA linker was added. Only 5- terminal phosphorylated RNA molecules can be used as effective substrates in ligation reactions. After the second ligation reaction, the reverse transcription primer hybridizes with the 3- terminal linker, and RT-PCR amplification begins (usually 12 cycles). Because of its small size and predictable fragment size (120bp linker sequence plus 20-30bp miRNA insertion fragment), libraries of barcodes or multiple mixed samples are usually recovered together. Because of the existence of linker dimer and non-miRNA connection (tRNA and snoRNA), the recovery of gum cutting is very important. This library preparation method leads to directional sequencing of the library, always from the fifth end to the third end of the original RNA. Ion Torrent's miRNA sequencing principle is similar. Ion Torrent was connected to the 3- end and 5- end of mirna through two different linkers, and then RT-PCR was performed. Generally, the library construction step can construct any RNA material into a targeted RNA-seq library.
One of the limitations of miRNA library is the low initial amount of RNA (
For the mRNA sequencing library, the methods mainly include synthesizing cDNA with random primers or oligo-dT primers, or adding adapters to mRNA fragments for some form of amplification. MRNA can be started with random primers or oligo-dT to produce a cDNA chain. If random primers are used, rRNA must be removed or reduced first. RRNA can be removed by reagents based on oligonucleotide probes, such as Ribo-Zero and RiboMinus. In addition, polyA RNA can be screened positively by oligo-dT magnetic beads.
It is generally hoped that the library can retain the directionality of the original target RNA chain. For example, antisense RNA produced by reverse transcription plays a role in regulating gene expression. In fact, lncRNA analysis relies on directed RNA sequencing. There are several methods to prepare directed RNA-seq library. Logically, cDNA reaction was carried out to selectively remove 1 of two strands, and dUTP was added when the second cDNA strand was synthesized. Uracil contains chains that can be digested by reactive enzymes or amplified by polymerases that do not recognize uracil. In addition, the addition of actinomycin D can reduce the synthesis of false sense chains in the process of single-stranded cDNA synthesis.
Another hybridization method uses the linker sequence of random or anchored oligo -dT primers to start the synthesis of the first-strand cDNA. Next, in the template transformation step, a 3- terminal linker sequence is added to the cDNA molecule. The obvious advantage of this method is that the first-strand cDNA molecule can be directly amplified by PCR with the unique sequence tag at the 3- terminal, without the need for second-strand synthesis. 5- terminal unique sequence tag is introduced in the first chain synthesis process.
Primer design for cDNA synthesis is very important for RNA-seq library. For example, the rRNA sequence can be removed (not used for further amplification) by designing primers targeting rRNA. NuGEN Ovation RNA-seq combined with SPIA (single primer isothermal amplification) nucleic acid amplification technology and primers used for first-strand cDNA synthesis to inhibit the amplification of rRNA. In another method, 4096 hexamers are used to suppress the rRNA sequence (identify and eliminate the perfect match). 749 hexamers were retained for starting the first-strand cDNA synthesis reaction. As a result, the rRNA reading dropped from 78% to 13%. Another method, DP-seq, uses 44 heptamer primers to amplify most mouse transcripts. This primer design selectively inhibits the amplification of highly expressed transcripts (including rRNA) and provides the estimation of low abundance transcripts in embryonic development models.
Recently, some methods for preparing single-cell RNA libraries have been published. One method is to use the polynucleotide tail of the first cDNA chain combined with template transformation reaction. The result is that the first-strand cDNA product can be amplified by universal PCR primers. As shown in fig. 4B, and has been incorporated into the kit. Another method, called CEL-Seq, synthesizes T7 promoter sequence at the 5- end of cDNA, and then performs phenomenon amplification during in vitro transcription.
The total RNA of a single cell is generally 10pg, while polyA RNA is only 0. 1pg. Therefore, these methods need full transcription amplification to some extent to generate enough initial amount for database construction. The disadvantage of this large-scale amplification is that it produces a lot of technical noise, which has not been solved yet. (? )
Finally, ribosome imprinting can reflect the mixing of cellular mRNA transcripts at any translation node. This method involves the use of ribonuclease to lyse cells, leaving only a region of 30 nucleotides protected by nucleosomes. Nucleosomes were purified by sucrose density gradient centrifugation, and then mRNA was extracted from nucleosomes. Another new application of RNA sequencing is SHAPE-Seq, which uses acylating reagents to modify unpaired bases in a biased way to explore the secondary structure of RNA. Through reverse transcription of modified RNA and unmodified control, the obtained cDNA fragments can be sequenced, and the base pairing information at nucleotide level can be revealed after comparison.
- Previous article:Introduction to triple arthrodesis
- Next article:What brand of clothes is FAPA One?
- Related articles
- What pain points can accounts receivable bills solve in the daily operation of enterprises?
- Five warm compositions describing the fight against the epidemic
- Who can help me translate a few short lines into Shanghai dialect? Thank you (preferably in pinyin)
- What are the tourist attractions in Longyou?
- Self-contained Bhutan, embracing modern trends, becomes less mysterious
- How about the traditional agricultural enterprises through the Internet for transformation and upgrading?
- What are the customs in Nanjing?
- Three Styles of Traditional Architecture in China
- How to make a pure solid wood single sofa
- Dragon Boat Festival data collection 70 words