Key Points
-
Most of the human genome consists of DNA that does not code for proteins.
-
Annotating functional regions in the non-coding genome involves two complementary analysis techniques: comparative analysis, which involves examining DNA sequences, and functional analysis, which involves examining the output of functional genomics experiments.
-
With the exponential increase in DNA sequence data, it is now possible to compare sequences within a single human haplotype, between cell types in a single person, across the human population and between species. Integrating the analysis across all these scales is useful.
-
There are two main methods of sequence comparison: scanning for regions of high sequence similarity above some operational threshold, and building statistical models of sequence families. Model-based sequence analysis can incorporate more biological knowledge than sequence similarity scans and provide more refined results.
-
The output of most high-throughput functional genomics experiments can be treated as a continuous signal mapped onto the genome and analysed with a standardized signal processing approach.
-
Signal processing involves smoothing the raw signal, then thresholding and segmenting the signal into discrete annotated blocks.
-
Integration of multiple types of signals generates a progression of more and more complex annotations; these smaller annotations are clustered into groups and then into functional networks that begin to represent the state of biological knowledge about the genome.
-
A chronic problem with annotation based on functional genomics data is the lack of sufficient validation by more low-throughput methods.
-
Techniques such as paired-end sequencing and chromosome conformation capture (and its descendants) enable annotation of connectivity between elements and necessitate a move beyond the one-dimensional signal approach to annotation.
Abstract
Most of the human genome consists of non-protein-coding DNA. Recently, progress has been made in annotating these non-coding regions through the interpretation of functional genomics experiments and comparative sequence analysis. One can conceptualize functional genomics analysis as involving a sequence of steps: turning the output of an experiment into a 'signal' at each base pair of the genome; smoothing this signal and segmenting it into small blocks of initial annotation; and then clustering these small blocks into larger derived annotations and networks. Finally, one can relate functional genomics annotations to conserved units and measures of conservation derived from comparative sequence analysis.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$189.00 per year
only $15.75 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Britten, R. J. & Kohne, D. E. Repeated sequences in DNA. Science 161, 529–540 (1968).
Ohno, S. So much 'junk' DNA in our genome. Brookhaven Symp. Biol. 23, 366–370 (1972).
Lewin, R. Proposal to sequence the human genome stirs debate. Science 232, 1598–1600 (1986).
Robertson, M. The proper study of mankind. Nature 322, 11 (1986).
Choi, M. et al. Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc. Natl Acad. Sci. USA 106, 19096–19101 (2009).
Gnirke, A. et al. Solution hybrid selection with ultra-long oligonucleotides for massively parallel targeted sequencing. Nature Biotech. 27, 182–189 (2009).
Ng, S. B. et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272–276 (2009).
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Venter, J. C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Ghildiyal, M. & Zamore, P. D. Small silencing RNAs: an expanding universe. Nature Rev. Genet. 10, 94–108 (2009).
Bejerano, G. et al. Ultraconserved elements in the human genome. Science 304, 1321–1325 (2004).
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Pennacchio, L. A. et al. In vivo enhancer analysis of human conserved non-coding sequences. Nature 444, 499–502 (2006).
Kleinjan, D. A. & van Heyningen, V. Long-range control of gene expression: emerging mechanisms and disruption in disease. Am. J. Hum. Genet. 76, 8–32 (2005).
Yeager, M. et al. Comprehensive resequence analysis of a 136 kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum. Genet. 124, 161–170 (2008).
Visel, A., Rubin, E. M. & Pennacchio, L. A. Genomic views of distant-acting enhancers. Nature 461, 199–205 (2009).
Lupski, J. R. Genomic disorders: structural features of the genome can lead to DNA rearrangements and human disease traits. Trends Genet. 14, 417–422 (1998). A prescient exposition of the important link between disease and structural variation in the human genome.
Kidd, J. M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008). The first high-resolution sequence map of human structural variation.
Lupski, J. R. & Stankiewicz, P. Genomic disorders: molecular mechanisms for rearrangements and conveyed phenotypes. PLoS Genet. 1, e49 (2005).
The ENCODE Project Consortium. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature 447, 799–816 (2007). A comprehensive overview of what was learned during the ENCODE pilot project.
Celniker, S. E. et al. Unlocking the secrets of the genome. Nature 459, 927–930 (2009).
Searls, D. B. The language of genes. Nature 420, 211–217 (2002).
Whitfield, J. Across the curious parallel of language and species evolution. PLoS Biol. 6, e186 (2008).
Pagel, M. Human language as a culturally transmitted replicator. Nature Rev. Genet. 10, 405–415 (2009).
Saha, S., Bridges, S., Magbanua, Z. V. & Peterson, D. G. Empirical comparison of ab initio repeat finding programs. Nucleic Acids Res. 36, 2284–2294 (2008).
Washietl, S. et al. Structured RNAs in the ENCODE selected regions of the human genome. Genome Res. 17, 852–864 (2007).
Harrow, J. et al. GENCODE: producing a reference annotation for ENCODE. Genome Biol. 7, S4 (2006).
Zhang, Z. L. et al. PseudoPipe: an automated pseudogene identification pipeline. Bioinformatics 22, 1437–1439 (2006).
Karro, J. E. et al. Pseudogene.org: a comprehensive database and comparison platform for pseudogene annotation. Nucleic Acids Res. 35, D55–D60 (2007).
Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge Univ. Press, 1998).
Miller, W., Makova, K. D., Nekrutenko, A. & Hardison, R. C. Comparative genomics. Annu. Rev. Genomics Hum. Genet. 5, 15–56 (2004).
Margulies, E. H. & Birney, E. Approaches to comparative sequence analysis: towards a functional view of vertebrate genomes. Nature Rev. Genet. 9, 303–313 (2008).
Ren, B. et al. Genome-wide location and function of DNA binding proteins. Science 290, 2306–2309 (2000).
Iyer, V. R. et al. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature 409, 533–538 (2001).
Lee, T. I., Johnstone, S. E. & Young, R. A. Chromatin immunoprecipitation and microarray-based analysis of protein location. Nature Protoc. 1, 729–748 (2006).
Johnson, D. S., Mortazavi, A., Myers, R. M. & Wold, B. Genome-wide mapping of in vivo protein–DNA interactions. Science 316, 1497–1502 (2007).
Robertson, G. et al. Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing. Nature Methods 4, 651–657 (2007).
Park, P. J. ChIP–seq: advantages and challenges of a maturing technology. Nature Rev. Genet. 10, 669–680 (2009).
Bertone, P. et al. Global identification of human transcribed sequences with genome tiling arrays. Science 306, 2242–2246 (2004).
Cheng, J. et al. Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science 308, 1149–1154 (2005).
Mortazavi, A., Williams, B. A., McCue, K., Schaeffer, L. & Wold, B. Mapping and quantifying mammalian transcriptomes by RNA–seq. Nature Methods 5, 621–628 (2008).
Nagalakshmi, U. et al. The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 320, 1344–1349 (2008).
Sultan, M. et al. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321, 956–960 (2008).
Wang, Z., Gerstein, M. & Snyder, M. RNA–seq: a revolutionary tool for transcriptomics. Nature Rev. Genet. 10, 57–63 (2009).
Karolchik, D. et al. The UCSC Genome Browser Database. Nucleic Acids Res. 31, 51–54 (2003).
Lister, R. et al. Human DNA methylomes at base resolution show widespread epigenomic differences. Nature 462, 315–322 (2009).
Bernstein, B. E. et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell 125, 315–326 (2006).
Barski, A. et al. High-resolution profiling of histone methylations in the human genome. Cell 129, 823–837 (2007).
Mikkelsen, T. S. et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature 448, 553–560 (2007).
Royce, T. E., Rozowsky, J. S. & Gerstein, M. B. Assessing the need for sequence-based normalization in tiling microarray experiments. Bioinformatics 23, 988–997 (2007).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Li, R. Q., Li, Y. R., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Zhang, Z. D., Rozowsky, J., Snyder, M., Chang, J. & Gerstein, M. Modeling ChIP sequencing in silico with applications. PLoS Comput. Biol. 4, e1000158 (2008).
Rozowsky, J. et al. PeakSeq enables systematic scoring of ChIP–seq experiments relative to controls. Nature Biotech. 27, 66–75 (2009).
Auerbach, R. K. et al. Mapping accessible chromatin regions using Sono-Seq. Proc. Natl Acad. Sci. USA 106, 14926–14931 (2009).
Kapranov, P. et al. Large-scale transcriptional activity in chromosomes 21 and 22. Science 296, 916–919 (2002).
Rinn, J. L. et al. The transcriptional activity of human Chromosome 22. Genes Dev. 17, 529–540 (2003).
Kapranov, P. et al. RNA maps reveal new RNA classes and a possible function for pervasive transcription. Science 316, 1484–1488 (2007).
Ponjavic, J., Ponting, C. P. & Lunter, G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 17, 556–565 (2007).
Struhl, K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nature Struct. Mol. Biol. 14, 103–105 (2007).
van Bakel, H., Nislow, C., Blencowe, B. J. & Hughes, T. R. Most dark matter transcripts are associated with known genes. PLoS Biol. 8, e1000371 (2010). A recent reappraisal, based on RNA–seq and tiling-array data, of the degree of pervasive transcription in the human genome.
Farnham, P. J. Insights from genomic profiling of transcription factors. Nature Rev. Genet. 10, 605–616 (2009).
Pinkel, D. et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nature Genetics 20, 207–211 (1998).
Gokcumen, O. & Lee, C. Copy number variants (CNVs) in primate species using array-based comparative genomic hybridization. Methods 49, 18–25 (2009).
Stathopoulos, A., Van Drenth, M., Erives, A., Markstein, M. & Levine, M. Whole-genome analysis of dorsal-ventral patterning in the Drosophila embryo. Cell 111, 687–701 (2002). An elegant study of the effect of transcription factor concentration on the arrangement of cis -regulatory elements at target genes.
Tantin, D., Gemberling, M., Callister, C. & Fairbrother, W. High-throughput biochemical analysis of in vivo location data reveals novel distinct classes of POU5F1(Oct4)/DNA complexes. Genome Res. 18, 631–639 (2008).
Zhang, Z. D. D. et al. Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. Genome Res. 17, 787–797 (2007).
Rozowsky, J. S. et al. The DART classification of unannotated transcription within the ENCODE regions: associating transcription with known and novel loci. Genome Res. 17, 732–745 (2007).
Bailey, J. A. & Eichler, E. E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nature Rev. Genet. 7, 552–564 (2006).
Kim, P. M. et al. Analysis of copy number variants and segmental duplications in the human genome: evidence for a change in the process of formation in recent evolutionary history. Genome Res. 18, 1865–1874 (2008).
Zheng, D. et al. Pseudogenes in the ENCODE regions: consensus annotation, analysis of transcription, and evolution. Genome Res. 17, 839–851 (2007).
Tam, O. H. et al. Pseudogene-derived small interfering RNAs regulate gene expression in mouse oocytes. Nature 453, 534–538 (2008).
Watanabe, T. et al. Endogenous siRNAs from naturally formed dsRNAs regulate transcripts in mouse oocytes. Nature 453, 539–543 (2008).
Sasidharan, R. & Gerstein, M. Protein fossils live on as RNA. Nature 453, 729–731 (2008).
Ahituv, N. et al. Deletion of ultraconserved elements yields viable mice. PLoS Biol. 5, e234 (2007).
Monroe, D. Genomic clues to DNA treasure sometimes lead nowhere. Science 325, 142–143 (2009).
Lareau, L. F., Inada, M., Green, R. E., Wengrod, J. C. & Brenner, S. E. Unproductive splicing of SR genes associated with highly conserved and ultraconserved DNA elements. Nature 446, 926–929 (2007).
Baer, C. F., Miyamoto, M. M. & Denver, D. R. Mutation rate variation in multicellular eukaryotes: causes and consequences. Nature Rev. Genet. 8, 619–631 (2007).
Guttman, M. et al. Chromatin signature reveals over a thousand highly conserved large non-coding RNAs in mammals. Nature 458, 223–227 (2009). A good example of the benefits of integrating comparative and functional analysis, which in this case led to the discovery of a new class of functional NCEs.
Khalil, A. M. et al. Many human large intergenic noncoding RNAs associate with chromatin-modifying complexes and affect gene expression. Proc. Natl Acad. Sci. USA 106, 11667–11672 (2009).
Clarke, J. et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnol. 4, 265–270 (2009).
Eid, J. et al. Real-time DNA sequencing from single polymerase molecules. Science 323, 133–138 (2009).
Du, J. et al. A supervised hidden markov model framework for efficiently segmenting tiling array data in transcriptional and ChIP–chip experiments: systematically incorporating validated biological knowledge. Bioinformatics 22, 3016–3024 (2006).
Geiss, G. K. et al. Direct multiplexed measurement of gene expression with color-coded probe pairs. Nature Biotech. 26, 317–325 (2008).
Dekker, J., Rippe, K., Dekker, M. & Kleckner, N. Capturing chromosome conformation. Science 295, 1306–1311 (2002).
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Fullwood, M. J. et al. An oestrogen-receptor-a-bound human chromatin interactome. Nature 462, 58–64 (2009).
Dostie, J. et al. Chromosome Conformation Capture Carbon Copy (5C): a massively parallel solution for mapping interactions between genomic elements. Genome Res. 16, 1299–1309 (2006).
Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).
Duan, Z. et al. A three-dimensional model of the yeast genome. Nature 465, 363–367 (2010). References 91 and 92 are two examples of the power of using long-distance connectivity data in the genome to map genome structure.
Clamp, M. et al. Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl Acad. Sci. USA 104, 19428–19433 (2007).
King, M. C. & Wilson, A. C. Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975).
Gregory, T. R. Synergy between sequence and size in large-scale genomics. Nature Rev. Genet. 6, 699–708 (2005).
Galgoczy, D. J. et al. Genomic dissection of the cell-type-specification circuit in Saccharomyces cerevisiae. Proc. Natl Acad. Sci. USA 101, 18069–18074 (2004).
Sulston, J. E., Schierenberg, E., White, J. G. & Thomson, J. N. The embryonic-cell lineage of the nematode Caenorhabditis elegans. Dev. Biol. 100, 64–119 (1983).
Vickaryous, M. K. & Hall, B. K. Human cell type diversity, evolution, development, and classification with special reference to cells derived from the neural crest. Biol. Rev. Camb. Philos. Soc. 81, 425–455 (2006).
Arendt, D. The evolution of cell types in animals: emerging principles from molecular studies. Nature Rev. Genet. 9, 868–882 (2008).
Schlotterer, C. & Tautz, D. Slippage synthesis of simple sequence DNA. Nucleic Acids Res. 20, 211–215 (1992).
Amor, D. J. & Choo, K. H. A. Neocentromeres: role in human disease, evolution, and centromere study. Am. J. Hum. Genet. 71, 695–714 (2002).
Vinces, M. D., Legendre, M., Caldara, M., Hagihara, M. & Verstrepen, K. J. Unstable tandem repeats in promoters confer transcriptional evolvability. Science 324, 1213–1216 (2009).
Mills, R. E., Bennett, E. A., Iskow, R. C. & Devine, S. E. Which transposable elements are active in the human genome? Trends Genet. 23, 183–191 (2007).
Zhang, Z., Frankish, A., Hunt, T., Harrow, J. & Gerstein, M. Identification and analysis of unitary pseudogenes: historic and contemporary gene losses in humans and other primates. Genome Biol. 11, R26 (2010).
Lagos-Quintana, M., Rauhut, R., Lendeckel, W. & Tuschl, T. Identification of novel genes coding for small expressed RNAs. Science 294, 853–858 (2001).
Lau, N. C., Lim, L. P., Weinstein, E. G. & Bartel, D. P. An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans. Science 294, 858–862 (2001).
Lee, R. C. & Ambros, V. An extensive class of small RNAs in Caenorhabditis elegans. Science 294, 862–864 (2001).
Brennecke, J. et al. Discrete small RNA-generating loci as master regulators of transposon activity in Drosophila. Cell 128, 1089–1103 (2007).
Carmell, M. A. et al. MIWI2 is essential for spermatogenesis and repression of transposons in the mouse male germline. Dev. Cell 12, 503–514 (2007).
Vaquerizas, J. M., Kummerfeld, S. K., Teichmann, S. A. & Luscombe, N. M. A census of human transcription factors: function, expression and evolution. Nature Rev. Genet. 10, 252–263 (2009). A useful synthesis of the current state of knowledge about human transcription factors.
Maston, G. A., Evans, S. K. & Green, M. R. Transcriptional regulatory elements in the human genome. Annu. Rev. Genomics Hum. Genet. 7, 29–59 (2006).
Bovee, D. et al. Closing gaps in the human genome with fosmid resources generated from multiple individuals. Nature Genet. 40, 96–101 (2008).
Kaiser, J. A plan to capture human diversity in 1000 genomes. Science 319, 395–395 (2008).
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, 2113–2144 (2007).
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nature Methods 6, 677–681 (2009).
Hormozdiari, F., Alkan, C., Eichler, E. E. & Sahinalp, S. C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
Lee, S., Hormozdiari, F., Alkan, C. & Brudno, M. MoDIL: detecting small indels from clone-end sequencing with mixtures of distributions. Nature Methods 6, 473–474 (2009).
Kidd, J. M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nature Methods 7, 365–371 (2010). The authors report the characterization of new insertion sequences relative to the human reference genome; this study is a useful addition to the field as it moves towards a series of reference genomes for sub-populations.
Lam, H. Y. K. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nature Biotech. 28, 47–55 (2010).
Li, R. Q. et al. Building the sequence map of the human pan-genome. Nature Biotech. 28, 57–63 (2010).
Griffiths-Jones, S., Saini, H. K., van Dongen, S. & Enright, A. J. miRBase: tools for microRNA genomics. Nucleic Acids Res. 36, D154–D158 (2008).
Iafrate, A. J. et al. Detection of large-scale variation in the human genome. Nature Genet. 36, 949–951 (2004).
Acknowledgements
The authors thank members of the Gerstein laboratory for helpful discussions and careful reading of the manuscript. We acknowledge support from the US NIH and from the Albert L. Williams Professorship funds.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Related links
Related links
FURTHER INFORMATION
Berkeley Drosophila Genome Project
Glossary
- Targeted exome sequencing
-
A technique that involves filtering genomic DNA by capturing regions of interest (often protein-coding exons) on a microarray, then sequencing the captured DNA using next-generation techniques.
- Structural variants
-
Chromosomal rearrangements (deletions, duplications, novel sequence insertions or inversions) that are inherited and polymorphic across the human population. Structural variants are by definition longer than SNPs and can be hundreds of thousands of base pairs long.
- Copy-number variants
-
Structural variants that arise from deletion or duplication and thus lead to a change in copy number of the underlying region of the genome.
- Segmental duplication
-
The operational definition of a segmental duplication rests on finding two regions in the same genome ranging in length from a thousand to several million nucleotides with at least 90% sequence identity. Segmental duplications are inherited but not necessarily polymorphic across the human population.
- Pseudogenes
-
Copies of protein-coding genes with mutations that disrupt their coding sequence and demolish their original protein-coding function.
- Syntenic blocks
-
Segments that align between genome sequences from two species and that are believed to define an orthologous relationship.
- DNA-based transposons
-
Transposable DNA elements that rely on a transposase enzyme to excise themselves from one region of the genome and insert themselves into a different region, without increasing in copy number.
- RNA-based retrotransposons
-
Transposable elements generated when reverse transcriptase enzymes copy RNA elements into DNA and insert the DNA copies back into the genome.
- Duplicated pseudogenes
-
Pseudogenes that result from whole-genome or segmental duplications, in which one copy maintains its ancestral function and the other copy degrades into a pseudogene.
- Processed pseudogenes
-
Pseudogenes that arise when the mRNA of a parent gene is retrotranscribed back into DNA and inserted into the genome.
- Unitary pseudogenes
-
A rare class of pseudogene in which a single-copy parent gene becomes non-functional.
- Chromatin immunoprecipitation
-
(ChIP.) A technique for identifying potential regulatory sequences that are bound by the protein of interest. Soluble DNA–chromatin extracts (complexes of DNA and protein) are isolated by using antibodies that recognize specific DNA-binding proteins. In ChIP–chip, the ChIP step is followed by microarray analysis, whereas in ChIP–seq, it is followed by sequencing.
- Tiling arrays
-
A class of microarray in which probes of a specific length and spacing provide uniform coverage of an entire genome or portion of a genome to a desired resolution.
- RNA sequencing
-
The use of high-throughput sequencing of RNA that has been reverse-transcribed into DNA to characterize the set of RNA transcripts produced by a cell.
- Smoothing
-
The process of filtering noise from a signal by removing fine-scale variation.
- Thresholding
-
The process of discretizing a continuous signal by choosing a signal value above which the signal is considered 'on' or 'active' and below which the signal is considered 'off' or 'inactive'.
- Segmenting
-
The result of thresholding in signal processing — that is, segments are those regions defined as 'on' or 'active' after discretization of the signal.
- Heterochromatin
-
Highly compact and therefore inactive regions of the genome. Largely composed of repetitive DNA, heterochromatin forms dark bands after Giemsa staining.
- Euchromatin
-
The lightly staining regions of the genome that are generally decondensed during interphase and contain transcriptionally active regions.
- Fosmid
-
A low-copy vector for the construction of stable genomic libraries that uses the Escherichia coli F-factor origin of replication. Each fosmid clone can store ∼40 kb of library DNA. Cloned sequences are more stable in fosmids than in high-copy vectors.
- Specificity
-
A measure of the proportion of true negatives correctly identified as such (for example, the percentage of healthy people who are identified as not having a disease).
- Regulatory forests
-
Regions of the genome that are enriched with binding sites for regulatory factors, such as transcription factors.
- Principal components analysis
-
A statistical method used to simplify data sets by transforming a series of correlated variables into a smaller number of uncorrelated factors.
- Non-allelic homologous recombination
-
Recombination between segmental duplications that leads to local duplication, deletion or inversion of genome sequence.
- Ultraconserved elements
-
Operationally defined as non-coding elements that are hundreds of base pairs long and 100% identical across human, mouse and rat genomes.
- Sensitivity
-
A measure of the proportion of true positives that are correctly identified as such (for example, the percentage of sick people who are identified as having a disease).
- Paired-end sequencing
-
Determination of the sequence at both ends of a fragment of DNA of known size.
- Chromosome conformation capture
-
A technique used to study the long-distance interactions between genomic regions, which in turn can be used to study the three-dimensional architecture of chromosomes within a cell nucleus.
Rights and permissions
About this article
Cite this article
Alexander, R., Fang, G., Rozowsky, J. et al. Annotating non-coding regions of the genome. Nat Rev Genet 11, 559–571 (2010). https://doi.org/10.1038/nrg2814
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrg2814
This article is cited by
-
Disulfidptosis-associated long non-coding RNA signature predicts the prognosis, tumor microenvironment, and immunotherapy and chemotherapy options in colon adenocarcinoma
Cancer Cell International (2023)
-
Prediction of lncRNA functions using deep neural networks based on multiple networks
BMC Genomics (2023)
-
Repetitive DNA sequence detection and its role in the human genome
Communications Biology (2023)
-
Non-coding RNA’s prevalence as biomarkers for prognostic, diagnostic, and clinical utility in breast cancer
Functional & Integrative Genomics (2023)
-
Accurate prediction of functional states of cis-regulatory modules reveals common epigenetic rules in humans and mice
BMC Biology (2022)