Last time, we discussed a brief historical account of the Y chromosome research-wise. Initially up to 17 traits were attributed to the chromosome but Stern and later Ohno argued that it was instead largely devoid of gene content. While they were mostly right, and we will see why, that view is now a bit simplistic and this is what the series aims to correct. This post will now provide a brief comparison of the X and Y chromosomes (in humans) and lay out some of the chromosome’s basic architecture.
As you can see in Figure 1, the Y chromosome (on the right) is puny and diminutive. It really is kind of pathetic once you look at it. The numbers also reflect the physical discrepancy between the two chromosomes (Table 1) – the Y chromosome is roughly 29% the size of the X in sheer base length, an even smaller 15% in euchromatin length, and has less than 10% of the gene number! Keep in mind that the Y used to be the same size as the X (just like any other homologous pair of chromosomes)!
So let’s discuss what little there is.
Much of the Y chromosome was ignored during the Human Genome Project because it is chock full of repetitive sequences, transposons, and other sequences that required a more detailed analysis than what was done in 2000. In 2003, Skaletsky, et al. (from the Page lab at Whitehead) published their full Y sequence and an initial analysis of what they found.
The major discovery was that the Y was composed of a “mosaic of discrete sequence classes.” Instead of being a long single sequence of DNA, evolution had shaped various parts of the chromosome differently leaving distinct classes with their own defining characteristics!
Pseudoautosomal regions. The pseudoautosomal regions (PARs) are a historically recognized class, are located on either end of the Y, and are the only parts to regularly recombine with the X chromosome. Over 25 genes have been identified in these regions (Ross et al., 2005) and are not included in the 78-count above. Aside from a possible later post, the PARs will be ignored.
Because the rest of the Y chromosome is assumed to not recombine with the X, Skaletsky et al. (2003) call the majority of the Y the male-specific region of the Y, or MSY. We will see later how the Y chromosome and recombination is actually much more complicated!
X-transposed class. Some time since human-chimp divergence in the last six million years, a 3.4Mb X-to-Y transposition occurred in the human lineage. Since the event, an inversion has split the region into two. This region is functionally inert aside from housing the two genes that traveled from the X chromosome. Although it still has a high sequence similarity to its origin, Xq21, the region does not participate in recombination (Skaletsky et al., 2003).
X-degenerate class. Over twice as large as the X-transposed region (8.6Mb), the X-degenerate region is actually split into eight distinct blocks across both arms of the Y. The region contains only 13 single-copy genes and 14 single-copy pseudogenes that are homologous to the X with nucleotide similarities ranging from 60-96% (Skaletsky et al., 2003). Furthermore, all of the Y’s “ubiquitously expressed” (non-testis-specific) genes are housed within the region. In other words, the region is like a decaying X (hence the name) and reflects the chromosomes’ autosomal histories. Interestingly, although the X-degenerate region contains all of the non-testes-specific genes, it also contains the SRY, the sex-determining factor.
Ampliconic class. The final sequence class, the ampliconic, is more complex than the previous two classes as it contains more genes and has stranger architecture. The 10.2Mb class is broken into seven segments and contains the highest density of genes on the MSY. An amplicon is a generic term to group together the highly repetitive MSY-specific units. To identify these amplicons, Skaletsky et al. compared a 50kb sliding window to the rest of the euchromatic sequences in 1kb steps and any window that showed over 50% similarity to another sequence was deemed an amplicon (blue regions in Figure 3). Although this seems arbitrary, 60% of the region shows over 99.9% similarity (Skaletsky et al., 2003).
The reason for these high sequence similarities is that the ampliconic region is mostly composed of eight large palindromes spanning 5.7Mb (or roughly 25% of the MSY) across the long arm.The two arms of the palindromes display arm-to-arm sequence similarities between 99.94 and 99.997%. The largest palindrome, P1, is a staggering 2.9Mb long and also contains two 24kb palindromes within itself. Each palindrome has a spacer in its center which ranges from 2-170kb which form hairpins in gene conversion (which we will discuss in a later post – it’s really cool!). Six of these palindromes contain protein-coding genes with at least two copies per gene (one on each arm). Of the nine gene families, six are exclusive to the palindromes.
As for gene content, of the MSY’s 156 transcription units, 135 of them are found here, and unlike the X-degenerate region, the genes are testes-specific and found in multiple copies (which are referred to as gene families). The 60 protein-coding genes are within nine families with copy numbers mostly ranging from two to six copies (with TSPY having 35 copies). Skaletsky et al. note, however, that due to the highly repetitive nature of this region, the copy numbers may vary from individual to individual.
Instead of palindromes, the short arm of the Y chromosome contains what Skaletsky et al. call “transcriptionally active tandem arrays,” or copies of transcription units found in a row. TSPY (testis-specific protein Y) is a 20.4kb repeat unit found in 35 copies in a row which makes the array about 700kb long. Interestingly, while one strand codes for TSPY, the other side codes for a previously unidentified transcription unit called CYorf16 whose function is still unknown – one sequence codes a protein, the reverse sequence codes a transcription factor. This 35-unit cluster is the largest protein-coding tandem array identified so far in the human genome (Skaletsky et al., 2003). Additionally, another tandem array of non-coding transcription units called TTTYn is approximately 622kb long.
Interspersed elements. The MSY is also full of interspersed repeat elements – approximately 47% (3% higher than the genomic average (Table 3)). However, the density of repeat elements is roughly 9% lower in the ampliconic region than the rest of the genome. The X-transposed region itself is 60% interspersed repeats. We will perhaps discuss this in a later post – my understanding of these things is rather limited and will require me to read up some more.
The work by Skaletsky et al. (2003) shows that the Y chromosome is much more complicated than previously thought. This post was a bit longer and more technical than I really wanted, but laying down this background is important for the rest of the series. Just remember these key facts:
1) There are several sequence classes found on the Y chromosome.
2) The X-transposed region is a relativly inert section that was transposed from the X to the Y.
3) The X-degenerate region is the remnants of a decaying X chromosome. It contains single-copy genes that are expressed throughout the body (as expected from X chromosome genes). The region also contains SRY.
4) The ampliconic region shows high sequence similarities due to most of it being composed of massive palindromes. The genes within this region are found in multiple copies and are testes-specific.
The next post will highlight the X-degenerate region – it is much more interesting than what was conveyed in this post. After that we will focus on the ampliconic region and those huge palindromes and how it questions the idea of a non-recombining Y. Don’t worry; the best stuff is yet to come!
Mark T. Ross, Darren V. Grafham, Alison J. Coffey, Steven Scherer, Stephan Beck, Jane Rogers, & David R. Bentley (2005). The DNA sequence of the human X chromosome Nature, 434, 325-337 : 10.1038/nature03440
Skaletsky H, Kuroda-Kawaguchi T, Minx PJ, Cordum HS, Hillier L, Brown LG, Repping S, Pyntikova T, Ali J, Bieri T, Chinwalla A, Delehaunty A, Delehaunty K, Du H, Fewell G, Fulton L, Fulton R, Graves T, Hou SF, Latrielle P, Leonard S, Mardis E, Maupin R, McPherson J, Miner T, Nash W, Nguyen C, Ozersky P, Pepin K, Rock S, Rohlfing T, Scott K, Schultz B, Strong C, Tin-Wollam A, Yang SP, Waterston RH, Wilson RK, Rozen S, & Page DC (2003). The male-specific region of the human Y chromosome is a mosaic of discrete sequence classes. Nature, 423 (6942), 825-37 PMID: 12815422