Species-specific expansions of gene duplicates foster adaptation, genetic innovation, and phenotypic diversification. While it is recognized that events during their early evolutionary history are important for determining if a gene duplicate will be retained, lost, or become nonfunctional, models of gene family evolution have mostly been built from studies of relatively ancient gene families. Therefore, how genes overcome the immediate consequences of duplication, i.e., dosage increase, and accumulate the molecular diversity required for novel functions, while also being impacted by molecular mechanisms such as gene conversion, and evolutionary forces such as genetic drift and selection along the path to fixation, remains largely uncharacterized.My goal was to characterize the functional evolution of a young tandem gene expansion found only in Drosophila melanogaster: Sperm-specific dynein intermediate chain (Sdic). I aimed to accurately reconstruct the Sdic region at the structural and sequence level while obtaining accurate information about sequence diversity among the Sdic paralogs in different strains from different geographical origins (Chapters 1 & 2); investigate the extent of Sdic copy number variation (CNV) (Chapter 2) while examining the relationship between Sdic copy number and total Sdic expression (Chapters 2 & 3); and probe the divergence of different expression attributes among Sdic paralogs within and between strains while gauging the impact of cis and trans regulatory variation (Chapters 1 & 3).
Through my research, I established the correct structure of the Sdic region in the D. melanogaster reference genome using raw long read sequences and showed the Sdic paralogs exhibit variable expression in both abundance and breadth using qRT-PCR and RNA-seq. I generated a precise portrait of Sdic copy number variation using reference-quality genome annotations, qPCR, and read-depth methods. Only one Sdic paralog is fixed across populations and there is no evidence of pseudogenization among paralogs. While artificially doubling copy number within the same genomic background increased male expression over two-fold, I observed no correlation between copy number and total Sdic expression across natural populations, suggesting differential regulatory modifiers likely play key roles in shaping Sdic expression. Further, I used RNA-seq to quantify Sdic expression in testes from populations with Sdic CNV, as well as testis, heads, and accessory glands from males with identical genomes except for different Y chromosomes. In testis, I found clear evidence of variable expression among Sdic paralogs and a positive correlation between Sdic CNV and expression. The Y chromosome seems to impact total expression of Sdic in accessory glands but not testes or heads.
My dissertation represents a rare interpopulation characterization of a species-specific multigene family at the sequence, structural, and functional levels. Sdic epitomizes how quickly a tandem multigene family can functionally diversify at both the coding and regulatory levels, even in the face of gene conversion. Beyond maintaining a minimally optimal expression level, the presence of Sdic duplicates appears to act as a catalyst for generating protein and regulatory diversity, showcasing a possible evolutionary path that novel gene functions can follow toward long-term consolidation within eukaryotic genomes.