« Home « Kết quả tìm kiếm

Characterization of a haplotype-reference panel for genotyping by low-pass sequencing in Swiss Large White pigs


Tóm tắt Xem thử

- Using this approach, we selected 70 key ancestors from two lines of the Swiss Large White breed that have been selected divergently for fertility and fattening traits and sequenced their genomes with short paired-end reads..
- Results: Using pedigree records, we estimated the effective population size of the dam and sire line to 72 and 44, respectively.
- The boars explained 87.95 and 95.35% of the genetic diversity of the breeding populations of the dam and sire line, respectively.
- Genomic inbreeding quantified using runs of homozygosity was higher in the sire than dam line (0.28 vs 0.26).
- We used the sequenced haplotypes of the 70 key ancestors as a reference panel to call genotypes in 175 pigs that had been sequenced at very low coverage (1.11-fold) using the GLIMPSE software.
- Conclusions: We assessed genetic diversity within and between two lines of the Swiss Large White pig breed.
- The sequenced haplotypes of the key ancestor animals enabled us to implement genotyping by low-pass sequencing which offers an intriguing cost-effective approach to increase the variant density over current array-based genotyping by more than 350-fold..
- 1 Animal Genomics, ETH Zürich, Eschikon 27, 8315 Lindau, Switzerland Full list of author information is available at the end of the article.
- Swiss pig production relies on maternal and paternal Swiss Large White (SLW) lines at the top level of the breeding pyra- mid.
- Approximately 32.5 and 30% of the genes of 2.5 million fattening pigs slaughtered in 2020 in Switzerland originate from the dam and sire line, respectively [1].
- The SLW breeding boars are selected based on genome-based breeding values that are predicted using genotypes obtained with a customized version of the Illumina PorcineSNP60 BeadChip.
- Apart from a small number of putatively causal variants that are included in the custom part, the content of the currently used microarray was designed in a way that it is useful for mainstream breeds [2].
- However, the genetic constitu- tion of the SLW breed beyond the microarray-derived SNP remains largely unknown.
- The genomes of key ancestor individuals maximally rep- resent the genetic diversity of the target population [3, 4].
- This approach utilises a sequenced haplo- type reference panel that represents the diversity of the target population.
- Sequence variant genotypes of animals sequenced at very shallow coverage are then inferred conditional on the observed haplotypes of the reference panel.
- Using the haplotypes of the key ancestor animals as a reference panel, we accurately genotype more than 22 million vari- ants in animals that have been sequenced at low coverage..
- Using pedigree records, the average inbreeding coeffi- cients of the active breeding animals of the sire and dam line were and respectively.
- Based on these values and the inbreeding coefficients of the parents, we estimated the effective population size of the sire and dam line of the Swiss Large White (SLW) breed to 44 and 72, respectively.
- Of the 70 boars, 38 and 32 represent the sire and dam line, respectively, explain- ing 95.35 and 87.95% of the genetic diversity of the ac- tive breeding populations..
- Following quality control (removal of adapter se- quences, reads and bases of low sequencing quality), be- tween 81.15 and 377.01 million read pairs (2 × 150 bp) per sample (mean million read pairs) were aligned to the SSC11.1 assembly of the porcine genome.
- 10 and SAM bitwise flag 1796 were not considered), the average sequencing coverage of the 70 boars was fold across all auto- somes.
- Raw sequence read data of 70 pigs have been de- posited at the European Nucleotide Archive (ENA) of the EMBL at BioProject PRJEB38156 and PRJEB39374..
- Of the variants and 1,594,775 were fixed for the alternate allele in the dam and sire line, respectively.
- Of the 54,600 SNP, 6376 and 1029 were fixed for the reference and alternate allele, respectively, and 47, 195 were polymorphic in the array-called genotypes of the 68 pigs..
- Of the 48,224 SNP that were either polymorphic or fixed for the alternate allele in the array-called geno- types and were also present in the raw and filtered sequence variants, re- spectively.
- 1232 SNP of the Illumina PorcineSNP60 BeadChip complement were missing in the sequenced set because they were either genotyped as INDEL or multiallelic sites using GATK and thus excluded from the comparison due to incompatible alleles.
- 983 and 1041 SNP were not among the raw and filtered sequence variants, respectively, although the frequency of the minor allele was >.
- 5% in the array-called genotypes for most (>.
- Beagle phasing and imputation further increased the concordance and non-reference sensitivity as well as decreased the non-reference dis- crepancy of the filtered sequence variant genotypes..
- The first principal component of the genomic relation- ship matrix explained 8.61% of the variation and sepa- rated the animals by lines (Fig.
- The second principal component explaining 2.68% of the variation revealed variability within the sire line.
- b Plot of the first two principal components showing the separation of animals by breed and the relationship between both lines.
- The genomic in- breeding (F ROH , i.e., the fraction of the autosomal gen- ome covered by ROH), was and for the dam and sire line, respectively.
- However, F ROH was higher for long ROH in the sire line..
- 4,038,170 INDEL) variants, including 2,567,754 variants that were not detected in the sire line..
- In 38 boars of the sire line, we annotated .
- 4,009,043 INDEL) variants, including that were not detected in the dam line.
- In total, 2.96% (dam line) and 2.94% (sire line) of the variants were in exons.
- A duplication of the KIT gene and a splice site variant in intron 17 of the KIT gene are associated with the dominant white phenotype [23, 24].
- 2 Genomic inbreeding in the two lines.
- The splice variant segregated at a frequency of 0.49 and 0.42 in the sire and dam line, respectively.
- The number and length of candidate selection regions was higher in the dam than the sire line (14 vs.
- We detected 14 and 16 candidate re- gions of selection in the dam and sire line, respectively, encompassing 28.5 Mb and 32.5 Mb.
- Considering both statistics, we detected more signa- tures of selection in the dam than sire line (28 vs.
- On average, 54% of the reference nucleotides were covered with at least one read.
- [8], we utilized the haplotypes of the 70 sequenced key ancestor animals as a reference panel to call geno- types at polymorphic sites in the 175 low- pass sequenced samples..
- Of the 54, 600 SNP, 6176 and 965 were fixed for the reference and alternate allele, respectively, in the 175 pigs according to the array-called genotypes.
- When the sequence vari- ant calling of the 175 samples was performed together with the 70 key ancestor animals using the multi-sample approach implemented in the GATK, all concordance metrics were considerably worse.
- 3 Signatures of selection detected in the sire and dam line of the SLW breed.
- Signatures of selection detected in the sire and dam line of the SLW breed using CLR (a) and iHS (b) Dotted lines indicate the empirical 0.5 (CLR) and 0.1% (iHS) thresholds.
- Blue, orange and grey vertical bars highlight signatures of selection detected in the sire, dam and both lines, respectively.
- We constructed genomic relationship matrices (GRM) from the microarray-derived and GLIMPSE- imputed genotypes of the 175 sequenced pigs based on a subset of 44,268 SNP that were detected at minor allele frequency greater than 0.01 in both data- sets.
- Both the off-diagonal and the diagonal elements of the GRM constructed from array-derived genotypes had greater variance (σ 2 diag σ 2 off than corresponding elements of the GRM.
- While the correlation of the off-diagonal (r = 0.99) and diagonal (r = 0.96) el- ements was high between both GRMs, the values of the diagonal elements were higher for all samples using the GLIMPSE-imputed than microarray-derived genotypes (Fig.
- The average value of the diagonal elements of the GRM was and for the microarray- and low-pass sequencing-derived genotypes, respectively.
- On aver- age, the 175 boars were homozygous for and of the 44,268 SNP when the genotypes were called from the microarray and low- pass sequencing data, respectively..
- We applied a key ancestor animal approach to prioritize 38 and 32 boars that accounted for 95.35 and 87.95% of the genetic diversity of the SLW sire and dam line, re- spectively.
- The contributions of the SLW key ancestor animals to the current populations are considerably higher than reported for other populations.
- For instance, 43 key ancestor animals explained 69% of the genetic di- versity of the Fleckvieh cattle population [5].
- [30] selected 41 and 55 key contributors, respectively, that explained 78 and 75% of the genetic re- lationship structure of the Swiss Franches-Montagnes horse and Australian Holstein-Friesian cattle population..
- The effective population size of the SLW sire and dam line is 44 and 72, respectively, which is less than half the effective population size of the Fleckvieh cattle and Swiss Franches-Montagnes horse population [31, 32].
- Thus, a few animals that are selected based on their marginal genetic contribution to the active breeding population, account for a large fraction of the population’s haplotype diversity.
- In spite of the low effective population size, the nucleotide diversity (π) was high in both lines (π dam .
- Although our sequencing cohort contained more ani- mals from the sire line, we detected somewhat more autosomal variants in the dam line (N sire .
- While the average number of het- erozygous variants detected per animal was higher in the dam line (N sire .
- N dam the num- ber of variants homozygous for the alternate allele was higher in the sire line (N sire .
- 2 Mb) suggests that recent inbreeding is higher in the sire than the dam line.
- For in- stance, a recessive sperm defect has recently been dis- covered in the sire line [39].
- The principal components of a genomic relationship matrix constructed from whole-genome sequence vari- ants revealed a separation of the animals by line.
- However, the diagonal ele- ments of the genomic relationship matrix were higher and had less variance using the genotypes from low-pass sequencing than microarray genotyping, likely because the sequenced key ancestor animals do not represent the full haplotype diversity of the SLW populations which precludes the imputation of rarer sites that predomin- antly occur in the heterozygous state.
- While a subset of the 22.62 million variants obtained is sufficient to accurately predict genomic breeding values, the full variant cata- logue, once available for a large mapping cohort, will fa- cilitate powerful genome-wide association studies at nucleotide resolution..
- The effective population size of the sire and dam line was estimated based on the differ- ence in pedigree-derived inbreeding coefficients between active breeding animals and their parents following eq..
- Alignment quality, read mapping and depth of coverage We used the fastp software [61] to remove adapter se- quences and reads that had Phred-scaled quality less than 15 for more than 15% of the bases.
- Subsequently, the filtered reads were aligned to the SSC11.1 assembly of the porcine genome [62] using the mem-algorithm of the BWA software [63].
- We used the BaseRecalibrator module of the Genome Analysis Toolkit (GATK - version 4.1.0 [19.
- Subsequently, we ap- plied the VariantFiltration module of the GATK accord- ing to best practice recommendations for site-level hard filtration to retain high-quality variants.
- We converted the TOP/BOT alleles of the microarray- derived genotypes to REF/ALT allele coding to make them compatible with the sequence-derived genotypes..
- Functional consequences of the variants (including SIFT scores [21] for missense variants) were predicted with the Ensembl Variant Effect Predictor (VEP, version 91.3 [20.
- Population structure and genetic diversity analysis The structure of the two lines was investigated using ADMIXTURE (v1.3.0 [69.
- The principal components of the genomic relationship matrix were calculated using the GCTA (version 1.92.1 [71.
- Empirical significance thresholds were chosen after visual inspection of the distribution of the test statistics (0.1% in iHS and 0.5% in CLR).
- Genes overlapping with candidate signatures of selections were determined based on the Ensembl (release 98) annota- tion of the porcine genome..
- The average copy number was 2.07 and 2.19 in the dam and sire line, respectively.
- DUP2 is 4.3 kb long and upstream, while DUP3 and DUP4 are 23 kb and 4.3 kb duplications downstream of the KIT gene..
- Plot of the first two principal compo- nents showing the relationship of 96 dam and 96 sire animals sequenced at low (<.
- Read and approved the final version of the manuscript: all authors..
- The funding body was not involved in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript..
- Raw sequencing read data of all key ancestor animals are available at the European Nucleotide Archive (ENA) (http://www.ebi.ac.uk/ena) of the EMBL at BioProject PRJEB38156 and PRJEB39374..
- HP is a member of the editorial board of BMC Genomics.
- Assessment of the genomic variation in a cattle population by re- sequencing of key animals at low to medium coverage.
- Evaluation of the accuracy of imputed sequence variant genotypes and their utility for causal variant detection in cattle.
- Molecular basis for the dominant white phenotype in the domestic pig.
- Strong signatures of selection in the domestic pig genome.
- fucosyltransferase activity of the pig FUT1 enzyme determines susceptibility of small intestinal epithelium to Escherichia coli F18 adhesion..
- Analysis of pedigree and conformation data to explain genetic variability of the horse breed Franches-Montagnes.
- Imputation of high-density genotypes in the Fleckvieh cattle population.
- A high density recombination map of the pig reveals a correlation between sex- specific recombination and GC content.
- A map of recent positive selection in the human genome.
- rehh 2.0: a reimplementation of the R package rehh to detect positive selection from haplotype structure

Xem thử không khả dụng, vui lòng xem tại trang nguồn
hoặc xem Tóm tắt