CanVAS.
15,451 dogs. 15 studies. 9.7 million variants.
One canine genome, made comparable.
Why CanVAS exists
Canine genomics has a sharing problem. Not the kind people think. The data is shared. The papers are published. The supplementary files are on bioRxiv. The problem is that none of it is comparable.
Fifteen different research groups, fifteen different genotyping arrays, fifteen different reference genomes, fifteen different probe naming conventions. A study from 2016 used CanFam3.1. A study from 2022 used CanFam4. One paper's "BICF2P1234567" is another paper's "BICF2P1234567_dup1" because the strand orientation flipped between chip versions. The Golden Retriever Lifetime Study cohort cannot be compared to the Hayward 2016 cohort without weeks of bioinformatics surgery, even though both are publicly available, even though both are answering related questions about the same species.
The result is that canine genetics, as a field, has been studying dogs in fifteen rooms with the doors locked. Every group analyzes its own cohort. Cross-cohort meta-analyses are rare because the lift is enormous. Rare-variant discovery, which requires sample sizes that no single cohort delivers, is largely impossible.
CanVAS is the work of unlocking those doors.
What harmonization actually means
Probe IDs were unified.
Every variant in every contributing cohort was matched to a canonical identifier so that the same locus appears under the same name across all fifteen datasets. This sounds mechanical until you realize that some cohorts used vendor probe IDs, some used custom annotations, and some used positional coordinates that drifted across reference assembly versions.
Strands were aligned.
DNA reads in two directions, and different chips report the same variant on opposite strands. CanVAS resolved every variant to a single strand convention so that an A/G call from one cohort means the same thing as an A/G call from another. Without this step, ancestry inference and population structure collapse into noise.
Everything was lifted to CanFam4.
The reference assembly for the dog genome has evolved. CanFam3.1 was the standard for a decade. CanFam4, released as UU_Cfam_GSD_1.0, is the current high-quality reference. CanVAS lifted every variant from every cohort onto CanFam4 coordinates so that any future study landing on the new assembly can compare cleanly.
Then everything was imputed.
After harmonization the typed backbone contained 77,215 SNPs that appeared in all fifteen cohorts. That is a usable number, but it is a thin sample of the canine genome. CanVAS imputed the typed backbone against the Dog10K whole-genome sequencing reference panel of 1,929 fully sequenced dogs, using Beagle 5.4, at DR2 of at least 0.3 and minor allele frequency of at least 0.01. The result is 9.67 million variants per dog, including roughly 3 million rare variants below 5% frequency that the typed arrays would never have caught.
Imputation is the multiplier. It is the move that turns 77,215 directly measured SNPs into 9.7 million inferred variants of comparable quality. It is also the move that lets a sparse, decade-old genotyping chip stand alongside a modern whole-genome study in the same analysis.
What 15,451 dogs at this density unlocks
Sample size and variant density compound. CanVAS gives the field both at the same time, for the first time.
The dog space CanVAS covers is wider than any prior single resource. 375 breeds. Village dog populations from multiple continents. Dingoes. Wolves. Coyotes. The aggregate is a population-scale view of the canid genome that resolves both within-breed substructure and the relationships between breeds, breed groups, and the wild canids that share their ancestry.
Rare variants change the questions the field can ask. Most genome-wide association studies in dogs have been powered for common variants because that is what the chips capture. The 3 million rare variants CanVAS recovers through imputation move parts of canine GWAS into the same statistical regime as modern human genomics, where rare variants with strong effects are the frontier of disease genetics.
The Brundage team validated CanVAS by recovering known population structure and genome-wide runs of homozygosity, both of which behave as expected if the harmonization and imputation are correct. The atlas reproduces breed-level inbreeding patterns that match the existing literature. The infrastructure works.
What CanVAS contributes to the Sniff Atlas
Every dog in the Sniff Atlas is a CanVAS dog. Without CanVAS there is no Sniff Atlas.
The 14,478 dogs visible in the Sniff Atlas today are the subset of CanVAS that passes our additional quality control. From those 14,478 dogs across 342 breeds, the entire layout of the atlas is computed. The PCA-256 embedding that defines genetic similarity. The 3D UMAP projection that gives every dog a spatial address. The per-breed centroids that anchor the breed-isolation view. The neighborhood placements that tell a newly uploaded dog which research dogs they sit closest to. All of it derives from CanVAS genotypes.
When the atlas adds a new study or cohort in the future, it will either be by ingesting an updated CanVAS release or by harmonizing a new source to the same reference. The Sniff Atlas is downstream of CanVAS. That is by design. The science has to live on a single shared substrate or the atlas stops being trustworthy.
CanVAS was assembled by David M. Brundage and colleagues. The work draws on fifteen publicly available canine genotyping studies whose investigators built the cohorts that made the integration possible. The Dog10K whole-genome sequencing consortium contributed the 1,929-dog reference panel that powered the imputation. The UU_Cfam_GSD_1.0 assembly team produced the CanFam4 reference that the atlas is anchored to. Beagle 5.4 by Browning and Browning did the imputation.
Sniff is not affiliated with the CanVAS authors, the contributing studies, the Dog10K consortium, or the CanFam4 team. We are grateful for all of it. The science belongs to everyone. We are doing our best to make sure it is seen.
Read the CanVAS preprint on bioRxiv ↗
CanVAS code and data on GitHub ↗
Citation: Brundage DM et al. (2026). CanVAS: A Harmonized and Imputed Canine Variant Atlas. doi:10.64898/2026.04.13.718238