To better understand the process of transformation and its consequences, we created a genome-wide map of transformation frequencies by sequencing the pooled genomes of 10,000 Haemophilus influenzae transformants. The donor strain 86-028NP differs from the recipient strain Rd at ~40,000 SNPs and ~900 indels and rearrangements, allowing sequencing at ~20,000-fold coverage to identify the rare donor-derived reads.
The resulting transformation profiles show dramatic variations in transformation frequency. These ranged from 10% down to the error baseline. Apparent hotspots and coldspots are seen at identical positions in two independent datasets (one selected for NovR and one for NalR; R2 = ~0.8 for overall concordance), indicating the reproducibility of the assay. Abrupt (5-fold) changes in transformation rate are seen over distances as short as 2 or 3 kb.
We next examined whether parameters known to affect transformation (USS and sequence divergence) can account for the variation, using linear modeling in R to fit the distance to the nearest USS, the number of flanking SNPs within ±500 bp, and the distance to the nearest indel/rearrangement (Fig. 8). All had significant but small correlations that together explained only about 11% of the variation. Interaction effects were not significant; neither were GC composition or sequencing depth. Other sequence-dependent factors must be responsible for most variation in transformation.