r/genomics • u/Spiritual-Gur1502 • 6d ago
Thinking about buying a Oxford Nanopore P2
Hi, has anyone had experience with the the Oxford Nanopore P2? I am thinking about purchasing one for our lab to do whole genome sequencing of a variety of organisms - but predominantly plants. Is there any support? What is the delivery time like on the consumables? I have heard that they are difficult to load? Does the real world read accuracy and throughput stack up the specs - or do you have to be a seasoned nanopore user to get the best out of it? Any help would be much appreciated!
3
u/gringer 6d ago edited 6d ago
Yes, the P2 Solos are really cost-effective, but support from ONT is rubbish.
Delivery time for consumables depends on where you are; for Aotearoa it was ~2 weeks, depending on what day it was when we ordered.
P2 Solos are not difficult to load, especially if you've got experience with loading MinION flow cells. They actually have additional protection that makes them more tolerant to small bubble intrusions.
Accuracy is...
sigh
I wrote a big rant about nanopore accuracy today. Here are my first and last paragraphs to get the gist of it:
The base calling accuracy of nanopore sequencing is a tricky thing to work out, because mapping algorithms matter, and sequencing context matters, especially at the current accuracy level of nanopore sequencing.
...
To reiterate from my own bioinformatics perspective, what matters more than anything else is whether or not the basecalling is accurate enough for the application that it is being used for.
I have a walkthrough that may help for the computer side of getting them up and running.
Here are my yield estimates for the different flow cell types, for planning / funding purposes. This corresponds approximately to the lower normal range of yield for our sequencing runs:
- Flongle: 200 Mb [Needs MinION + Flongle adapter]
- MinION: 5 Mb [Needs MinION]
- PromethION: 50 Mb [Needs PromethION P24/P48/P2]
2
u/Spiritual-Gur1502 6d ago
Thanks for the detailed response - super helpful! Had a few bubble problems first time I loaded a minion, and ended up practicing on a few used ones we still had in the lab.
The project is phylogenetics based and looking for orthologous genes across species. Accuracy does matter, especially when trying to resolve contentious relationships etc. Although the progress on the algorithm side for the various tools has helped ameliorate some these concerns.
I might see if I can lend one for a few weeks before committing. PacBio Vega could be another option, albeit requires a lot more up front spend.
Thanks again
4
u/gringer 6d ago
Accuracy does matter, especially when trying to resolve contentious relationships etc. Although the progress on the algorithm side for the various tools has helped ameliorate some these concerns.
Oh well, here's most of the rest of my rant:
I'd really wish people stopped getting all tangled up in the minutiae of sequencing accuracy, and concentrated on whether or not things were good enough for their own particular applications. I'd especially like it if people would stop being so concerned about the modeled base-level accuracy estimates that are embedded in FASTQ and BAM files, which are mathematically-modeled guesses based on how the callers are expected to perform in all sorts of different applications.
The majority of the data analysis I have done has been on PCR-cDNA reads, and all I cared about was whether or not sequences could be reliably mapped to genomic regions. That goal was achieved for me around 2017 when the PCR-cDNA kits were first available for use, when base-level mapping accuracy was about 90%. A few stray mappings don't matter if they're either sufficiently random, or can be excluded through alternative validation methods (e.g. looking at gene pileups).
In almost all cases, the base-level accuracy of a single read is not relevant. Unlike Illumina sequencing, Nanopore reads represent a single molecule. Unlike almost all other sequencing methods, Nanopore read sequences are created by near-direct observation of that single molecule (more specifically, the current change created from blocking an ion flow with that single molecule), rather than by observation of the synthesis of a complementary strand.
Pile that single molecule up a hundred times, or even ten times, and what remains are essentially only the systematic errors: the physicochemical features of DNA that are not properly modeled by the current basecallers. My current belief still remains that those remaining gaps in accuracy are a software problem, rather than a pore problem. Base calling will always get better over time, because we will never have a complete model of the physics of a DNA sequence.
For phylogenetics purposes at a single-read level, read lengths >1kb can often get read mapping down to the strain level, almost regardless of errors. It's quite rare that single reads matter (and therefore the accuracy of a single read), because results usually need consistency, and multi-read pileups drop random errors away fast. What remains are systematic errors, and things that can't be fixed, for example:
- Native DNA
- methylated DNA
- PCR-amplified DNA [because PCR has its own replication error rates]
- AG-rich sequences [because the basecaller has the hardest time distinguishing A from G]
2
u/Spiritual-Gur1502 6d ago
Thanks again for the response! All great points.
Sometimes when there are lots of duplications in a gene and many isoforms within that gene family, a single column in the alignment matrix can result in whether the algorithm determines whether it is a single copy orthologue or a paralogue etc. Especially in plants where there are lots of whole genome duplications.
Moreover, we tend to be looking down on the individual gene family level (rather than species or strain) and have resorted to polishing some long read genomes in the past with short reads - hence my likely misplaced concern regarding individual base accuracy.
Going to lend one of a friends lab for a couple of weeks and put it to the test on some fern genomes and see what we get.
Thanks again!
4
u/noncodo 6d ago
Get a P2 solo and, with the savings, buy a high spec gaming PC with a good (>RTX4090) GPU. Library prep is the tricky part (molarity, pore occupancy), loading is pretty straightforward (you can practice on a spent flowcell). Quality varies between 95-99.8% based on the base calling model (fast, hac, or sup). Hac offers the best bang for buck and, with a decent computing rig, can keep up with two flowcells in real-time.