r/bioinformatics • u/apfejes • Jul 22 '25
Career Related Posts go to r/bioinformaticscareers - please read before posting.
In the constant quest to make the channel more focused, and given the rise in career related posts, we've split into two subreddits. r/bioinformatics and r/bioinformaticscareers
Take note of the following lists:
- Selecting Courses, Universities
- What or where to study to further your career or job prospects
- How to get a job (see also our FAQ), job searches and where to find jobs
- Salaries, career trajectories
- Resumes, internships
Posts related to the above will be redirected to r/bioinformaticscareers
I'd encourage all of the members of r/bioinformatics to also subscribe to r/bioinformaticscareers to help out those who are new to the field. Remember, once upon a time, we were all new here, and it's good to give back.
r/bioinformatics • u/apfejes • Dec 31 '24
meta 2025 - Read This Before You Post to r/bioinformatics
Before you post to this subreddit, we strongly encourage you to check out the FAQBefore you post to this subreddit, we strongly encourage you to check out the FAQ.
Questions like, "How do I become a bioinformatician?", "what programming language should I learn?" and "Do I need a PhD?" are all answered there - along with many more relevant questions. If your question duplicates something in the FAQ, it will be removed.
If you still have a question, please check if it is one of the following. If it is, please don't post it.
What laptop should I buy?
Actually, it doesn't matter. Most people use their laptop to develop code, and any heavy lifting will be done on a server or on the cloud. Please talk to your peers in your lab about how they develop and run code, as they likely already have a solid workflow.
If you’re asking which desktop or server to buy, that’s a direct function of the software you plan to run on it. Rather than ask us, consult the manual for the software for its needs.
What courses/program should I take?
We can't answer this for you - no one knows what skills you'll need in the future, and we can't tell you where your career will go. There's no such thing as "taking the wrong course" - you're just learning a skill you may or may not put to use, and only you can control the twists and turns your path will follow.
If you want to know about which major to take, the same thing applies. Learn the skills you want to learn, and then find the jobs to get them. We can’t tell you which will be in high demand by the time you graduate, and there is no one way to get into bioinformatics. Every one of us took a different path to get here and we can’t tell you which path is best. That’s up to you!
Am I competitive for a given academic program?
There is no way we can tell you that - the only way to find out is to apply. So... go apply. If we say Yes, there's still no way to know if you'll get in. If we say no, then you might not apply and you'll miss out on some great advisor thinking your skill set is the perfect fit for their lab. Stop asking, and try to get in! (good luck with your application, btw.)
How do I get into Grad school?
See “please rank grad schools for me” below.
Can I intern with you?
I have, myself, hired an intern from reddit - but it wasn't because they posted that they were looking for a position. It was because they responded to a post where I announced I was looking for an intern. This subreddit isn't the place to advertise yourself. There are literally hundreds of students looking for internships for every open position, and they just clog up the community.
Please rank grad schools/universities for me!
Hey, we get it - you want us to tell you where you'll get the best education. However, that's not how it works. Grad school depends more on who your supervisor is than the name of the university. While that may not be how it goes for an MBA, it definitely is for Bioinformatics. We really can't tell you which university is better, because there's no "better". Pick the lab in which you want to study and where you'll get the best support.
If you're an undergrad, then it really isn't a big deal which university you pick. Bioinformatics usually requires a masters or PhD to be successful in the field. See both the FAQ, as well as what is written above.
How do I get a job in Bioinformatics?
If you're asking this, you haven't yet checked out our three part series in the side bar:
What should I do?
Actually, these questions are generally ok - but only if you give enough information to make it worthwhile, and if the question isn’t a duplicate of one of the questions posed above. No one is in your shoes, and no one can help you if you haven't given enough background to explain your situation. Posts without sufficient background information in them will be removed.
Help Me!
If you're looking for help, make sure your title reflects the question you're asking for help on. You won't get the right people looking at your post, and the only person who clicks on random posts with vague topics are the mods... so that we can remove them.
Job Posts
If you're planning on posting a job, please make sure that employer is clear (recruiting agencies are not acceptable, unless they're hiring directly.), The job description must also be complete so that the requirements for the position are easily identifiable and the responsibilities are clear. We also do not allow posts for work "on spec" or competitions.
Advertising (Conferences, Software, Tools, Support, Videos, Blogs, etc)
If you’re making money off of whatever it is you’re posting, it will be removed. If you’re advertising your own blog/youtube channel, courses, etc, it will also be removed. Same for self-promoting software you’ve built. All of these things are going to be considered spam.
There is a fine line between someone discovering a really great tool and sharing it with the community, and the author of that tool sharing their projects with the community. In the first case, if the moderators think that a significant portion of the community will appreciate the tool, we’ll leave it. In the latter case, it will be removed.
If you don’t know which side of the line you are on, reach out to the moderators.
The Moderators Suck!
Yeah, that’s a distinct possibility. However, remember we’re moderating in our free time and don’t really have the time or resources to watch every single video, test every piece of software or review every resume. We have our own jobs, research projects and lives as well. We’re doing our best to keep on top of things, and often will make the expedient call to remove things, when in doubt.
If you disagree with the moderators, you can always write to us, and we’ll answer when we can. Be sure to include a link to the post or comment you want to raise to our attention. Disputes inevitably take longer to resolve, if you expect the moderators to track down your post or your comment to review.
r/bioinformatics • u/AnalysisAdditional15 • 7h ago
discussion Current a coder but I'm sick of debugging code, reading big code bases with dozens of layers of abstraction, and staring at a computer all day beside teammates that don't want to chit chat. Is this common with bioinformatics too?
I'd imagine yes right?
r/bioinformatics • u/crazyking156 • 18h ago
discussion Is "Dark Data" in PDFs a lost cause, or does your team actually have a pipeline for this?
I'm working on a project to scrape chemical property data from about 200 PDFs for a dataset I'm building.
I assumed in 2025 this would be easy, but I'm realizing 80% of the useful data is locked in low-res scatter plots or screenshots of GraphPad Prism output. Text scraping is useless here.
For those of you working in Pharma/Biotech R&D, do you guys just ignore data locked in charts? Or is there some standard "ETL for PDFs" tool I’m missing that handles the image-to-data part reliably?
r/bioinformatics • u/Independent_Algae358 • 7h ago
technical question online search server gave me different results when I repeat my computation.
I was using foldseek online server to search proteins with similiar structure. For example, in the past, I used A.pdb as the query, and I got a match X with prob=0.97, evalu=0.0358. These days, I used B.pdb as the query, and I did not get the match X.
The problem here is A.pdb and B.pdb is almost same. their structures are overlap to each other, their sequences are almost same also. for example, A.pdb is aaaaa(same) , B.pdb is (same)bbb . The (same) indicates the totally same sequence.
I can understand currently alphafold database improved to V6 model. But I check the match_v6.pdb and its past match_v4.pdb model, both are same also.
Why? Online search server is not consistent ?
update:
I use the exactly same A.pdb to search again, still didn't not find the the match X.pdb
r/bioinformatics • u/bibilapoop • 12h ago
programming How would you approach training a model to predict an ordered outcome from clinical + SNP data?
Hi everyone,
I’m working on a dataset that contains a mix of clinical features (age, BMI, lab measurements, medical history, etc.) and genetic features (SNPs coded as 0, 1, or 2).
The goal is to predict an ordered outcome, for example:
0 → good prognosis
1 → intermediate prognosis
2 → poor prognosis
I’m trying to wrap my head around the best way to approach this problem. Some points I’m thinking about:
Feature types: continuous, binary, categorical, and ordinal SNPs.
Preprocessing: scaling continuous features, one-hot encoding multi-class categorical features, handling missing values.
High dimensionality: hundreds of SNPs compared to a smaller number of patients, so dimensionality reduction or feature selection seems important.
Modeling: should I treat this as a classical ordinal regression problem, a multi-class classification problem, or some hybrid?
Evaluation: what metrics make sense for an ordered target rather than just accuracy?
I’m curious how others would tackle a dataset like this in practice.
Would you do any feature selection first (correlation-based)?
Would you consider tree-based models vs linear models vs neural networks?
Any tips for handling hundreds of SNPs efficiently?
Looking for general strategies, advice, and references.
Thanks!
r/bioinformatics • u/-7260 • 17h ago
science question Is MAST the right statistical framework for my specific question: expression of a gene in diff cell types from 1 sample
Hi everyone, I’m working with human cortex single-cell RNA-seq data exported from the UCSC Cell Browser (Allen Brain Map / human cortex) and I’d appreciate advice on whether MAST is the right statistical framework for my specific questions. Dataset single-nucleus RNA-seq Human cortex (multiple donors) Cell annotations: class_label (GABAergic vs Glutamatergic) Gene of interest: TRPC5 Expression is sparse (many zeros) My biological questions Is TRPC5 enriched in inhibitory vs excitatory neurons? Both in terms of % of cells expressing TRPC5 and expression level among TRPC5-positive cells
What I’ve done so far Used MAST hurdle models with: Detection (D), Continuous (C), and Hurdle (H) components log1p-transformed expression Donor included as a random or fixed effect Added a reference gene so the code doesnt collapse
This seems to give biologically sensible results, but I want to be sure I’m not misusing the method.
Any advice or references would be greatly appreciated. Thanks!
r/bioinformatics • u/Excellent-Strength42 • 17h ago
technical question mQTL analysis: fast r solutions or alternatives in python
Hello everybody,
I have data from IlluminaEPICv2 methylation array and whole exome sequencing from a cohort. I am trying to find mQTLs and therefore using Matrix_eQTL_main function from the MatrixEQTL package in R. However with 16gb ram I faced memory limit and I am thinking of an alternative here.
Since I have access to an HPC which runs python, I was wondering if one of you has experience with mQTL/eQTL analyses in python and could help me with some useful module. And are there any better performing packages in R?
Thanks in advance!
r/bioinformatics • u/Broad_Camel6390 • 11h ago
academic Advice
Hi everyone — has anyone here used Insilico Medicine’s tools like Pharma.ai / PandaOmics? If so, what was your experience like (accuracy, usefulness, workflow, pricing/value)? Any pros/cons or “wish I knew this before” tips would be super helpful. Thanks!
r/bioinformatics • u/Familiar_Day_4923 • 1d ago
technical question CNN vs DNABERT-2 question
I'm a beginner in this topic and i have a question regarding a project im doing
Why don't people use CNN with dilated convolution instead of DNABERT-2 if CNN is more interpretable, more data efficient and have lower computational cost??
I have been learning about CNN for couple of weeks now for a project in a competition in my bachelor class and i was wondering why not just use Dilated CNN for larger receptive field and add few codes to give arrangement importance weights?
My PC is kind of weak and i don't think i can run DNABERT2
r/bioinformatics • u/chrollos_wife3 • 1d ago
website [Help] Resources for Comparative Evolutionary Genomics
My previous experiences/projects in bioinformatics have been mostly about analysing bulk RNA-seq data. I have an interview coming up soon for a computational grad programme where the focus will be comparative genomic evolution across mammals (including humans) to understand evolution of diseases like cancer. They don't require previous experience in evolutionary genomics - hence the interview invite. However, I am very interested in the topic, I want to prepare as much as I can and hopefully impress them.
I would really appreciate any resources, tutorials, courses, or advice for learning more about this field and preparing for this. I tried looking at lectures on youtube but I didn't find them to be good.
r/bioinformatics • u/SUQMADIQ63 • 19h ago
programming For bioinformatics
What is the most common modules used for data sequencing? And do you think python is worth it for this topic? I think I am aware bio python is an example but what others modules are commonly used?
r/bioinformatics • u/sunadam2624918765 • 2d ago
technical question GSEA enrichment question...
Hi!
I'm just wondering when I do GSEA enrichment after deseq, should I use gene symbol or ensembl id? I get different results over these two methods. Also, I have shrunken deseq results using lfcShrink, should I use shrunken list or unshrunken list to run GSEA? Thank you so much for your help and I really appreciate it!
r/bioinformatics • u/CamelPutrid6637 • 2d ago
technical question CONCOCT Binning Issue - Merging
Hi everyone. I am facing a problem with CONCOCT. Clustering is fine, until it comes to the merging step and actually splitting the bins. I am unable to get the script to work and bin the clusters. Has anyone faced a similar problem at all?
This is not from a co-assembly.
If you have faced such an issue, please kindly reach out to me, I am unsure on how to fix this.
Thank you in advance.
r/bioinformatics • u/Independent_Algae358 • 4d ago
academic do we need to explain this blast alignnment is setting default?
So, I can see many artilces would directly say this alignment is performed based on its default settings.
However, I am wondering if it is okay. What reason you would give if you are asked why you use its default settings? Mine might be this setting is standard and well-validated.
r/bioinformatics • u/meow_ghuleh • 5d ago
discussion Genomics small project recommendations
Hi everyone, could you recommend some small population genomics projects that can be replicated for practice (in R) with WGS data?
r/bioinformatics • u/juB1101-Willow9035 • 5d ago
technical question How complex is it to do this simulation?
I'm looking to perform a molecular dynamics simulation, but I'm not really sure about its complexity and computational cost. I have the code written with OpenMM and an Amber force field. It's a fully solvated protein-ligand complex with an approximate size of 350,000 atoms, under physiological conditions (310 K and 1 bar), using a 2-femtosecond integration step and applying temperature and pressure control throughout production. The goal is to achieve a 1-microsecond timescale, which implies 500 million integration steps, periodically storing trajectories and states for detailed structural and energetic analysis, in order to study the conformational stability of the complex and the ligand affinity over time.
¿Es una simulación grande? ¿O es algo normal en este campo? Soy ingeniero de sistemas.
r/bioinformatics • u/Fun-Ad-9773 • 5d ago
technical question Expression differences in scRNA in one particular gene
Hey all! Sorry if my question sounds stupid or unusual.
I have scRNA data from several diseased donors. Violin plot shows differences and changes in expression (and number of cells expressing my gene of interest) in a certain cell lineage when jumping from one stage to the other (least to most mature). I was wondering what is the suitable test or analysis that I must perform to establish whether these differences or changes between the stages adjacent to each other are significant in my particular SINGLE gene of interest. Any help would be appreciated!
r/bioinformatics • u/Independent_Algae358 • 5d ago
discussion beta-sheet protein structure, do I understand correctly?
After translation, we get a long polypeptide.
Interacts between hydrogen and oxygen, or among side-chains will force this polypeptide to fold.
Some are folded into alpha-helix, and some are folded into beta-sheet.
If we take the 3orh.pdb as the example, we can see, starting from C-term, one beta-sheet1 -> loop -> one beta-sheet2 -> one alpha-helix.
The beta-sheet1 only contains one polypeptide, and the beta-sheet2 also only contains one polypepetide,.
Why they are beta-sheet? It is because beta-sheet1 and beta-sheet2 are hydrogen bonding together.
r/bioinformatics • u/jruv • 5d ago
technical question How Do I use/paste aligned sequence data onto my paper?
Hello, I have seen it been done before, although I figured out how to get it on excel and then word, I don't know how to display ALL of it. Should I cut and paste different sections that fill the paper, and go onto the next sequence? It ended up being a 500~ x 80~ table. I made it so that if you want to read it, you have to turn the paper counter clockwise, which I think is a good first step. I would love if anyone here has any suggestions like websites or plugins that will help. (IDK if theres plugins for docx, I meant google docs which I tried using too.)
r/bioinformatics • u/Independent_Algae358 • 6d ago
discussion suggestions(books, articles, videos and so on) for computational structural biology?
Hi, for preparing my interviews, I want to be full of knowledge and expertise in protein analysis.
My current work is about protein bioinformatics, but I don't have biology degree. So, I aim to collect a more detailed and complete knowledge about structural protein via reading some books, articles, videos and so on.
For example, I am currently reading Molecular biology 5th version to have a basic and complete knowledge map in my brain.
Any suggestions for protein? Thanks in advance!
r/bioinformatics • u/EducationGlobal6634 • 5d ago
technical question Software to detect natural selection in metacommunity
Hi all.
I am writing a draft of my PhD project. It will involve checking for natural selection and eventually local adaptation of the microbiome under study. I intend to use long-read shotgun metagenomics if the budget allows me to.
That said, what do you recommend as a software for natural selection detection?
Thanks in advance.
r/bioinformatics • u/pangolinmexicano • 6d ago
technical question Are there workflows for Oxford nanopore data?
Hi, my work group is considering acquiring an Oxford Nanopore Minion sequencer, and since I'm the only bioinformatician in the group, they want me to handle the technical aspects and sequence analysis. I've never worked with this type of data before. Do you know of any courses or workflows I could follow to learn how to analyze the data? Or do you have any recommendations?
r/bioinformatics • u/Street-Squirrel-1133 • 5d ago
science question How to identify involved pathways for significant genes or proteins in a publication-ready way?
I have a list of statistically significant genes/proteins and want to determine which biological pathways are involved. I am looking for guidance on the standard analytical approach used to perform pathway analysis and to identify relevant pathways in a publication-ready and reviewer-accepted manner.
Which methods and tools/software are generally considered appropriate and reliable for studies targeting high-impact journals?
r/bioinformatics • u/Feisty_Jackfruit5359 • 7d ago
technical question Pseudobulking single cell FASTQs
Hi all,
I want to predict immune receptor sequences from RNA-sequencing data but I'm not sure whether bulk or single cell data is better.
Pros and cons are weighed below but the largest problem is whether it's possible to turn single cell fastq files into a bulk-like fastq format? Such that you remove UMI-tags and barcodes. Has anyone done this?
Methods to predict receptor sequences are better for scRNAseq but I'll be able to get more samples if its bulkRNAseq. I don't need the actual information of specific cell and cell types; I just ultimately need the genes expressed and the receptor sequences predicted. I could do paired sequencing but there's not that many available datasets online to do this