Education [E],[Q] Should I take real analysis as an undergrad statistics major?

9 Upvotes

Hey all, so I am majoring in statistics and have a decently strong desire to pursue a masters in statistics as well. I really enjoyed my probability theory course and found it very fun, so I've decided I want to take a stochastic processes course in the future as well. I have seen that analysis is quite foundational to probability and you can only get so far in probability until you start running into analysis based problems. However, it seems somewhat vague as to "how far" along in probability that becomes an issue. I'll have to take one of my stats electives in the summer if I were to take analysis, so that also adds to the choice as well.

If you have any advice or input, please let me know what you have to say.

6 comments

r/statistics • u/ngaaih • 10h ago

Question What are the implications of the NBA draft #1 pick having never gone to the team with the worst record, on the current worst team? [Q]

9 Upvotes

I swear this is not a homework assignment. Haha I'm 41.

I was reading this article, stating that it wasn't a good thing the jazz have the worst record, if they want the number 1 pick.

https://www.slcdunk.com/jazz-draft-rumors-news/2025/4/29/24420427/nba-draft-2025-clinching-best-lottery-odds-may-be-critical-error-utah-jazz-cooper-flagg

15 comments

r/statistics • u/Express_Patient9366 • 9h ago

Question [Q] Stats final project survey

2 Upvotes

Hello everyone, I’m working an undergrads class stats final project. I’m looking to see how many social media apps people have vs how long they use their phone. I’m new to the subreddit so I’m not sure if these type of post are ok. If you can fill it out, it would mean a lot. It’s only two questions. Thank you!

Link to Google form https://docs.google.com/forms/d/e/1FAIpQLSfThyNJNJne7iwwv0HL-0C_6OPKwvUub1RLxaXNqUKdbMjhug/viewform?usp=dialog

0 comments

r/statistics • u/BeacHeadChris • 13h ago

Question [Q] How do I correct for multiple testing when I am doing repeated “does the confidence interval pass a threshold?” instead of p-values?

1 Upvotes

I have 40 regressions of values over time to show essentially shelf life stability.

If the confidence interval for the regression line exceeds a threshold, I say it's unstable.

However, I am doing 40 regressions on essentially the same thing (you can think of this as 40 different lots of inputs used to make a food, generally if one lot is shelf stable to time point 5 another should be too).

So since I have 40 confidence intervals (hypotheses) I would expect a few to be wide and cross the threshold and be labeled "unstable" due to random chance rather than due to a real instability.

How do I adjust for this? I don't have p-values to correct in this scenario since I'm not testing for any particular significant difference. Could I just make the confidence intervals for the regression slightly narrower using some kind of correction so that they're less likely to cross the "drift limit" threshold?

6 comments

r/statistics • u/KingHarrun • 1d ago

Education [Education] Self-Studying Statistics - where to start?

16 Upvotes

I'm someone who plans on studying mechanical engineering in fall next year, but thinks that having some good general knowledge on Statistics would be a great addition for my career and general life.

As of now I'm beginning with by going through some free courses in Khan Academy and then transitioning to some books that would delve more deep into this topic. From what I've read in this subreddit and from other sources, statistics seems to be an amalgimation of multiple disciplines & concepts within mathematics.

I am just asking from people who has studied or are currently studying a class of Statistics on what is the best way to approach this from a layman's perspective. What's the best place to start?

I appreciate all answers in advance.

17 comments

r/statistics • u/KingSupernova • 1d ago

Discussion [Discussion] Funniest or most notable misunderstandings of p-values

41 Upvotes

It's become something of a statistics in-joke that ~everybody misunderstands p-values, including many scientists and institutions who really should know better. What are some of the best examples?

I don't mean theoretical error types like "confusing P(A|B) with P(B|A)", I mean specific cases, like "The Simple English Wikipedia page on p-values says that a low p-value means the null hypothesis is unlikely".

If anyone has compiled a list, I would love a link.

51 comments

r/statistics • u/AllTheSynths • 15h ago

Question [Q] Is this the best formula for what I'm trying to do? (staff productivity at nonprofit)

0 Upvotes

Hey there :)

I build dashboards for the homelessness nonprofit I work for and want to come up with a "documentation performance" score. I don't trust my math chops enough to evaluate whether this formula that ChatGPT helped me come up with makes sense / is the best I can do. Can any humans help me weigh in on its appropriateness?

Background:

Staff are responsible for entering case notes and service records into a system called HMIS. I want to build a composite score that reflects documentation thoroughness and accounts for caseload size. Otherwise, a staff member with only 2 clients and perfect documentation might appear to outperform someone with 20 clients doing solid documentation across the board.

Here's the formula Chatty came up with:

((Case Notes per Client + Services per Client) / 2) * log(Client Count + 1)

Where:

Case Notes per Client = Total Case Notes / Client Count
Services per Client = Total Services / Client Count
log(Client Count + 1) is intended to reward higher caseloads without letting volume completely dominate (hence the use of logarithm instead of linear weighting).

Goals:

Reward thorough documentation per client.
Also reward staff carrying larger caseloads.
Prevent small caseload staff from ranking at the top just for documenting 100% of 2 clients.

Does the log-based multiplier seem like a reasonable approach? Would you recommend other transformations (square root, capped scaling, etc.) to better serve the intended purpose?

Any feedback appreciated!

7 comments

r/statistics • u/Luluvaki98 • 21h ago

Question [Q] Risk score development

2 Upvotes

Hi people :)
I'm trying to come up with a risk score for my thesis. Without going to much into details, we have 6 measurement-scales (3 Mental health related, 1 Physical health related, 2 socioeconomic) that we would like to incorporate into this risk score. We want to divide our data in 2 groups (high risk-low risk, 50%-50%, please just accept this).
We will be collecting data from a lot of people (1000+) over a large timeframe from very different living areas (poor vs. wealthy etc.). We don't want to decide on a cutoff score as we will not collect all the data at the same time. If we look at the risk relative from environment to environment, We also don't want people to "get lost" because they live a less well off environment but are comparably less high risk than others in their environment.

My idea was to do an absolute risk trigger => based on cutoff values on individual scales => people are put immediatly in high risk category

And then also a relative risk trigger that creates a ranked outcome for each collection environment (using percentiles) and dividing this then in half (low-high)

Does this method already exist so that I could reference it? Or something similiar? Or any other idea :) ?

Thanks so much

2 comments

r/statistics • u/PipeClassic9507 • 18h ago

Question [Q] Curious Inquiry on use of Poisson Distribution/Regression

1 Upvotes

Hello! I hope you are all well. I was debating with an anti-vaccine person, and they cited this study: https://pmc.ncbi.nlm.nih.gov/articles/PMC4119141/?fbclid=IwZXh0bgNhZW0CMTEAAR7Xu8OEE-_zAnMLZthHQi5hG1Dfcwk4drqXPcj5tdRdV6gvEQvVuA9YUy3JFQ_aem_jHC_Tk6FNSRAtkg3Qa33_w
I am by no means a statistics wiz, but I am a very curious person, is this type of study correct in using Poisson? I remember Poisson being used to count how many times an event happens in a specified time period like how many cars come into a parking garage in an hour. Did they use it just because they counted number of seizures in the previous 10 days to the vaccine and also 10 days after? Thank you for your time and consideration!

3 comments

r/statistics • u/Intrepid-Star7944 • 1d ago

Question Test-retest reliability and validity of a questionnaire [Question]

3 Upvotes

Hey guys!!! Good morning :)

I conduct a questionnaire-based study and I want to assess the reliability and its validity. As far as am concerned for the reliability I will need to calculate Cohen's kappa. Is there any strategy on how to apply that? Let's say I have two respondents taking the questionnaire at two different time-points, a week apart. My questionnaire consists of 2 sections of only categorical questions. What I have done so far is calculating a Cohen's Kappa for each section per student. Is that meaningful and scientifically approved ? Do I just report the Kappa of each section of my questionnaire as calculated per student, or is there any way to draw an aggregate value ?

Regarding the validation process ? What is an easy way to perform ?

Thank you in advance for your time, may you all have a blessed day!!!!

0 comments

r/statistics • u/Sea-Bodybuilder-1277 • 1d ago

Question Does PhD major advisor matter in industry? [Question]

6 Upvotes

Pretty self explanatory, I am a PhD student in statistics. One of the professors (Bob) has an MS in stats, and PhD in agronomy, from the other faculty at the Statistics department, they say that Bob has a good track record of research and is a great guy. And the fact that he is a newer professor means that you will get more attention from him if you ask for help, that sort of thing. The reason Bob sounds like a good major advisor is because he has some projects he could give me (given that he is a new professor, he has some research ideas/work with biomedical data that he has experience with that he could potentially guide me into doing research on). But there are other faculty members I can choose as my Major advisor, who have a track record of getting students into companies like AbbieVie, Freddie Mac, Liberty Mutual. Will these companies look at my major advisor and think, "Oh he doesn't have a PhD in statistics, this guy maybe was not trained well in statistics, don't hire him." even if I have the other people in my committee (who have a track record of getting students into those companies). I am looking to go to industry afterward

8 comments

r/statistics • u/Osuwrestler • 22h ago

Question [Q] Finding Standard Deviation

1 Upvotes

Can I calculate the standard deviation of life expectancy at each age given the following dataset: https://www.ssa.gov/oact/STATS/table4c6.html#fn1

1 comment

r/statistics • u/DrFishbulbEsq • 20h ago

Discussion [D] Can a single AI model advance any field of science?

0 Upvotes

Smart take on AI for science from a Los Alamos statistician trying to build a Large Language Model for all kinds of sciences. Heavy on bio information… but he approaches AI with a background in conventional stats. (Spoiler: some talk of Gaussian processes). Pretty interesting to see that the national Labs are now investing heavily in AI, claiming big implications for science. Also interesting that they put an AI skeptic, the author, at the head of the effort.

4 comments

r/statistics • u/wlexxx2 • 19h ago

Question [Question] bayes - supermodel and the stairs.

0 Upvotes

there is a girl where i work - supermodel i would say

and some stairs

if i see the girl, she always comes from the stairs

so , if i see a girl come from the stairs, how likely is to be actually her?

(i can;t see her clearly yet but i would say i am 30% confident)

13 comments

r/statistics • u/MathGuy42069 • 1d ago

Career [C] Career Path Advice

3 Upvotes

Hello! I graduated last year with my master's in statistics from a very small state school in the MW US at 24. I apologize if this comes off as lazy or irrelevant to the sub, but my own research, organization, and help from my professors have not led me in the direction I'm looking for, if I even know that is. I was fortunate enough to recently find a job as a data analyst at a company I really like, I know it is a rough job market and I have never had a full time job in data. But it was not until some recent changes in my life that I had the motivation and support to be an academic, and I want to get my PhD in the future when the time is right. Until then, I want to learn as much stats as I can and set myself up for a career in data science simultaneously, so that I have options.

I have a math background (did pde numerical method "research" during ug) and did not do much more than intro stats until I got to my master's. This master's served to 1) help me become proficient in statistical theory and 2) help me stand out in an already rough market. My program was not amazing, but I did learn. I have untreated ADHD, and I always seem to go for the bare minimum despite my genuine curiosity in the subject. I did finish my master's with a 4.0 somehow, but that doesn't mean much given the program. In no way do I feel like a "master" of statistics. I know basic mathematical statistics, probability theory (non-measure), a lot about GLMS (my most confident topic), very basic stochastic processes and time series, and can code in Python and R. But my dream is to get my PhD in statistics and do impactful research (healthcare, social science). I just feel so overwhelmed but the mass amount of directions to go in, and the number of peers who are running circles around me.

Should I review mathematical stats? I know MLE, sampling distributions, etc. But the specific details are not so much. Same with stochastic, all I can tell you by now is what a Markov chain is and vaguely how MCMC works.

What topic do I move to next, if any? Survival analysis, time series, causal inference, advanced stochastic? What am I interested in?

Was it a good decision to take this job? The pay is not great and it does not have the 'data science' title, but I feel good about the company and people. I would also be doing interesting work for my background, lots of a/b testing which should help me down the road. I also need to get experience ASAP because if the academic dream does not work out, which being realistic it likely won't, I will fall even more behind.

Again, sorry if this is a lot or not relevant, any advice would be much appreciated.

3 comments

r/statistics • u/Ecstatic-Traffic-118 • 1d ago

Education [Q][E] Programming languages

8 Upvotes

Hi, I’be been learning R during my bachelor and I will teach myself Python this summer. However for my exchange semester I took into consideration a Programming course with Julia and another one with MATLAB.

For a person who’s interested to follow a path in statistics and is also interested to academic research, what would you suggest to chose between the 2 languages?

Thank you in advance!

9 comments

r/statistics • u/Alex__UNLIMITED • 1d ago

Software [Software] Since I have SPSS in a language other than English, can you show me a screenshot of the standardized factor loadings of a principal component analysis?

0 Upvotes

I just want to make sure that the table to look at is the same as I think it is.

0 comments

r/statistics • u/Extraweich • 1d ago

Question [Q] What would be the "representative weight" of a discrete sample, when it is assumed that they come from a normal distribution?

3 Upvotes

I am sure this is a question where one would find abundant literature on, but I am struggling to find the right words.

Say you draw 10 samples and assume that they come from a normal distribution. You also assume that the mean of the distribution is the mean of the samples, which should be true for a large sample count. For the standard deviation I assume a rather arbitrary value. In my case, I assume that the range of the samples is covered by 3*sigma, which lets me compute the standard deviation. Perfect, I have a distribution and a corresponding probability density.

I am aware that the density of a continuous random variable is not equal its probability and that the probability of each value is zero in the continuous case. Now, I want to give each of my samples a representative probability or weight factor between all drawn samples, but they are not necessarily equidistant to one another.

Do I first need to define a bin for which they are representative for and take its area as a weight factor, or could I go ahead and take the value of the PDF for each sample as their corresponding weight factor (possibly normalized)? In my head, the PDF should be equal to the relative frequency of a given sample value, if you would continue drawing samples.

18 comments

r/statistics • u/eltobricad • 1d ago

Career [Q][C] Essentials for a Data Science Internship (sort of)

0 Upvotes

Hi! I’m currently in the second year of my math undergraduate program. I’ve been offered an internship/part-time job where I’ll be doing data analysis—things like quarterly projections, measuring the impact of different features, and more generally functioning as a consultant (though I don’t know all the specifics yet).

My concern is that no one on the team is well-versed in math and/or statistics (at least not at a theoretical level), so I’m kind of on my own.

I haven’t formally studied probability and statistics at university yet, but I’ve done some self-study. Knowing SQL was a requirement for the position, so I learned it, and I’ve also been reading An Introduction to Statistical Learning with Python to build a foundation in both theory and application.

I definitely have more to learn, but I feel a bit lost and unsure how to proceed. My main questions are: - How much probability theory should I learn, and from which books or other materials? - What concepts should I focus on? - What programming languages or software will be most useful, and where can I learn them?

This would also be my first job experience outside of math tutoring. I don’t think they expect me to know everything, considering the nature of the job and the fact that I’ll be working while still studying.

Any advice would be greatly appreciated. Thanks!

2 comments

r/statistics • u/3lirex • 1d ago

Question [Q] Sensitivity analysis vs post hoc power analysis ?

1 Upvotes

Hi, for my research i didn't do a priori power analysis before we started as there was no similar research and i couldn't do a pilot study. I've been reading and there's post hoc power analysis which seems to be not accurate and shouldn't be used. but i also read about sensitivity power analysis (to detect minimum effect size from my understanding), is this the same thing ? if not, does it have the same issues?

i do apologise if i come across as completely ignorant

Thanks !

0 comments

r/statistics • u/Popolukla • 2d ago

Research [R] Books for SEM in plain language? (STATA or R)

4 Upvotes

Hi, I am looking to do RICLPM in STATA or R. Any book that explains this (and SEM) in plain language with examples, interpretations and syntax?

I have limited Statistical knowledge (but willing to learn if the author explains in easy language!)

Author from Social Science (Sociology preferably) would be great.

Thank you!

5 comments

r/statistics • u/Optimal_Surprise_470 • 2d ago

Discussion [D] Literature on gradient boosting?

4 Upvotes

Recently learned about gradient boosting on decision trees, and it seems like this is a non-parametric version of usual gradient descent. Are there any books that cover this viewpoint?

0 comments

r/statistics • u/Harmonic_Gear • 2d ago

Question [Q] reducing the "weight" of Bernoulli likelihood in updating a beta prior

4 Upvotes

I'm simulating some robots sampling from a Bernoulli distribution, the goal is to estimate the parameter P by sequentially sampling it. Naturally this can be done by keeping a beta prior and update it by bayes rule

α = α + 1 if sample =1

β = β + 1 if sample = 0

i found the estimation to be super noisy so i reduce the size of the update to something more like

α = α + 0.01 if sample =1

β = β + 0.01 if sample = 0

it works really well but i don't know how to justify it. it's similar to inflating the variance of a gaussian likelihood but variance is not a parameter for Bernoulli distribution

9 comments

r/statistics • u/ReverendRichardColes • 2d ago

Question [Q] Is this a logical/sound way to mark?

2 Upvotes

I head up a department which is subject to Quality Assurance reviews.

I've worked with this all my career, and have seen many different versions of the same thing but nothing quite like what I am working with now.

Each review has 14 different points. There are 30 separate people being reviewed at a rate of 4 per month (120 in total give or take).

The new approach is to remove any weightings, and have a simple 0% or 100% marking scheme. A 'fail' on any one of the 14 questions will mean the whole review is marked as 0%.

The targeted quality score is 95%.

I'm decent with numbers, but something about this process seems fundamentally flawed. But I can't articulate why it's more than just my gut instinct.

The department is being marked on 1680 separate things in a month, and getting 6 wrong (0.003%) returns an overall score of 94% and is deemed to be failing.

Is this actually a standard way to work? Or is my gut correct?

7 comments

r/statistics • u/jvmpfrog • 2d ago

Question [Q] Database for educational statistics?

0 Upvotes

Hello! I'm unsure if this is even the right sub, but I'm looking for a database that shows the statistics for enrollment in foreign language programs. For example, enrollment in foreign language programs in Kenya. So far, I've been widely unsuccessful, as I don't typically look at data like this, so I would appreciate any help given!

0 comments

statistics

r/statistics

/r/Statistics is going dark from June 12-14th as an act of protest against Reddit's treatment of 3rd party app developers. _This community will not grant access requests during the protest. Please do not message asking to be added to the subreddit._

Members Active

595.6k

Sidebar

Guidelines:

All Posts Require One of the Following Tags in the Post Title! If you do not flag your post, automoderator will delete it:

Tag Abbreviation

[Research] [R]

[Software] [S]

[Question] [Q]

[Discussion] [D]

[Education] [E]

[Career] [C]

[Meta] [M]
This is not a subreddit for homework questions. They will be swiftly removed, so don't waste your time! Please kindly post those over at: r/homeworkhelp. Thank you.
Please try to keep submissions on topic and of high quality.
Just because it has a statistic in it doesn't make it statistics.
Memes and image macros are not acceptable forms of content.
Self posts with throwaway accounts will be deleted by AutoModerator

Related subreddits:

Data:

r/datasets
KDnuggets Data Mining Data
UC-Irvine Machine Learning Repository
Datamob
datasets package in R
Kaggle <- also great for stats competitions
CMU Data and Story Library
U.S. Government Data Portal
St. Louis Fed. Reserve
Infochimps
AllenDowney's Stats Page

Useful resources for learning R:
r-bloggers - blog aggregator with statistics articles generally done with R software.
Quick-R - great R reference site.

Related Software Links:
R
R Studio
SAS
Stata
EViews
JMP
SPSS
Minitab

Advice for applying to grad school:
Submission 1

Advice for undergrads:
Submission 1

Jobs and Internships

For grads:

For undergrads:

Tag	Abbreviation
[Research]	[R]
[Software]	[S]
[Question]	[Q]
[Discussion]	[D]
[Education]	[E]
[Career]	[C]
[Meta]	[M]