Friday, December 27, 2024

Do you really understand p-values? by 🎯 Ming "Tommy" Tang

 https://www.linkedin.com/posts/%F0%9F%8E%AF-ming-tommy-tang-40650014_do-you-really-understand-p-values-the-ugcPost-7277341013398978560-IoWG?utm_source=share&utm_medium=member_desktop


🎯 Do you really understand p-values?
The p-value histogram can reveal a LOT about your data. Let's break it down using real examples.👇

1/ First, a quick fact: P-values follow a uniform distribution under the null hypothesis.
What does that mean? 🤔
If there’s truly no difference between groups, the p-value behaves like rolling a fair die:
• P(p < 0.01) = 0.01
• P(p < 0.02) = 0.02

2/ Here's the catch: With 100 t-tests, if the null hypothesis is true, we expect ~1 test with p < 0.01, ~2 tests with p < 0.02, and so on.
This randomness underlies why p-values are so tricky—and why multiple testing correction is critical

3/ Why should bioinformaticians care? 🧬
When you analyze RNA-seq, ChIP-seq, or other large datasets, you’re running thousands of tests.

Plotting the p-value histogram can tell you:

✔️ If the null hypothesis holds

✔️ If your experiment reveals meaningful signals

4/ you should see a peak around p=0 if you have real differences. Then you apply Bonferroni correction or Benjamini & Hochberg (BH method) to control the FDR. In R, uses p.adjust(pvals). dive deeper in my blog post here https://lnkd.in/ex3S3V5g

5/ Example: A perfect null distribution (no effect) looks flat like this:
📊 [0.01] [0.02] [0.03] ... evenly distributed across the range 0 to 1.
If you see a lot of p-values near zero, it suggests true effects in your data—signals worth exploring!

6/ 🚩 Red flag: A "U-shaped" histogram, with many p-values near 0 and 1.
This often indicates technical artifacts in your data (batch effects, low-quality samples).
Always check your p-value histogram before jumping to conclusions. reference: https://lnkd.in/eVctWN_j

7/ Want to explore this further? Try it with your RNA-seq results!
• Run your differential expression analysis.
• Plot the histogram of your p-values.
• Interpret: Do you see flat (null), enriched (signals), or U-shaped (artifacts)?

8/ I saw mine in one of my RNAseq analysis. what's going on for the p-values? there is a spike around p=0.8

9/ After removing the genes with low counts, I got the normal good looking histogram. read more in details in my blog post https://lnkd.in/ezsMVE3s

10/ TL;DR:

✔️ Under the null, p-values are uniform (flat distribution).

✔️ A "spike near 0" suggests true effects.

✔️ A "U-shape" signals artifacts—time to troubleshoot.

Plot your p-values. They’ll tell you more than you think!
Have questions? Let’s chat 👇

I hope you've found this post helpful.

Follow me for more.

Subscribe to my FREE newsletter https://lnkd.in/erw83Svn

No comments: