T Score Calculator

Q: What is the difference between a t-score and a z-score?

A z-score uses the known population standard deviation, while a t-score uses the sample standard deviation as an estimate. With samples larger than 30, the t-distribution approximates the normal distribution, making both scores nearly identical.

Q: What does a negative t-score mean?

A negative t-score simply means the sample mean is below the population mean (or group 1 mean is below group 2). The sign reflects direction; statistical significance depends on the absolute value and the corresponding p-value.

Q: When should I use a one-tailed vs two-tailed test?

Use a one-tailed test only when you have a directional hypothesis (e.g., the mean increases). Use a two-tailed test when testing for any difference, regardless of direction. Two-tailed tests are standard in most research.

Q: What p-value is considered statistically significant?

A p-value below 0.05 is the conventional threshold for statistical significance, meaning there is less than a 5% probability the observed difference occurred by chance. Some fields use stricter thresholds of 0.01 or 0.001.

Q: Can I use a t-test with non-normal data?

T-tests assume normally distributed data, but they are robust to moderate deviations with sample sizes above 30. For severely non-normal or ordinal data, consider non-parametric alternatives like the Mann-Whitney U test or Wilcoxon signed-rank test.

Q: How many samples do I need for a reliable t-test?

There is no fixed minimum, but samples under 10 require very large effect sizes to achieve significance. A power analysis is the proper way to determine sample size – most studies aim for at least 20–30 observations per group.

May 11, 2026

A pharmaceutical company tests whether a new drug lowers blood pressure more than a placebo. A teacher compares two classes on exam performance. An engineer checks if a production line meets specifications. All three rely on the same statistical tool: the t-test. The t score calculator below handles one-sample, two-sample, and paired t-tests, returning the t-value, degrees of freedom, and p-value.

Test Type

One-Sample Two-Sample Paired

Select the type of t-test that matches your study design.

One-Sample Parameters

Compare a sample mean against a known or hypothesized population mean.

Sample Mean (x̄) The average of your sample data Population Mean (μ₀) The null hypothesis value Sample Std Dev (s) Standard deviation of your sample Sample Size (n) Number of observations (≥ 2)

Test Options

Two-tailed One-tailed

Two-tailed is standard. Use one-tailed only with a directional hypothesis.

What Is a T-Score?

A t-score (or t-statistic) measures how far a sample mean deviates from a population mean, expressed in units of standard error. The larger the absolute value, the stronger the evidence against the null hypothesis.

William Sealy Gosset developed the t-distribution in 1908 while working at the Guinness Brewery in Dublin. Publishing under the pen name “Student,” he needed a reliable way to draw conclusions from small sample sizes – a common constraint in industrial quality control and biological research.

The t-distribution accounts for the extra uncertainty introduced when the population standard deviation is unknown and must be estimated from the sample. It has heavier tails than the standard normal distribution, which means extreme values are more likely with small samples.

Types of T-Tests and Their Formulas

One-Sample T-Test

Compares a single sample mean against a known or hypothesized population mean.

Formula:

$$t = \frac{\bar{x} - \mu}{s / \sqrt{n}}$$

Where:

x̄ – sample mean
μ – population mean (null hypothesis value)
s – sample standard deviation
n – sample size

Degrees of freedom: df = n − 1

Example: A factory claims its bolts have a mean length of 50 mm. You measure 25 bolts and find x̄ = 49.2 mm with s = 2.1 mm.

t = (49.2 − 50) / (2.1 / √25) = −0.8 / 0.42 = −1.905

With df = 24 and α = 0.05 (two-tailed), the critical value is ±2.064. Since |−1.905| < 2.064, you fail to reject the null hypothesis.

Two-Sample (Independent) T-Test

Compares the means of two independent groups to determine whether they are significantly different.

Formula (Welch’s t-test, unequal variances):

$$t = \frac{\bar{x}_1 - \bar{x}_2}{\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}}$$

Degrees of freedom (Welch–Satterthwaite approximation):

$$df = \frac{\left(\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}\right)^2}{\frac{(s_1^2/n_1)^2}{n_1 - 1} + \frac{(s_2^2/n_2)^2}{n_2 - 1}}$$

When both samples have equal variances and equal sizes, the simpler pooled variance formula applies:

$$t = \frac{\bar{x}_1 - \bar{x}_2}{s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}}$$

Where the pooled standard deviation is:

$$s_p = \sqrt{\frac{(n_1 - 1)s_1^2 + (n_2 - 1)s_2^2}{n_1 + n_2 - 2}}$$

Degrees of freedom (pooled): df = n₁ + n₂ − 2

Example: Group A (n = 30, x̄ = 78, s = 8) vs Group B (n = 35, x̄ = 72, s = 10).

Using Welch’s formula: t ≈ 2.68, df ≈ 59.7. At α = 0.05 (two-tailed), p ≈ 0.009 – statistically significant.

Paired (Dependent) T-Test

Compares two related measurements – typically before-and-after observations on the same subjects.

Formula:

$$t = \frac{\bar{d}}{s_d / \sqrt{n}}$$

Where:

d̄ – mean of the differences between paired values
s_d – standard deviation of the differences
n – number of pairs

Degrees of freedom: df = n − 1

Example: 15 patients have blood pressure measured before and after treatment. The mean difference is d̄ = −6.4 mmHg with s_d = 8.2 mmHg.

t = −6.4 / (8.2 / √15) = −6.4 / 2.118 = −3.022

With df = 14, the two-tailed p-value is approximately 0.009 – significant at α = 0.05.

How to Interpret T-Score Results

The T-Value Itself

The t-value is a ratio of signal to noise:

|t| close to 0 – the observed difference is small relative to the variability in the data
|t| large – the observed difference is large relative to the variability, suggesting a real effect

P-Value

The p-value is the probability of observing a t-score as extreme as (or more extreme than) the calculated value, assuming the null hypothesis is true.

P-value range	Interpretation
p < 0.001	Very strong evidence against H₀
0.001 ≤ p < 0.01	Strong evidence against H₀
0.01 ≤ p < 0.05	Moderate evidence against H₀
p ≥ 0.05	Insufficient evidence against H₀

Confidence Intervals

A 95% confidence interval for the mean difference complements the t-test:

$$CI = \bar{x} - \mu \pm t_{\alpha/2, df} \times \frac{s}{\sqrt{n}}$$

If the interval contains 0, the result is not significant at that confidence level.

T-Score vs Z-Score

Feature	T-Score	Z-Score
Population SD	Unknown, estimated from sample	Known
Distribution	t-distribution (heavier tails)	Standard normal distribution
Sample size	Any, especially n < 30	Large samples (n ≥ 30)
Degrees of freedom	df = n − 1 (affects shape)	Not applicable
Convergence	Approaches z as n → ∞	Fixed distribution

With samples of 30 or more, the t-distribution closely approximates the normal distribution. For smaller samples, the heavier tails of the t-distribution produce wider confidence intervals and higher p-values – a built-in correction for the added uncertainty.

Assumptions Behind a Valid T-Test

Violating these assumptions can produce misleading results:

Normality – the underlying population should be approximately normally distributed. Test with a Shapiro-Wilk test or Q-Q plot. The t-test is reasonably robust to moderate non-normality with n ≥ 30.
Independence – observations must not influence each other. Random sampling or random assignment usually satisfies this.
Scale of measurement – data must be continuous (interval or ratio scale), not categorical or ordinal.
Equal variances (two-sample pooled only) – use Levene’s test to check. If variances differ significantly, Welch’s t-test is the safer choice.

When Should You Use a T-Test?

A t-test is appropriate when you need to:

Compare a sample mean to a known benchmark (one-sample)
Evaluate the difference between two independent groups (two-sample)
Measure change within the same subjects over time or conditions (paired)

A t-test is not appropriate when you have:

More than two groups → use ANOVA
Repeated measures across more than two conditions → use repeated-measures ANOVA
Categorical outcome data → use chi-square test
Three or more predictors → use multiple regression

Common Mistakes to Avoid

Multiple comparisons without correction. Running several t-tests on the same dataset inflates the Type I error rate. Use ANOVA or apply Bonferroni correction.
Ignoring paired structure. Analyzing paired data with an independent two-sample test discards information and reduces statistical power.
One-tailed tests after peeking at data. Choosing a one-tailed test because the difference already appears in one direction doubles the false-positive risk. Pre-register your hypothesis.
Confusing statistical significance with practical significance. A large sample can make a trivially small difference “significant.” Report effect sizes (Cohen’s d) alongside p-values.

Cohen’s d for a one-sample test:

$$d = \frac{\bar{x} - \mu}{s}$$

Cohen’s d	Effect size
0.2	Small
0.5	Medium
0.8	Large

This calculator provides educational results. For clinical trials, regulatory submissions, or published research, verify all calculations with dedicated statistical software such as R, SPSS, or SAS.

Frequently Asked Questions

What is the difference between a t-score and a z-score?

A z-score uses the known population standard deviation, while a t-score uses the sample standard deviation as an estimate. With samples larger than 30, the t-distribution approximates the normal distribution, making both scores nearly identical.

What does a negative t-score mean?

A negative t-score simply means the sample mean is below the population mean (or group 1 mean is below group 2). The sign reflects direction; statistical significance depends on the absolute value and the corresponding p-value.

When should I use a one-tailed vs two-tailed test?

Use a one-tailed test only when you have a directional hypothesis (e.g., the mean increases). Use a two-tailed test when testing for any difference, regardless of direction. Two-tailed tests are standard in most research.

What p-value is considered statistically significant?

A p-value below 0.05 is the conventional threshold for statistical significance, meaning there is less than a 5% probability the observed difference occurred by chance. Some fields use stricter thresholds of 0.01 or 0.001.

Can I use a t-test with non-normal data?

T-tests assume normally distributed data, but they are robust to moderate deviations with sample sizes above 30. For severely non-normal or ordinal data, consider non-parametric alternatives like the Mann-Whitney U test or Wilcoxon signed-rank test.

How many samples do I need for a reliable t-test?

There is no fixed minimum, but samples under 10 require very large effect sizes to achieve significance. A power analysis is the proper way to determine sample size – most studies aim for at least 20–30 observations per group.

T Score Calculator

Group 1

Group 2

What Is a T-Score?

Types of T-Tests and Their Formulas

One-Sample T-Test

Two-Sample (Independent) T-Test

Paired (Dependent) T-Test

How to Interpret T-Score Results

The T-Value Itself

P-Value

Confidence Intervals

T-Score vs Z-Score

Assumptions Behind a Valid T-Test

When Should You Use a T-Test?

Common Mistakes to Avoid

Frequently Asked Questions

Group 1

Group 2

What Is a T-Score?

Types of T-Tests and Their Formulas

One-Sample T-Test

Two-Sample (Independent) T-Test

Paired (Dependent) T-Test

How to Interpret T-Score Results

The T-Value Itself

P-Value

Confidence Intervals

T-Score vs Z-Score

Assumptions Behind a Valid T-Test

When Should You Use a T-Test?

Common Mistakes to Avoid

Frequently Asked Questions

See also