Grouped data is a statistical term used in data analysis. Raw data can be organized by grouping together similar measurements in a table. This frequency table is also called grouped data.[1]
Example
For example, someone gave a group of students a simple math question, and timed how long it took them to answer it. The numbers are below:
20 |
25 |
24 |
33 |
13
|
26 |
8 |
19 |
31 |
11
|
16 |
21 |
17 |
11 |
34
|
14 |
15 |
21 |
18 |
17
|
Table 1: Time taken (in seconds) to answer a simple math question
The smallest amount of time was 8 seconds, and the largest was 34 seconds. One method we could use to analyze the needed time is to group close numbers together. In order to keep the analysis fair, we'll make each group be the same number of seconds. We can then count how many students fell in each group. For example, if we organized scores into 5 second ranges:
Time taken |
Frequency
|
5 to 9 seconds |
1 student
|
10 to 14 seconds |
4 students
|
15 to 19 seconds |
6 students
|
20 to 24 seconds |
4 students
|
25 to 29 seconds |
2 students
|
30 to 34 seconds |
3 students
|
Table 2: Frequency distribution of the time taken (in seconds) to answer a simple math question
Another way to group data is to organize the scores data into groups based on their performance. Suppose there are three types of students:
- Smart (5 to 14 seconds)
- Normal (15 to 24 seconds)
- Below average (25 or more seconds)
then the grouped data looks like the following:
|
Frequency
|
Smart |
5
|
Normal |
10
|
Below average |
5
|
Table 3: Frequency distribution of the three types of students
Mean of grouped data
An estimate, , of the mean can be calculated from grouped data.
- x refers to the mid-point of the class intervals
- f is the class frequency.
Note that this estimated mean may be different from the sample mean of the ungrouped data. The mean of the grouped data in the above example can be calculated as follows:
Class Intervals |
Frequency ( f ) |
Midpoint ( x ) |
f*x
|
5 to 9 seconds |
1 |
7.5 |
7.5
|
10 to 14 seconds |
4 |
12.5 |
50
|
15 to 19 seconds |
6 |
17.5 |
105
|
20 to 24 seconds |
4 |
22.5 |
90
|
25 to 29 seconds |
2 |
27.5 |
55
|
30 to 34 seconds |
3 |
32.5 |
97.5
|
TOTAL |
20 |
|
405
|
Therefore, the mean of the grouped data is
Related pages
Notes
- ↑ Newbold et al., 2009, pages 14 to 17
References
- Newbold, P., W. Carlson and B. Thorne (2009) Statistics for Business and Economics, Seventh edition, Pearson Education. ISBN 9780135072486.
|
---|
|
|
---|
Continuous data | |
---|
Count data | |
---|
Summary tables | |
---|
Dependence | |
---|
Graphics |
- Bar chart
- Biplot
- Box plot
- Control chart
- Correlogram
- Fan chart
- Forest plot
- Histogram
- Pie chart
- Q–Q plot
- Run chart
- Scatter plot
- Stem-and-leaf display
- Radar chart
- Violin plot
|
---|
|
|
|
---|
Study design |
- Population
- Statistic
- Effect size
- Statistical power
- Optimal design
- Sample size determination
- Replication
- Missing data
|
---|
Survey methodology | |
---|
Controlled experiments | |
---|
Adaptive Designs |
- Adaptive clinical trial
- Up-and-Down Designs
- Stochastic approximation
|
---|
Observational Studies |
- Cross-sectional study
- Cohort study
- Natural experiment
- Quasi-experiment
|
---|
|
|
|
---|
Statistical theory | |
---|
Frequentist inference | Point estimation |
- Estimating equations
- Unbiased estimators
- Mean-unbiased minimum-variance
- Rao–Blackwellization
- Lehmann–Scheffé theorem
- Median unbiased
- Plug-in
|
---|
Interval estimation | |
---|
Testing hypotheses |
- 1- & 2-tails
- Power
- Uniformly most powerful test
- Permutation test
- Multiple comparisons
|
---|
Parametric tests |
- Likelihood-ratio
- Score/Lagrange multiplier
- Wald
|
---|
|
---|
Specific tests | | Goodness of fit | |
---|
Rank statistics |
- Sign
- Signed rank (Wilcoxon)
- Rank sum (Mann–Whitney)
- Nonparametric anova
- 1-way (Kruskal–Wallis)
- 2-way (Friedman)
- Ordered alternative (Jonckheere–Terpstra)
|
---|
|
---|
Bayesian inference | |
---|
|
|
|
---|
Correlation | |
---|
Regression analysis |
- Errors and residuals
- Regression validation
- Mixed effects models
- Simultaneous equations models
- Multivariate adaptive regression splines (MARS)
|
---|
Linear regression | |
---|
Non-standard predictors |
- Nonlinear regression
- Nonparametric
- Semiparametric
- Isotonic
- Robust
- Heteroscedasticity
- Homoscedasticity
|
---|
Generalized linear model | |
---|
Partition of variance |
- Analysis of variance (ANOVA, anova)
- Analysis of covariance
- Multivariate ANOVA
- Degrees of freedom
|
---|
|
|
Categorical / Multivariate / Time-series / Survival analysis |
---|
Categorical |
- Cohen's kappa
- Contingency table
- Graphical model
- Log-linear model
- McNemar's test
- Cochran-Mantel-Haenszel statistics
|
---|
Multivariate |
- Regression
- Manova
- Principal components
- Canonical correlation
- Discriminant analysis
- Cluster analysis
- Classification
- Structural equation model
- Multivariate distributions
|
---|
Time-series | General |
- Decomposition
- Trend
- Stationarity
- Seasonal adjustment
- Exponential smoothing
- Cointegration
- Structural break
- Granger causality
|
---|
Specific tests |
- Dickey–Fuller
- Johansen
- Q-statistic (Ljung–Box)
- Durbin–Watson
- Breusch–Godfrey
|
---|
Time domain |
- Autocorrelation (ACF)
- Cross-correlation (XCF)
- ARMA model
- ARIMA model (Box–Jenkins)
- Autoregressive conditional heteroskedasticity (ARCH)
- Vector autoregression (VAR)
|
---|
Frequency domain | |
---|
|
---|
Survival | Survival function |
- Kaplan–Meier estimator (product limit)
- Proportional hazards models
- Accelerated failure time (AFT) model
- First hitting time
|
---|
Hazard function | |
---|
Test | |
---|
|
---|
|
|
Applications |
---|
Biostatistics | |
---|
Engineering statistics |
- Chemometrics
- Methods engineering
- Probabilistic design
- Process / quality control
- Reliability
- System identification
|
---|
Social statistics | |
---|
Spatial statistics |
- Cartography
- Environmental statistics
- Geographic information system
- Geostatistics
- Kriging
|
---|
|
|