Grouped data

Grouped data is a statistical term used in data analysis. Raw data can be organized by grouping together similar measurements in a table. This frequency table is also called grouped data.^[1]

Example

For example, someone gave a group of students a simple math question, and timed how long it took them to answer it. The numbers are below:

Table 1: *Time taken (in seconds) to answer a simple math question*
20	25	24	33	13
26	8	19	31	11
16	21	17	11	34
14	15	21	18	17

The smallest amount of time was 8 seconds, and the largest was 34 seconds. One method we could use to analyze the needed time is to group close numbers together. In order to keep the analysis fair, we'll make each group be the same number of seconds. We can then count how many students fell in each group. For example, if we organized scores into 5 second ranges:

Table 2: *Frequency distribution of the time taken (in seconds) to answer a simple math question*
Time taken	Frequency
5 to 9 seconds	1 student
10 to 14 seconds	4 students
15 to 19 seconds	6 students
20 to 24 seconds	4 students
25 to 29 seconds	2 students
30 to 34 seconds	3 students

Another way to group data is to organize the scores data into groups based on their performance. Suppose there are three types of students:

Smart (5 to 14 seconds)
Normal (15 to 24 seconds)
Below average (25 or more seconds)

then the grouped data looks like the following:

Table 3: *Frequency distribution of the three types of students*
	Frequency
Smart	5
Normal	10
Below average	5

Mean of grouped data

An estimate, ${\bar {x$ , of the mean can be calculated from grouped data.

{\bar {x}={\frac {\sum {f*\,x}{\sum {f}.

x refers to the mid-point of the class intervals

f is the class frequency.

Note that this estimated mean may be different from the sample mean of the ungrouped data. The mean of the grouped data in the above example can be calculated as follows:

Class Intervals	Frequency ( f )	Midpoint ( x )	fx*
5 to 9 seconds	1	7.5	7.5
10 to 14 seconds	4	12.5	50
15 to 19 seconds	6	17.5	105
20 to 24 seconds	4	22.5	90
25 to 29 seconds	2	27.5	55
30 to 34 seconds	3	32.5	97.5
TOTAL	20		405

Therefore, the mean of the grouped data is

{\bar {x}={\frac {\sum {f*\,x}{\sum {f}={\frac {405}{20}=20.25

Related pages

Data binning
Level of measurement
Frequency distribution
Discretization of continuous features

Notes

↑ Newbold et al., 2009, pages 14 to 17

References

Newbold, P., W. Carlson and B. Thorne (2009) Statistics for Business and Economics, Seventh edition, Pearson Education. ISBN 9780135072486.

[1] Newbold et al., 2009, pages 14 to 17

[1]