7.3 The Chi-Square Statistic
- Key Idea 1: Another way of describing such results is to calculate a value called the chi-square statistic...
Discussion The chi-square statistic is written as c2 . The symbol c is the Greek letter chi, pronounced to rhyme with sky. The chi-square statistic is very commonly used in practice. It is very important in the life sciences, for example.
- Key Idea 2: Let’s see how the chi-square statistic is computed...
Discussion Table 7.5, called a chi-square frequency table, shows both the D statistic (the total of the |O-E| column) and the chi-square statistic (the total of the (O - E )2 / E column) for the same data, 60 rolls of a six-sided die. (In practice, a chi-square table does not show the value of the statistic.)
- The value of X2 is given by
![]()
- Therefore,
![]()
- Key Idea 3: There are similarities, but also important differences, in how we calculate D and chi-square...
Discussion
- Instead of taking the absolute |O - E| value (that is, the magnitude of O - E ), we square O - E , obtaining the square of the magnitude of O - E.
- We then divide each value of (O - E)
2 by its corresponding expected value E .Why divide by E?
2 by E helps us make comparisons concerning whether an (O - E)2 value is unusually large when the expected values differ from application to application.
This is a natural question to raise. It can be partially justified in this way: Dividing each (O - E)
For, just as we expect a tall person to weigh more than a short person, we expect an experiment involving more rolls of a die to produce larger values of (O - E)2 than an experiment with fewer rolls. Hence, dividing by E (which is proportional to the number of rolls) is a way to appropriately rescale the values of (O - E)2, just as computing the ratio weight/height seems like a good way to adjust for the influence of height on a person’s weight.
- Key Idea 4: Let's take the simulation that we did with D, and do it with chi-square...
Discussion
- We repeat many trials (30 here) of the experiment of rolling a die 60 times, computing c 2 for each set of rolls (that is, for each trial).
- We then prepare a frequency table of c2 values, as shown in Table 7.6, in much the same way as we produced a frequency table of values when estimating P(D>= 24) in the previous section.
- We find a stem-and-leaf plot handy for this task.
- From Table 7.6 we see that a c2 of 12.4 or larger was obtained in 1 of the 30 trials. Therefore, rounded to two significant figures,
- Since 0.03 is less than 0.05, we conclude that the die whose data are in Table 7.5 is not fair.
Simulation Experiment
- Summary...
Discussion Note that we have really just used our six-step decision-making method to evaluate whether c2 >=12.4 is unusual, as we did earlier for D >= 24.