7.5 Smooth Chi-Square Curves
- Key Idea 1: We know that the way to obtain good accuracy in our probability estimates is to have a large number of trials in the five- or six-step method...
Discussion As we conduct more and more trials, it is a very important fact that the relative frequency polygon of chi-square values will become closer and closer to a particular smooth curve.
Example
- Key Idea 2: This smooth curve is called a chi-square density...
Discussion
- To show this, we have added to Table 7.11 a further 1000 trials and a further 10,000 trials to produce Table 7.12.
- In Figure 7.3 the 100-trial, 1000-trial, and 10,000-trial relative frequency polygons and the smooth curve that results as the trials increase to an arbitrary large number are all displayed.
- Note in particular how the 10,000- trial relative frequency polygon is much closer to the in-the-limit smooth curve than is the 100-trial relative frequency polygon.
Simulation
- Key Idea 3: Bypassing the six-step method....
Discussion density to the right of 9.0.
Suppose a chi-square hypothesis-testing problem concerning 90 rolls of a possibly loaded six-sided die produces a chi-square value of 9.0. Then as we understand from the previous sections, we would like to evaluate P( c2>=9.0)in order to make a decision.
Now we have just learned that P(X2>=9.0) is a certain area (recall Figure 7.2) of the relative frequency histogram. Moreover, we have also just learned (recall the 10,000-trial case) that this histogram and its associated relative frequency polygon are in fact well approximated by the chi-square density of Figure 7.3. Thus we have another way to approximately evaluate P(c 2 >=9.0) one that avoids simulation and the six-step model.
Namely, we simply seek the area to the right of 9.0 under the chi-square density, 0.0035+0.0237+..+..+ 0.0001 from the last column of Table 7.12.
Figure 7.4 shows this estimated probability as the area under the chi-square
- Key Idea 4: There are tables that give the theoretical probability for exceeding a given value of chi-square...
Discussion You will see in the next section that there are tables that give the theoretical probability for exceeding a given value of chi-square corresponding to rolls of a die with a given number of sides and, more generally, for other problems requiring chi-square probabilities.
- Key Idea 5: How good is this approximation of probability of chi-square provided by the chi-square density?...
Discussion P(c2 >= a).
- Table 7.13 shows empirical chi-square frequencies for 100 trials of rolling a six-sided die, for 30, 60, and 90 rolls per trial. Figure 7.5 shows the three estimated smooth curves that result. These curves have been informally estimated by a method similar to the visual method of Chapter 3 for doing linear regression.
- The key point to note is that all three curves are close to each other and close to the theoretical chi-square density, which is also drawn in Figure 7.5 (the solid line curve).
- Hence, it is intuitively clear that whether we have 30, 60, or 90 rolls, we can use areas under the same chi-square density to solve for probabilities of the form
- The value of chi-square is not affected much by the total number of rolls of the die involved in calculating the statistic.
- Key Idea 6: Why statisticians usually prefer to use chi-square rather than D statistic...
Discussion The D statistic does not have this crucial property of not being affected by the number of rolls of the die, nor does it have other important properties of chi-square.
- Key Idea 7: Convention for deciding whether one is allowed to use areas under the chi-square density instead of using the six-step method...
Discussion
- We use the fact that areas under a chi-square density (tabulated in Table 7.20 and Appendix C) can be accurately used to decide whether an observed chi-square value is strong evidence
- That a given model is unlikely to have produced the data, provided enough data (die throws in this example) are available that every expected frequency is 5 or greater (statisticians’ usual rule of thumb).
- For example, this criterion of a frequency of 5 or greater holds for Table 7.5.
- Key Idea 8: Is it the same theoretical curve regardless of the number of sides of the die of the null hypothesis model?...
Discussion Because the distribution of the actual chi-square statistics in simulations are known to be close to the theoretical curve called a density (see Chapter 8), we can look at chi-square frequency tables for fair dice of different numbers of sides in an effort to find out whether we get the same chi-square density in the limit as we got in Figure 7.4 or whether we need a whole family of curves (or their tabulated areas).
- Key Idea 9: We therefore now explore the effect on the chi-square values of the number of sides of the fair die being used...
Discussion
We carry out 50 simulated trials of rolling a fair die. Each trial consists of 60 rolls of a die. For the first experiment we use a four-sided die (Table 7.14), for the second experiment we use a six-sided die (Table 7.15), and for the third experiment we use a 10-sided die (Table 7.16).
- Notice first that as the number of sides on the die increases, so does the mean chi-square value.
- Notice also that the standard deviation of chi-square increases as well.
- That is, as the number of sides of the die increases, the chi-square values also tend to become more spread out.
Simulation Experiment
- Key Idea 10: The idea of Degrees of Freedom...
Discussion df ).
- Tables 7.14 through 7.16 suggest that the number of sides of the fair die (that is, the number of possible outcomes of the die) has an effect on the size of the chi-square statistics produced.
- Hence if we are to be able to use areas under a smooth curve analogous to the curve in Figure 7.4, we will need a whole family of curves, one for each number of sides, or outcomes, of the die.
- This number of possible outcomes has to do with the idea of degrees of freedom (sometimes abbreviated
- We can look at the idea of degrees of freedom in terms of tables of outcomes for rolling a die.
- We therefore say that Table 7.17 has three degrees of freedom—one less than the number of sides of the die.
- We say that Table 7.18 has five degrees of freedom
- Since the number of degrees of freedom (the number of sides on the die minus 1) affects the size of the chi-square statistics produced by tables of die-rolling outcomes, it is important to know how many degrees of freedom are associated with a given chi-square.
- Therefore, we indicate that a chi-square has, for example, three degrees of freedom by writing
:
Example
- Summary...