6.2 Using Samples to Decide on Hypotheses About Populations
Key Idea 2: The next step is to examine data to assess this proposition...
Discussion
- The students of a sociology class decide to conduct a survey to statistically assess this claim.
- They ask 25 people chosen at random from the local telephone directory (i.e. a random sample of size 25) whether they favor raising the driving age or not. 18 people favor raising the driving age, 7 oppose it.
Key Idea 3: We start our analysis by denying that the community is in favor of raising the driving age, but instead is evenly split. This is our null hypothesis...
Discussion
- This seems convoluted – why do we make this assumption, given that the sample seems to confirm the editorial? Yes, the sample is consistent with the editorial, but we know that small samples can be quite variable just by the luck of the draw. It is a good idea to find out if the sample is also consistent with the opposing point of view, and just happened to turn out the way it did by chance (in the same way that 10 tosses of a coin might produce 7 heads on one occasion, and 7 tails on another.)
- In other words, we want to protect ourselves against mistakenly confirming the editorial, when in fact the community does not favor raising the driving age. The greatest danger of making this mistake occurs when the community is just about evenly split – this gives the best chance of producing a sample that (just by chance) contains 18 or more people favoring raising the driving age.
- If we find that an evenly split community has a very small chance of producing 18 yeses in a sample of 25, then we can be pretty sure that we are not making a mistake in saying that the sample confirms the editorial.
- In this case, we would reject the null hypothesis that the community is evenly split.
Key Idea 4: Let’s now test the null hypothesis—that is, actually test whether the evenly split community could produce a sample with 18 or more yeses...
Discussion
- Using the 5-step procedure.
1 A box with 2500 tickets, 1250 marked “yes” (favor raising the age) and 1250 “no” (oppose raising the age)
2 Draw 25 tickets without replacement, count the number of yeses
3 The trial is a success if we get 18 or more yeses 4 Do, say, 1000 trials 5 Running the Box Sampler illustration, we estimate that P(18 or more in favor) = 27/1000 = 0.027 This is one answer given by the Box Sampler. To try out this example yourself, click the
Box Sampler Illustration
- Terminology: This probability resulting from a hypothesis test is called the p-value. As you can see, it is the probability that the null hypothesis could give rise to a sample as extreme as the observed sample.
- Interpretation: Since the null hypothesis (an evenly split community) is so unlikely to produce a sample with 18 yeses, we conclude that such a sample is inconsistent with the null hypotheses – we reject the null hypothesis and accept that the newspaper editorial is correct.
Key Idea 5: What if the population is huge, or its size is unknown? We use a small simulation model and sample with replacement...
Discussion
- Recall from Section 5.7 that if the population size is large compared to the sample size, we may instead estimate a probability using sampling with replacement, with the knowledge that our estimate is virtually as good as if we sampled without replacement.
- Using the 5-step procedure.
1 A box with two tickets, one marked “yes” (favor raising the age) and one marked “no” (oppose raising the age). You could also use a coin with heads = yes, tails = no.
2 Draw a ticket 25 times with replacement, count the number of yeses. (Or, if you use a coin, flip a coin 25 times and count the heads.)
3 The trial is a success if we get 18 or more yeses 4 Do, say, 1000 trials 5 Running the Box Sampler illustration, we estimate that P(18 or more in favor) = 27/1000 = 0.027 This is one answer given by the Box Sampler. To try out this example yourself, click the
Box Sampler Illustration
Key Idea 6: Hypothesis testing protects us against being fooled by a chance sample result, but not 100%...
Discussion In the above simulations, we concluded that the newspaper editorial was right. Still, even with an evenly split community, we did get samples about 3% of the time that confirmed the newspaper editorial. We did not entirely eliminate the possibility that the data could deceive us by chance.
Key Idea 7: The Key Problem Revisited – how small must the p-value be to reject the null hypothesis?...
Discussion
- The fact that there were 8 cures out of 10 treatments does suggest the possibility that the experimental drug is better. The previous cure rate of 0.6 leads us to expect that on average there should be 6 cures out of 10 treatments. But it also seems quite likely that a drug with a true cure rate of 0.6 might easily produce 8 out of 10 on occasion.
- Let’s test just how likely it is that a true cure rate of 0.6 could produce 8/10 cures. (Note: We are thus testing the null hypothesis that the experimental drug has not improved from the standard 60% cure rate)
- Using the 5-step procedure.
1 We can use a box model of 10 tickets marked with the digits 0-9, where 1, 2, 3, 4, 5, 6 = Cure and 7, 8, 9, 0 = no cure.(Or we could select random digits from a random number table, and use the same coding scheme.)
2 Draw a ticket 10 times with replacement (or, if you are using a random number table, select 10 consecutive digits) and count the number of “cures.”
3 A trial is successful if >= 8 “cures” occur 4 Do, say, 1000 trials (or click here to see the results of 25 trials using random digits) 5 From running the Box Sampler illustration, we estimate that P(getting >= 8 cures in 10 attempts) = 177/1000 = 0.177 See the
Box Sampler Illustration
- Interpretation: Since the old cure rate (0.6) has a 0.177 chance of producing a sample as good as ours, we do not have enough evidence to rule out the possibility that the old cure rate still prevails, and we got a good sample just by chance. Rather, we withhold judgment.
- What criterion should be used to decide what is too rare to be considered consistent with the null hypothesis? Statisticians have made their minds up long ago on what criteria to use. Almost universally, 0.05 is used as the value that the probability must be less than before we are willing to declare the observed data rare enough to discard the null hypothesis (here that the new drug has a cure rate of 0.6). Occasionally 0.1 is used, and occasionally 0.01 is used.
Key Idea 8: The level of significance....
Discussion This criterion described above is called the level of significance. P-values below the level of significance (usually set at 0.05) lead us to reject the null hypothesis, p-values above that level tell us that the evidence from the sample is not strong enough to reject the null hypothesis