10.4 Inference
- Key Idea 1: A randomization test of significance...
Discussion
- Consider again the chemonucleolysis treatment for slipped discs:
Treatment Control
Total Improve 30 28 58 No Improve 20 28 48 Total 50 56 106 - Is the difference between treatment and placebo large enough to be statistically significant?
- By "statistically significant" we typically mean that the observed difference is too extreme to be attributed to chance if there is no real difference between treatment and placebo.
- To answer the question we select, as we often do, the six-step approach.
- Key Idea 2: A six-step randomization test of significance...
Discussion
- Our hypothesis-testing strategy is to assume no gain for the treatment over the placebo, which means that under this null hypothesis the outcomes (58 "improve" and 48 "don't improve") are assumed to be assigned randomly (since treatment/placebo doesn't matter). Then, draw simulated "treatment" outcomes and find out how we get as many as 30 (the observed number of "improves" in the treatment group)
- Here is the procedure.
Key Idea 3: Repeating the analysis with more data...
Discussion
Two more randomized control studies showed positive results for the slipped-disc treatment, so we will redo the analysis, but using the three combined studies.
Received Treatment Control Total Improved 80 56 136 Not Improved 34 64 98 Total 114 120 234
Step
1 Choice of a Model: First we must decide on a model to represent chance setting of problem. The actual experiment started with the 114 + 120 = 234 patients, and 114 were randomly chosen to receive the treatment. Based on the observed data set, there are 80(treatment) + 56(placebo) = 136 are improved and 34(treatment) + 64(placebo) = 98 are not improved. So we can model this by choosing 'Box' model with the following composition.
1's(improved): 136
0's(not improved): 982 Definition of a Trial: A trial consists of drawing 114 tickets from the box. The sampling method should be 'Without Replacement'.
3 Definition of a Successful Trial: A success occurs whenever eighty or more of the 234 patients are 'improved' that is a success is obtaining 80 or more 1's. 4 Repetition of Trials: Steps 2 and 3 should be repeated many times. Larger the number of trials, the better (more reliable) is your estimate of desired expected value. For many problems, 100 trials provide a reasonably reliable estimate. 5 Finding the probability of interest: The estimated (experimental ) probability of interest is the chance of 80 improvements or more.
P( 80 or more improvements) = # successful trials / Total # trials.
6 Decision: In contrast to the first study, this combined study had a statistically significant number of improvements among the treatment group. The estimate probability P is much less than 0.05. That is, we are confident that the 80 is not due to chance. It does appear that the treatment is more effective than the placebo. See
Box Sampler illustration.
- Conclusion: The first study did not detect a statistically significant effect for the treatment, while the combined studies did. Although it might seem so, there is no contradiction. As already stressed, for the first study, the conclusion is not "the treatment is ineffective" (accepting the null hypothesis is true) but rather "there is not sufficient evidence to conclude that the treatment is effective" (strong evidence is lacking that the null hypothesis is false). Although gathering the evidence from two additional studies, there was enough evidence to reject the null hypothesis.
Key Idea 4: Summarizing the randomization test...
Discussion
This randomization test is validated by the fact that the actual experiment used randomization in the assignment of patients to treatment and control. To summarize the procedure we used above:
The number of tickets in the box is the total number of subjects in the experiment.
The tickets in the box are 1's (observed successes) and 0's (observed failures).
The number of draws (randomly and without replacement) equals the actual number in the treatment group.
The statistic of interest is the number of 1's in the drawn group.
The probability of interest is estimated by the proportion of simulated samples that have as many 1's as were observed in the treatment group.
If this proportion is very small (less than 0.05 - see Chapter 6), then we conclude that the treatment is more effective than the control.
Key Idea 5: The Accuracy of a Simple Random Sample...
Discussion
A Newsweek poll of 933 potential voters taken in August 1996 showed that Clinton was favored by 44%, Dole by 42%, a "statistical tie". How can 933 people reflect anything meaningful about a country with nearly 200 million eligible voters?
Just as with assessing the accuracy of measurements, we can measure sampling error by the standard error.
Here we will use the five-step procedure to answer this question: How far is the percentage support for Clinton in the sample likely to be from his percentage support in the population?
Analysis: A review of the simulation results histogram show that the difference between Clinton's 44% and Dole's 42% is not large, compared to the variation in the sample results. We lack strong statistical evidence that Clinton is ahead of Dole in the population. We can draw the same conclusion from our knowledge about how far one sample proportion might range from another, in terms of the standard error - see chapter 8.
Key Idea 6: Shortcut for the Standard Error of a Sample Proportion...
Key Idea 7: Sampling Several Populations...
Discussion
- You can see from the formula for the standard error that it depends only on the proportion of 1's and the sample size. The standard error does not depend on the population size.
- Review an
illustration of this principle.
Summary....