8.2 The Poisson Distribution: Events at Random
- Key Idea 1: The Poisson distribution counts the number of occurrences (perhaps "successes") of something during a preset amount of time, or in a preset interval of space...
Discussion
- Contrast the Poisson distribution ("successes" in a given interval of time or space) with the binomial distribution ("successes" in a given number of trials).
- The number of cars that pass under an under pass in an hour can be modeled by the Poisson distributions.
- The number of radioactive particles being emitted from some substance and registering on a Geiger counter can be modeled by the Poisson distributions.
Key Idea 2: The number of occurrences will have a Poisson distribution if the following conditions are met...
Discussion
- The probability of an occurrence within a small time or space interval of fixed length is the same, regardless of when the time interval begins.Thus, the rate of occurrence is constant over time.
- Whether there is an occurrence within a given interval is independent of whether there is an occurrence in another nonoverlapping interval (even if intervals are close to each other).
- If we take a very small time interval, it is much more unlikely that there will be two or more occurrences in the interval than that there will be one occurrence (the intuitive idea is that simultaneous occurrences are impossible).
Key Idea 3: Observed probabilities in the Key Problem ...
Discussion
- The above conditions seem to be met in the Key Problem, thus we suspect the data may be well modeled by the Poisson distribution.
- Here is table for Prussian Soldier data.
Number of soldiers kicked to death Number of units Experimental probability 0 144 0.514 1 91 0.325 2 32 0.114 3 11 0.039 4 2 0.007 5 or more 0 0 Total 280 - 14 Military units were observed for 20 years, and the number of deaths from horse-kick in each unit in each year were counted (280 total counts). Thus, we can get a pretty good empirical idea of what the probabilities are for 0 deaths per unit, 1 death per unit, etc. The average was 0.7 deaths per unit.
Key Idea 4: Theoretical probabilities in the Key Problem ...
Discussion
- The theoretical Poisson probabilities depend on the theoretical mean number of counts in the time interval (in this case, 0.7 deaths per unit per year). The mean number of counts per time unit is a parameter of the distribution, just as for the binomial the number of trials and the probability of a 1 on a trial are both parameters.
- Theoretical Poisson probabilities can be estimated by
Box Sampler Simulation, or calculated by formula (see Section 14.2).
- Table below gives theoretical probabilities for a few possible values of the mean, including the value of 0.7 that resulted from the soldier data.
Theoretical Poisson Probabilities
Mean Number of occurrences 0.1 0.7 1.0 5.0 0 0.9048 0.4966 0.3679 0.0067 1 0.0905 0.3476 0.3679 0.0337 2 0.0045 0.1217 0.1839 0.0842 3 0.0002 0.0284 0.0613 0.1404 4 0.0000 0.0284 0.0153 0.1755 5 0.0000 0.0050 0.0031 0.1755 6 0.0000 0.0007 0.0005 0.1462 7 0.0000 0.0001 0.0001 0.1044 8 0.0000 0.0000 0.0000 0.0653 9 0.0000 0.0000 0.0000 0.0363 10 0.0000 0.0000 0.0000 0.0181 11 0.0000 0.0000 0.0000 0.0082 12 0.0000 0.0000 0.0000 0.0034 13 0.0000 0.0000 0.0000 0.0013 14 0.0000 0.0000 0.0000 0.0005 15 0.0000 0.0000 0.0000 0.0002 16 0.0000 0.0000 0.0000 0.0000 ... ... ... ... ...
- Compare the theoretical probabilities for a mean of 0.7 to the observed proportions in table below. They are not exactly the same, but they are fairly close.
Number of soldiers kicked to death Number of units Experimental probability 0 144 0.514 1 91 0.325 2 32 0.114 3 11 0.039 4 2 0.007 5 or more 0 0 Total 280
Key Idea 5: Comparing observed and theoretical Poisson probabilities in baseball...
Discussion
- Consider the number of runs the Oakland Athletics baseball team scored in each of the 144 games they played in 1995.Table 8.4 and Figure 8.3 let you compare the observed (experimental) distribution of runs per game with the theoretical Poisson distribution of runs per game.
- The last two columns in Table 8.4 show some fairly large discrepancies. For example, the Athletics were shut out (had 0 runs) eight times, or about 5.6% of the time. The Poisson distribution predicts shot out only 0.6% of the time. They scored four runs 12.5% of the time and five runs 7.6% of the time, while the Poisson distribution predicts over 17% for four runs and also for five runs. What other discrepancies are there? You should try to explain how this situation might violate the three conditions for Poisson distribution, and explain why Poisson therefore does not model these data well.