Best Subset (Multiple Linear Regression...)
This option of regression can be used to perform best subset regression, which finds the best subsets of the predictors to fit multiple linear regression. This is useful when the number of predictors is large. This can be particularly useful when the number of predictors is more than the number of observations.
To learn more about Best Subset option of regression click on ....
Independent Variables
These are the variables selected as independent variable in the main screen of Regression.
To be included in the best
Put one or more variables in this box from Independent variables box by clicking on the corresponding selection button.These are the variables which are to be included in the best subset. Check the checkbox next to the variables to force them to be included in all best subsets.
If the intercept term is omitted from the model in the main screen of regression then all models reported by the best subset procedure will omit the intercept term.
Size of best subset
Select this number to specify the size of the subsets in the model.
Number of best subsets
For each fixed size of subset StatCalc can provide more than one models arranged in order of optimality. Select the number by clicking on the spinner.
Selection Procedure
StatCalc supports the following procedures for best subset selection..
Backward elimination (default)
Forward selection
Sequential replacement
Stepwise selection
Exhaustive search
FIN
For stepwise selection enter the value of F-statistic for entry of a variable in the model. If the corresponding F-statistic for the variable is more than this value then only the variable will be entered into the model. StatCalc uses a default value of 3.84 for this parameter.
FOUT
For stepwise selection enter the value of F-statistic for exit of a variable from the model. If the corresponding F-statistic for the variable is less than this value then only the variable will be removed from the model. StatCalc uses a default value of 2.71 for this parameter. The value of this parameter should be less than the value of the FIN parameter.
Following output is produced when the options shown above in Best Subset dialogbox are selected. This output is produced along with the Default output for the same input data. Given the number of predictors in the model , this procedure provides best subsets of predictors. In our example we have requested for models with 8 or less predictors. The constant term is included in all these models since it was included in the main regression dialogbox. The variables 'Age' and 'Ed' also have been included in all the models. This we have specified by ticking the checkboxes on the left of the variables 'Age' and 'Ed' in the Best Subsets screen. We have used stepwise selection procedure for the best subset selection. For each fixed model size we have requested for the best and the next best models. The C_{p} column in the table provides the Mallows C_{p} statistic for each model. Typically one prefers a model with low Mallows C_{p}. According to this criterion, the model with 6 predictors and Mallows C_{p} as 4.646 should be preferred among the models reported. The "Prob" column of the table reports the probability that a particular subset is sufficient in explaining the response. The probability greater than 0.05 says that the model is acceptable at 5% level. In our example, both the models of size 6 and one model of size 5 are acceptable at 5% level.