Preface to the Second Edition xiii
Preface to the First Edition xvii
Part I The Multiple Linear Regression Model
1 Multiple Linear Regression 3
1.1 Introduction 3
1.2 Concepts and Background Material 4
1.2.1 The Linear Regression Model 4
1.2.2 Estimation Using Least Squares 5
1.2.3 Assumptions 8
1.3 Methodology 9
1.3.1 Interpreting Regression Coefficients 9
1.3.2 Measuring the Strength of the Regression Relationship 11
1.3.3 Hypothesis Tests and Confidence Intervals for _ 12
1.3.4 Fitted Values and Predictions 14
1.3.5 Checking Assumptions Using Residual Plots 15
1.4 Example ' Estimating Home Prices 16
1.5 Summary 19
2 Model Building 23
2.1 Introduction 23
2.2 Concepts and Background Material 24
2.2.1 Using Hypothesis Tests to Compare Models 24
2.2.2 Collinearity 26
2.3 Methodology 29
2.3.1 Model Selection 29
2.3.2 Example'Estimating Home Prices (continued) 31
2.4 Indicator Variables and Modeling Interactions 39
2.4.1 Example'Electronic Voting and the 2004 Presidential Election 41
2.5 Summary 46
Part II Addressing Violations of Assumptions
3 Diagnostics for Unusual Observations 53
3.1 Introduction 53
3.2 Concepts and Background Material 54
3.3 Methodology 56
3.3.1 Residuals and Outliers 56
3.3.2 Leverage Points 57
3.3.3 Influential Points and Cooks Distance 58
3.4 Example ' Estimating Home Prices (continued) 60
3.5 Summary 64
4 Transformations and Linearizable Models 67
4.1 Introduction 67
4.2 Concepts and Background Material: The Log-Log Model 69
4.3 Concepts and Background Material: Semilog Models 69
4.3.1 Logged Response Variable 70
4.3.2 Logged Predictor Variable 70
4.4 Example ' Predicting Movie Grosses After One Week 71
4.5 Summary 78
5 Time Series Data and Autocorrelation 81
5.1 Introduction 81
5.2 Concepts and Background Material 83
5.3 Methodology: Identifying Autocorrelation 85
5.3.1 The Durbin-Watson Statistic 86
5.3.2 The Autocorrelation Function (ACF) 87
5.3.3 Residual Plots and the Runs Test 87
5.4 Methodology: Addressing Autocorrelation 88
5.4.1 Detrending and Deseasonalizing 88
5.4.2 Example ' e-Commerce Retail Sales 89
5.4.3 Lagging and Differencing 95
5.4.4 Example ' Stock Indexes 96
5.4.5 Generalized Least Squares (GLS): The Cochrane- Orcutt Procedure 102
5.4.6 Example ' Time Intervals Between Old Faithful Geyser Eruptions 104
5.5 Summary 107
Part III Categorical Predictors
6 Analysis of Variance 113
6.1 Introduction 113
6.2 Concepts and Background Material 114
6.2.1 One-Way ANOVA 114
6.2.2 Two-Way ANOVA 115
6.3 Methodology 117
6.3.1 Codings for Categorical Predictors 117
6.3.2 Multiple Comparisons 122
6.3.3 Levenes Test and Weighted Least Squares 124
6.3.4 Membership in Multiple Groups 127
6.4 Example ' DVD Sales of Movies 129
6.5 Higher-Way ANOVA 134
6.6 Summary 136
7 Analysis of Covariance 139
7.1 Introduction 139
7.2 Methodology 139
7.2.1 Constant Shift Models 139
7.2.2 Varying Slope Models 141
7.3 Example ' International Grosses of Movies 141
7.4 Summary 145
Part IV Non-Gaussian Regression Models
8 Logistic Regression 149
8.1 Introduction 149
8.2 Concepts and Background Material 151
8.2.1 The Logit Response Function 151
8.2.2 Bernoulli and Binomial Random Variables 152
8.2.3 Prospective and Retrospective Designs 153
8.3 Methodology 156
8.3.1 Maximum Likelihood Estimation 156
8.3.2 Inference, Model Comparison, and Model Selection 157
8.3.3 Goodness-of-Fit 159
8.3.4 Measures of Association and Classification Accuracy 161
8.3.5 Diagnostics 163
8.4 Example ' Smoking and Mortality 163
8.5 Example ' Modeling Bankruptcy 167
8.6 Summary 173
9 Multinomial Regression 177
9.1 Introduction 177
9.2 Concepts and Background Material 178
9.2.1 Nominal Response Variable 178
9.2.2 Ordinal Response Variable 180
9.3 Methodology 182
9.3.1 Estimation 182
9.3.2 Inference, Model Comparisons, and Strength of Fit 183
9.3.3 Lack of Fit and Violations of Assumptions 184
9.4 Example ' City Bond Ratings 185
9.5 Summary 189
10 Count Regression 191
10.1 Introduction 191
10.2 Concepts and Background Material 192
10.2.1 The Poisson Random Variable 192
10.2.2 Generalized Linear Models 193
10.3 Methodology 194
10.3.1 Estimation and Inference 194
10.3.2 Offsets 195
10.4 Overdispersion and Negative Binomial Regression 196
10.4.1 Quasi-likelihood 197
10.4.2 Negative Binomial Regression 198
10.5 Example ' Unprovoked Shark Attacks in Florida 198
10.6 Other Count Regression Models 205
10.7 Poisson Regression and Weighted Least Squares 209
10.7.1 Example'International Grosses of Movies (continued) 210
10.8 Summary 212
11 Models for Time-to-Event (Survival) Data 215
11.1 Introduction 216
11.2 Concepts and Background Material 217
11.2.1 The Nature of Survival Data 217
11.2.2 Accelerated Failure Time Models 218
11.2.3 The Proportional Hazards Model 219
11.3 Methodology 220
11.3.1 The Kaplan-Meier Estimator and the Log-Rank Test 220
11.3.2 Parametric (Likelihood) Estimation 225
11.3.3 Semiparametric (Partial Likelihood) Estimation 227
11.3.4 The Buckley-James Estimator 229
11.4 Example'The Survival of Broadway Shows (continued) 230
11.5 LTRC Data and Time-Varying Covariates 238
11.5.1 Left-Truncated/Right-Censored Data 238
11.5.2 Example'The Survival of Broadway Shows (continued) 239
11.5.3 Time-Varying Covariates 240
11.5.4 Example ' Female Heads of Government 241
11.6 Summary 244
Part V Other Regression Models
12 Nonlinear Regression 249
12.1 Introduction 249
12.2 Concepts and Background Material 250
12.3 Methodology 252
12.3.1 Nonlinear Least Squares Estimation 252
12.3.2 Inference for Nonlinear Regression Models 253
12.4 Example ' Michaelis-Menten Enzyme Kinetics 254
12.5 Summary 259
13 Models for Longitudinal and Nested Data 261
13.1 Introduction 261
13.2 Concepts and Background Material 263
13.2.1 Nested Data and ANOVA 263
13.2.2 Longitudinal Data and Time Series 264
13.2.3 Fixed Effects Versus Random Effects 265
13.3 Methodology 266
13.3.1 The Linear Mixed Effects Model 266
13.3.2 The Generalized Linear Mixed Effects Model 268
13.3.3 Generalized Estimating Equations 269
13.3.4 Nonlinear Mixed Effects Models 269
13.4 Example ' Tumor Growth in a Cancer Study 270
13.5 Example'Unprovoked Shark Attacks in theUnited States 276
13.6 Summary 282
14 Regularization Methods and Sparse Models 285
14.1 Introduction 285
14.2 Concepts and Background Material 286
14.2.1 The Bias-Variance Tradeoff 286
14.2.2 Large Numbers of Predictors and Sparsity 287
14.3 Methodology 288
14.3.1 Forward Stepwise Regression 288
14.3.2 Ridge Regression 289
14.3.3 The Lasso 290
14.3.4 Other Regularization Methods 291
14.3.5 Choosing the Regularization Parameter(s) 292
14.3.6 More Structured Regression Problems 293
14.3.7 Cautions About Regularization Methods 294
14.4 Example ' Human Development Index 295
14.5 Summary 298
Part VI Nonparametric and Semiparametric Models
15 Smoothing and Additive Models 303
15.1 Introduction 303
15.2 Concepts and Background Material 304
15.2.1 The Bias-Variance Tradeoff 304
15.2.2 Smoothing and Local Regression 305
15.3 Methodology 306
15.3.1 Local Polynomial Regression 306
15.3.2 Choosing the Bandwidth 307
15.3.3 Smoothing Splines 308
15.3.4 Multiple Predictors, the Curse of Dimensionality, and Additive Models 308
15.4 Example ' Prices of German Used Automobiles 309
15.5 Local and Penalized Likelihood Regression 312
15.5.1 Example ' The Bechdel Rule and Hollywood Movies 313
15.6 Using Smoothing to Identify Interactions 316
15.6.1 Example'Estimating Home Prices (continued) 318
15.7 Summary 318
16 Tree-Based Models 323
16.1 Introduction 324
16.2 Concepts and Background Material 324
16.2.1 Recursive Partitioning 324
16.2.2 Types of Trees 327
16.3 Methodology 328
16.3.1 CART 328
16.3.2 Conditional Inference Trees 329
16.3.3 Ensemble Methods 330
16.4 Examples 332
16.4.1 Estimating Home Prices (continued) 332
16.4.2 Example ' Courtesy in Airplane Travel 332
16.5 Trees for Other Types of Data 337
16.5.1 Trees for Nested and Longitudinal Data 337
16.5.2 Survival Trees 338
16.6 Summary 343
Index 355