Preface
Background and motivation
Intended audience
Key features of this book
Changes in the second edition
Key role of technology
How to use this book
Acknowledgments
I Part I: Introduction to Data Science
1. Prologue: Why data science?
What is data science?
Case study: The evolution of sabermetrics
Datasets
Further resources
2. Data visualization
The federal election cycle
Composing data graphics
Importance of data graphics: Challenger
Creating effective presentations
The wider world of data visualization
Further resources
Exercises
Supplementary exercises
3. A grammar for graphics
A grammar for data graphics
Canonical data graphics in R
Extended example: Historical baby names
Further resources
Exercises
Supplementary exercises
4. Data wrangling on one table
A grammar for data wrangling
Extended example: Bens time with the Mets
Further resources
Exercises
Supplementary exercises
5. Data wrangling on multiple tables
inner_join()
left_join()
Extended example: Manny Ramirez
Further resources
Exercises
Supplementary exercises
6. Tidy data
Tidy data
Reshaping data
Naming conventions
Data intake
Further resources
Exercises
Supplementary exercises
7. Iteration
Vectorized operations
Using across() with dplyr functions
The map() family of functions
Iterating over a one-dimensional vector
Iteration over subgroups
Simulation
Extended example: Factors associated with BMI
Further resources
Exercises
Supplementary exercises
8. Data Science Ethics
Introduction
Truthful falsehoods
Role of data science in society
Some settings for professional ethics
Some principles to guide ethical action
Algorithmic bias
Data and disclosure
Reproducibility
Ethics, collectively
Professional guidelines for ethical conduct
Further resources
Exercises
Supplementary exercises
II Part II: Statistics and Modeling
9. Statistical foundations
Samples and populations
Sample statistics
The bootstrap
Outliers
Statistical models: Explaining variation
Confounding and accounting for other factors
The perils of p-values
Further resources
Exercises
Supplementary exercises
10. Predictive modeling
Predictive modeling
Simple classification models
Evaluating models
Extended example: Who has diabetes?
Further resources
Exercises
Supplementary exercises
11. Supervised learning
Non-regression classifiers
Parameter tuning
Example: Evaluation of income models redux
Extended example: Who has diabetes this time?
Regularization
Further resources
Exercises
Supplementary exercises
12. Unsupervised learning
Clustering
Dimension reduction
Further resources
Exercises
Supplementary exercises
13. Simulation
Reasoning in reverse
Extended example: Grouping cancers
Randomizing functions
Simulating variability
Random networks
Key principles of simulation
Further resources
Exercises
Supplementary exercises
III Part III: Topics in Data Science
14. Dynamic and customized data graphics
Rich Web content using Djs and htmlwidgets
Animation
Flexdashboard
Interactive Web apps with Shiny
Customization of library(ggplot)ggplot graphics
Extended example: Hot dog eating
Further resources
Exercises
Supplementary exercises
15. Database querying using SQL
From dplyr to SQL
Flat-file databases
The SQL universe
The SQL data manipulation language
Extended example: FiveThirtyEight flights
SQL vs R
Further resources
Exercises
Supplementary exercises
16. Database administration
Constructing efficient SQL databases
Changing SQL data
Extended example: Building a database
Scalability
Further resources
Exercises
Supplementary exercises
17. Working with geospatial data
Motivation: Whats so great about geospatial data?
Spatial data structures
Making maps
Extended example: Congressional districts
Effective maps: How (not) to lie
Projecting polygons
Playing well with others
Further resources
Exercises
Supplementary exercises
18. Geospatial computations
Geospatial operations
Geospatial aggregation
Geospatial joins
Extended example: Trail elevations at MacLeish
Further resources
Exercises
Supplementary exercises
19. Text as data
Regular expressions using Macbeth
Extended example: Analyzing textual data from arXivorg
Ingesting text
Further resources
Exercises
Supplementary exercises
20. Network science
Introduction to network science
Extended example: Six degrees of Kristen Stewart
PageRank
Extended example: mens college basketball
Further resources
Exercises
Supplementary exercises
21. Epilogue: Towards "big data"
Notions of big data
Tools for bigger data
Alternatives to R
Closing thoughts
Further resources
IV Part IV: Appendices
A Packages used in this book
The mdsr package
Other packages
Further resources
B Introduction to R and RStudio
Installation
Learning R
Fundamental structures and objects
Add-ons: Packages
Further resources
Exercises
Supplementary exercises
C Algorithmic thinking
Introduction
Simple example
Extended example: Law of large numbers
Non-standard evaluation
Debugging and defensive coding
Further resources
Exercises
Supplementary exercises
D Reproducible analysis and workflow
Scriptable statistical computing
Reproducible analysis with R Markdown
Projects and version control
Further resources
Exercises
Supplementary exercises
E Regression modeling
Multiple regression
Inference for regression
Assumptions underlying regression
Logistic regression
Further resources
Exercises
Supplementary exercises
F Setting up a database server
SQLite
MySQL
PostgreSQL
Connecting to SQL