I Introduction 1
1 The Big Picture with Kondratiev and Kardashev 3
2 The Scientific Method and Data 7
3 Conventions 13
II Starting with R and Elements of Statistics 19
4 The Basics of R 21
4.1 Variables 27
4.2 Data Types 29
4.2.1 Elementary Data Types 29
4.2.2 Vectors 30
4.2.3 Lists 33
4.2.4 Matrices 39
4.2.5 Arrays 42
4.2.6 Factors 44
4.2.7 Data Frames 48
4.3 Operators 56
4.3.1 Arithmetic Operators 56
4.3.2 Relational Operators 57
4.3.3 Logical Operators 57
4.3.4 Assignment Operators 59
4.3.5 Other Operators 60
4.3.6 Loops 62
4.3.7 Functions 66
4.3.8 Packages 70
4.3.9 Strings 73
4.4 Selected Data Interfaces 76
4.4.1 CSV Files 76
4.4.2 Excel Files 80
4.4.3 Databases 80
4.5 Distributions 83
4.5.1 Normal Distribution 83
4.5.2 Binomial Distribution 85
5 Lexical Scoping and environments 91
5.1 Environments in R 92
5.2 Lexical Scoping in R 94
6 The Implementation of OO 99
6.1 Base Types 102
6.2 S3 Objects 104
6.2.1 Creating S3 objects 107
6.2.2 Creating generic methods 109
6.2.3 Method dispatch 110
6.2.4 Group generic functions 111
6.3 S4 Objects 114
6.3.1 Creating S4 Objects 114
6.3.2 Recognising objects, generic functions, and methods 122
6.3.3 Creating S4 Generics 124
6.3.4 Method dispatch 125
6.4 The reference class, refclass, RC or R5 model 127
6.4.1 Creating R5 objects 127
6.5 OO Conclusion 134
7 Tidy R with the Tidyverse 137
7.1 The Philosophy of the Tidyverse 138
7.2 Packages in the tidyverse 141
7.3 Working with the tidyverse 144
7.3.1 tibbles 144
7.3.2 Piping with R 150
7.3.3 Attention points when using the pipe command 151
7.3.3.1 Advanced piping 153
7.3.3.2 Conclusion 155
8 Elements of Descriptive Statistics 157
8.1 Measures of Central Tendency 158
8.1.1 Mean 158
8.1.2 The Median 161
8.1.3 The Mode 162
8.2 Measures of Variation or Spread 164
8.3 Measures of Covariation 166
8.4 Chi Square Tests 169
9 Further Reading 171
III Data Import 173
10 A short history of modern database systems 175
11 RDBMS 179
12 SQL 183
12.1 Designing the database 184
12.2 Building the database 187
12.3 Adding data to the database 196
12.4 Querying the database 200
12.5 Modifying an existing database 206
12.6 Advanced features of SQL 211
13 Connecting R to an SQL database 215
IV Data Wrangling 221
14 Anonymising Data 225
15 DataWrangling in the tidyverse 229
15.1 Tidy data 230
15.2 Importing the data 232
15.2.1 Importing from an SQL RDBMS 232
15.2.2 Importing flat files in the tidyverse 234
15.2.2.1 CSV Files 236
15.2.2.2 Making sense of fixed width files 238
15.3 Tidying up data with tidyr 243
15.3.1 Splitting tables 244
15.3.2 headers to data 249
15.3.3 Spreading one column over many 250
15.3.4 separate 252
15.3.5 Unite 254
15.3.6 Wrong Data 255
15.4 Playing with tipples: SQL-like functionality 256
15.4.1 Selecting 256
15.4.2 Filtering 256
15.4.3 Joining 258
15.4.4 Mutating 262
15.4.5 Set Operations 265
15.5 String Manipulation in the tidyverse 268
15.5.1 Basic string manipulation 269
15.5.2 Pattern matching with regular expressions 272
15.5.2.1 Regular Expressions 273
15.5.2.2 Functions using Regex 279
15.6 Dates with lubridate 287
15.6.0.1 ISO 8601 Format 288
15.6.0.2 Timezones 290
15.6.0.3 Extract and set date and time components 291
15.6.0.4 Calculating with date-times 293
15.7 Factors with forcats 298
16 Dealing with missing data 307
17 Data Binning 319
17.1 Tuning the binning procedure 323
17.2 More complex cases: matrix binning 329
17.3 Weight of evidence and information value 336
18 Factoring analysis and principle components 339
18.1 Principle components analysis 340
18.2 Factor Analysis 345
V Explore Data 349
19 Using Descriptive Statistics 353
20 Standard Charts & Graphs 357
20.1 Pie Charts 358
20.2 Bar Charts 359
20.3 Boxplots 361
20.4 Violin plots 363
20.5 Histograms 366
20.6 Scatterplots 368
20.7 Line Graphs 371
20.8 Plotting Functions 373
20.9 Maps and contour plots 374
21 Elected Visualization Methods 377
21.1 Heat-maps 377
21.2 Text Mining 379
21.2.1 Word Clouds 379
21.2.2 Word Associations 383
21.3 Colours in R 386
22 Time Series Analysis 393
22.1 Time Series in R 394
22.2 Forecasting 397
22.2.1 Moving Average 397
22.2.2 Seasonal Decomposition 403
VI Modelling 409
23 Regression Models 411
23.1 Linear Regression 411
23.2 Multiple Linear Regression 415
23.2.1 Poisson Regression 416
23.2.2 Non-Linear Regression 418
23.3 Performance of regression models 421
23.3.1 Mean Square Error (MSE) 421
23.3.2 R-Squared 421
23.3.3 Mean Average Deviation (MAD) 423
24 Classification Models 425
24.1 Logistic Regression 425
24.2 The performance of binary classification models 427
24.2.1 The Confusion Matrix and related measures 428
24.2.2 ROC 431
24.2.3 AUC 433
24.2.4 AUC Gini for logistic regression 435
24.2.5 Kolmogorov-Smirnov (KS) for logistic regression 436
24.2.6 Finding an Optimal Cut-off 439
25 Learning Machines 445
25.1 Decision Tree 447
25.1.1 Essential Background 447
25.1.2 Important considerations 452
25.1.3 Growing trees with R 455
25.1.4 Evaluating the performance of a decision tree 463
25.1.4.1 The performance of the regression tree 464
25.1.4.2 The performance of the classification tree 464
25.2 Random Forest 467
25.3 Artificial Neural Networks (ANN) 472
25.3.1 The basics of ANNs in R 472
25.3.2 An example of a work-flow to develop an ANN 475
25.4 Support Vector Machine 483
25.5 Unsupervised learning and clustering 487
25.5.1 k-means clustering 488
25.5.2 Fuzzy clustering 501
25.5.3 Hierarchical clustering 504
25.5.4 Other clustering methods 506
26 Towards a tidy modelling cycle with modelr 507
27 Model Validation 513
27.1 Model quality measures 515
27.2 Predictions and residuals 516
27.3 Bootstrapping 517
27.4 Cross-Validation 520
27.4.1 training and validating 521
27.5 Monte-Carlo Cross Validation 525
27.6 k-Fold Cross Validation 527
27.7 Comparison 529
27.8 Validation in a broader perspective 530
28 Labs 535
28.1 Financial Analysis with QuantMod 535
28.1.1 The quantmod data structure 539
28.1.2 Support functions supplied by quantmod 543
28.1.3 Financial modelling in quantmod 545
29 Multi Criteria Decision Analysis (MCDA) 553
29.1 What and Why 553
29.2 GeneralWork-flow 555
29.3 Identify the issue at hand: step 1 and 2 559
29.4 STEP 3: the decision matrix 561
29.4.1 Construct a decision matrix 561
29.4.2 Normalize the decision matrix 563
29.5 STEP 4: leave out inefficient and unacceptable alternatives 565
29.5.1 Unacceptable Alternatives 565
29.5.2 Dominance' inefficient alternatives 565
29.6 Printing preference relationships 568
29.7 STEP 6: MCDA Methods 570
29.7.1 Examples of non-compensatory methods 570
29.7.2 The weighted sum method (WSM) 571
29.7.3 WPM 574
29.7.4 ELECTRE 575
29.7.4.1 ELECTRE I 576
29.7.4.2 ELECTRE II 582
29.7.5 PROMethEE 584
29.7.5.1 PROMethEE I 587
29.7.5.2 PROMethEE II 597
29.7.6 PCA (Gaia) 602
29.7.7 Outranking methods 607
29.7.8 Goal Programming 608
29.8 Summary MCDA 611
VII Introduction to Companies 613
30 Financial Accounting 617
30.1 The Statements of Accounts 618
30.1.1 Income Statement 618
30.1.2 Net Income: The P&L statement 618
30.1.3 Balance Sheet 619
30.2 The Value Chain 621
30.3 Further Terminology 623
30.4 Selected Financial Ratios 625
31 Management Accounting 627
31.1 Introduction 628
31.2 Selected Methods in MA 630
31.2.1 Cost Accounting 630
31.2.2 Selected Cost Types 632
31.3 Selected Use Cases of MA 635
31.3.1 Balanced Scorecard 635
31.3.2 Key Performance Indicators 636
31.3.2.1 Selection of KPIs 638
32 Asset Valuation Basics 641
32.1 Time Value of Money 642
32.2 Cash 645
32.3 Bonds 646
32.3.1 Valuation of Bonds 648
32.3.2 Duration 650
32.3.2.1 Macaulay Duration 651
32.3.2.2 Modified Duration 652
32.4 Equities 654
32.4.1 Valuation of Equities 655
32.4.1.1 CAPM 656
32.4.2 Absolute Value Models 660
32.4.2.1 Dividend Discount Model 660
32.4.2.2 Free Cash Flow (FCF) 664
32.4.2.3 Discounted Cash Flow Model 666
32.4.2.4 Discounted Abnormal Operating Earnings valuation model 668
32.4.2.5 Net Asset Value Method or Cost Method 668
32.4.2.6 Excess Earnings Method 670
32.4.3 Relative Value Models 670
32.4.3.1 The Idea behind Relative Value Models 670
32.4.3.2 Some Ratios that can be used in relative value models 671
32.4.3.3 Measures Related to Company Value for External Stakeholders 673
32.4.3.4 Relative Value Models in Practice 680
32.4.3.5 Conclusions and Use 680
32.4.4 Selection of Valuation Methods 681
32.4.5 Pitfalls and Matters Requiring Attention for all Methods 682
32.4.5.1 Results and Sensitivity 682
32.5 Forwards and Futures 690
32.6 Options 692
32.6.1 Definitions 692
32.6.2 Commercial Aspects 695
32.6.3 Historic observations 696
32.6.4 Valuation of Options at Maturity 697
32.6.5 The Put-Call Parity 700
32.6.6 The Black & Scholes Model 702
32.6.6.1 Apply the Black and Scholes formula 703
32.6.7 Dependencies 705
32.6.8 Sensitivities: 'the Greeks' 710
32.6.9 Delta Hedging 711
32.6.10 Linear Option Strategies 714
32.6.10.1 The Limits of the Black and Scholes Model 720
32.6.11 The Binomial Model 724
32.6.11.1 Risk Neutral Method 727
32.6.11.2 The Equivalent Portfolio Binomial Model 729
32.6.11.3 Summary Binomial Model 732
32.6.12 Exotic Options 732
32.6.13 Integrated Option Strategies 733
32.6.14 Capital Protected Structures 736
VIII Report 739
33 ggplot2 743
34 R-markdown 753
35 knitr and LATEX 757
36 An automated development cycle 761
37 Writing and communication skills 763
38 Interactive apps 767
38.1 Shiny 769
38.2 Browser born data visualization 773
38.2.1 HTML-widgets 773
38.2.2 ggvis 775
38.2.3 googleVis 777
38.3 Dashboards 779
38.3.1 The business case: a diversity dashboard 780
38.3.2 A dashboard with flexdashboard 785
38.3.2.1 Interactive dashboards with flexdashboard 790
38.3.3 A dashboard with shinydashboard 791
IX Appendices 795
39 Other Resources 797
40 Levels of Measurement 799
40.1 Nominal Scale 800
40.2 Ordinal Scale 801
40.3 Interval Scale 802
40.4 Ratio Scale 803
41 Trademark Notices 805
42 Code snippets not shown in the body of the book 809
43 Answers to questions 815
Bibliography 829
Index 839
Nomenclature 851