Jump to Block: (About) 01 02 03 04 05 06 07 08 09 10 11 12

# 06 Decision Trees and Random Forests

In this block we cover:

- Decision Trees
- The Classification and Regression Tree (CART) approach
- Decision loss functions: ID3 vs Gini impurity
- Pruning trees to reduce overfitting
- Regression trees

- Random Forests
- Ensembles of trees
- Bagging features
- Forests vs Boosted Decision Trees
- Feature importance

## Lectures:

- Decision Trees and Random Forests:
- 6.1.1 Decisions, Trees, Forests, (Part 1, Trees) (39:25)
- 6.1.2 Decisions, Trees, Forests, (Part 2, Forests) (17:22)
- Reference R code

## Worksheets:

## Workshop:

The workshop is split into two sections. The first of these is in R, and **generates the data** (so you should run it first). The second of these in in Python and compares to the R content. Note that the content is exported to the DST github and the code below grabs it from there, so it is possible to run it out of order.

- 6.2.1 Workshop on Random Forests (R content) (10:52)
- 6.2.2 Workshop on Random Forests (Python content) (32:58)

## References

- Tree methods:
- Chapter 9.2 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- Penn State U Applied Data Mining and Statistical Learning How to prune trees
- Decision Tree Algorithms: Deep Math ML

- Regression Trees:
- Karalic A, “Employing Linear Regression in Regression Tree Leaves” (1992) ECAI-92

- Boosted Decision Trees:
- J. Elith, J. Leathwick, and T. Hastie “A working guide to boosted regression trees” (2008). British Ecological Society.

- CART:
- CART = Classification and Regression Trees. Breiman, Leo; Friedman, J. H.; Olshen, R. A.; Stone, C. J. (1984). Classification and regression trees.
- Wei-Yin Loh’s 2011 Review is popular.
- ID3: Quinlan, J. R. 1986. Induction of Decision Trees. Mach. Learn. 1, 1 (Mar. 1986), 81-106.

- Random Forests:
- Chapter 15 of The Elements of Statistical Learning: Data Mining, Inference, and Prediction (Friedman, Hastie and Tibshirani).
- Implement a Random Forest From Scratch in Python
- A Gentle Introduction to Random Forests at CitizenNet
- DataDive on Selecting good features
- Cosma Shalizi on Regression Trees
- Gilles Louppe PhD Thesis: Understanding Random Forests

- Kroese et al’s Data Science & Machine Learning free ebook looks pretty helpful.