description: build Decision Tree from bank note dataset in python
CART on the Bank Note dataset
1 | from random import seed |
Load a CSV file
1 | def load_csv(filename): |
Convert string column to float
1 | def str_column_to_float(dataset, column): |
Split a dataset into k folds
1 | def cross_validation_split(dataset, n_folds): |
Calculate accuracy percentage
1 | def accuracy_metric(actual, predicted): |
Evaluate an algorithm using a cross validation split
1 | def evaluate_algorithm(dataset, algorithm, n_folds, *args): |
Split a dataset based on an attribute and an attribute value
1 | def test_split(index, value, dataset): |
Calculate the Gini index for a split dataset
1 | def gini_index(groups, classes): |
Select the best split point for a dataset
1 | def get_split(dataset): |
Create a terminal node value
1 | def to_terminal(group): |
Create child splits for a node or make terminal
1 | def split(node, max_depth, min_size, depth): |
Build a decision tree
1 | def build_tree(train, max_depth, min_size): |
Make a prediction with a decision tree
1 | def predict(node, row): |
Classification and Regression Tree Algorithm
1 | def decision_tree(train, test, max_depth, min_size): |
Test CART on Bank Note dataset
1 | seed(1) |
load and prepare data
1 | filename = 'data_banknote_authentication.csv' |
convert string attributes to integers
1 | for i in range(len(dataset[0])): |
evaluate algorithm
1 | n_folds = 5 |
output
1 | Scores: [100.0, 100.0, 100.0, 100.0, 100.0] |