CHAPTER 7

DATA CLASSIFICATION WITH DECISION TREES

action is a string that is either “train” or “test.” d is the data structure that defines the tree.

t are the inputs for either training or testing. The outputs are the updated data structure and r with the results.

The function is first called with training data and the action is “train.” The main function is short.

switch lower(action)

case 'train'

d = Training( d, t );

d.box(1)

case 'test'

for k = 1:length(d.box)

d.box(k).id = [];

end

[r, d] = Testing( d, t );

for k = 1:length(d.box)

d.box(k)

end

otherwise

error('%s is not an available action',action);

end

We added the error case otherwise for completeness. Note that we use lower to eliminate case sensitivity. Training creates the decision tree. A decision tree is a set of boxes connected by lines. A parent box has two child boxes if it is a decision box. A class box has no children. The subfunction Training trains the tree. It adds boxes at each node.

%% DecisionTree>Training

function d = Training( d, t )

[n,m]

= size(t.x);

nClass

= max(t.m);

box(1)

= AddBox( 1, 1:n*m, [] );

box(1).child = [2 3];

[˜, dH] = HomogeneityMeasure( 'initialize', d, t.m );

class

= 0;

nRow

= 1;

kR0

= 0;

kNR0

= 1; % Next row;

kInRow

= 1;

kInNRow = 1;

while( class < nClass )

k

= kR0 + kInRow;

idK

= box(k).id; % Data that is in the box and to use to compute

the next action

% Enter this loop if it not a non-decision box

if( isempty(box(k).class) )

160