Command line callClassifier gives different/incorrect results?

View: New views
3 Messages — Rating Filter:   Alert me  

Command line callClassifier gives different/incorrect results?

by dbaumgar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I am running a large amount of experiments with the CLI (I use this rather than the Experimenter so I have more freedom and result data to work with).  I put the classifier output in a file and then append the predictions to the end of it.  Since I am using 10-fold cross-validation, I have to use the callClassifier class in order to output these predictions from the command line.

The problem is that my manual calculations of the average from the predictions do not match what is reported in the "Correctly Classified Instances" of the classifier output.  I would not be concerned with a < 1% difference; however, at times there is a 4% difference!  Should these be the same, or shouldn't they?

The issue only seems to occur with more complex classifier setups (e.g. Bagging).  Using J48 the two sources for the accuracy match.  However, using Bagging with J48 produces the problem that I am seeing.

To reproduce what I am seeing with the autos.arff dataset:

1) Obtain the standard classifier output
java weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -t autos.arff -v -o -i -k -- -C 0.25 -M 2

2) Obtain the predictions
java callClassifier weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -t autos.arff  -- -C 0.25 -M 2

Re: Command line callClassifier gives different/incorrect results?

by dbaumgar :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

dbaumgar wrote:
Hello,

I am running a large amount of experiments with the CLI (I use this rather than the Experimenter so I have more freedom and result data to work with).  I put the classifier output in a file and then append the predictions to the end of it.  Since I am using 10-fold cross-validation, I have to use the callClassifier class in order to output these predictions from the command line.

The problem is that my manual calculations of the average from the predictions do not match what is reported in the "Correctly Classified Instances" of the classifier output.  I would not be concerned with a < 1% difference; however, at times there is a 4% difference!  Should these be the same, or shouldn't they?

The issue only seems to occur with more complex classifier setups (e.g. Bagging).  Using J48 the two sources for the accuracy match.  However, using Bagging with J48 produces the problem that I am seeing.

To reproduce what I am seeing with the autos.arff dataset:

1) Obtain the standard classifier output
java weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -t autos.arff -v -o -i -k -- -C 0.25 -M 2

2) Obtain the predictions
java callClassifier weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W weka.classifiers.trees.J48 -t autos.arff  -- -C 0.25 -M 2

For ease of reproduction, you can use the iris.arff dataset that is provided with the install of Weka.

Thanks,
Dustin Baumgartner

Re: Command line callClassifier gives different/incorrect results?

by Peter Reutemann :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> I am running a large amount of experiments with the CLI (I use this rather
> than the Experimenter so I have more freedom and result data to work with).
> I put the classifier output in a file and then append the predictions to the
> end of it.  Since I am using 10-fold cross-validation, I have to use the
> callClassifier class in order to output these predictions from the command
> line.
>
> The problem is that my manual calculations of the average from the
> predictions do not match what is reported in the "Correctly Classified
> Instances" of the classifier output.  I would not be concerned with a < 1%
> difference; however, at times there is a 4% difference!  Should these be the
> same, or shouldn't they?
>
> The issue only seems to occur with more complex classifier setups (e.g.
> Bagging).  Using J48 the two sources for the accuracy match.  However, using
> Bagging with J48 produces the problem that I am seeing.
>
> To reproduce what I am seeing with the autos.arff dataset:
>
> 1) Obtain the standard classifier output
> java weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W
> weka.classifiers.trees.J48 -t autos.arff -v -o -i -k -- -C 0.25 -M 2
>
> 2) Obtain the predictions
> java callClassifier weka.classifiers.meta.Bagging -P 100 -S 1 -I 10 -W
> weka.classifiers.trees.J48 -t autos.arff  -- -C 0.25 -M 2


"callClassifier" is a third-party tool provided by Alex Seewald, the
author of the "Weka Primer"
(http://weka.sourceforge.net/wekadoc/index.php/en:Primer). Please
contact him and ask him about the differences in output.

Cheers, Peter
--
Peter Reutemann, Dept. of Computer Science, University of Waikato, NZ
http://www.cs.waikato.ac.nz/~fracpete/ Ph. +64 (7) 858-5174

_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
LightInTheBox - Buy quality products at wholesale price