|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
ROC AreaHi,
I have one question about the ROC Area obtained when using the cross-validation option on Explorer GUI. I have a database with 771 instances and 35 attributes + class, the classes are unbalanced (14 classes, one with 390 instances, one with 21 and the others with 30 instances), when I run KNN with K = 771, everybody is classified to the majority class, that's expected. But the ROC Area obtained for the other classes isn't 0, but some high number (one class has almost the same area as the majority class) different for each class. Am I missing something or is something wrong? The output obtained is attached. The WEKA version is 3.5.7. Regards, Thiago === Run information === Scheme: weka.classifiers.lazy.IBk -K 771 -W 0 -I -A "weka.core.neighboursearch.LinearNNSearch -A \"weka.core.EuclideanDistance -R first-last\"" Relation: DATA-weka.filters.unsupervised.attribute.Standardize Instances: 771 Attributes: 36 atr1 atr2 ... atr35 class Test mode: 10-fold cross-validation === Classifier model (full training set) === IB1 instance-based classifier using 771 inverse-distance-weighted nearest neighbour(s) for classification Time taken to build model: 0 seconds === Stratified cross-validation === === Summary === Correctly Classified Instances 390 50.5837 % Incorrectly Classified Instances 381 49.4163 % Kappa statistic 0 Mean absolute error 0.0923 Root mean squared error 0.2154 Relative absolute error 88.5604 % Root relative squared error 94.6235 % Total Number of Instances 771 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0 0 0 0 0 0.901 class1 0 0 0 0 0 0.931 class2 0 0 0 0 0 0.991 class3 0 0 0 0 0 0.996 class4 0 0 0 0 0 0.975 class5 0 0 0 0 0 0.969 class6 0 0 0 0 0 0.965 class7 1 1 0.506 1 0.672 0.997 class8 0 0 0 0 0 0.988 class9 0 0 0 0 0 0.982 class10 0 0 0 0 0 0.962 class11 0 0 0 0 0 0.973 class12 0 0 0 0 0 0.966 class13 0 0 0 0 0 0.966 class14 === Confusion Matrix === a b c d e f g h i j k l m n <-- classified as 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | a = class1 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | b = class2 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | c = class3 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | d = class4 0 0 0 0 0 0 0 21 0 0 0 0 0 0 | e = class5 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | f = class6 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | g = class7 0 0 0 0 0 0 0 390 0 0 0 0 0 0 | h = class8 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | i = class9 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | j = class10 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | k = class11 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | l = class12 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | m = class13 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | n = class14 _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
|
|
Re: ROC AreaWeka computes AUC for each class by considering each in turn to be
the "positive" class and all the remaining classes are the negative class. Since KNN in Weka produces probability distributions (i.e. the votes for the classes by the k neighbors are normalized), the predictions can be ranked for each class in turn, and an ROC curve produced. In your case, most of the AUC scores are kind of similar because most of these two-class situations are kind of similar (i.e. each minority classes vs the big class plus the remaining minority classes). I'm not sure that I've been all that clear, but hopefully that helps :-) Cheers, Mark. On 30/06/2008, at 4:00 AM, Thiago Ferreira wrote: > Hi, > > I have one question about the ROC Area obtained when using the > cross-validation option on Explorer GUI. I have a database with 771 > instances and 35 attributes + class, the classes are unbalanced (14 > classes, one with 390 instances, one with 21 and the others with 30 > instances), when I run KNN with K = 771, everybody is classified to > the majority class, that's expected. But the ROC Area obtained for the > other classes isn't 0, but some high number (one class has almost the > same area as the majority class) different for each class. > Am I missing something or is something wrong? > The output obtained is attached. The WEKA version is 3.5.7. > > Regards, > Thiago > > === Run information === > > Scheme: weka.classifiers.lazy.IBk -K 771 -W 0 -I -A > "weka.core.neighboursearch.LinearNNSearch -A > \"weka.core.EuclideanDistance -R first-last\"" > Relation: DATA-weka.filters.unsupervised.attribute.Standardize > Instances: 771 > Attributes: 36 > atr1 > atr2 > ... > atr35 > class > Test mode: 10-fold cross-validation > > === Classifier model (full training set) === > > IB1 instance-based classifier > using 771 inverse-distance-weighted nearest neighbour(s) for > classification > > > Time taken to build model: 0 seconds > > === Stratified cross-validation === > === Summary === > > Correctly Classified Instances 390 50.5837 % > Incorrectly Classified Instances 381 49.4163 % > Kappa statistic 0 > Mean absolute error 0.0923 > Root mean squared error 0.2154 > Relative absolute error 88.5604 % > Root relative squared error 94.6235 % > Total Number of Instances 771 > > === Detailed Accuracy By Class === > > TP Rate FP Rate Precision Recall F-Measure ROC Area Class > 0 0 0 0 0 0.901 class1 > 0 0 0 0 0 0.931 class2 > 0 0 0 0 0 0.991 class3 > 0 0 0 0 0 0.996 class4 > 0 0 0 0 0 0.975 class5 > 0 0 0 0 0 0.969 class6 > 0 0 0 0 0 0.965 class7 > 1 1 0.506 1 0.672 0.997 class8 > 0 0 0 0 0 0.988 class9 > 0 0 0 0 0 0.982 class10 > 0 0 0 0 0 0.962 class11 > 0 0 0 0 0 0.973 class12 > 0 0 0 0 0 0.966 class13 > 0 0 0 0 0 0.966 class14 > > === Confusion Matrix === > > a b c d e f g h i j k l m n <-- > classified as > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | a = > class1 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | b = > class2 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | c = > class3 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | d = > class4 > 0 0 0 0 0 0 0 21 0 0 0 0 0 0 | e = > class5 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | f = > class6 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | g = > class7 > 0 0 0 0 0 0 0 390 0 0 0 0 0 0 | h = > class8 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | i = > class9 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | j = > class10 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | k = > class11 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | l = > class12 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | m = > class13 > 0 0 0 0 0 0 0 30 0 0 0 0 0 0 | n = > class14 > > _______________________________________________ > Wekalist mailing list > Wekalist@... > https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist > Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/ projects/pentaho> _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
|
|
Re: ROC AreaMark,
Don't worry, you were clear :) Thanks, Thiago On Tue, Jul 1, 2008 at 10:27 PM, Mark Hall <mhall@...> wrote: Weka computes AUC for each class by considering each in turn to be the "positive" class and all the remaining classes are the negative class. Since KNN in Weka produces probability distributions (i.e. the votes for the classes by the k neighbors are normalized), the predictions can be ranked for each class in turn, and an ROC curve produced. In your case, most of the AUC scores are kind of similar because most of these two-class situations are kind of similar (i.e. each minority classes vs the big class plus the remaining minority classes). _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
| Free Forum Powered by Nabble | Forum Help |