ROC Area

View: New views
3 Messages — Rating Filter:   Alert me  

ROC Area

by Thiago Ferreira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

     I have one question about the ROC Area obtained when using the
cross-validation option on Explorer GUI. I have a database with 771
instances and 35 attributes + class, the classes are unbalanced (14
classes, one with 390 instances, one with 21 and the others with 30
instances), when I run KNN with K = 771, everybody is classified to
the majority class, that's expected. But the ROC Area obtained for the
other classes isn't 0, but some high number (one class has almost the
same area as the majority class) different for each class.
      Am I missing something or is something wrong?
      The output obtained is attached. The WEKA version is 3.5.7.

Regards,
Thiago

=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 771 -W 0 -I -A
"weka.core.neighboursearch.LinearNNSearch -A
\"weka.core.EuclideanDistance -R first-last\""
Relation:     DATA-weka.filters.unsupervised.attribute.Standardize
Instances:    771
Attributes:   36
              atr1
              atr2
              ...
              atr35
              class
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 771 inverse-distance-weighted nearest neighbour(s) for classification


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         390               50.5837 %
Incorrectly Classified Instances       381               49.4163 %
Kappa statistic                          0
Mean absolute error                      0.0923
Root mean squared error                  0.2154
Relative absolute error                 88.5604 %
Root relative squared error             94.6235 %
Total Number of Instances              771

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
  0         0          0         0         0          0.901    class1
  0         0          0         0         0          0.931    class2
  0         0          0         0         0          0.991    class3
  0         0          0         0         0          0.996    class4
  0         0          0         0         0          0.975    class5
  0         0          0         0         0          0.969    class6
  0         0          0         0         0          0.965    class7
  1         1          0.506     1         0.672      0.997    class8
  0         0          0         0         0          0.988    class9
  0         0          0         0         0          0.982    class10
  0         0          0         0         0          0.962    class11
  0         0          0         0         0          0.973    class12
  0         0          0         0         0          0.966    class13
  0         0          0         0         0          0.966    class14

=== Confusion Matrix ===

   a   b   c   d   e   f   g   h   i   j   k   l   m   n   <-- classified as
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   a = class1
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   b = class2
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   c = class3
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   d = class4
   0   0   0   0   0   0   0  21   0   0   0   0   0   0 |   e = class5
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   f = class6
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   g = class7
   0   0   0   0   0   0   0 390   0   0   0   0   0   0 |   h = class8
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   i = class9
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   j = class10
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   k = class11
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   l = class12
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   m = class13
   0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   n = class14

_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: ROC Area

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Weka computes AUC for each class by considering each in turn to be  
the "positive" class and all the remaining classes are the negative  
class. Since KNN in Weka produces probability distributions (i.e. the  
votes for the classes by the k neighbors are normalized), the  
predictions can be ranked for each class in turn, and an ROC curve  
produced. In your case, most of the AUC scores are kind of similar  
because most of these two-class situations are kind of similar (i.e.  
each minority classes vs the big class plus the remaining minority  
classes).

I'm not sure that I've been all that clear, but hopefully that helps :-)

Cheers,
Mark.

On 30/06/2008, at 4:00 AM, Thiago Ferreira wrote:

> Hi,
>
>      I have one question about the ROC Area obtained when using the
> cross-validation option on Explorer GUI. I have a database with 771
> instances and 35 attributes + class, the classes are unbalanced (14
> classes, one with 390 instances, one with 21 and the others with 30
> instances), when I run KNN with K = 771, everybody is classified to
> the majority class, that's expected. But the ROC Area obtained for the
> other classes isn't 0, but some high number (one class has almost the
> same area as the majority class) different for each class.
>       Am I missing something or is something wrong?
>       The output obtained is attached. The WEKA version is 3.5.7.
>
> Regards,
> Thiago
>
> === Run information ===
>
> Scheme:       weka.classifiers.lazy.IBk -K 771 -W 0 -I -A
> "weka.core.neighboursearch.LinearNNSearch -A
> \"weka.core.EuclideanDistance -R first-last\""
> Relation:     DATA-weka.filters.unsupervised.attribute.Standardize
> Instances:    771
> Attributes:   36
>               atr1
>               atr2
>               ...
>               atr35
>               class
> Test mode:    10-fold cross-validation
>
> === Classifier model (full training set) ===
>
> IB1 instance-based classifier
> using 771 inverse-distance-weighted nearest neighbour(s) for  
> classification
>
>
> Time taken to build model: 0 seconds
>
> === Stratified cross-validation ===
> === Summary ===
>
> Correctly Classified Instances         390               50.5837 %
> Incorrectly Classified Instances       381               49.4163 %
> Kappa statistic                          0
> Mean absolute error                      0.0923
> Root mean squared error                  0.2154
> Relative absolute error                 88.5604 %
> Root relative squared error             94.6235 %
> Total Number of Instances              771
>
> === Detailed Accuracy By Class ===
>
> TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
>   0         0          0         0         0          0.901    class1
>   0         0          0         0         0          0.931    class2
>   0         0          0         0         0          0.991    class3
>   0         0          0         0         0          0.996    class4
>   0         0          0         0         0          0.975    class5
>   0         0          0         0         0          0.969    class6
>   0         0          0         0         0          0.965    class7
>   1         1          0.506     1         0.672      0.997    class8
>   0         0          0         0         0          0.988    class9
>   0         0          0         0         0          0.982    class10
>   0         0          0         0         0          0.962    class11
>   0         0          0         0         0          0.973    class12
>   0         0          0         0         0          0.966    class13
>   0         0          0         0         0          0.966    class14
>
> === Confusion Matrix ===
>
>    a   b   c   d   e   f   g   h   i   j   k   l   m   n   <--  
> classified as
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   a =  
> class1
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   b =  
> class2
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   c =  
> class3
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   d =  
> class4
>    0   0   0   0   0   0   0  21   0   0   0   0   0   0 |   e =  
> class5
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   f =  
> class6
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   g =  
> class7
>    0   0   0   0   0   0   0 390   0   0   0   0   0   0 |   h =  
> class8
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   i =  
> class9
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   j =  
> class10
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   k =  
> class11
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   l =  
> class12
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   m =  
> class13
>    0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   n =  
> class14
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@...
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr.,
Orlando, FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/ 
projects/pentaho>




_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: ROC Area

by Thiago Ferreira :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Mark,

     Don't worry, you were clear :)

Thanks,
Thiago
    

On Tue, Jul 1, 2008 at 10:27 PM, Mark Hall <mhall@...> wrote:
Weka computes AUC for each class by considering each in turn to be the "positive" class and all the remaining classes are the negative class. Since KNN in Weka produces probability distributions (i.e. the votes for the classes by the k neighbors are normalized), the predictions can be ranked for each class in turn, and an ROC curve produced. In your case, most of the AUC scores are kind of similar because most of these two-class situations are kind of similar (i.e. each minority classes vs the big class plus the remaining minority classes).

I'm not sure that I've been all that clear, but hopefully that helps :-)

Cheers,
Mark.


On 30/06/2008, at 4:00 AM, Thiago Ferreira wrote:

Hi,

    I have one question about the ROC Area obtained when using the
cross-validation option on Explorer GUI. I have a database with 771
instances and 35 attributes + class, the classes are unbalanced (14
classes, one with 390 instances, one with 21 and the others with 30
instances), when I run KNN with K = 771, everybody is classified to
the majority class, that's expected. But the ROC Area obtained for the
other classes isn't 0, but some high number (one class has almost the
same area as the majority class) different for each class.
     Am I missing something or is something wrong?
     The output obtained is attached. The WEKA version is 3.5.7.

Regards,
Thiago

=== Run information ===

Scheme:       weka.classifiers.lazy.IBk -K 771 -W 0 -I -A
"weka.core.neighboursearch.LinearNNSearch -A
\"weka.core.EuclideanDistance -R first-last\""
Relation:     DATA-weka.filters.unsupervised.attribute.Standardize
Instances:    771
Attributes:   36
             atr1
             atr2
             ...
             atr35
             class
Test mode:    10-fold cross-validation

=== Classifier model (full training set) ===

IB1 instance-based classifier
using 771 inverse-distance-weighted nearest neighbour(s) for classification


Time taken to build model: 0 seconds

=== Stratified cross-validation ===
=== Summary ===

Correctly Classified Instances         390               50.5837 %
Incorrectly Classified Instances       381               49.4163 %
Kappa statistic                          0
Mean absolute error                      0.0923
Root mean squared error                  0.2154
Relative absolute error                 88.5604 %
Root relative squared error             94.6235 %
Total Number of Instances              771

=== Detailed Accuracy By Class ===

TP Rate   FP Rate   Precision   Recall  F-Measure   ROC Area  Class
 0         0          0         0         0          0.901    class1
 0         0          0         0         0          0.931    class2
 0         0          0         0         0          0.991    class3
 0         0          0         0         0          0.996    class4
 0         0          0         0         0          0.975    class5
 0         0          0         0         0          0.969    class6
 0         0          0         0         0          0.965    class7
 1         1          0.506     1         0.672      0.997    class8
 0         0          0         0         0          0.988    class9
 0         0          0         0         0          0.982    class10
 0         0          0         0         0          0.962    class11
 0         0          0         0         0          0.973    class12
 0         0          0         0         0          0.966    class13
 0         0          0         0         0          0.966    class14

=== Confusion Matrix ===

  a   b   c   d   e   f   g   h   i   j   k   l   m   n   <-- classified as
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   a = class1
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   b = class2
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   c = class3
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   d = class4
  0   0   0   0   0   0   0  21   0   0   0   0   0   0 |   e = class5
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   f = class6
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   g = class7
  0   0   0   0   0   0   0 390   0   0   0   0   0   0 |   h = class8
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   i = class9
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   j = class10
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   k = class11
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   l = class12
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   m = class13
  0   0   0   0   0   0   0  30   0   0   0   0   0   0 |   n = class14

_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist


--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr.,
Orlando, FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/projects/pentaho>




_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist



_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist