On 2/07/2008, at 10:35 PM, Sebastian Briesemeister wrote:
> Hello,
>
>>> some of you might have had already memory problems with weka.
>>> Especially when you start off with more than 10000 features!
>> I have run datasets with 500,000 samples and 1,3 million features
>> without problems, even with non-incremental (batch) learning
>> algorithms. You might be using inappropriate learning algorithms,
>> could you give more details on your experiments?
>
> I simply do a backward attribute selection with CfsSubset as
> evaluator.
CfsSubsetEval discretizes all numeric attributes (if the class is
discrete), which creates another copy of your data. Also, a
correlation matrix is computed. For 20,000 attributes, that's 3.2Gb
right there :-)
Cheers,
Mark.
--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr.,
Orlando, FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <
http://www.sourceforge.net/
projects/pentaho>
_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist