|
View:
New views
3 Messages
—
Rating Filter:
Alert me
|
|
|
weka's memory problem> some of you might have had already memory problems with weka.
> Especially when you start off with more than 10000 features! I have run datasets with 500,000 samples and 1,3 million features without problems, even with non-incremental (batch) learning algorithms. You might be using inappropriate learning algorithms, could you give more details on your experiments? C.f. why WEKA performs deep copying: This is sound programming practice in Java - _not_ deep copying can lead to some very hard to find bugs in slightly buggy code. It essentially makes your live easier when you develop learning algorithms for WEKA at a slight memory consumption penalty, which is likely to be irrelevant in practice. > Another side note: many RapidMiner processes can be directly applied > on a database by setting the appropriate parameters and there is > basically no memory restriction in these cases. Database access is simple for WEKA as well, see e.g. http://weka.sourceforge.net/wiki/index.php/Databases - weka.core.converters.DatabaseLoader even allows incremental loading, in which case (combined with an incremental learner) no memory restrictions exist as well. >And a second note: RapidMiner is also available as a 64 version in >cases where more than 4 Gb of memory are available on a 64 bit OS. >We ourself work here on a 16 Gb machine and then the running time starts >to be the limiting factor. Any 64bit version of Java (e.g. I use Java HotSpot(TM) 64-Bit Server VM (build 1.5.0_04-b05), which was built somewhere in 2005) can run WEKA with > 4G of main memory on a 64bit OS. So that again is not a limitation of WEKA, but of Java itself - a 32bit JVM is only able to address slightly less than 2GB of memory. Best, Alex -- Dr. Alexander K. Seewald Seewald Solutions www.seewald.at Tel. +43(664)1106886 Fax. +43(1)2533033/2764 _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
|
|
Re: weka's memory problemHello,
>> some of you might have had already memory problems with weka. >> Especially when you start off with more than 10000 features! > I have run datasets with 500,000 samples and 1,3 million features > without problems, even with non-incremental (batch) learning > algorithms. You might be using inappropriate learning algorithms, > could you give more details on your experiments? I simply do a backward attribute selection with CfsSubset as evaluator. > C.f. why WEKA performs deep copying: This is sound programming > practice in Java - _not_ deep copying can lead to some very hard to > find bugs in slightly buggy code. It essentially makes your live > easier when you develop learning algorithms for WEKA at a slight > memory consumption penalty, which is likely to be irrelevant in > practice. I thought Java only performs a "real" deep copy in case you make changes in the object, otherwise it will use references to point to that object. Even though, for feature selection it is not necessary to copy the data, a list of indices should do. >> Another side note: many RapidMiner processes can be directly applied >> on a database by setting the appropriate parameters and there is >> basically no memory restriction in these cases. > Database access is simple for WEKA as well, see e.g. > http://weka.sourceforge.net/wiki/index.php/Databases > - weka.core.converters.DatabaseLoader even allows incremental > loading, in which case (combined with an incremental learner) > no memory restrictions exist as well. > >> And a second note: RapidMiner is also available as a 64 version in >> cases where more than 4 Gb of memory are available on a 64 bit OS. >> We ourself work here on a 16 Gb machine and then the running time starts >> to be the limiting factor. > Any 64bit version of Java (e.g. I use Java HotSpot(TM) 64-Bit Server > VM (build 1.5.0_04-b05), which was built somewhere in 2005) can run > WEKA with > 4G of main memory on a 64bit OS. So that again is not a > limitation of WEKA, but of Java itself - a 32bit JVM is only able to > address slightly less than 2GB of memory. really urgent problem is the extreme memory waste for simple operations as attribute selection. Cheers, Sebastian _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
|
|
Re: weka's memory problemOn 2/07/2008, at 10:35 PM, Sebastian Briesemeister wrote: > Hello, > >>> some of you might have had already memory problems with weka. >>> Especially when you start off with more than 10000 features! >> I have run datasets with 500,000 samples and 1,3 million features >> without problems, even with non-incremental (batch) learning >> algorithms. You might be using inappropriate learning algorithms, >> could you give more details on your experiments? > > I simply do a backward attribute selection with CfsSubset as > evaluator. discrete), which creates another copy of your data. Also, a correlation matrix is computed. For 20,000 attributes, that's 3.2Gb right there :-) Cheers, Mark. -- Mark Hall Senior Developer/Consultant, Pentaho Open Source Business Intelligence Citadel International, Suite 340, 5950 Hazeltine National Dr., Orlando, FL 32822, USA +64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax, Skype: mark.andrew.hall, Yahoo: mark_andrew_hall Download the latest release today <http://www.sourceforge.net/ projects/pentaho> _______________________________________________ Wekalist mailing list Wekalist@... https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist |
| Free Forum Powered by Nabble | Forum Help |