weka's memory problem

View: New views
3 Messages — Rating Filter:   Alert me  

weka's memory problem

by Alexander K. Seewald :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> some of you might have had already memory problems with weka.
> Especially when you start off with more than 10000 features!
I have run datasets with 500,000 samples and 1,3 million features
without problems, even with non-incremental (batch) learning
algorithms. You might be using inappropriate learning algorithms,
could you give more details on your experiments?

C.f. why WEKA performs deep copying: This is sound programming
practice in Java - _not_ deep copying can lead to some very hard to
find bugs in slightly buggy code. It essentially makes your live
easier when you develop learning algorithms for WEKA at a slight
memory consumption penalty, which is likely to be irrelevant in
practice.

> Another side note: many RapidMiner processes can be directly applied
> on a database by setting the appropriate parameters and there is
> basically no memory restriction in these cases.
Database access is simple for WEKA as well, see e.g.
http://weka.sourceforge.net/wiki/index.php/Databases
- weka.core.converters.DatabaseLoader even allows incremental
  loading, in which case (combined with an incremental learner)
  no memory restrictions exist as well.

>And a second note: RapidMiner is also available as a 64 version in
>cases where more than 4 Gb of memory are available on a 64 bit OS.
>We ourself work here on a 16 Gb machine and then the running time starts
>to be the limiting factor.
Any 64bit version of Java (e.g. I use Java HotSpot(TM) 64-Bit Server
VM (build 1.5.0_04-b05), which was built somewhere in 2005) can run
WEKA with > 4G of main memory on a 64bit OS. So that again is not a
limitation of WEKA, but of Java itself - a 32bit JVM is only able to
address slightly less than 2GB of memory.

Best,
  Alex
--
Dr. Alexander K. Seewald

Seewald Solutions
www.seewald.at
Tel. +43(664)1106886
Fax. +43(1)2533033/2764


_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: weka's memory problem

by Sebastian Briesemeister :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

>> some of you might have had already memory problems with weka.
>> Especially when you start off with more than 10000 features!
> I have run datasets with 500,000 samples and 1,3 million features
> without problems, even with non-incremental (batch) learning
> algorithms. You might be using inappropriate learning algorithms,
> could you give more details on your experiments?

I simply do a backward attribute selection with CfsSubset as evaluator.

> C.f. why WEKA performs deep copying: This is sound programming
> practice in Java - _not_ deep copying can lead to some very hard to
> find bugs in slightly buggy code. It essentially makes your live
> easier when you develop learning algorithms for WEKA at a slight
> memory consumption penalty, which is likely to be irrelevant in
> practice.

I thought Java only performs a "real" deep copy in case you make changes
in the object, otherwise it will use references to point to that object.
Even though, for feature selection it is not necessary to copy the data,
a list of indices should do.

>> Another side note: many RapidMiner processes can be directly applied
>> on a database by setting the appropriate parameters and there is
>> basically no memory restriction in these cases.
> Database access is simple for WEKA as well, see e.g.
> http://weka.sourceforge.net/wiki/index.php/Databases
> - weka.core.converters.DatabaseLoader even allows incremental
>   loading, in which case (combined with an incremental learner)
>   no memory restrictions exist as well.
>
>> And a second note: RapidMiner is also available as a 64 version in
>> cases where more than 4 Gb of memory are available on a 64 bit OS.
>> We ourself work here on a 16 Gb machine and then the running time starts
>> to be the limiting factor.
> Any 64bit version of Java (e.g. I use Java HotSpot(TM) 64-Bit Server
> VM (build 1.5.0_04-b05), which was built somewhere in 2005) can run
> WEKA with > 4G of main memory on a 64bit OS. So that again is not a
> limitation of WEKA, but of Java itself - a 32bit JVM is only able to
> address slightly less than 2GB of memory.
Right! I also didn't have a problem to address more than 4 GB. The only
really urgent problem is the extreme memory waste for simple operations
as attribute selection.

Cheers,
Sebastian

_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: weka's memory problem

by Mark Hall-9 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On 2/07/2008, at 10:35 PM, Sebastian Briesemeister wrote:

> Hello,
>
>>> some of you might have had already memory problems with weka.
>>> Especially when you start off with more than 10000 features!
>> I have run datasets with 500,000 samples and 1,3 million features
>> without problems, even with non-incremental (batch) learning
>> algorithms. You might be using inappropriate learning algorithms,
>> could you give more details on your experiments?
>
> I simply do a backward attribute selection with CfsSubset as  
> evaluator.
CfsSubsetEval discretizes all numeric attributes (if the class is  
discrete), which creates another copy of your data. Also, a  
correlation matrix is computed. For 20,000 attributes, that's 3.2Gb  
right there :-)

Cheers,
Mark.

--
Mark Hall
Senior Developer/Consultant, Pentaho Open Source Business Intelligence
Citadel International, Suite 340, 5950 Hazeltine National Dr.,
Orlando, FL 32822, USA
+64 7 847-3537 office, +64 21 399-132 mobile, +1 815 550-8637 fax,
Skype: mark.andrew.hall, Yahoo: mark_andrew_hall
Download the latest release today <http://www.sourceforge.net/ 
projects/pentaho>




_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
LightInTheBox - Buy quality products at wholesale price