weka's memory problem

View: New views
3 Messages — Rating Filter:   Alert me  

weka's memory problem

by Sebastian Briesemeister :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,

some of you might have had already memory problems with weka. Especially
when you start off with more than 10000 features!

I wonder if this is a problem which can be solved quite easily by
triggering the garbage collector manually in the weka code??

Otherwise an attribute selection with BestFirst search giving an initial
set of 20000 features uses up to 9 GB RAM!!!

Does anyone has experience with memory usage of RapidMiner?

Cheers and thanks,
Sebastian



_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: weka's memory problem

by Arne Muller-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

I can just offer "sympathy" - had memory problems, too (12 Gb) for a
large data set. Some of this might be related to (un-necessarily?)
deep-copying the instances structure.

By the way, I've tried different GC strategies (jvm switches) as well
as triggering GC in my code - without success :-( .

As a short term solution you could buy additional memory ;-)

   regards,

    arne

On Mon, Jun 30, 2008 at 12:04 PM, Sebastian Briesemeister
<briese@...> wrote:

> Dear all,
>
> some of you might have had already memory problems with weka. Especially
> when you start off with more than 10000 features!
>
> I wonder if this is a problem which can be solved quite easily by triggering
> the garbage collector manually in the weka code??
>
> Otherwise an attribute selection with BestFirst search giving an initial set
> of 20000 features uses up to 9 GB RAM!!!
>
> Does anyone has experience with memory usage of RapidMiner?
>
> Cheers and thanks,
> Sebastian
>
>
>
> _______________________________________________
> Wekalist mailing list
> Wekalist@...
> https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
>
>

_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist

Re: weka's memory problem

by Ingo Mierswa :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

> some of you might have had already memory problems with weka.
> Especially when you start off with more than 10000 features!
> [...]
> Does anyone has experience with memory usage of RapidMiner?

RapidMiner employs a completely different data storage mechanism than
Weka and does hardly perform any deep copies of the data at all. So the
memory usage is often lower as a default. We had several users who
reported that even for the Weka learning schemes included in RapidMiner
the memory usage was (much) smaller than in Weka itself which is of
course an interesting result of these data structures. The same applies
for many preprocessing processes. The reason for this is the fact that
we do not build Weka instances from our data (again: no data copy here)
but deliver a new instances object to Weka which directly accesses the
data structures of RapidMiner without deep-copying the data even in Weka
operations. For certain data mining processes, however, things are
exactly the other way round: the learning algorithms of Weka are already
very mature and highly optimized and several implementations of
corresponding RapidMiner operators can not deliver better results. So
often the best solution for memory intensive processes is to combine the
strong points of both worlds: the mature analysis algorithms of Weka on
top of the more efficient data structures of RapidMiner.

Another side note: many RapidMiner processes can be directly applied on
a database by setting the appropriate parameters and there is basically
no memory restriction in these cases. And a second note: RapidMiner is
also available as a 64 version in cases where more than 4 Gb of memory
are available on a 64 bit OS. We ourself work here on a 16 Gb machine
and then the running time starts to be the limiting factor. However,
both notes might help in cases where they are applicable.

Hope that helps,
Ingo

--
Ingo Mierswa
Managing Director

Rapid-I GmbH
Stockumer Str. 475
44149 Dortmund, Germany

Phone: +49 (0)231 425 786 90

E-Mail:  mierswa@...

Sitz: Dortmund
HRB 20720, Amtsgericht Dortmund
Geschäftsführer: Ingo Mierswa, Ralf Klinkenberg

www: http://rapid-i.com/




_______________________________________________
Wekalist mailing list
Wekalist@...
https://list.scms.waikato.ac.nz/mailman/listinfo/wekalist
LightInTheBox - Buy quality products at wholesale price