Reference counting instead of GC?

View: New views
8 Messages — Rating Filter:   Alert me  

Reference counting instead of GC?

by Alpár Jüttner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

Did anyone consider replacing the garbage collection with reference
counting in the Erlang emulator? How difficult would it be to do?

Probably, it isn't worth changing in general, but reference counting has
some advantages which are important in some special use cases:

      * Per process memory pools is not necessary.
      * Message passing (of large data) will be more efficient as
        messages don't have to be physically copied.
      * There is no need for garbage collection.  GC is for example a
        major pitfall in hard real time systems (see e.g.
        http://www.erlang.se/workshop/2007/proceedings/05nicosi.pdf )

Here are the drawbacks I can see now:

      * Higher memory overhead.
              * However the memory is less fragmented, so we get back
                something in return.
      * It is said to be slower. I'm not absolutely convinced about it
        though.
      * In general, the highest problem with reference counting is that
        in case of cross/circular referencing, dead objects will be left
        in the memory. However, this cannot happen in Erlang due to the
        immutability of the data.

Regards,
Alpar


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Thomas Lindgren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




--- On Thu, 7/3/08, Alpár Jüttner <alpar@...> wrote:
> Did anyone consider replacing the garbage collection with
> reference
> counting in the Erlang emulator? How difficult would it be
> to do?

Good luck :-) However, note that the hipe guys have over the years proposed garbage collectors with most of the good properties you mention. (Cf the "shared" and "hybrid" emulators. E.g., "erl -hybrid".)

Also note that simple reference counting does not guarantee real time collection or bounded or even short pauses. Consider the case when you drop a big term: a reference counting implementation must decrement the reference counts of all subterms (recursively), while a copying collector won't even visit the dead data. Because the dead term can have arbitrary size, the pause while it is traversed is not bounded. (But you may be thinking of more sophisticated variants than that.)

Best,
Thomas



     
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Richard Carlsson-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Alpár Jüttner wrote:
> Hi,
>
> Did anyone consider replacing the garbage collection with reference
> counting in the Erlang emulator? How difficult would it be to do?

Perhaps not "difficult", but lots of tedious and error-prone work.

> Probably, it isn't worth changing in general, but reference counting has
> some advantages which are important in some special use cases:
>
>       * Per process memory pools is not necessary.
>       * Message passing (of large data) will be more efficient as
>         messages don't have to be physically copied.
>       * There is no need for garbage collection.  GC is for example a
>         major pitfall in hard real time systems (see e.g.
>         http://www.erlang.se/workshop/2007/proceedings/05nicosi.pdf )
>
> Here are the drawbacks I can see now:
>
>       * Higher memory overhead.
>               * However the memory is less fragmented, so we get back
>                 something in return.
>       * It is said to be slower. I'm not absolutely convinced about it
>         though.
>       * In general, the highest problem with reference counting is that
>         in case of cross/circular referencing, dead objects will be left
>         in the memory. However, this cannot happen in Erlang due to the
>         immutability of the data.

Reference counting would most certainly both be slower and use more
memory, in the cases where it matters the most for Erlang: lists and
tuples, floats, small binaries, and large pids and refs.

On the other hand, large binaries are already handled by reference
counting in the BEAM, so you already have the advantages that you list,
for big objects.

     /Richard
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Hynek Vychodil :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Message don't copy for example Termite (http://toute.ca/) and this is of course faster for big messages. But there is many good reasons why Erlang is as is. Counting GC is in some cases slower than generating GC especially on multi core. For example, when you do walking tree, in current GC schema you can copy after 1000 reduces small resulting tree and during those reduction you don't GC anything. Contrary counting GC every change of tree and each GC step can take unpredictable different long time. Current GC occurs once after longer period and depend only on size off current data. In result it is faster on single core. There is more fun with multi core. When processes share data (don't copy message), counter for each structure can be accessed by many processes running possibly on many CPU cores. CPU cache don't work well in this case and operation is slow. So counting GC is slow in concurrent environment. Counting GC is not as good as looks in first sight. When there is possibility to pack big messages into binary which is not copied, you are able solve pitfalls of Erlang implementation. And share anything between processes also impact reliability, of course. I don't believe change GC is good idea except you want just try it.

On Thu, Jul 3, 2008 at 2:16 PM, Alpár Jüttner <alpar@...> wrote:
Hi,

Did anyone consider replacing the garbage collection with reference
counting in the Erlang emulator? How difficult would it be to do?

Probably, it isn't worth changing in general, but reference counting has
some advantages which are important in some special use cases:

     * Per process memory pools is not necessary.
     * Message passing (of large data) will be more efficient as
       messages don't have to be physically copied.
     * There is no need for garbage collection.  GC is for example a
       major pitfall in hard real time systems (see e.g.
       http://www.erlang.se/workshop/2007/proceedings/05nicosi.pdf )

Here are the drawbacks I can see now:

     * Higher memory overhead.
             * However the memory is less fragmented, so we get back
               something in return.
     * It is said to be slower. I'm not absolutely convinced about it
       though.
     * In general, the highest problem with reference counting is that
       in case of cross/circular referencing, dead objects will be left
       in the memory. However, this cannot happen in Erlang due to the
       immutability of the data.

Regards,
Alpar


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions



--
--Hynek (Pichi) Vychodil
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Alpár Jüttner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2008-07-03 at 07:41 -0700, Thomas Lindgren wrote:

> Good luck :-) However, note that the hipe guys have over the years
>  proposed garbage collectors with most of the good properties you
>  mention. (Cf the "shared" and "hybrid" emulators. E.g., "erl
>  -hybrid".)

Could you tell me where can I find some info about these modes? The erl
manual doesn't even list the -shared and the -hybrid switches.

> Also note that simple reference counting does not guarantee real time
>  collection or bounded or even short pauses. Consider the case when you
>  drop a big term: a reference counting implementation must decrement
>  the reference counts of all subterms (recursively), while a copying
>  collector won't even visit the dead data.
>  Because the dead term can
>  have arbitrary size, the pause while it is traversed is not bounded.

Yes but at least it is a deterministic and predictable time you can
calculate with.

>  (But you may be thinking of more sophisticated variants than that.)

Yes indeed, the process of dropping the unused terms can be safely
interrupted.

In fact, my idea was to have a special (low priority) process in the
emulator taking care of the unused terms. In this way the dropping of
big data could be automatically postponed if there are "more urgent"
things to do.

Best regards,
Alpar

_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Alpár Jüttner-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Thu, 2008-07-03 at 17:47 +0200, Hynek Vychodil wrote:
> Message don't copy for example Termite (http://toute.ca/) and this is
> of course faster for big messages. But there is many good reasons why
> Erlang is as is.
>[...]
>I don't believe change GC is good idea except you want just try it.

I didn't want to suggest replacing GC in the mainstream erlang.
However I have a feeling that the reference counting approach has some
considerable advantages in some _special_ applications, for example
      * when an erlang(-like) system is operated on very low capacity
        hardware and/or
      * for hard real time applications.

Best regards,
Alpar


_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Thomas Lindgren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message




--- On Thu, 7/3/08, Alpár Jüttner <alpar@...> wrote:
> Could you tell me where can I find some info about these
> modes? The erl
> manual doesn't even list the -shared and the -hybrid
> switches.

The authors are probably the ultimate sources. I don't know of any official documentation.

> In fact, my idea was to have a special (low priority)
> process in the
> emulator taking care of the unused terms. In this way the
> dropping of
> big data could be automatically postponed if there are
> "more urgent"
> things to do.

You will still get a form of priority inversion if the collector can't keep up with the mutator. But best of luck with your project, it sounds like an interesting topic to attack.

Best,
Thomas




     
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions

Re: Reference counting instead of GC?

by Robert Virding-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Many years ago I did an Erlang implementation based on a reference
counting GC (it *is* gc even if not a mark-sweep or copying). Some
comments based on that work (no specific order):

- It worked very well and was reasonably efficient.
- One interesting property was that it reclaimed data *fast* so the
memory footprint was relatively small.
- Even though each object may be larger as there is only one copy then
memory usage is often less than for a copying collector.
- Freeing large unused structures is no problems, there well-known are
simple techniques for doing that in a non-blocking way.
- My work was done on a uni-processor so I have no experience with
doing it on a multi-core. I have seen that there are papers written on
the subject but I have not studied them.
- To get speed you have to be cunning at all levels so it is difficult
to just take the BEAM and hack in a different GC, especially one which
is so different.
- The biggest problem is probably the sheer work involved in doing
enough of an implementation to test it. As mentioned in previous
comment you have to redo basically all memory allocation everywhere.
- Erlang is suited for reference counting as it has no circular
structures, well it has a few circular references but they are only in
well-known places so no real problems.
- It was a fun memory system. Seeing all the reference counts drop to
0 at the end of a run was very satisfying. :-)
- Shared memory between processes at the *implementation level* is no
problem as the application never sees it, it can only make copies.

I really liked it, one of these days I will do another.

Robert


On 03/07/2008, Alpár Jüttner <alpar@...> wrote:

> On Thu, 2008-07-03 at 07:41 -0700, Thomas Lindgren wrote:
>
>> Good luck :-) However, note that the hipe guys have over the years
>>  proposed garbage collectors with most of the good properties you
>>  mention. (Cf the "shared" and "hybrid" emulators. E.g., "erl
>>  -hybrid".)
>
> Could you tell me where can I find some info about these modes? The erl
> manual doesn't even list the -shared and the -hybrid switches.
>
>> Also note that simple reference counting does not guarantee real time
>>  collection or bounded or even short pauses. Consider the case when you
>>  drop a big term: a reference counting implementation must decrement
>>  the reference counts of all subterms (recursively), while a copying
>>  collector won't even visit the dead data.
>>  Because the dead term can
>>  have arbitrary size, the pause while it is traversed is not bounded.
>
> Yes but at least it is a deterministic and predictable time you can
> calculate with.
>
>>  (But you may be thinking of more sophisticated variants than that.)
>
> Yes indeed, the process of dropping the unused terms can be safely
> interrupted.
>
> In fact, my idea was to have a special (low priority) process in the
> emulator taking care of the unused terms. In this way the dropping of
> big data could be automatically postponed if there are "more urgent"
> things to do.
>
> Best regards,
> Alpar
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions@...
> http://www.erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
erlang-questions@...
http://www.erlang.org/mailman/listinfo/erlang-questions
LightInTheBox - Buy quality products at wholesale price