Copy uint32 to Memory optimization?!

View: New views
2 Messages — Rating Filter:   Alert me  

Copy uint32 to Memory optimization?!

by Moritz Struebe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi there,

I just came past the following code. I'm optimizing with -Os.

static INLINE void setBeef(HEAPNODE * node){
    *((u_long *) ((uptr_t) node + node->hn_size - sizeof(0xDEADBEEF))) =
0xDEADBEEF;
 3c8:    8f ee           ldi    r24, 0xEF    ; 239
 3ca:    9e eb           ldi    r25, 0xBE    ; 190
 3cc:    ad ea           ldi    r26, 0xAD    ; 173
 3ce:    be ed           ldi    r27, 0xDE    ; 222
 3d0:    b2 93           st    -Z, r27
 3d2:    a2 93           st    -Z, r26
 3d4:    92 93           st    -Z, r25
 3d6:    82 93           st    -Z, r24

The T register is already filled, so that's fine. But why are 4
additional registers needed? Doesn't the following code do the same?
Especially as registers are quite valuable?


ldi    r24, 0xDE    ; 222
st    -Z, r24
ldi    r24, 0xAD    ; 173
st    -Z, r24
ldi    r24, 0xBE    ; 190
st    -Z, r24
ldi    r24, 0xEF    ; 239
st    -Z, r24

I'm neither too familiar with assembler nor compiler. Therefore an
answer like: "Yes, your're right, but adding such an optimization is
difficult / nobody was motivated to implement it." will satisfy me. ;-)

Cheers
Morty




_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list

Re: Copy uint32 to Memory optimization?!

by Andy H-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

This happens because the AVR backend of gcc defines RTL for wide mode  
instructions it does not have. So what you have here is a literal
translation from
a single RTL instruction into 8 assembler instructions. The compiler
sees 1 instruction.

And yes you are correct this produces sub-optimal code.

The solution - yet to be implemented is to split wide mode instructions
into byte (or word) sized pieces. Then RTL will more closely match
assembler. This would remove the need for additional registers and also
allow further optimisation.

It is little bit tricky, as not all instructions can be split. (Those
involving carry are problem). Splitting instruction too early will
create mixture that can prevent certain optimisations.


Andy


Moritz Struebe wrote:

> Hi there,
>
> I just came past the following code. I'm optimizing with -Os.
>
> static INLINE void setBeef(HEAPNODE * node){
>    *((u_long *) ((uptr_t) node + node->hn_size - sizeof(0xDEADBEEF)))
> = 0xDEADBEEF;
> 3c8:    8f ee           ldi    r24, 0xEF    ; 239
> 3ca:    9e eb           ldi    r25, 0xBE    ; 190
> 3cc:    ad ea           ldi    r26, 0xAD    ; 173
> 3ce:    be ed           ldi    r27, 0xDE    ; 222
> 3d0:    b2 93           st    -Z, r27
> 3d2:    a2 93           st    -Z, r26
> 3d4:    92 93           st    -Z, r25
> 3d6:    82 93           st    -Z, r24
>
> The T register is already filled, so that's fine. But why are 4
> additional registers needed? Doesn't the following code do the same?
> Especially as registers are quite valuable?
>
>
> ldi    r24, 0xDE    ; 222
> st    -Z, r24
> ldi    r24, 0xAD    ; 173
> st    -Z, r24
> ldi    r24, 0xBE    ; 190
> st    -Z, r24
> ldi    r24, 0xEF    ; 239
> st    -Z, r24
>
> I'm neither too familiar with assembler nor compiler. Therefore an
> answer like: "Yes, your're right, but adding such an optimization is
> difficult / nobody was motivated to implement it." will satisfy me. ;-)
>
> Cheers
> Morty
>
>
>
>
> _______________________________________________
> AVR-GCC-list mailing list
> AVR-GCC-list@...
> http://lists.nongnu.org/mailman/listinfo/avr-gcc-list


_______________________________________________
AVR-GCC-list mailing list
AVR-GCC-list@...
http://lists.nongnu.org/mailman/listinfo/avr-gcc-list
LightInTheBox - Buy quality products at wholesale price