|
View:
New views
2 Messages
—
Rating Filter:
Alert me
|
|
|
Copy uint32 to Memory optimization?!Hi there,
I just came past the following code. I'm optimizing with -Os. static INLINE void setBeef(HEAPNODE * node){ *((u_long *) ((uptr_t) node + node->hn_size - sizeof(0xDEADBEEF))) = 0xDEADBEEF; 3c8: 8f ee ldi r24, 0xEF ; 239 3ca: 9e eb ldi r25, 0xBE ; 190 3cc: ad ea ldi r26, 0xAD ; 173 3ce: be ed ldi r27, 0xDE ; 222 3d0: b2 93 st -Z, r27 3d2: a2 93 st -Z, r26 3d4: 92 93 st -Z, r25 3d6: 82 93 st -Z, r24 The T register is already filled, so that's fine. But why are 4 additional registers needed? Doesn't the following code do the same? Especially as registers are quite valuable? ldi r24, 0xDE ; 222 st -Z, r24 ldi r24, 0xAD ; 173 st -Z, r24 ldi r24, 0xBE ; 190 st -Z, r24 ldi r24, 0xEF ; 239 st -Z, r24 I'm neither too familiar with assembler nor compiler. Therefore an answer like: "Yes, your're right, but adding such an optimization is difficult / nobody was motivated to implement it." will satisfy me. ;-) Cheers Morty _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@... http://lists.nongnu.org/mailman/listinfo/avr-gcc-list |
|
|
Re: Copy uint32 to Memory optimization?!This happens because the AVR backend of gcc defines RTL for wide mode
instructions it does not have. So what you have here is a literal translation from a single RTL instruction into 8 assembler instructions. The compiler sees 1 instruction. And yes you are correct this produces sub-optimal code. The solution - yet to be implemented is to split wide mode instructions into byte (or word) sized pieces. Then RTL will more closely match assembler. This would remove the need for additional registers and also allow further optimisation. It is little bit tricky, as not all instructions can be split. (Those involving carry are problem). Splitting instruction too early will create mixture that can prevent certain optimisations. Andy Moritz Struebe wrote: > Hi there, > > I just came past the following code. I'm optimizing with -Os. > > static INLINE void setBeef(HEAPNODE * node){ > *((u_long *) ((uptr_t) node + node->hn_size - sizeof(0xDEADBEEF))) > = 0xDEADBEEF; > 3c8: 8f ee ldi r24, 0xEF ; 239 > 3ca: 9e eb ldi r25, 0xBE ; 190 > 3cc: ad ea ldi r26, 0xAD ; 173 > 3ce: be ed ldi r27, 0xDE ; 222 > 3d0: b2 93 st -Z, r27 > 3d2: a2 93 st -Z, r26 > 3d4: 92 93 st -Z, r25 > 3d6: 82 93 st -Z, r24 > > The T register is already filled, so that's fine. But why are 4 > additional registers needed? Doesn't the following code do the same? > Especially as registers are quite valuable? > > > ldi r24, 0xDE ; 222 > st -Z, r24 > ldi r24, 0xAD ; 173 > st -Z, r24 > ldi r24, 0xBE ; 190 > st -Z, r24 > ldi r24, 0xEF ; 239 > st -Z, r24 > > I'm neither too familiar with assembler nor compiler. Therefore an > answer like: "Yes, your're right, but adding such an optimization is > difficult / nobody was motivated to implement it." will satisfy me. ;-) > > Cheers > Morty > > > > > _______________________________________________ > AVR-GCC-list mailing list > AVR-GCC-list@... > http://lists.nongnu.org/mailman/listinfo/avr-gcc-list _______________________________________________ AVR-GCC-list mailing list AVR-GCC-list@... http://lists.nongnu.org/mailman/listinfo/avr-gcc-list |
| Free Forum Powered by Nabble | Forum Help |