I've spent the last three weekends making a serious attempt to get to
the bottom of the problem of quadratic performance in gexslt that has
been present ever since I first successfully ran an identity
transformation back in 2004.
I used the profiler in EiffelStudio 6.1 (a run that normally last 71
minutes when compiled with gec takes about 14 hours when compiled with
ISE 6.1 with the profiler on), and spent lots of time
following false leads (the report gives exact numbers for routine
calls, which is very useful, and nonsense for time spent e.g. 873.49%
for one routine, which is definitely not useful).
By this morning I had concluded there was nothing in my code that
could explain the results seen - or at least, if there was, I wasn't
going to be able to find it using the EiffelStudio profiler.
So I edited $GOBO/tool/gec/config/c/gcc.cfg to write out profiling
information for gprof (Eric, when can we have the setting of cflags
and lflags in system.xace?). The resulting runtime was 118 minutes - a
much more acceptable overhead. And the resulting report from gprof is
highly illuminating. here follows the first few lines:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls s/call s/call name
36.58 2943.00 2943.00 GC_mark_from
16.32 4256.06 1313.06 GC_header_cache_miss
13.18 5316.64 1060.58 GC_mark_local
10.57 6166.79 850.15 GC_steal_mark_stack
4.32 6514.36 347.57 GC_add_to_black_list_stack
1.55 6639.04 124.68 GC_push_marked
1.26 6740.27 101.23 GC_find_header
1.22 6838.10 97.83 GC_reclaim_clear
1.10 6926.51 88.41 GC_do_local_mark
0.82 6992.39 65.88 GC_generic_malloc_many
0.64 7044.16 51.77 GC_block_was_dirty
0.63 7095.09 50.93 GC_install_header
0.58 7141.48 46.39 GC_apply_to_all_blocks
0.47 7179.41 37.93 GC_build_fl
0.44 7214.44 35.03 530952040 0.00 0.00 T122f10
0.40 7246.41 31.97 GC_allochblk_nth
0.29 7269.71 23.30 GC_enclosing_mapping
0.26 7290.70 20.99 262713928 0.00 0.00 T506f27
0.25 7311.07 20.37 GC_malloc
0.25 7331.32 20.25 127354522 0.00 0.00 T162x16589
0.22 7348.82 17.50 GC_free_block_ending_at
0.22 7366.13 17.31 GC_malloc_atomic
0.20 7382.11 15.99 3529690218 0.00 0.00 T240f8
0.19 7397.55 15.44 GC_reclaim_block
0.18 7412.41 14.86 GC_reclaim_uninit
0.18 7427.16 14.75 529087403 0.00 0.00 T57f9
0.18 7441.84 14.68 529541505 0.00 0.00 T15f4
0.13 7452.31 10.47 2030496041 0.00 0.00 T240c7
0.11 7461.08 8.77 105048154 0.00 0.00 T27f8
0.10 7469.40 8.32 2049701317 0.00 0.00 GE_new239
0.10 7477.42 8.02 7679890135 0.00 0.00 GE_check_null
0.10 7485.32 7.90 172970992 0.00 0.00 T772f27p1
0.10 7493.20 7.88 2049636054 0.00 0.00 GE_new240
in short, almost all the run time is for memory management (this did
not surprise me as typically I see the program using 125-290% CPU - I
have a quad-core processor, so fully utilized = 400% - and I
configured the boehm-gc for parallel marking).
Now my problem is what to do about it (incidentally T122f10 is
{DS_ARRAYED_LIST}.item - nothing very surprising about that either, I
guess.). Does anyone have any suggested approaches?
--
Colin Adams
Preston Lancashire
-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2008.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/_______________________________________________
gobo-eiffel-develop mailing list
gobo-eiffel-develop@...
https://lists.sourceforge.net/lists/listinfo/gobo-eiffel-develop