Data needed on long-running suites

View: New views
3 Messages — Rating Filter:   Alert me  

Data needed on long-running suites

by kentb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

All,

David and I are working on ways to shorten the validation phase of the inner
programming loop (feature -> test -> code -> validate). One of the ideas is
to find a general way to run suites faster by running tests in parallel. To
find effective parallelization strategies, we need data on test run times.
Rather than build big infrastructure to do this (which would undoubtedly be
cool), I'd like to start with the simplest thing that could possibly work.
So:

If you have a long-running test suite and
You run it using Ant and
You can use the XML formattter ("<formatter type="xml"/>") and
You don't mind sharing your test names with me confidentially

Would you please zip your reports and email them to me. If they're too big
for email, please let me know and we'll figure out a backup plan. I'd
appreciate any context you can provide--how long the suite has been in
development, the experience level of the developers, whatever else you think
we might need to know.

The first data set I looked at was from DevCreek-->90M test runs from
production coding representing more than 50 person-years of development. To
my surprise, the test runs exhibit a power law distribution (way lots of
fast tests, a few very long running tests, plot a histogram log-log and you
get a straight line). I have no idea what this means, but it brings to mind
the Asimov quote, "The most exciting phrase to hear in science, the one that
heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny
...'" I've found many power law distributions in the static and dynamic
structure of code, but the mechanisms influencing test run times seem to be
completely different than those influencing code structure.

Anyway, I'd love to validate those findings. The Ant XML format seems like a
good place to start. Alternatively, you could send me one or more files with
test run times one per line.

Questions and comments appreciated.

Yours in science,

Kent Beck
Three Rivers Institute


Re: Data needed on long-running suites

by David Saff :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Kent,

Did you ever get any results from this?  I'm betting I'll have to
write an anonymization tool to bundle up whatever Google data we can
share--would it help if I offered it to the list?

   David

On Mon, Sep 8, 2008 at 7:53 PM, kentb <kentb@...> wrote:

> All,
>
> David and I are working on ways to shorten the validation phase of the inner
> programming loop (feature -> test -> code -> validate). One of the ideas is
> to find a general way to run suites faster by running tests in parallel. To
> find effective parallelization strategies, we need data on test run times.
> Rather than build big infrastructure to do this (which would undoubtedly be
> cool), I'd like to start with the simplest thing that could possibly work.
> So:
>
> If you have a long-running test suite and
> You run it using Ant and
> You can use the XML formattter ("<formatter type="xml"/>") and
> You don't mind sharing your test names with me confidentially
>
> Would you please zip your reports and email them to me. If they're too big
> for email, please let me know and we'll figure out a backup plan. I'd
> appreciate any context you can provide--how long the suite has been in
> development, the experience level of the developers, whatever else you think
> we might need to know.
>
> The first data set I looked at was from DevCreek-->90M test runs from
> production coding representing more than 50 person-years of development. To
> my surprise, the test runs exhibit a power law distribution (way lots of
> fast tests, a few very long running tests, plot a histogram log-log and you
> get a straight line). I have no idea what this means, but it brings to mind
> the Asimov quote, "The most exciting phrase to hear in science, the one that
> heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny
> ...'" I've found many power law distributions in the static and dynamic
> structure of code, but the mechanisms influencing test run times seem to be
> completely different than those influencing code structure.
>
> Anyway, I'd love to validate those findings. The Ant XML format seems like a
> good place to start. Alternatively, you could send me one or more files with
> test run times one per line.
>
> Questions and comments appreciated.
>
> Yours in science,
>
> Kent Beck
> Three Rivers Institute
>
>

RE: Data needed on long-running suites

by kentb :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I got one submission: data from 2000 Python tests. Interestingly, they
didn't show any clear trend in runtimes. I have also analyzed the data from
Gump, and found a clear power-law ish distribution of runtimes. I'm still
open for more data, or suggestions of how to make the submission process
simpler.
 
Cheers,
 
Kent Beck
Three Rivers Institute

  _____  

From: junit@... [mailto:junit@...] On Behalf Of
David Saff
Sent: Monday, September 15, 2008 9:50 AM
To: junit@...
Subject: Re: [junit] Data needed on long-running suites



Kent,

Did you ever get any results from this? I'm betting I'll have to
write an anonymization tool to bundle up whatever Google data we can
share--would it help if I offered it to the list?

David

On Mon, Sep 8, 2008 at 7:53 PM, kentb <kentb@earthlink.
<mailto:kentb%40earthlink.net> net> wrote:
> All,
>
> David and I are working on ways to shorten the validation phase of the
inner
> programming loop (feature -> test -> code -> validate). One of the ideas
is
> to find a general way to run suites faster by running tests in parallel.
To
> find effective parallelization strategies, we need data on test run times.
> Rather than build big infrastructure to do this (which would undoubtedly
be

> cool), I'd like to start with the simplest thing that could possibly work.
> So:
>
> If you have a long-running test suite and
> You run it using Ant and
> You can use the XML formattter ("<formatter type="xml"/>") and
> You don't mind sharing your test names with me confidentially
>
> Would you please zip your reports and email them to me. If they're too big
> for email, please let me know and we'll figure out a backup plan. I'd
> appreciate any context you can provide--how long the suite has been in
> development, the experience level of the developers, whatever else you
think
> we might need to know.
>
> The first data set I looked at was from DevCreek-->90M test runs from
> production coding representing more than 50 person-years of development.
To
> my surprise, the test runs exhibit a power law distribution (way lots of
> fast tests, a few very long running tests, plot a histogram log-log and
you
> get a straight line). I have no idea what this means, but it brings to
mind
> the Asimov quote, "The most exciting phrase to hear in science, the one
that
> heralds new discoveries, is not 'Eureka!' (I found it!) but 'That's funny
> ...'" I've found many power law distributions in the static and dynamic
> structure of code, but the mechanisms influencing test run times seem to
be
> completely different than those influencing code structure.
>
> Anyway, I'd love to validate those findings. The Ant XML format seems like
a
> good place to start. Alternatively, you could send me one or more files
with

> test run times one per line.
>
> Questions and comments appreciated.
>
> Yours in science,
>
> Kent Beck
> Three Rivers Institute
>
>


 


[Non-text portions of this message have been removed]

LightInTheBox - Buy quality products at wholesale price!