Spooling on = huge intial delay, Spooling off = invalid XML errors. Isn't there a better option for buffering the incoming http response?

View: New views
4 Messages — Rating Filter:   Alert me  

Spooling on = huge intial delay, Spooling off = invalid XML errors. Isn't there a better option for buffering the incoming http response?

by acgourley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Are there any other options built into SVN kit for dealing with large incoming HTTP responses? Users of our product sometimes have very large SVN repositories, so we had to enable spooling. The problem is that we would like to start processing svn events as soon as possible, and not wait for the incoming response to finish.

Is there a better way to buffer the incoming response without bouncing the whole thing off the disk?

Re: Spooling on = huge intial delay, Spooling off = invalid XML errors. Isn't there a better option for buffering the incoming http response?

by Alexander Kitaev-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello,

 > Is there a better way to buffer the incoming response without
bouncing the
 > whole thing off the disk?
Spooling of HTTP responses (i.e. when response is first saved to disk
and only then processed) was always present in SVNKit and native
Subversion client. However, by default it is only enabled for "diff"
operation (that is also used by "merge" one).

The reason to introduce spooling is that slow reading of HTTP response
may sometimes result in reading "garbage" data, especially if there are
delays in response processing. "merge" operation for instance is
considered as slow because it performs additional requests to the same
repository while processing the original request response:

SENT: HTTP REPORT...
PROCESSING HTTP REPORT response
    SENT HTTP GET
    READ HTTP GET reponse
    PROCESSING HTTP GET
PROCESSING HTTP REPORT response continued
....
    SENT HTTP GET
    READ HTTP GET reponse
    PROCESSING HTTP GET
PROCESSING HTTP REPORT response continued
...
and so on.

Update and other similar operations are considered "fast" and no
spooling is performed by default. However, SVNKit allows to provide
custom implementations of ISVNEditor interface for processing update
request response and in case user implements it in such a way that
delays are introduced, then we recommend to enable spooling not to get
XML parsing errors (because of garbage data being read).

The workaround for that could be changing client side code (ISVNEditor
implementation) so, that it works relatively fast - and perform custom
processing of the data received in a separate parallel thread or after
update operation is completed.

Alexander Kitaev,
TMate Software,
http://svnkit.com/ - Java [Sub]Versioning Library!

acgourley wrote:
> Are there any other options built into SVN kit for dealing with large
> incoming HTTP responses? Users of our product sometimes have very large SVN
> repositories, so we had to enable spooling. The problem is that we would
> like to start processing svn events as soon as possible, and not wait for
> the incoming response to finish.
>
> Is there a better way to buffer the incoming response without bouncing the
> whole thing off the disk?

---------------------------------------------------------------------
To unsubscribe, e-mail: svnkit-users-unsubscribe@...
For additional commands, e-mail: svnkit-users-help@...


Re: Spooling on = huge intial delay, Spooling off = invalid XML errors. Isn't there a better option for buffering the incoming http response?

by acgourley :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thanks for the replay, Alexander.

I don't think the right answer is to just speed up the ISVNEditor implementation, because its very hard to test that you got it "fast enough" for all cases, so there is always the potential of having invalid XML.

It seems like the real answer is to read from the spooled http request while it is being pulled down.

in this code:

if (myIsSpoolResponse || myIsSpoolAll) {
                OutputStream dst = null;
                try {
                    tmpFile = new SpoolFile(mySpoolDirectory);
                    dst = tmpFile.openForWriting();
                    dst = new SVNCancellableOutputStream(dst, myRepository.getCanceller());
                    // this will exhaust http stream anyway.
                    err = readData(request, dst);
                    SVNFileUtil.closeFile(dst);
                    dst = null;
                    if (err != null) {
                        return err;
                    }
                    // this stream always have to be closed.
                    is = tmpFile.openForReading();
                } finally {
                    SVNFileUtil.closeFile(dst);
                }
            } else {
                is = createInputStream(request.getResponseHeader(), getInputStream());
            }

Perhaps it would be best to return an input stream to the tmpFile while it is being written to?

I was mostly asking if there was an official way to get around the problem I was experiencing (either having invalid XML errors with no spooling or unacceptable startup delays with spooling on large repository checkouts).

If there isn't, I'll just accept that I need to edit SVNKit myself for our needs (hopefully just fulfilling an interface and injecting it in form our code will be enough).

-Alex

Alexander Kitaev-3 wrote:
Hello,

 > Is there a better way to buffer the incoming response without
bouncing the
 > whole thing off the disk?
Spooling of HTTP responses (i.e. when response is first saved to disk
and only then processed) was always present in SVNKit and native
Subversion client. However, by default it is only enabled for "diff"
operation (that is also used by "merge" one).

The reason to introduce spooling is that slow reading of HTTP response
may sometimes result in reading "garbage" data, especially if there are
delays in response processing. "merge" operation for instance is
considered as slow because it performs additional requests to the same
repository while processing the original request response:

SENT: HTTP REPORT...
PROCESSING HTTP REPORT response
    SENT HTTP GET
    READ HTTP GET reponse
    PROCESSING HTTP GET
PROCESSING HTTP REPORT response continued
....
    SENT HTTP GET
    READ HTTP GET reponse
    PROCESSING HTTP GET
PROCESSING HTTP REPORT response continued
...
and so on.

Update and other similar operations are considered "fast" and no
spooling is performed by default. However, SVNKit allows to provide
custom implementations of ISVNEditor interface for processing update
request response and in case user implements it in such a way that
delays are introduced, then we recommend to enable spooling not to get
XML parsing errors (because of garbage data being read).

The workaround for that could be changing client side code (ISVNEditor
implementation) so, that it works relatively fast - and perform custom
processing of the data received in a separate parallel thread or after
update operation is completed.

Alexander Kitaev,
TMate Software,
http://svnkit.com/ - Java [Sub]Versioning Library!

acgourley wrote:
> Are there any other options built into SVN kit for dealing with large
> incoming HTTP responses? Users of our product sometimes have very large SVN
> repositories, so we had to enable spooling. The problem is that we would
> like to start processing svn events as soon as possible, and not wait for
> the incoming response to finish.
>
> Is there a better way to buffer the incoming response without bouncing the
> whole thing off the disk?

---------------------------------------------------------------------
To unsubscribe, e-mail: svnkit-users-unsubscribe@svnkit.com
For additional commands, e-mail: svnkit-users-help@svnkit.com

Re: Spooling on = huge intial delay, Spooling off = invalid XML errors. Isn't there a better option for buffering the incoming http response?

by Alexander Kitaev-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Alex,

 > I don't think the right answer is to just speed up the ISVNEditor
 > implementation, because its very hard to test that you got it "fast
enough"
 > for all cases, so there is always the potential of having invalid XML.
Ok, let it be one of the answers, may be not the "right" one :)

 > It seems like the real answer is to read from the spooled http
request while
 > it is being pulled down.
This solution came to my mind as well - the problem here is that a) I
didn't tested whether it is possible to read from the file we're
currently writing in - probably it will require some intermediate stream
that will precisely know how many bytes already has been written to the
spool file and b) reading and writing should be performed in different
threads (and reading should block writing at least for the time it
needed to read data to the buffer and still not block writing to the
file too much - otherwise it will be exact the same situation we're
trying to avoid).

I wouldn't like to introduce that modification to 1.1.x branch, but
we'll consider implementing it in 1.2.0 or 1.2.1. Meanwhile you may of
course use patched version and I'll appreciate if you'll send us a patch
so everyone will be able to benefit from using it.

Alexander Kitaev,
TMate Software,
http://svnkit.com/ - Java [Sub]Versioning Library!

acgourley wrote:

> Thanks for the replay, Alexander.
>
> I don't think the right answer is to just speed up the ISVNEditor
> implementation, because its very hard to test that you got it "fast enough"
> for all cases, so there is always the potential of having invalid XML.
>
> It seems like the real answer is to read from the spooled http request while
> it is being pulled down.
>
> in this code:
>
> if (myIsSpoolResponse || myIsSpoolAll) {
>                 OutputStream dst = null;
>                 try {
>                     tmpFile = new SpoolFile(mySpoolDirectory);
>                     dst = tmpFile.openForWriting();
>                     dst = new SVNCancellableOutputStream(dst,
> myRepository.getCanceller());
>                     // this will exhaust http stream anyway.
>                     err = readData(request, dst);
>                     SVNFileUtil.closeFile(dst);
>                     dst = null;
>                     if (err != null) {
>                         return err;
>                     }
>                     // this stream always have to be closed.
>                     is = tmpFile.openForReading();
>                 } finally {
>                     SVNFileUtil.closeFile(dst);
>                 }
>             } else {
>                 is = createInputStream(request.getResponseHeader(),
> getInputStream());
>             }
>
> Perhaps it would be best to return an input stream to the tmpFile while it
> is being written to?
>
> I was mostly asking if there was an official way to get around the problem I
> was experiencing (either having invalid XML errors with no spooling or
> unacceptable startup delays with spooling on large repository checkouts).
>
> If there isn't, I'll just accept that I need to edit SVNKit myself for our
> needs (hopefully just fulfilling an interface and injecting it in form our
> code will be enough).
>
> -Alex
>
>
> Alexander Kitaev-3 wrote:
>> Hello,
>>
>>  > Is there a better way to buffer the incoming response without
>> bouncing the
>>  > whole thing off the disk?
>> Spooling of HTTP responses (i.e. when response is first saved to disk
>> and only then processed) was always present in SVNKit and native
>> Subversion client. However, by default it is only enabled for "diff"
>> operation (that is also used by "merge" one).
>>
>> The reason to introduce spooling is that slow reading of HTTP response
>> may sometimes result in reading "garbage" data, especially if there are
>> delays in response processing. "merge" operation for instance is
>> considered as slow because it performs additional requests to the same
>> repository while processing the original request response:
>>
>> SENT: HTTP REPORT...
>> PROCESSING HTTP REPORT response
>>     SENT HTTP GET
>>     READ HTTP GET reponse
>>     PROCESSING HTTP GET
>> PROCESSING HTTP REPORT response continued
>> ....
>>     SENT HTTP GET
>>     READ HTTP GET reponse
>>     PROCESSING HTTP GET
>> PROCESSING HTTP REPORT response continued
>> ...
>> and so on.
>>
>> Update and other similar operations are considered "fast" and no
>> spooling is performed by default. However, SVNKit allows to provide
>> custom implementations of ISVNEditor interface for processing update
>> request response and in case user implements it in such a way that
>> delays are introduced, then we recommend to enable spooling not to get
>> XML parsing errors (because of garbage data being read).
>>
>> The workaround for that could be changing client side code (ISVNEditor
>> implementation) so, that it works relatively fast - and perform custom
>> processing of the data received in a separate parallel thread or after
>> update operation is completed.
>>
>> Alexander Kitaev,
>> TMate Software,
>> http://svnkit.com/ - Java [Sub]Versioning Library!
>>
>> acgourley wrote:
>>> Are there any other options built into SVN kit for dealing with large
>>> incoming HTTP responses? Users of our product sometimes have very large
>>> SVN
>>> repositories, so we had to enable spooling. The problem is that we would
>>> like to start processing svn events as soon as possible, and not wait for
>>> the incoming response to finish.
>>>
>>> Is there a better way to buffer the incoming response without bouncing
>>> the
>>> whole thing off the disk?
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: svnkit-users-unsubscribe@...
>> For additional commands, e-mail: svnkit-users-help@...
>>
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: svnkit-users-unsubscribe@...
For additional commands, e-mail: svnkit-users-help@...