|
View:
New views
14 Messages
—
Rating Filter:
Alert me
|
|
|
Are log messages Unicode?In pysvn I have assumed that log messages are in UTF-8 and decode
them to unicode. A user has reported that their logs are in latin-1 and fail to decode as UTF-8. Did I make a bad assumption? Barry --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?The repository stores all log messages as UTF8, and they travel that
way to the clients. The clients are responsible for converting UTF8 to the native locale. Thus, if a user has latin-1 set as a native locale, that's what 'svn log' will show him. For more info, see: http://svnbook.red-bean.com/nightly/en/svn.advanced.l10n.html On Thu, Jul 3, 2008 at 4:57 PM, Barry Scott <barry@...> wrote: > In pysvn I have assumed that log messages are in UTF-8 and decode them to > unicode. > > A user has reported that their logs are in latin-1 and fail to decode as > UTF-8. > > Did I make a bad assumption? > > Barry > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscribe@... > For additional commands, e-mail: dev-help@... > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?On Jul 3, 2008, at 23:12, Ben Collins-Sussman wrote: > The repository stores all log messages as UTF8, and they travel that > way to the clients. The clients are responsible for converting UTF8 > to the native locale. Thus, if a user has latin-1 set as a native > locale, that's what 'svn log' will show him. For more info, see: > > http://svnbook.red-bean.com/nightly/en/svn.advanced.l10n.html > Using the svn_client API is it possible for a client to write none- UTF-8 log messages? Clearly if this happened it would be a bug in the client given the above statement. Barry --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@...> wrote:
> > On Jul 3, 2008, at 23:12, Ben Collins-Sussman wrote: > >> The repository stores all log messages as UTF8, and they travel that >> way to the clients. The clients are responsible for converting UTF8 >> to the native locale. Thus, if a user has latin-1 set as a native >> locale, that's what 'svn log' will show him. For more info, see: >> >> http://svnbook.red-bean.com/nightly/en/svn.advanced.l10n.html >> > > Using the svn_client API is it possible for a client to write none-UTF-8 log > messages? > Clearly if this happened it would be a bug in the client given the above > statement. I don't recall the details, but it's actually the *programmers'* burden to convert paths and log messages from native locale to UTF8 (and back again). If you read the svn APIs, you'll notice that every path and log message passed into APIs (or passed around between APIs) are presumed to *already* be UTF8. So if you're writing your own client, it's your job to convert user input to UTF8 before passing to svn_client_*(). Look at the commandline client to see how it's doing that; I believe there a number of convenience routines in libsvn_subr to help with conversion. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?"Ben Collins-Sussman" <sussman@...> writes:
> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@...> wrote: >> Using the svn_client API is it possible for a client to write >> none-UTF-8 log messages? >> Clearly if this happened it would be a bug in the client given the >> above statement. > > I don't recall the details, but it's actually the *programmers'* > burden to convert paths and log messages from native locale to UTF8 > (and back again). If you read the svn APIs, you'll notice that every > path and log message passed into APIs (or passed around between APIs) > are presumed to *already* be UTF8. So if you're writing your own > client, it's your job to convert user input to UTF8 before passing to > svn_client_*(). Look at the commandline client to see how it's doing > that; I believe there a number of convenience routines in libsvn_subr > to help with conversion. I think Barry's asking if the client and/or server do any validation. That is, if the programmer supplies a non-UTF8 log message, our client libraries should reject it; and if such a log message were to reach the repository (perhaps because someone wrote their own client software from scratch), the repository should reject it too. I don't know whether we do such validation or not, but agree we should. Barry, got time to test/trace it? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?On Jul 7, 2008, at 17:15, Karl Fogel wrote: > "Ben Collins-Sussman" <sussman@...> writes: >> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@barrys- >> emacs.org> wrote: >>> Using the svn_client API is it possible for a client to write >>> none-UTF-8 log messages? >>> Clearly if this happened it would be a bug in the client given the >>> above statement. >> >> I don't recall the details, but it's actually the *programmers'* >> burden to convert paths and log messages from native locale to UTF8 >> (and back again). If you read the svn APIs, you'll notice that every >> path and log message passed into APIs (or passed around between APIs) >> are presumed to *already* be UTF8. So if you're writing your own >> client, it's your job to convert user input to UTF8 before passing to >> svn_client_*(). Look at the commandline client to see how it's doing >> that; I believe there a number of convenience routines in >> libsvn_subr >> to help with conversion. > > I think Barry's asking if the client and/or server do any validation. > That is, if the programmer supplies a non-UTF8 log message, our client > libraries should reject it; and if such a log message were to reach > the > repository (perhaps because someone wrote their own client software > from > scratch), the repository should reject it too. > > I don't know whether we do such validation or not, but agree we > should. > > Barry, got time to test/trace it? > Karl's correct, I'm not asking for programming help, I'm trying to understand a user reported problem using pysvn. If the SVN API does not check that the strings are UTF-8 then I can close the bug as not pysvn's problem. Another client must have messed up the users repos. The user has not given me their repos to test against. Barry --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?On Jul 7, 2008, at 17:15, Karl Fogel wrote: > "Ben Collins-Sussman" <sussman@...> writes: >> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@barrys- >> emacs.org> wrote: >>> Using the svn_client API is it possible for a client to write >>> none-UTF-8 log messages? >>> Clearly if this happened it would be a bug in the client given the >>> above statement. >> >> I don't recall the details, but it's actually the *programmers'* >> burden to convert paths and log messages from native locale to UTF8 >> (and back again). If you read the svn APIs, you'll notice that every >> path and log message passed into APIs (or passed around between APIs) >> are presumed to *already* be UTF8. So if you're writing your own >> client, it's your job to convert user input to UTF8 before passing to >> svn_client_*(). Look at the commandline client to see how it's doing >> that; I believe there a number of convenience routines in >> libsvn_subr >> to help with conversion. > > I think Barry's asking if the client and/or server do any validation. > That is, if the programmer supplies a non-UTF8 log message, our client > libraries should reject it; and if such a log message were to reach > the > repository (perhaps because someone wrote their own client software > from > scratch), the repository should reject it too. > > I don't know whether we do such validation or not, but agree we > should. > > Barry, got time to test/trace it? > I have the dump of the repos that causes pysvn to fail. In the attachment is the fragment of the dump file for r219 that causes the problems. If you need the whole 3MB of the full dump I'll have to ask permission to pass it on to you. Python cannot decode the svn:log as utf-8. $ python2.5 extract_log_text.py 'Bitbucket r\xe9serv\xe9 \xe0 dev/null\nClassement dans Mail/spam seulement apr\xe8s le localstart qui lance spamc\n' '\xe9s' Traceback (most recent call last): File "extract_log_text.py", line 12, in <module> print log.decode( 'utf-8' ) File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ python2.5/encodings/utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 11-13: invalid data Is this proof that the repos has none UTF-8 log text? svn 1.4.6 is happy to show the log: $ svn log -r219 file:///Users/barry/tmp/repos/trunk/dotfiles ------------------------------------------------------------------------ r219 | bortzmeyer | 2003-01-17 14:04:31 +0000 (Fri, 17 Jan 2003) | 3 lines Bitbucket r?\233serv?\233 ?\224 dev/null Classement dans Mail/spam seulement apr?\232s le localstart qui lance spamc ------------------------------------------------------------------------ But the \233 are supposed to be é I understand. Barry --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?On Jul 12, 2008, at 15:54, Barry Scott wrote:
> > On Jul 7, 2008, at 17:15, Karl Fogel wrote: > >> "Ben Collins-Sussman" <sussman@...> writes: >>> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@barrys- >>> emacs.org> wrote: >>>> Using the svn_client API is it possible for a client to write >>>> none-UTF-8 log messages? >>>> Clearly if this happened it would be a bug in the client given the >>>> above statement. >>> >>> I don't recall the details, but it's actually the *programmers'* >>> burden to convert paths and log messages from native locale to UTF8 >>> (and back again). If you read the svn APIs, you'll notice that >>> every >>> path and log message passed into APIs (or passed around between >>> APIs) >>> are presumed to *already* be UTF8. So if you're writing your own >>> client, it's your job to convert user input to UTF8 before >>> passing to >>> svn_client_*(). Look at the commandline client to see how it's >>> doing >>> that; I believe there a number of convenience routines in >>> libsvn_subr >>> to help with conversion. >> >> I think Barry's asking if the client and/or server do any validation. >> That is, if the programmer supplies a non-UTF8 log message, our >> client >> libraries should reject it; and if such a log message were to >> reach the >> repository (perhaps because someone wrote their own client >> software from >> scratch), the repository should reject it too. >> >> I don't know whether we do such validation or not, but agree we >> should. >> >> Barry, got time to test/trace it? >> > > I have the dump of the repos that causes pysvn to fail. In the > attachment is > the fragment of the dump file for r219 that causes the problems. If > you need the > whole 3MB of the full dump I'll have to ask permission to pass it > on to you. > > Python cannot decode the svn:log as utf-8. > > $ python2.5 extract_log_text.py > 'Bitbucket r\xe9serv\xe9 \xe0 dev/null\nClassement dans Mail/spam > seulement apr\xe8s le localstart qui lance spamc\n' > '\xe9s' > Traceback (most recent call last): > File "extract_log_text.py", line 12, in <module> > print log.decode( 'utf-8' ) > File "/Library/Frameworks/Python.framework/Versions/2.5/lib/ > python2.5/encodings/utf_8.py", line 16, in decode > return codecs.utf_8_decode(input, errors, True) > UnicodeDecodeError: 'utf8' codec can't decode bytes in position > 11-13: invalid data > > Is this proof that the repos has none UTF-8 log text? > > svn 1.4.6 is happy to show the log: > > $ svn log -r219 file:///Users/barry/tmp/repos/trunk/dotfiles > ---------------------------------------------------------------------- > -- > r219 | bortzmeyer | 2003-01-17 14:04:31 +0000 (Fri, 17 Jan 2003) | > 3 lines > > Bitbucket r?\233serv?\233 ?\224 dev/null > Classement dans Mail/spam seulement apr?\232s le localstart qui > lance spamc > > ---------------------------------------------------------------------- > -- > > But the \233 are supposed to be é I understand. > > Barry > > Opss forgot the attachement. Barry --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?Barry Scott wrote on Sun, 13 Jul 2008 at 00:01 +0100:
> On Jul 12, 2008, at 15:54, Barry Scott wrote: > > I have the dump of the repos that causes pysvn to fail. ... > > > > Is this proof that the repos has none UTF-8 log text? > > Yes. > > svn 1.4.6 is happy to show the log: > > svn trunk shows the log too, in the same way (with ?\ddd escapes). > > $ svn log -r219 file:///Users/barry/tmp/repos/trunk/dotfiles > > ------------------------------------------------------------------------ > > r219 | bortzmeyer | 2003-01-17 14:04:31 +0000 (Fri, 17 Jan 2003) | 3 lines > > > > Bitbucket r?\233serv?\233 ?\224 dev/null > > Classement dans Mail/spam seulement apr?\232s le localstart qui lance > > spamc > > > > ------------------------------------------------------------------------ > > > > But the \233 are supposed to be é I understand. > > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400:
> "Ben Collins-Sussman" <sussman@...> writes: > > On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@...> wrote: > >> Using the svn_client API is it possible for a client to write > >> none-UTF-8 log messages? > >> Clearly if this happened it would be a bug in the client given the > >> above statement. > > > > I don't recall the details, but it's actually the *programmers'* > > burden to convert paths and log messages from native locale to UTF8 > > (and back again). If you read the svn APIs, you'll notice that every > > path and log message passed into APIs (or passed around between APIs) > > are presumed to *already* be UTF8. So if you're writing your own > > client, it's your job to convert user input to UTF8 before passing to > > svn_client_*(). Look at the commandline client to see how it's doing > > that; I believe there a number of convenience routines in libsvn_subr > > to help with conversion. > > I think Barry's asking if the client and/or server do any validation. > That is, if the programmer supplies a non-UTF8 log message, our client > libraries should reject it; and if such a log message were to reach the > repository (perhaps because someone wrote their own client software from > scratch), the repository should reject it too. > > I don't know whether we do such validation or not, but agree we should. > Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log messages in libsvn_repos. It has not been backported to 1.5.x. The cmdline client also does some conversions; in my case, it dropped the bytes it couldn't understand: % svn ci iota -F dump-fragment.txt Sending iota Transmitting file data . Committed revision 2. # It should have failed. Let's see... % xxd ../../repos1/db/revprops/0/2 ... 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101. 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe ... # Ah, but that's not the log message I specified! % xxd dump-fragment.txt 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul # It dropped these bytes: ^ ^ ^ > Barry, got time to test/trace it? --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?Hi list, long time no see :)
Daniel Shahaf wrote: > Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400: >> "Ben Collins-Sussman" <sussman@...> writes: >>> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@...> wrote: >>>> Using the svn_client API is it possible for a client to write >>>> none-UTF-8 log messages? >>>> Clearly if this happened it would be a bug in the client given the >>>> above statement. >>> I don't recall the details, but it's actually the *programmers'* >>> burden to convert paths and log messages from native locale to UTF8 >>> (and back again). If you read the svn APIs, you'll notice that every >>> path and log message passed into APIs (or passed around between APIs) >>> are presumed to *already* be UTF8. So if you're writing your own >>> client, it's your job to convert user input to UTF8 before passing to >>> svn_client_*(). Look at the commandline client to see how it's doing >>> that; I believe there a number of convenience routines in libsvn_subr >>> to help with conversion. >> I think Barry's asking if the client and/or server do any validation. >> That is, if the programmer supplies a non-UTF8 log message, our client >> libraries should reject it; and if such a log message were to reach the >> repository (perhaps because someone wrote their own client software from >> scratch), the repository should reject it too. >> >> I don't know whether we do such validation or not, but agree we should. >> > > Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log > messages in libsvn_repos. It has not been backported to 1.5.x. " The subversion server and client do not validate props in places where they should: - where the server receives props from a client out there. (#1796) - where the server reads props from the repository file system. - where the svn client reads props from a server out there. (Approval by kfogel) [My] patch starts by fixing the specific problems of issue 1796, only: - where the server receives props from a client out there. (#1796) , and limited only to the log message prop (SVN_PROP_REVISION_LOG). " I am still intending to continue on these issues... (I have been diverted because of the social shock following a recent unexpected death in my close family) I am still at the point where I am trying to find out - the best place to validate props being read from the repository file system by the server; - how to write a unit test on whether the server validates props read from the file system (the code that writes *to* the file system now validates props; so, how do I get *unvalidated* props written to the file system in the first place?); - the best place to validate props in the client, reading from a server out there; - how to write a unit test on whether the client validates props read from a server out there; - which other props need to be validated; - what the formats for these other props are (are they, by chance, all UTF8 & LF? That would be nice.). Since other/more people are taking interest in these issues, maybe it would make sense to file separate issues in the issue tracker for the remaining two cases? : - where the server reads props from the repository file system. - where the svn client reads props from a server out there. > > The cmdline client also does some conversions; in my case, it > dropped the bytes it couldn't understand: > > % svn ci iota -F dump-fragment.txt > Sending iota > Transmitting file data . > Committed revision 2. > > # It should have failed. Let's see... > % xxd ../../repos1/db/revprops/0/2 > ... > 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101. > 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv > 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe > ... > > # Ah, but that's not the log message I specified! > % xxd dump-fragment.txt > 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V > 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r > 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul > # It dropped these bytes: ^ ^ ^ > >> Barry, got time to test/trace it? at least be informed about what's happening... -- Neels Hofmeyr -- elego Software Solutions GmbH Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany phone: +49 30 23458696 mobile: +49 177 2345869 fax: +49 30 23458695 http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 |
|
|
Re: Are log messages Unicode?My user says that repos was created by cvs2svn and wonders if it is
the source of the bad log entry. Barry On Jul 13, 2008, at 22:37, Neels Janosch Hofmeyr wrote: > Hi list, long time no see :) > > Daniel Shahaf wrote: >> Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400: >>> "Ben Collins-Sussman" <sussman@...> writes: >>>> On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@barrys- >>>> emacs.org> wrote: >>>>> Using the svn_client API is it possible for a client to write >>>>> none-UTF-8 log messages? >>>>> Clearly if this happened it would be a bug in the client given the >>>>> above statement. >>>> I don't recall the details, but it's actually the *programmers'* >>>> burden to convert paths and log messages from native locale to UTF8 >>>> (and back again). If you read the svn APIs, you'll notice that >>>> every >>>> path and log message passed into APIs (or passed around between >>>> APIs) >>>> are presumed to *already* be UTF8. So if you're writing your own >>>> client, it's your job to convert user input to UTF8 before >>>> passing to >>>> svn_client_*(). Look at the commandline client to see how it's >>>> doing >>>> that; I believe there a number of convenience routines in >>>> libsvn_subr >>>> to help with conversion. >>> I think Barry's asking if the client and/or server do any >>> validation. >>> That is, if the programmer supplies a non-UTF8 log message, our >>> client >>> libraries should reject it; and if such a log message were to >>> reach the >>> repository (perhaps because someone wrote their own client >>> software from >>> scratch), the repository should reject it too. >>> >>> I don't know whether we do such validation or not, but agree we >>> should. >>> >> >> Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of >> log >> messages in libsvn_repos. It has not been backported to 1.5.x. > > Quoting message "[PATCH] issue 1796: ..." from 03 Jun 2008 by me: > > " > The subversion server and client do not validate props in places where > they should: > - where the server receives props from a client out there. (#1796) > - where the server reads props from the repository file system. > - where the svn client reads props from a server out there. > (Approval by kfogel) > > [My] patch starts by fixing the specific problems of issue 1796, only: > - where the server receives props from a client out there. (#1796) > , and limited only to the log message prop (SVN_PROP_REVISION_LOG). > " > > I am still intending to continue on these issues... (I have been > diverted because of the social shock following a recent unexpected > death > in my close family) > > I am still at the point where I am trying to find out > > - the best place to validate props being read from the repository file > system by the server; > > - how to write a unit test on whether the server validates props read > from the file system (the code that writes *to* the file system now > validates props; so, how do I get *unvalidated* props written to the > file system in the first place?); > > - the best place to validate props in the client, reading from a > server > out there; > > - how to write a unit test on whether the client validates props read > from a server out there; > > - which other props need to be validated; > > - what the formats for these other props are (are they, by chance, all > UTF8 & LF? That would be nice.). > > Since other/more people are taking interest in these issues, maybe it > would make sense to file separate issues in the issue tracker for the > remaining two cases? : > > - where the server reads props from the repository file system. > - where the svn client reads props from a server out there. > >> >> The cmdline client also does some conversions; in my case, it >> dropped the bytes it couldn't understand: >> >> % svn ci iota -F dump-fragment.txt >> Sending iota >> Transmitting file data . >> Committed revision 2. >> >> # It should have failed. Let's see... >> % xxd ../../repos1/db/revprops/0/2 >> ... >> 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V >> 101. >> 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv >> 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/ >> null.Classe >> ... >> >> # Ah, but that's not the log message I specified! >> % xxd dump-fragment.txt >> 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K >> 7.svn:log.V >> 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 >> 101.Bitbucket r >> 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/ >> nul >> # It dropped these bytes: ^ ^ ^ >> >>> Barry, got time to test/trace it? > > Hm, that's not nice. Silently dropped bytes aren't good. The user > should > at least be informed about what's happening... > > -- > Neels Hofmeyr -- elego Software Solutions GmbH > Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany > phone: +49 30 23458696 mobile: +49 177 2345869 fax: +49 30 23458695 > http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin > Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: > DE163214194 > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?(patch manager hat on)
Neels Janosch Hofmeyr wrote on Sun, 13 Jul 2008 at 23:37 +0200: > Daniel Shahaf wrote: > > Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400: > >> I think Barry's asking if the client and/or server do any validation. > >> That is, if the programmer supplies a non-UTF8 log message, our client > >> libraries should reject it; and if such a log message were to reach the > >> repository (perhaps because someone wrote their own client software from > >> scratch), the repository should reject it too. > >> > >> I don't know whether we do such validation or not, but agree we should. > >> > > > > Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log > > messages in libsvn_repos. It has not been backported to 1.5.x. > > Quoting message "[PATCH] issue 1796: ..." from 03 Jun 2008 by me: > > " > The subversion server and client do not validate props in places where > they should: > - where the server receives props from a client out there. (#1796) > - where the server reads props from the repository file system. > - where the svn client reads props from a server out there. > (Approval by kfogel) > > [My] patch starts by fixing the specific problems of issue 1796, only: > - where the server receives props from a client out there. (#1796) > , and limited only to the log message prop (SVN_PROP_REVISION_LOG). > " > > I am still intending to continue on these issues... (I have been > diverted because of the social shock following a recent unexpected death > in my close family) > > I am still at the point where I am trying to find out > Comments, anyone? Neels, I think you can answer some of these questions yourself :) > - the best place to validate props being read from the repository file > system by the server; > > - how to write a unit test on whether the server validates props read > from the file system (the code that writes *to* the file system now > validates props; so, how do I get *unvalidated* props written to the > file system in the first place?); > > - the best place to validate props in the client, reading from a server > out there; > > - how to write a unit test on whether the client validates props read > from a server out there; > > - which other props need to be validated; > > - what the formats for these other props are (are they, by chance, all > UTF8 & LF? That would be nice.). > > Since other/more people are taking interest in these issues, maybe it > would make sense to file separate issues in the issue tracker for the > remaining two cases? : > > - where the server reads props from the repository file system. > - where the svn client reads props from a server out there. > > > > > The cmdline client also does some conversions; in my case, it > > dropped the bytes it couldn't understand: > > > > % svn ci iota -F dump-fragment.txt > > Sending iota > > Transmitting file data . > > Committed revision 2. > > > > # It should have failed. Let's see... > > % xxd ../../repos1/db/revprops/0/2 > > ... > > 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101. > > 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv > > 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe > > ... > > > > # Ah, but that's not the log message I specified! > > % xxd dump-fragment.txt > > 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V > > 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r > > 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul > > # It dropped these bytes: ^ ^ ^ > > > >> Barry, got time to test/trace it? > > Hm, that's not nice. Silently dropped bytes aren't good. The user should > at least be informed about what's happening... > +1 (want to write the patch?) Daniel (who won't have time to review patches in the near future) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@... For additional commands, e-mail: dev-help@... |
|
|
Re: Are log messages Unicode?Daniel Shahaf wrote: > (patch manager hat on) > > Neels Janosch Hofmeyr wrote on Sun, 13 Jul 2008 at 23:37 +0200: >> Daniel Shahaf wrote: >>> Karl Fogel wrote on Mon, 7 Jul 2008 at 12:15 -0400: >>>> I think Barry's asking if the client and/or server do any validation. >>>> That is, if the programmer supplies a non-UTF8 log message, our client >>>> libraries should reject it; and if such a log message were to reach the >>>> repository (perhaps because someone wrote their own client software from >>>> scratch), the repository should reject it too. >>>> >>>> I don't know whether we do such validation or not, but agree we should. >>>> >>> Since r31614 (Neels' fix of issue #1796) we do UTF-8 validation of log >>> messages in libsvn_repos. It has not been backported to 1.5.x. >> Quoting message "[PATCH] issue 1796: ..." from 03 Jun 2008 by me: >> >> " >> The subversion server and client do not validate props in places where >> they should: >> - where the server receives props from a client out there. (#1796) >> - where the server reads props from the repository file system. >> - where the svn client reads props from a server out there. >> (Approval by kfogel) >> >> [My] patch starts by fixing the specific problems of issue 1796, only: >> - where the server receives props from a client out there. (#1796) >> , and limited only to the log message prop (SVN_PROP_REVISION_LOG). >> " >> >> I am still intending to continue on these issues... (I have been >> diverted because of the social shock following a recent unexpected death >> in my close family) >> >> I am still at the point where I am trying to find out >> > > Comments, anyone? > > Neels, I think you can answer some of these questions yourself :) > >> - the best place to validate props being read from the repository file >> system by the server; >> >> - how to write a unit test on whether the server validates props read >> from the file system (the code that writes *to* the file system now >> validates props; so, how do I get *unvalidated* props written to the >> file system in the first place?); >> >> - the best place to validate props in the client, reading from a server >> out there; >> >> - how to write a unit test on whether the client validates props read >> from a server out there; >> >> - which other props need to be validated; >> >> - what the formats for these other props are (are they, by chance, all >> UTF8 & LF? That would be nice.). >> >> Since other/more people are taking interest in these issues, maybe it >> would make sense to file separate issues in the issue tracker for the >> remaining two cases? : >> >> - where the server reads props from the repository file system. >> - where the svn client reads props from a server out there. >> >>> The cmdline client also does some conversions; in my case, it >>> dropped the bytes it couldn't understand: >>> >>> % svn ci iota -F dump-fragment.txt >>> Sending iota >>> Transmitting file data . >>> Committed revision 2. >>> >>> # It should have failed. Let's see... >>> % xxd ../../repos1/db/revprops/0/2 >>> ... >>> 00000a0: 370a 7376 6e3a 6c6f 670a 5620 3130 310a 7.svn:log.V 101. >>> 00000b0: 4269 7462 7563 6b65 7420 7273 6572 7620 Bitbucket rserv >>> 00000c0: 2064 6576 2f6e 756c 6c0a 436c 6173 7365 dev/null.Classe >>> ... >>> >>> # Ah, but that's not the log message I specified! >>> % xxd dump-fragment.txt >>> 0000040: 380a 0a4b 2037 0a73 766e 3a6c 6f67 0a56 8..K 7.svn:log.V >>> 0000050: 2031 3031 0a42 6974 6275 636b 6574 2072 101.Bitbucket r >>> 0000060: e973 6572 76e9 20e0 2064 6576 2f6e 756c .serv. . dev/nul >>> # It dropped these bytes: ^ ^ ^ >>> >>>> Barry, got time to test/trace it? >> Hm, that's not nice. Silently dropped bytes aren't good. The user should >> at least be informed about what's happening... >> > > +1 (want to write the patch?) > > Daniel > (who won't have time to review patches in the near future) http://subversion.tigris.org/servlets/ReadMsg?listName=dev&msgNo=141457 amounting to not validating log messages traveling towards the user. Answering the original question: On Sun, Jul 6, 2008 at 5:23 AM, Barry Scott <barry@...> wrote: > Using the svn_client API is it possible for a client to write > none-UTF-8 log messages? No, it is not possible to send a non-UTF8 log message using the svn cmdline client, since it performs a conversion to UTF8-with-LF. It is, however, possible to do so using any other, lenient client. But since the patch for 1796 was committed (around 6 Jun 2008), the svn *server* rejects all non-UTF8 log messages from whichever client. The dropped bytes issue above is not yet accounted for, but probably caused by that conversion in the svn cmdline client. (I guess it's that "translate_string" line of code that I switched off in my 2nd attachment to issue 1796 on the issue tracker site, trying to prove a point. Index: subversion/svn/util.c =================================================================== --- subversion/svn/util.c (revision 31304) +++ subversion/svn/util.c (working copy) @@ -651,14 +651,10 @@ svn_stringbuf_appendcstr(default_msg, APR_EOL_STR APR_EOL_STR); *tmp_file = NULL; - if (lmb->message) + if (1) { - svn_string_t *log_msg_string = svn_string_create(lmb->message, pool); - - SVN_ERR_W(svn_subst_translate_string(&log_msg_string, log_msg_string, - lmb->message_encoding, pool), - _("Error normalizing log message to internal format")); - + SVN_ERR(svn_cmdline_printf(pool, "*** TEST BUILD: FORGING COMMIT MESSAGE ***\n")); + svn_string_t *log_msg_string = svn_string_create("forged\r\ncommit\r\nmessage\r\n", pool); *log_msg = log_msg_string->data; /* Trim incoming messages the EOF marker text and the junk that ) -- Neels Hofmeyr -- elego Software Solutions GmbH Gustav-Meyer-Allee 25 / Gebäude 12, 13355 Berlin, Germany phone: +49 30 23458696 mobile: +49 177 2345869 fax: +49 30 23458695 http://www.elegosoft.com | Geschäftsführer: Olaf Wagner | Sitz: Berlin Handelsreg: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 |