Literals, Introducers, Character Set NONE, suggestions

View: New views
4 Messages — Rating Filter:   Alert me  

Literals, Introducers, Character Set NONE, suggestions

by Bill Oliver-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello everybody!

Sorry for long post, question at bottom.

I have a question regarding non-ascii characters, client connection sets,
and introducers.

My application uses Firebird Embedded, connecting to the database using
CHARACTER SET NONE. I create a test database, with default character set
encoding of ISO8859_1. I'm testing Firebird 2.5.

When my client connects to the database, it is possible for the user to
enter SQL syntax like this, with non-ASCII characters.

  create table "ICU" ("BIN" int, "LINGSUB" int, "BINSUB" int, "NAME"
character(20));
  insert into "ICU" values (2, 4, 5, 'Adélaïde');

The problem here is that since the literal should really be specified with
an introducer, _UNICODE_FSS or _ISO8859_1. When I fetch this record, I find
the data is corrupted.

We originally fixed this issue in SAS Vulcan by having our driver rewrite
the query and add "_UNICODE_FSS" to all literals. This was kind of yuck,
since if the user had an error in the SQL, the error message would come back
with the _UNICODE_FSS token in the parsing error.

Then, I fixed the issue in SAS Vulcan directly in dsql/pass1, routine
pass1_constant(). I added code that forced _UNICODE_FSS on all literals,
unless the user supplied an introducer.

    // force UNICODE_FSS, unless user has supplied an introducer.
    if (!string->str_charset)
            string->str_charset = "UNICODE_FSS";

This worked great for me, but doesn't seem to be a fix that can go back to
HEAD. Too big of a hammer. ;-)

HEAD version of pass1_constant() now appears to have code that determines
the client connection set, and sets the text type of the literal correctly.

Would it be desirable to add a check in the Firebird engine to see if the
client connection set is NONE, and then treat the literal as UNICODE_FSS?
Perhaps there is a better way to make this check, and there may be other
solutions I haven't considered here, too.

My fallback is to put code back in my driver to add the introducers, but if
making a change in Firebird itself is helpful for others, that would help me
too.

Thanks in advance!

-bill




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: Literals, Introducers, Character Set NONE, suggestions

by asfernandes :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bill Oliver escreveu:

> Hello everybody!
>
> Sorry for long post, question at bottom.
>
> I have a question regarding non-ascii characters, client connection sets,
> and introducers.
>
> My application uses Firebird Embedded, connecting to the database using
> CHARACTER SET NONE. I create a test database, with default character set
> encoding of ISO8859_1.
I'm assuming your columns is also using ISO8859_1.

>  I'm testing Firebird 2.5.
>
> When my client connects to the database, it is possible for the user to
> enter SQL syntax like this, with non-ASCII characters.
>
>   create table "ICU" ("BIN" int, "LINGSUB" int, "BINSUB" int, "NAME"
> character(20));
>   insert into "ICU" values (2, 4, 5, 'Adélaïde');
>  
And that 'Adélaïde' is encoded in UNICODE_FSS.

> The problem here is that since the literal should really be specified with
> an introducer, _UNICODE_FSS or _ISO8859_1. When I fetch this record, I find
> the data is corrupted.
>
> We originally fixed this issue in SAS Vulcan by having our driver rewrite
> the query and add "_UNICODE_FSS" to all literals. This was kind of yuck,
> since if the user had an error in the SQL, the error message would come back
> with the _UNICODE_FSS token in the parsing error.
>
> Then, I fixed the issue in SAS Vulcan directly in dsql/pass1, routine
> pass1_constant(). I added code that forced _UNICODE_FSS on all literals,
> unless the user supplied an introducer.
>
>     // force UNICODE_FSS, unless user has supplied an introducer.
>     if (!string->str_charset)
>             string->str_charset = "UNICODE_FSS";
>
> This worked great for me, but doesn't seem to be a fix that can go back to
> HEAD. Too big of a hammer. ;-)
>
> HEAD version of pass1_constant() now appears to have code that determines
> the client connection set, and sets the text type of the literal correctly.
>
> Would it be desirable to add a check in the Firebird engine to see if the
> client connection set is NONE, and then treat the literal as UNICODE_FSS?
>  
No. :-)

> Perhaps there is a better way to make this check, and there may be other
> solutions I haven't considered here, too.
Yes. Why not you use UNICODE_FSS as connection character set?


Adriano


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: Literals, Introducers, Character Set NONE,

by Bill Oliver-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

> Yes. Why not you use UNICODE_FSS as connection character set?

Thanks for the reply!

The decision to connect with character set NONE, was done before my time. I
assume that it was done for performance reasons, or perhaps because of the
many problems with UNICODE_FSS present in pre-2.0 versions of Firebird.

I'm willing to try this. Any limitations I should know about? Also, I
thought that multi-byte character sets were prohibited as client connection
sets, but this seems not to be true? Perhaps this is because with
UNICODE_FSS, table names, etc. are still in 1-byte ascii representation?

-bill




-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel

Re: Literals, Introducers, Character Set NONE,

by asfernandes :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Bill Oliver escreveu:
>> Yes. Why not you use UNICODE_FSS as connection character set?
>>    
>
> Thanks for the reply!
>
> The decision to connect with character set NONE, was done before my time. I
> assume that it was done for performance reasons, or perhaps because of the
> many problems with UNICODE_FSS present in pre-2.0 versions of Firebird.
>  
The major problem before 2.0 is that buffer sizes aren't translated from
one charset to another. If you have a CHAR(10) ISO8859-1 field and
connect using UNICODE_FSS, the buffer in described has 10 bytes (instead
of the necessary 30).

> I'm willing to try this. Any limitations I should know about?
AFAIK, no.

>  Also, I thought that multi-byte character sets were prohibited as client connection
> sets, but this seems not to be true?
It's not true.

> Perhaps this is because with UNICODE_FSS, table names, etc. are still in 1-byte ascii representation?
The only limitation is that you can't create object names using more
than 31 bytes (instead of characters).


Adriano


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
Firebird-Devel mailing list, web interface at https://lists.sourceforge.net/lists/listinfo/firebird-devel