CSV with fails with doublequote inside the value

View: New views
3 Messages — Rating Filter:   Alert me  

CSV with fails with doublequote inside the value

by Albert Kam :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Apache Derby,

I'm using db-derby-10.4.1.3-bin, and when i'm importing a csv file like this :

ij> CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE (null,'SMS_OUT','/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv',null,null,null,0);

an error pops out like this :
ERROR XIE0R: Import error on line 11 of file /home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv: Data found on line 11 for column 3 after the stop delimiter.
ERROR XIE03: Data found on line 11 for column 3 after the stop delimiter.

I check the file, and in the line 11, it goes like this :
"92","+628xxxxxxx","Testinggg -__-"","2006-08-03 11:43:42","0","0","-1","-1","-1"
(Notice the -__-"") the content should be Testinggg -__-"

So, i tried changing the Testinggg -__-" into Testinggg -__-\", but still errornous ..

Later after googling for a while, i found out about this : http://www.perlmonks.org/?node_id=678257

Here's one interesting quote :

CSV as defined by RFC 4180 does not "escape" double quotes with a backslash, but rather by an additional set of double quotes. Your parser fails to handle this format properly.

CSV is hard.


Is this a bug or is it something that i did ? For now, i removed the doublequotes and it worked like charm.

Regards,
Albert Kam

--
Do not pursue the past. Do not lose yourself in the future.
The past no longer is. The future has not yet come.
Looking deeply at life as it is in the very here and now,
the practitioner dwells in stability and freedom.
(Thich Nhat Hanh)

Re: CSV with fails with doublequote inside the value

by Rick Hillegas-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Albert,

Derby's understanding of CSV predates RFC 4180. For instance, Derby
trims leading and trailing spaces from imported columns although this
behavior is forbidden by RFC 4180. Derby's CSV dialect is described in
the Tools Guide in a section titled "File format for input and output".
This description is impressionistic rather than rigorous. I don't think
that the grammar of Derby's CSV dialect is published anywhere. Other
than that sketch in the Tools Guide, I think that the only contract is
that Derby promises to be able to re-import anything which Derby itself
exported.

In practice, I think that this means that plenty of edge cases plague
Derby when it imports files exported by other data sources. A lot of
these edge cases can be handled by using the middle arguments of
SYSCS_IMPORT_TABLE to fine-tune delimiters and code sets. If you can't
fine-tune SYSCS_IMPORT_TABLE well enough, you can always write your own
table function to wrap the foreign data file and then import the file
like this

  insert into MyTable
  select * from table( MyTableFunction( '/path/to/the/data/file' ) ) s

Given the guidelines sketched in the Tools Guide, I would expect Derby
to object to the file you are trying to import. If you want Derby to
include the extra double-quote in the imported text, then I would escape
it with another double-quote. For instance, the following file:

"foo", 1, "bar"
"wibble""", 2, "wombat"

imports successfully for me:

ij> call syscs_util.syscs_import_table
( 'APP', 'T', 'doubleQuote.csv', null, null, null, 0 );
0 rows inserted/updated/deleted
ij> select * from t;
A                                                 |B          
|C                                                
-----------------------------------------------------------------------------------------------------------------
foo                                               |1          
|bar                                              
wibble"                                           |2          
|wombat                                          

2 rows selected

Hope this helps,
-Rick


Albert Kam wrote:

> Hello Apache Derby,
>
> I'm using db-derby-10.4.1.3-bin, and when i'm importing a csv file
> like this :
>
> *ij> CALL SYSCS_UTIL.SYSCS_IMPORT_TABLE
> (null,'SMS_OUT','/home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv',null,null,null,0);
> *
> an error pops out like this :
> *ERROR XIE0R: Import error on line 11 of file
> /home/albert/mnt/linuxdb/rnd/sofco_sms/CSVs/sms_out.csv: Data found on
> line 11 for column 3 after the stop delimiter.
> ERROR XIE03: Data found on line 11 for column 3 after the stop delimiter.
> *
> I check the file, and in the line 11, it goes like this :
> "92","+628xxxxxxx","*Testinggg -__-"*","2006-08-03
> 11:43:42","0","0","-1","-1","-1"
> (Notice the -__-"") the content should be *Testinggg -__-"*
>
> So, i tried changing the *Testinggg -__-"* into *Testinggg -__-\"*,
> but still errornous ..
>
> Later after googling for a while, i found out about this :
> http://www.perlmonks.org/?node_id=678257
>
> Here's one interesting quote :
> /
>
> CSV as defined by RFC 4180 <http://tools.ietf.org/html/rfc4180> does
> not "escape" double quotes with a backslash, but rather by an
> additional set of double quotes. Your parser fails to handle this
> format properly.
>
> CSV is hard.
>
> /
> Is this a bug or is it something that i did ? For now, i removed the
> doublequotes and it worked like charm.
>
> Regards,
> Albert Kam
>
> --
> Do not pursue the past. Do not lose yourself in the future.
> The past no longer is. The future has not yet come.
> Looking deeply at life as it is in the very here and now,
> the practitioner dwells in stability and freedom.
> (Thich Nhat Hanh)


Parent Message unknown Re: CSV with fails with doublequote inside the value

by Albert Kam :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear Rick,

Thanks for the lengthy explanation.
I'm still quite new to derby, and beginning to notice it's flexibility, haha.

Regards,
Albert Kam

--
Do not pursue the past. Do not lose yourself in the future.
The past no longer is. The future has not yet come.
Looking deeply at life as it is in the very here and now,
the practitioner dwells in stability and freedom.
(Thich Nhat Hanh)
LightInTheBox - Buy quality products at wholesale price