MySQL Backend - NCI Thesaurus (Truncation, data too long)

View: New views
2 Messages — Rating Filter:   Alert me  

MySQL Backend - NCI Thesaurus (Truncation, data too long)

by Jesus Bisbal Riera :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Dear all,
    I'm trying to upload the NCI Thesaurus (2008 OWL version, 89 MBytes) into Protege using a MySQL Backend. I'm using the API, programmatically (not the GUI). With other ontologies (pizza.owl, for example), everything is fine.
    For NCI, however, it seems to only partially work. After much processing, I get this error:

    ....
    Loaded 1110000 triples 9280

    com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long for column 'frame' at row 1

    The database has over a million tuples into it.
    The problem seems to be that the table's definition has automatically been defined to be 255 characters long for attribute 'Frame'. It seems that this is not enough for NCI Thesaurus 2008. For some tuples it needs to be at least 345 characters long.
    Can I change the table definition from the API? If I change it manually (MySQL admin) and re-run my program, it redefines the field's size to 255.
    I'm using Protege 3.4 Beta (build 504), and MySQL 5.0.51b community edition.
    Many thanks for any pointers.
    Regards,
   
Jesus

PS: the program recovers from this error and continues, and I eventually get an out of memory error (even though I'm using 512M heap size), but that could be related to the previous error... so let's take one at a time.
-- 
______________________________________________________________________
Jesus Bisbal-Riera                   http://www.tecn.upf.es/~jbisbal
     Currently: Visiting Researcher at Dublin Institute of Technology
	        Kevin Street
		Dublin 8, Ireland
		Ph: +35314024929

Department of Information and Communication Technologies
Universitat Pompeu Fabra            http://www.upf.edu
Passeig de Circumval·lació, 8
08003 Barcelona                      Work Ph: +34 93 542 29 51 / 25 00
Spain                                Fax:     +34 93 542 25 17

_______________________________________________
protege-owl mailing list
protege-owl@...
https://mailman.stanford.edu/mailman/listinfo/protege-owl

Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03 

Re: MySQL Backend - NCI Thesaurus (Truncation, data too long)

by Tania Tudorache :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Jesus,

Indeed the new thesaurus has an URI that is over 300 chars long. You can
easily change the column definition used by Protege. Just edit  the
protege.properties and add this line:

Database.typename.varchar.com.mysql.jdbc.Driver=VARCHAR(500) COLLATE
UTF8_BIN

Instead of 500, you can use also a bigger number (if needed). If you are
using the non-streaming parser, then you need to allocate a heap size
around 800M to Protege. It first reads the ontology in memory and then
it writes it out to the database. If you use the streaming parser, the
memory requirements are smaller, but I did not measure exactly how much
you need.

We have a wiki page that describes how to convert programmatically an
owl file to a database (both streaming and non-streaming approach):

http://protegewiki.stanford.edu/index.php/ConvertingToDatabaseProject

Tania



Jesus Bisbal Riera wrote:

> Dear all,
>     I'm trying to upload the NCI Thesaurus (2008 OWL version, 89
> MBytes) into Protege using a MySQL Backend. I'm using the API,
> programmatically (not the GUI). With other ontologies (pizza.owl, for
> example), everything is fine.
>     For NCI, however, it seems to only partially work. After much
> processing, I get this error:
>
>     ....
>     Loaded 1110000 triples 9280
>
>     com.mysql.jdbc.MysqlDataTruncation: Data truncation: Data too long
> for column 'frame' at row 1
>
>     The database has over a million tuples into it.
>     The problem seems to be that the table's definition has
> automatically been defined to be 255 characters long for attribute
> 'Frame'. It seems that this is not enough for NCI Thesaurus 2008. For
> some tuples it needs to be at least 345 characters long.
>     Can I change the table definition from the API? If I change it
> manually (MySQL admin) and re-run my program, it redefines the field's
> size to 255.
>     I'm using Protege 3.4 Beta (build 504), and MySQL 5.0.51b
> community edition.
>     Many thanks for any pointers.
>     Regards,
>    
> Jesus
>
> PS: the program recovers from this error and continues, and I
> eventually get an out of memory error (even though I'm using 512M heap
> size), but that could be related to the previous error... so let's
> take one at a time.
> --
> ______________________________________________________________________
> Jesus Bisbal-Riera                   http://www.tecn.upf.es/~jbisbal
>      Currently: Visiting Researcher at Dublin Institute of Technology
>        Kevin Street
> Dublin 8, Ireland
> Ph: +35314024929
>
> Department of Information and Communication Technologies
> Universitat Pompeu Fabra            http://www.upf.edu
> Passeig de Circumval·lació, 8
> 08003 Barcelona                      Work Ph: +34 93 542 29 51 / 25 00
> Spain                                Fax:     +34 93 542 25 17
> ------------------------------------------------------------------------
>
> _______________________________________________
> protege-owl mailing list
> protege-owl@...
> https://mailman.stanford.edu/mailman/listinfo/protege-owl
>
> Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03 
>  

_______________________________________________
protege-owl mailing list
protege-owl@...
https://mailman.stanford.edu/mailman/listinfo/protege-owl

Instructions for unsubscribing: http://protege.stanford.edu/doc/faq.html#01a.03 
LightInTheBox - Buy quality products at wholesale price