The fill_cvtermpath stored procedure

View: New views
2 Messages — Rating Filter:   Alert me  

The fill_cvtermpath stored procedure

by Robin Houston-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I just noticed that the fill_cvtermpath stored procedure doesn't  
properly take account of relationship types: it thinks that a  
centromere is_a chromosome, for example.

Attached is a proposed replacement procedure that behaves as I believe  
it should (according to the docs on the wiki). I haven't put any  
effort into making it fast, on the grounds that it's not likely to be  
used frequently, though no doubt it could be made faster if necessary.

The result of this is more useful, certainly for what I'm using it for  
now. (I just need the reflexive transitive closure of the is_a  
relation.) But it's still rather unsatisfactory in some ways. In  
particular, it assumes that all relations are reflexive and  
transitive, which isn't true. For example 'non_functional_homolog_of'  
is not reflexive, and 'adjacent_to' is neither. The cvtermpath entries  
for relations of this sort are at least partially nonsensical.

At least for the relationship ontology types, we do know which of them  
are reflexive and which transitive, because it's in the OBO file. (We  
also know that, for example, has_part is the inverse of part_of.) Is  
there any reason not to take this information into account when  
populating cvtermpath?

The other issue is that of relations between the relation types  
themselves. If A `proper_part_of` B, then certainly A `part_of` B, for  
example. It would be useful if cvtermpath could also include derived  
relations of this sort.

If this has been discussed ad nauseam in the past, I apologise for  
reopening old wounds. :-)

Robin




--
 The Wellcome Trust Sanger Institute is operated by Genome Research
 Limited, a charity registered in England with number 1021457 and a
 company registered in England with number 2742969, whose registered
 office is 215 Euston Road, London, NW1 2BE.






PS. I'm aware that there's also a Perl script make_cvtermpath.pl  
included in the Chado distribution. I haven't examined it in detail.  
 From a cursory glance, it seems it wouldn't work as intended, since  
it assumes that the name of the relationship ontology CV is  
'Relationship Ontology' and it expects there to be a cvterm whose name  
is 'OBO_REL:0001'.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema

fill_cvtermpath.pgplsql (2K) Download Attachment

Re: The fill_cvtermpath stored procedure

by Chris Mungall-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Robin

In fact this has not been discussed much.

I'm not sure what procedures are currently used to fill the cvtermpath  
table. I suspect there are many groups who are not populating this at  
all, and thus not taking advantage of the ontology graphs in their  
annotations.

Thanks for the patch. I'm not sure if the procedure will continue to  
be used much because (a) people seem to have a gut reaction against  
procedural code in the dbms (b) there are problems with large  
closures : the procedure has to run inside a transaction (c) as you  
say it has to make certain assumptions about relations that may not be  
true.

I would recommend instead moving towards using an external tool with  
the necessary logic to build the full transitive closure. The  
documentation on how to do this with the GO database is applicable to  
Chado:

http://wiki.geneontology.org/index.php/Transitive_closure

You can use oboedit2 in command line mode to generate a closure table  
that can easily be slurped into Chado (OK, we are lacking the script  
but it should be simple). Or if you are using an ontology registered  
with OBO, there are closure tables already built for you. See the  
above wiki for details.

The advantage of using this approach is you don't have to reimplement  
the reasoning logic. It should do the right thing as far as relations  
are concerned (taking into account sub-relations, relation compositon,  
etc)

On Sep 9, 2008, at 2:49 PM, Robin Houston wrote:

> I just noticed that the fill_cvtermpath stored procedure doesn't  
> properly take account of relationship types: it thinks that a  
> centromere is_a chromosome, for example.
>
> Attached is a proposed replacement procedure that behaves as I  
> believe it should (according to the docs on the wiki). I haven't put  
> any effort into making it fast, on the grounds that it's not likely  
> to be used frequently, though no doubt it could be made faster if  
> necessary.
>
> The result of this is more useful, certainly for what I'm using it  
> for now. (I just need the reflexive transitive closure of the is_a  
> relation.) But it's still rather unsatisfactory in some ways. In  
> particular, it assumes that all relations are reflexive and  
> transitive, which isn't true. For example  
> 'non_functional_homolog_of' is not reflexive, and 'adjacent_to' is  
> neither. The cvtermpath entries for relations of this sort are at  
> least partially nonsensical.
>
> At least for the relationship ontology types, we do know which of  
> them are reflexive and which transitive, because it's in the OBO  
> file. (We also know that, for example, has_part is the inverse of  
> part_of.) Is there any reason not to take this information into  
> account when populating cvtermpath?
>
> The other issue is that of relations between the relation types  
> themselves. If A `proper_part_of` B, then certainly A `part_of` B,  
> for example. It would be useful if cvtermpath could also include  
> derived relations of this sort.
>
> If this has been discussed ad nauseam in the past, I apologise for  
> reopening old wounds. :-)
>
> Robin
>
>
>
>
> --
> The Wellcome Trust Sanger Institute is operated by Genome  
> ResearchLimited, a charity registered in England with number 1021457  
> and acompany registered in England with number 2742969, whose  
> registeredoffice is 215 Euston Road, London, NW1 2BE.
> <fill_cvtermpath.pgplsql>
>
>
>
> PS. I'm aware that there's also a Perl script make_cvtermpath.pl  
> included in the Chado distribution. I haven't examined it in detail.  
> From a cursory glance, it seems it wouldn't work as intended, since  
> it assumes that the name of the relationship ontology CV is  
> 'Relationship Ontology' and it expects there to be a cvterm whose  
> name is 'OBO_REL:0001'.
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's  
> challenge
> Build the coolest Linux based applications with Moblin SDK & win  
> great prizes
> Grand prize is a trip for two to an Open Source event anywhere in  
> the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/_______________________________________________
> Gmod-schema mailing list
> Gmod-schema@...
> https://lists.sourceforge.net/lists/listinfo/gmod-schema


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema
LightInTheBox - Buy quality products at wholesale price!