LInking stock with sequence feature

View: New views
4 Messages — Rating Filter:   Alert me  

LInking stock with sequence feature

by George Githinji :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,
Please what would be a good way of linking a stock with sequence feature entry?

I have some features which are only associated with isolates of a single organism

and I am wondering what is chado's best practice for storing such information.

Thank you

George

Website: http://biorelated.wordpress.com/
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema

Re: LInking stock with sequence feature

by Scott Cain-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi George,

It looks like nobody ever answered your question.  Sorry about that;
it's been a busy few weeks.  I think you are really asking two
questions:

1. How do I link a stock entry to a feature entry? and

2. How do I link a stock entry to a genotypically/phenotypically
identifiable sample?

The stock table is designed to link directly to the genotype for the
specific sample, so #2 is easy: there is a stock_genotype table to do
the linking.  To link a stock to a feature, you really need to link
the feature to the genotype first and then when you link the stock and
the genotype together, it will be transitively linked.

Does that make sense?  I've not done this myself, but I am pretty sure
that is what people do.

Scott



2008/7/4 George Githinji <georgkam@...>:

> Hi,
> Please what would be a good way of linking a stock with sequence feature
> entry?
>
> I have some features which are only associated with isolates of a single
> organism
>
> and I am wondering what is chado's best practice for storing such
> information.
>
> Thank you
>
> George
>
> Website: http://biorelated.wordpress.com/
> -------------------------------------------------------------------------
> Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
> Studies have shown that voting for your favorite open source project,
> along with a healthy diet, reduces your potential for chronic lameness
> and boredom. Vote Now at http://www.sourceforge.net/community/cca08
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema@...
> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl@...
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema

Parent Message unknown Re: LInking stock with sequence feature

by George Githinji :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

His Jay and Scott,
Thanks you so much for your suggested solutions. and Scott for reformulating my question. :)

I seem to have followed Jay's solution 2 for the time been but was feeling like it may not be the best solution.
Cain's transitive linking sounds good but am worried about the time it would take to execute some queries.

As for option Jay's option (4)
 if you store all stock data as attributes in the featureprop, what do you store as features and what do you store as organisms?

Please bear with my naivety

Thanks


On Wed, Jul 23, 2008 at 4:48 PM, Jay Sundaram <sundaram@...> wrote:
Hi George,

We also wanted to link the isolate (sequence) to stock data for our viral projects.
Since we did not have genotype information, the following strategies were considered:

1) create our own feature_stock table:
feature_stock_id
feature_id
stock_id

2) treat each isolate as a single organism and then link feature.organism_id, organism.organism_id, stock.organism_id

3) link feature.uniquename to stock.uniquename

4) store all stock data as attributes in the featureprop table

We settled on option 4.
Some of the types of attributes we're associating with the isolates are listed here:
sundaram@sundaram-lx % grep '^name:' viral_annotation_pipeline.obo
name: viral_annotation_pipeline
name: blinded_number
name: center_project
name: collection
name: collection_date
name: collection_location
name: contact_address
name: contact_email
name: contact_investigator
name: contact_PI
name: cov_group
name: date_received
name: extraction_date
name: extraction_method
name: genbank_def_line
name: inhabiting_organ
name: library_id
name: passage_history
name: sample
name: start_date
name: taxon
name: submission_date
name: author_list
name: collection_contact_address
name: collection_contact_email
name: collection_contact_name
name: collection_contact_phone
name: collection_start_date
name: genbank_accession

At some point I'd like to move the data into the stock module if that is appropriate.

Jay



Scott Cain wrote:

Hi George,

It looks like nobody ever answered your question.  Sorry about that;
it's been a busy few weeks.  I think you are really asking two
questions:

1. How do I link a stock entry to a feature entry? and

2. How do I link a stock entry to a genotypically/phenotypically
identifiable sample?

The stock table is designed to link directly to the genotype for the
specific sample, so #2 is easy: there is a stock_genotype table to do
the linking.  To link a stock to a feature, you really need to link
the feature to the genotype first and then when you link the stock and
the genotype together, it will be transitively linked.

Does that make sense?  I've not done this myself, but I am pretty sure
that is what people do.

Scott



2008/7/4 George Githinji <georgkam@...>:
 
Hi,
Please what would be a good way of linking a stock with sequence feature
entry?

I have some features which are only associated with isolates of a single
organism

and I am wondering what is chado's best practice for storing such
information.

Thank you

George

Website: http://biorelated.wordpress.com/
-------------------------------------------------------------------------
Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW!
Studies have shown that voting for your favorite open source project,
along with a healthy diet, reduces your potential for chronic lameness
and boredom. Vote Now at http://www.sourceforge.net/community/cca08
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema


 



 


--
Jay Sundaram
Bioinformatics Engineer
Informatics Department
J. Craig Venter Institute
9712 Medical Center Rockville, Maryland, 20850
B1/1st FL/99N
301-795-7983
sundaram@...
http://www.jcvi.org/




--
---------------
Sincerely
George

Skype: george_g2
Website: http://biorelated.wordpress.com/

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema

Re: LInking stock with sequence feature

by Jay Sundaram-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi George,

I've attached some query results.
We're currently storing the isolate/sequence as an 'assembly' along with
the usual suspects (canonical gene objects) gene, transcript, CDS, exon,
polypeptide- in the feature table.
As noted in the attachment, the product name is associated with the
'transcript' feature and the protein sequence is stored in the
'polypeptide' feature record.

Note that the coronavirus positive, single stranded RNA (genomic
sequence) is translated into a single polyprotein and then the
polyprotein is cleaved into individual products.
So our (JCVI) representation of this virus in chado is not accurate.
We're thinking about how we will store viral data in the Sequence module.

Jay

George Githinji wrote:

> His Jay and Scott,
> Thanks you so much for your suggested solutions. and Scott for
> reformulating my question. :)
>
> I seem to have followed Jay's solution 2 for the time been but was
> feeling like it may not be the best solution.
> Cain's transitive linking sounds good but am worried about the time it
> would take to execute some queries.
>
> As for option Jay's option (4)
>  if you store all stock data as attributes in the featureprop, what do
> you store as features and what do you store as organisms?
>
> Please bear with my naivety
>
> Thanks
>
>
> On Wed, Jul 23, 2008 at 4:48 PM, Jay Sundaram <sundaram@...
> <mailto:sundaram@...>> wrote:
>
>     Hi George,
>
>     We also wanted to link the isolate (sequence) to stock data for
>     our viral projects.
>     Since we did not have genotype information, the following
>     strategies were considered:
>
>     1) create our own feature_stock table:
>     feature_stock_id
>     feature_id
>     stock_id
>
>     2) treat each isolate as a single organism and then link
>     feature.organism_id, organism.organism_id, stock.organism_id
>
>     3) link feature.uniquename to stock.uniquename
>
>     4) store all stock data as attributes in the featureprop table
>
>     We settled on option 4.
>     Some of the types of attributes we're associating with the
>     isolates are listed here:
>     sundaram@sundaram-lx % grep '^name:' viral_annotation_pipeline.obo
>     name: viral_annotation_pipeline
>     name: blinded_number
>     name: center_project
>     name: collection
>     name: collection_date
>     name: collection_location
>     name: contact_address
>     name: contact_email
>     name: contact_investigator
>     name: contact_PI
>     name: cov_group
>     name: date_received
>     name: extraction_date
>     name: extraction_method
>     name: genbank_def_line
>     name: inhabiting_organ
>     name: library_id
>     name: passage_history
>     name: sample
>     name: start_date
>     name: taxon
>     name: submission_date
>     name: author_list
>     name: collection_contact_address
>     name: collection_contact_email
>     name: collection_contact_name
>     name: collection_contact_phone
>     name: collection_start_date
>     name: genbank_accession
>
>     At some point I'd like to move the data into the stock module if
>     that is appropriate.
>
>     Jay
>
>
>
>     Scott Cain wrote:
>
>         Hi George,
>
>         It looks like nobody ever answered your question.  Sorry about
>         that;
>         it's been a busy few weeks.  I think you are really asking two
>         questions:
>
>         1. How do I link a stock entry to a feature entry? and
>
>         2. How do I link a stock entry to a genotypically/phenotypically
>         identifiable sample?
>
>         The stock table is designed to link directly to the genotype
>         for the
>         specific sample, so #2 is easy: there is a stock_genotype
>         table to do
>         the linking.  To link a stock to a feature, you really need to
>         link
>         the feature to the genotype first and then when you link the
>         stock and
>         the genotype together, it will be transitively linked.
>
>         Does that make sense?  I've not done this myself, but I am
>         pretty sure
>         that is what people do.
>
>         Scott
>
>
>
>         2008/7/4 George Githinji <georgkam@...
>         <mailto:georgkam@...>>:
>          
>
>             Hi,
>             Please what would be a good way of linking a stock with
>             sequence feature
>             entry?
>
>             I have some features which are only associated with
>             isolates of a single
>             organism
>
>             and I am wondering what is chado's best practice for
>             storing such
>             information.
>
>             Thank you
>
>             George
>
>             Website: http://biorelated.wordpress.com/
>             -------------------------------------------------------------------------
>             Sponsored by: SourceForge.net Community Choice Awards:
>             VOTE NOW!
>             Studies have shown that voting for your favorite open
>             source project,
>             along with a healthy diet, reduces your potential for
>             chronic lameness
>             and boredom. Vote Now at
>             http://www.sourceforge.net/community/cca08
>             _______________________________________________
>             Gmod-schema mailing list
>             Gmod-schema@...
>             <mailto:Gmod-schema@...>
>             https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>
>              
>
>
>
>
>          
>
>
>
>     --
>     Jay Sundaram
>     Bioinformatics Engineer
>     Informatics Department
>     J. Craig Venter Institute
>     9712 Medical Center Rockville, Maryland, 20850
>     B1/1st FL/99N
>     301-795-7983
>     sundaram@... <mailto:sundaram@...>
>     http://www.jcvi.org/
>
>
>
>
> --
> ---------------
> Sincerely
> George
>
> Skype: george_g2
> Website: http://biorelated.wordpress.com/


--
Jay Sundaram
Bioinformatics Engineer
Informatics Department
J. Craig Venter Institute
9712 Medical Center Rockville, Maryland, 20850
B1/1st FL/99N
301-795-7983
sundaram@...
http://www.jcvi.org/


--
-- Contents of organism table.
-- We append strain info to the species.
--
1> select organism_id, substring(genus,1,15), substring(species,1,70),substring(abbreviation,1,30)
from organism
where genus != 'not known'
order by organism_id;
 organism_id
 ------------ --------------- ---------------------------------------------------------------------- ------------------------------
            1 Coronavirus     White-tailed deer coronavirus US/OH-WD470/1994                         white-tailed deer coronavirus
            2 Coronavirus     Bovine coronavirus E-AH187-TC                                          bovine coronavirus
            3 Coronavirus     Sambar deer coronavirus US/OH-WD388/1994                               sambar deer coronavirus
            4 Coronavirus     PRCV ISU-1                                                             prcv
            5 Coronavirus     Rat Coronavirus, strain Parker                                         rat coronavirus
            6 Coronavirus     Murine coronavirus strain MHV-3                                        murine coronavirus
            7 Coronavirus     Murine coronavirus strain A59 rBS                                      murine coronavirus
            8 Coronavirus     Calf-passaged Waterbuck coronavirus /US/OH-WD358-GnC/1994              waterbuck coronavirus
            9 Coronavirus     Murine coronavirus strain A59 R13rwt                                   murine coronavirus
           10 Coronavirus     TC-passaged Bovine respiratory coronavirus (Bovine/US/OH-440-TC/1996)  bovine coronavirus
           11 Coronavirus     Murine coronavirus repA59/RJHM                                         murine coronavirus
           12 Coronavirus     Murine coronavirus SA59/RJHM                                           murine coronavirus
           13 Coronavirus     Murine coronavirus strain JHM Arwt                                     murine coronavirus
           14 Coronavirus     Waterbuck coronavirus US/OH-WD358/1994                                 waterbuck coronavirus
           15 Coronavirus     Murine coronavirus strain MHV-1                                        murine coronavirus
           16 Coronavirus     Sambar Deer Coronavirus/US/OH-WD388-TC/1994                            sambar deer coronavirus
           17 Coronavirus     Murine coronavirus SJHM/RA59                                           murine coronavirus
           19 Coronavirus     Murine coronavirus strain JHM.IA                                       murine coronavirus
           20 Coronavirus     Bovine coronavirus E-DB2-TC                                            bovine coronavirus
           21 Coronavirus     Human enteric coronavirus strain 4408                                  human enteric coronavirus
           22 Coronavirus     Waterbuck coronavirus US/OH-WD358-TC/1994                              waterbuck coronavirus
           23 Coronavirus     TGEV Purdue P115                                                       tgev
           24 Coronavirus     Bovine respiratory coronavirus strain AH187                            bovine coronavirus

(23 rows affected)

--
-- Each isolate/sequence is treated as a single organism
--
1> select count(distinct(organism_id))
from feature f, cvterm c
where c.cvterm_id = f.type_id
and c.name = 'assembly';

 -----------
          23

(1 row affected)


--
-- Types of featurs associated with the first organism
--
1> select substring(c.name,1,20), count(f.type_id)
from feature f, cvterm c
where f.organism_id = 1
and f.type_id = c.cvterm_id
group by c.name;

 -------------------- -----------
 exon                           9
 transcript                     9
 CDS                            9
 gene                           9
 assembly                       1
 polypeptide                    9

(6 rows affected)

--
-- The gene, transcript, CDS, exon are merely placeholder artifacts.
-- The protein sequence is stored in the polypeptide feature record.
--
1> select top 1 f.*
from feature f, cvterm c
where c.name = 'gene'
and c.cvterm_id = f.type_id
and f.organism_id = 1

feature_id:       153
dbxref_id:        36578
organism_id:      1
name:             cva1.gene.844.1
uniquename:       cva1.gene.844.1
residues:         NULL
seqlen:           4092
md5checksum:      NULL
type_id:          23825
is_analysis:      0
is_obsolete:      0
timeaccessioned:  Feb 15 2008  3:53PM
timelastmodified: Feb 15 2008  3:53PM

(1 row affected)
1> select top 1 f.*
from feature f, cvterm c
where c.name = 'transcript'
and c.cvterm_id = f.type_id
and f.organism_id = 1

feature_id:       116
dbxref_id:        36574
organism_id:      1
name:             cva1.transcript.842.1
uniquename:       cva1.transcript.842.1
residues:         NULL
seqlen:           837
md5checksum:      NULL
type_id:          23794
is_analysis:      0
is_obsolete:      0
timeaccessioned:  Feb 15 2008  3:53PM
timelastmodified: Feb 15 2008  3:53PM

(1 row affected)
1> select top 1 f.*
from feature f, cvterm c
where c.name = 'CDS'
and c.cvterm_id = f.type_id
and f.organism_id = 1

feature_id:       67
dbxref_id:        36574
organism_id:      1
name:             cva1.CDS.842.1
uniquename:       cva1.CDS.842.1
residues:         NULL
seqlen:           837
md5checksum:      NULL
type_id:          23437
is_analysis:      0
is_obsolete:      0
timeaccessioned:  Feb 15 2008  3:53PM
timelastmodified: Feb 15 2008  3:53PM

(1 row affected)
1> select top 1 f.*
from feature f, cvterm c
where c.name = 'exon'
and c.cvterm_id = f.type_id
and f.organism_id = 1

feature_id:       78
dbxref_id:        36729
organism_id:      1
name:             cva1.exon.846.1
uniquename:       cva1.exon.846.1
residues:         NULL
seqlen:           330
md5checksum:      NULL
type_id:          23268
is_analysis:      0
is_obsolete:      0
timeaccessioned:  Feb 15 2008  3:53PM
timelastmodified: Feb 15 2008  3:53PM

(1 row affected)
1> select top 1 f.*
from feature f, cvterm c
where c.name = 'polypeptide'
and c.cvterm_id = f.type_id
and f.organism_id = 1

feature_id:       194
dbxref_id:        36578
organism_id:      1
name:             cva1.polypeptide.817.1
uniquename:       cva1.polypeptide.817.1
residues:         MAVAYADKPNHFINFPLTQFQGFVLNYKGLQFQLLDEGVDCKIQTAPHISLAMLDIQPEDYRSVDVAIQEVIDDMHWGEGFQIKFENPHILGRCIVLDVKGVEELHDDLVNYIRDKGCVADQSRKWIGHCTIAQLTDAAL
                 SIKENVDFINSMQFNYKITINPSSPARLEIVKLGAEKKDGFYETIASHWMGIRFEYNPPTDKLAMIMGYCCLEVVRKELEEGDLPENDDDAWFKLSYHYENNSWFFRHVYRKSSYFRKSCQNLDCNCLGFYESSVEED*
seqlen:           279
md5checksum:      00000000000000000000000000000710
type_id:          23225
is_analysis:      0
is_obsolete:      0
timeaccessioned:  Feb 15 2008  3:53PM
timelastmodified: Feb 15 2008  3:53PM

(1 row affected)


--
-- The product name is associated with the transcript feature
--
1> select substring(c.name,1,20), substring(fp.value,1,50)
from featureprop fp, cvterm c
where c.cvterm_id = fp.type_id
and fp.feature_id = 116 ;

 -------------------- --------------------------------------------------
 gene_product_name    32 kDa non-structural protein

(1 row affected)


--
-- Sample information being associated with the isolates/sequences
-- (i.e. the features where feature.type_id = cvterm.cvterm_id and cvterm.name = 'assembly')
--
1> select substring(c.name,1,20), substring(fp.value,1,60)
from featureprop fp, cvterm c
where c.cvterm_id = fp.type_id
and fp.feature_id = 21;

 -------------------- ------------------------------------------------------------
 pi                   somebody, Ohio State University
 molecule_type        rna
 topology             linear
 host                 White-tailed deer
 blinded_number       TCVSP-SAIF-00016
 center_project       TIGR-GCV-16542
 collection_date      1994
 collection_location  The Ohio State University, Ohio, USA
 cov_group            2
 date_received        6/7/2006
 extraction_date      1/6/2006
 extraction_method    RNAeasy mini kit, Qiagen
 genbank_def_line     White-tailed deer coronavirus (White-tailed deer/US/OH-WD470
 library_id           CVDM
 passage_history      direct sequencing of original sample
 author_list          Spiro,D., Halpin,R., Wang,S., Hostetler,J., Overton,L., Tsit
 collection_contact_a The Ohio State University
Food Animal Health Research Progra
 collection_contact_e somebody@...
 collection_contact_n somebody somebody
 collection_contact_p (111)111-1111
 collection_start_dat 2007-01-16 10:32:38 EST

(21 rows affected)

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-schema mailing list
Gmod-schema@...
https://lists.sourceforge.net/lists/listinfo/gmod-schema
LightInTheBox - Buy quality products at wholesale price