|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
LInking stock with sequence featureHi,
Please what would be a good way of linking a stock with sequence feature entry? I have some features which are only associated with isolates of a single organism and I am wondering what is chado's best practice for storing such information. Thank you George Website: http://biorelated.wordpress.com/ ------------------------------------------------------------------------- Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! Studies have shown that voting for your favorite open source project, along with a healthy diet, reduces your potential for chronic lameness and boredom. Vote Now at http://www.sourceforge.net/community/cca08 _______________________________________________ Gmod-schema mailing list Gmod-schema@... https://lists.sourceforge.net/lists/listinfo/gmod-schema |
|
|
Re: LInking stock with sequence featureHi George,
It looks like nobody ever answered your question. Sorry about that; it's been a busy few weeks. I think you are really asking two questions: 1. How do I link a stock entry to a feature entry? and 2. How do I link a stock entry to a genotypically/phenotypically identifiable sample? The stock table is designed to link directly to the genotype for the specific sample, so #2 is easy: there is a stock_genotype table to do the linking. To link a stock to a feature, you really need to link the feature to the genotype first and then when you link the stock and the genotype together, it will be transitively linked. Does that make sense? I've not done this myself, but I am pretty sure that is what people do. Scott 2008/7/4 George Githinji <georgkam@...>: > Hi, > Please what would be a good way of linking a stock with sequence feature > entry? > > I have some features which are only associated with isolates of a single > organism > > and I am wondering what is chado's best practice for storing such > information. > > Thank you > > George > > Website: http://biorelated.wordpress.com/ > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: VOTE NOW! > Studies have shown that voting for your favorite open source project, > along with a healthy diet, reduces your potential for chronic lameness > and boredom. Vote Now at http://www.sourceforge.net/community/cca08 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema@... > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > -- ------------------------------------------------------------------------ Scott Cain, Ph. D. cain.cshl@... GMOD Coordinator (http://www.gmod.org/) 216-392-3087 Cold Spring Harbor Laboratory ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Gmod-schema mailing list Gmod-schema@... https://lists.sourceforge.net/lists/listinfo/gmod-schema |
|
|
|
|
|
Re: LInking stock with sequence featureHi George,
I've attached some query results. We're currently storing the isolate/sequence as an 'assembly' along with the usual suspects (canonical gene objects) gene, transcript, CDS, exon, polypeptide- in the feature table. As noted in the attachment, the product name is associated with the 'transcript' feature and the protein sequence is stored in the 'polypeptide' feature record. Note that the coronavirus positive, single stranded RNA (genomic sequence) is translated into a single polyprotein and then the polyprotein is cleaved into individual products. So our (JCVI) representation of this virus in chado is not accurate. We're thinking about how we will store viral data in the Sequence module. Jay George Githinji wrote: > His Jay and Scott, > Thanks you so much for your suggested solutions. and Scott for > reformulating my question. :) > > I seem to have followed Jay's solution 2 for the time been but was > feeling like it may not be the best solution. > Cain's transitive linking sounds good but am worried about the time it > would take to execute some queries. > > As for option Jay's option (4) > if you store all stock data as attributes in the featureprop, what do > you store as features and what do you store as organisms? > > Please bear with my naivety > > Thanks > > > On Wed, Jul 23, 2008 at 4:48 PM, Jay Sundaram <sundaram@... > <mailto:sundaram@...>> wrote: > > Hi George, > > We also wanted to link the isolate (sequence) to stock data for > our viral projects. > Since we did not have genotype information, the following > strategies were considered: > > 1) create our own feature_stock table: > feature_stock_id > feature_id > stock_id > > 2) treat each isolate as a single organism and then link > feature.organism_id, organism.organism_id, stock.organism_id > > 3) link feature.uniquename to stock.uniquename > > 4) store all stock data as attributes in the featureprop table > > We settled on option 4. > Some of the types of attributes we're associating with the > isolates are listed here: > sundaram@sundaram-lx % grep '^name:' viral_annotation_pipeline.obo > name: viral_annotation_pipeline > name: blinded_number > name: center_project > name: collection > name: collection_date > name: collection_location > name: contact_address > name: contact_email > name: contact_investigator > name: contact_PI > name: cov_group > name: date_received > name: extraction_date > name: extraction_method > name: genbank_def_line > name: inhabiting_organ > name: library_id > name: passage_history > name: sample > name: start_date > name: taxon > name: submission_date > name: author_list > name: collection_contact_address > name: collection_contact_email > name: collection_contact_name > name: collection_contact_phone > name: collection_start_date > name: genbank_accession > > At some point I'd like to move the data into the stock module if > that is appropriate. > > Jay > > > > Scott Cain wrote: > > Hi George, > > It looks like nobody ever answered your question. Sorry about > that; > it's been a busy few weeks. I think you are really asking two > questions: > > 1. How do I link a stock entry to a feature entry? and > > 2. How do I link a stock entry to a genotypically/phenotypically > identifiable sample? > > The stock table is designed to link directly to the genotype > for the > specific sample, so #2 is easy: there is a stock_genotype > table to do > the linking. To link a stock to a feature, you really need to > link > the feature to the genotype first and then when you link the > stock and > the genotype together, it will be transitively linked. > > Does that make sense? I've not done this myself, but I am > pretty sure > that is what people do. > > Scott > > > > 2008/7/4 George Githinji <georgkam@... > <mailto:georgkam@...>>: > > > Hi, > Please what would be a good way of linking a stock with > sequence feature > entry? > > I have some features which are only associated with > isolates of a single > organism > > and I am wondering what is chado's best practice for > storing such > information. > > Thank you > > George > > Website: http://biorelated.wordpress.com/ > ------------------------------------------------------------------------- > Sponsored by: SourceForge.net Community Choice Awards: > VOTE NOW! > Studies have shown that voting for your favorite open > source project, > along with a healthy diet, reduces your potential for > chronic lameness > and boredom. Vote Now at > http://www.sourceforge.net/community/cca08 > _______________________________________________ > Gmod-schema mailing list > Gmod-schema@... > <mailto:Gmod-schema@...> > https://lists.sourceforge.net/lists/listinfo/gmod-schema > > > > > > > > > > > > -- > Jay Sundaram > Bioinformatics Engineer > Informatics Department > J. Craig Venter Institute > 9712 Medical Center Rockville, Maryland, 20850 > B1/1st FL/99N > 301-795-7983 > sundaram@... <mailto:sundaram@...> > http://www.jcvi.org/ > > > > > -- > --------------- > Sincerely > George > > Skype: george_g2 > Website: http://biorelated.wordpress.com/ -- Jay Sundaram Bioinformatics Engineer Informatics Department J. Craig Venter Institute 9712 Medical Center Rockville, Maryland, 20850 B1/1st FL/99N 301-795-7983 sundaram@... http://www.jcvi.org/ -- -- Contents of organism table. -- We append strain info to the species. -- 1> select organism_id, substring(genus,1,15), substring(species,1,70),substring(abbreviation,1,30) from organism where genus != 'not known' order by organism_id; organism_id ------------ --------------- ---------------------------------------------------------------------- ------------------------------ 1 Coronavirus White-tailed deer coronavirus US/OH-WD470/1994 white-tailed deer coronavirus 2 Coronavirus Bovine coronavirus E-AH187-TC bovine coronavirus 3 Coronavirus Sambar deer coronavirus US/OH-WD388/1994 sambar deer coronavirus 4 Coronavirus PRCV ISU-1 prcv 5 Coronavirus Rat Coronavirus, strain Parker rat coronavirus 6 Coronavirus Murine coronavirus strain MHV-3 murine coronavirus 7 Coronavirus Murine coronavirus strain A59 rBS murine coronavirus 8 Coronavirus Calf-passaged Waterbuck coronavirus /US/OH-WD358-GnC/1994 waterbuck coronavirus 9 Coronavirus Murine coronavirus strain A59 R13rwt murine coronavirus 10 Coronavirus TC-passaged Bovine respiratory coronavirus (Bovine/US/OH-440-TC/1996) bovine coronavirus 11 Coronavirus Murine coronavirus repA59/RJHM murine coronavirus 12 Coronavirus Murine coronavirus SA59/RJHM murine coronavirus 13 Coronavirus Murine coronavirus strain JHM Arwt murine coronavirus 14 Coronavirus Waterbuck coronavirus US/OH-WD358/1994 waterbuck coronavirus 15 Coronavirus Murine coronavirus strain MHV-1 murine coronavirus 16 Coronavirus Sambar Deer Coronavirus/US/OH-WD388-TC/1994 sambar deer coronavirus 17 Coronavirus Murine coronavirus SJHM/RA59 murine coronavirus 19 Coronavirus Murine coronavirus strain JHM.IA murine coronavirus 20 Coronavirus Bovine coronavirus E-DB2-TC bovine coronavirus 21 Coronavirus Human enteric coronavirus strain 4408 human enteric coronavirus 22 Coronavirus Waterbuck coronavirus US/OH-WD358-TC/1994 waterbuck coronavirus 23 Coronavirus TGEV Purdue P115 tgev 24 Coronavirus Bovine respiratory coronavirus strain AH187 bovine coronavirus (23 rows affected) -- -- Each isolate/sequence is treated as a single organism -- 1> select count(distinct(organism_id)) from feature f, cvterm c where c.cvterm_id = f.type_id and c.name = 'assembly'; ----------- 23 (1 row affected) -- -- Types of featurs associated with the first organism -- 1> select substring(c.name,1,20), count(f.type_id) from feature f, cvterm c where f.organism_id = 1 and f.type_id = c.cvterm_id group by c.name; -------------------- ----------- exon 9 transcript 9 CDS 9 gene 9 assembly 1 polypeptide 9 (6 rows affected) -- -- The gene, transcript, CDS, exon are merely placeholder artifacts. -- The protein sequence is stored in the polypeptide feature record. -- 1> select top 1 f.* from feature f, cvterm c where c.name = 'gene' and c.cvterm_id = f.type_id and f.organism_id = 1 feature_id: 153 dbxref_id: 36578 organism_id: 1 name: cva1.gene.844.1 uniquename: cva1.gene.844.1 residues: NULL seqlen: 4092 md5checksum: NULL type_id: 23825 is_analysis: 0 is_obsolete: 0 timeaccessioned: Feb 15 2008 3:53PM timelastmodified: Feb 15 2008 3:53PM (1 row affected) 1> select top 1 f.* from feature f, cvterm c where c.name = 'transcript' and c.cvterm_id = f.type_id and f.organism_id = 1 feature_id: 116 dbxref_id: 36574 organism_id: 1 name: cva1.transcript.842.1 uniquename: cva1.transcript.842.1 residues: NULL seqlen: 837 md5checksum: NULL type_id: 23794 is_analysis: 0 is_obsolete: 0 timeaccessioned: Feb 15 2008 3:53PM timelastmodified: Feb 15 2008 3:53PM (1 row affected) 1> select top 1 f.* from feature f, cvterm c where c.name = 'CDS' and c.cvterm_id = f.type_id and f.organism_id = 1 feature_id: 67 dbxref_id: 36574 organism_id: 1 name: cva1.CDS.842.1 uniquename: cva1.CDS.842.1 residues: NULL seqlen: 837 md5checksum: NULL type_id: 23437 is_analysis: 0 is_obsolete: 0 timeaccessioned: Feb 15 2008 3:53PM timelastmodified: Feb 15 2008 3:53PM (1 row affected) 1> select top 1 f.* from feature f, cvterm c where c.name = 'exon' and c.cvterm_id = f.type_id and f.organism_id = 1 feature_id: 78 dbxref_id: 36729 organism_id: 1 name: cva1.exon.846.1 uniquename: cva1.exon.846.1 residues: NULL seqlen: 330 md5checksum: NULL type_id: 23268 is_analysis: 0 is_obsolete: 0 timeaccessioned: Feb 15 2008 3:53PM timelastmodified: Feb 15 2008 3:53PM (1 row affected) 1> select top 1 f.* from feature f, cvterm c where c.name = 'polypeptide' and c.cvterm_id = f.type_id and f.organism_id = 1 feature_id: 194 dbxref_id: 36578 organism_id: 1 name: cva1.polypeptide.817.1 uniquename: cva1.polypeptide.817.1 residues: MAVAYADKPNHFINFPLTQFQGFVLNYKGLQFQLLDEGVDCKIQTAPHISLAMLDIQPEDYRSVDVAIQEVIDDMHWGEGFQIKFENPHILGRCIVLDVKGVEELHDDLVNYIRDKGCVADQSRKWIGHCTIAQLTDAAL SIKENVDFINSMQFNYKITINPSSPARLEIVKLGAEKKDGFYETIASHWMGIRFEYNPPTDKLAMIMGYCCLEVVRKELEEGDLPENDDDAWFKLSYHYENNSWFFRHVYRKSSYFRKSCQNLDCNCLGFYESSVEED* seqlen: 279 md5checksum: 00000000000000000000000000000710 type_id: 23225 is_analysis: 0 is_obsolete: 0 timeaccessioned: Feb 15 2008 3:53PM timelastmodified: Feb 15 2008 3:53PM (1 row affected) -- -- The product name is associated with the transcript feature -- 1> select substring(c.name,1,20), substring(fp.value,1,50) from featureprop fp, cvterm c where c.cvterm_id = fp.type_id and fp.feature_id = 116 ; -------------------- -------------------------------------------------- gene_product_name 32 kDa non-structural protein (1 row affected) -- -- Sample information being associated with the isolates/sequences -- (i.e. the features where feature.type_id = cvterm.cvterm_id and cvterm.name = 'assembly') -- 1> select substring(c.name,1,20), substring(fp.value,1,60) from featureprop fp, cvterm c where c.cvterm_id = fp.type_id and fp.feature_id = 21; -------------------- ------------------------------------------------------------ pi somebody, Ohio State University molecule_type rna topology linear host White-tailed deer blinded_number TCVSP-SAIF-00016 center_project TIGR-GCV-16542 collection_date 1994 collection_location The Ohio State University, Ohio, USA cov_group 2 date_received 6/7/2006 extraction_date 1/6/2006 extraction_method RNAeasy mini kit, Qiagen genbank_def_line White-tailed deer coronavirus (White-tailed deer/US/OH-WD470 library_id CVDM passage_history direct sequencing of original sample author_list Spiro,D., Halpin,R., Wang,S., Hostetler,J., Overton,L., Tsit collection_contact_a The Ohio State University Food Animal Health Research Progra collection_contact_e somebody@... collection_contact_n somebody somebody collection_contact_p (111)111-1111 collection_start_dat 2007-01-16 10:32:38 EST (21 rows affected) ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Gmod-schema mailing list Gmod-schema@... https://lists.sourceforge.net/lists/listinfo/gmod-schema |
| Free Forum Powered by Nabble | Forum Help |