Any volunteers for doing a Nested Containment List implementation?

View: New views
4 Messages — Rating Filter:   Alert me  

Any volunteers for doing a Nested Containment List implementation?

by Lincoln Stein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Folks,

This paper describes an indexing algorithm for genome feature databases called Nested Containment Lists. Apparently it is substantially faster than the indexing systems we use in BioPerl for the Bio::DB::GFF, Bio::DB::SeqFeature::Store, and Chado.


The concept and data structures are quite simple, and I think it would be straightforward to implement this system in MySQL tables. Would anybody be interested in taking this on as a summer project?

Lincoln

--
Lincoln D. Stein

Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Stacey Fairfield <Stacey.Fairfield@...>

Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724 USA
(516) 367-8380
Assistant: Sandra Michelsen <michelse@...>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [Bioperl-l] Any volunteers for doing a Nested Containment List implementation?

by Chris Fields :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

It's definitely worth exploring.  Might be nice to sort a few things  
out, for instance, do we want to use their C library, or role our own  
data structures?  And would we want this to retain a similar interface  
to Bio::DB::SeqFeature::Store?

chris

On Jul 21, 2008, at 7:02 PM, Lincoln Stein wrote:

> Hi Folks,
>
> This paper describes an indexing algorithm for genome feature  
> databases
> called Nested Containment Lists. Apparently it is substantially  
> faster than
> the indexing systems we use in BioPerl for the Bio::DB::GFF,
> Bio::DB::SeqFeature::Store, and Chado.
>
> http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1?papetoc
>
> The concept and data structures are quite simple, and I think it  
> would be
> straightforward to implement this system in MySQL tables. Would  
> anybody be
> interested in taking this on as a summer project?
>
> Lincoln
>
> --
> Lincoln D. Stein
>
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Stacey Fairfield <Stacey.Fairfield@...>
>
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724 USA
> (516) 367-8380
> Assistant: Sandra Michelsen <michelse@...>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign





-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [Bioperl-l] Any volunteers for doing a Nested Containment List implementation?

by Lincoln Stein :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I was thinking that the data structures could be reimplemented as MySQL tables. The main data structures are both fixed-width records. They do a binary search across the arrays, and MySQL can do the same thing with its B-tree indexing.

Lincoln

On Mon, Jul 21, 2008 at 9:27 PM, Chris Fields <cjfields@...> wrote:
It's definitely worth exploring.  Might be nice to sort a few things out, for instance, do we want to use their C library, or role our own data structures?  And would we want this to retain a similar interface to Bio::DB::SeqFeature::Store?

chris


On Jul 21, 2008, at 7:02 PM, Lincoln Stein wrote:

Hi Folks,

This paper describes an indexing algorithm for genome feature databases
called Nested Containment Lists. Apparently it is substantially faster than
the indexing systems we use in BioPerl for the Bio::DB::GFF,
Bio::DB::SeqFeature::Store, and Chado.

http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1?papetoc

The concept and data structures are quite simple, and I think it would be
straightforward to implement this system in MySQL tables. Would anybody be
interested in taking this on as a summer project?

Lincoln

--
Lincoln D. Stein

Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Stacey Fairfield <Stacey.Fairfield@...>

Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724 USA
(516) 367-8380
Assistant: Sandra Michelsen <michelse@...>
_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Marie-Claude Hofmann
College of Veterinary Medicine
University of Illinois Urbana-Champaign







--
Lincoln D. Stein

Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Stacey Fairfield <Stacey.Fairfield@...>

Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724 USA
(516) 367-8380
Assistant: Sandra Michelsen <michelse@...>

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse

Re: [Bioperl-l] Any volunteers for doing a Nested Containment List implementation?

by aaron.j.mackey :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

FYI, there's also (native) support for RTREE-based spatial indices in
MySQL 5.1 and beyond ...

  http://dev.mysql.com/doc/refman/5.1/en/optimizing-spatial-analysis.html

I once took a day to try to implement this NCL algorithm as a MySQL UDF
(i.e. written entirely in C), but got bogged down and discouraged.  The
PostgreSQL implementation is available from the authors, if I remember
correctly.

-Aaron

bioperl-l-bounces@... wrote on 07/21/2008 08:02:36 PM:

> Hi Folks,
>
> This paper describes an indexing algorithm for genome feature databases
> called Nested Containment Lists. Apparently it is substantially faster
than
> the indexing systems we use in BioPerl for the Bio::DB::GFF,
> Bio::DB::SeqFeature::Store, and Chado.
>
>
http://bioinformatics.oxfordjournals.org/cgi/content/abstract/btl647v1?papetoc
>
> The concept and data structures are quite simple, and I think it would
be
> straightforward to implement this system in MySQL tables. Would anybody
be

> interested in taking this on as a summer project?
>
> Lincoln
>
> --
> Lincoln D. Stein
>
> Ontario Institute for Cancer Research
> 101 College St., Suite 800
> Toronto, ON, Canada M5G0A3
> 416 673-8514
> Assistant: Stacey Fairfield <Stacey.Fairfield@...>
>
> Cold Spring Harbor Laboratory
> 1 Bungtown Road
> Cold Spring Harbor, NY 11724 USA
> (516) 367-8380
> Assistant: Sandra Michelsen <michelse@...>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>



-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse@...
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse