Taxonomy Schema Proposal

View: New views
20 Messages — Rating Filter:   Alert me  
< Prev | 1 - 2 - 3 | Next >

Taxonomy Schema Proposal

by Ryan Boren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Putting aside the rest of the debate, let's talk schema.  Here's a
proposed three table solution where we have terms (tags/categories),
taxonomies (link categories, post categories, post tags), and objects
(posts/pages and links).

// terms contains the actual
categories/tags/terms/classifiers/whatevers.  It stores only ID, name,
and slug.
CREATE TABLE $wpdb->terms (
 term_id bigint(20) NOT NULL auto_increment,
 term_name varchar(55) NOT NULL default '',
 term_slug varchar(200) NOT NULL default '',
 PRIMARY KEY  (term_ID),
 KEY term_slug (term_slug)
);
// term_taxonomy puts a term in the context of a taxonomy (link
category, post category, or tag).  Hierarchy is put here as well as
counts.
CREATE TABLE $wpdb->term_taxonomy (
 term_id bigint(20) NOT NULL default '0',
 taxonomy varchar(20) NOT NULL default '0',
 term_description longtext NOT NULL,
 parent bigint(20) NOT NULL default '0',
 count bigint(20) NOT NULL default '0'
);
// term_relationships relates a term to a post or link or undeclared
future object thingy.  The relationship is placed within the context
of a given taxonomy.
CREATE TABLE $wpdb->term_relationships (
 object_id bigint(20) NOT NULL default '0',
 term_id bigint(20) NOT NULL default '0',
 taxonomy varchar(20) NOT NULL default '0',
 PRIMARY KEY  (object_ID),
);

I haven't thought about the best keying yet.  TBD.

"taxonomy" is a string that can be 'post-category', 'link-category',
or 'post-tag'.  Plugins can add their own taxonomy types.  This could
be an id that relates to a taxonomy table instead of a stand-alone
string, but another table doesn't seem necessary. Another table would
mean having the taxonomy string in one place rather than two, but I
didn't drink that much of the normalization punch back in school.

This approach maintains one id for a given term name while separating
out the troublesome hierarchy and count fields into a table that can
maintain them in relation to specific taxonomies.  This avoids the
troubles encountered with the current category_parent field when link
and post categories share a table.  It also avoids having to add a
count field for every taxonomy type.  This schema seems well-suited to
plugins.  Plugins can easily add their own taxonomies. (Ryan's One
True Folksonomic Reverie Plugin here we come.) Generic API along the
lines of get_term($id, $taxonomy) and
get_terms($taxonomy[|$taxonomy]...) could be simple and powerful.  The
different taxonomies could be kept separate or joined in interesting
ways.

Ryan
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Mark Jaquith :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Apr 16, 2007, at 3:19 AM, Ryan Boren wrote:

> Putting aside the rest of the debate, let's talk schema.  Here's a
> proposed three table solution where we have terms (tags/categories),
> taxonomies (link categories, post categories, post tags), and objects
> (posts/pages and links).

I've been thinking about this for the past couple of hours since you  
pitched it to me, and I've yet to find fault with it.  It appears to  
deliver everything Matt's been trying to shoehorn into our current  
structure, but has several huge advantages.  The queries are sane,  
there aren't a zillion foo_count columns, you can have completely  
different Link Category and Post Category trees, and (this is a BIG  
one), it is easily and cleanly extendible via plugins.

Looking forward to the response on this.  Hope crowd from the other  
thread gives this a serious look.

To all the people who want pages to be categorizable, with this  
schema, a plugin can do it *without adding any new tables* and  
*without screwing up existing taxonomies*.  That should get some  
people interested.

--
Mark Jaquith
http://markjaquith.com/

Covered Web Services
http://coveredwebservices.com/


_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Peter Westwood :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Mon, April 16, 2007 9:05 am, Mark Jaquith wrote:

> On Apr 16, 2007, at 3:19 AM, Ryan Boren wrote:
>
>> Putting aside the rest of the debate, let's talk schema.  Here's a
>> proposed three table solution where we have terms (tags/categories),
>> taxonomies (link categories, post categories, post tags), and objects
>> (posts/pages and links).
>
> I've been thinking about this for the past couple of hours since you
> pitched it to me, and I've yet to find fault with it.  It appears to
> deliver everything Matt's been trying to shoehorn into our current
> structure, but has several huge advantages.  The queries are sane,
> there aren't a zillion foo_count columns, you can have completely
> different Link Category and Post Category trees, and (this is a BIG
> one), it is easily and cleanly extendible via plugins.
>
> Looking forward to the response on this.  Hope crowd from the other
> thread gives this a serious look.
>
> To all the people who want pages to be categorizable, with this
> schema, a plugin can do it *without adding any new tables* and
> *without screwing up existing taxonomies*.  That should get some
> people interested.
>

This new schema proposal looks really good.

I think we need to ensure we expose two levels of API.

A low level api that works directly with these individual tables should be
provided to allow plugins that extend it to add new taxonomies types
without needing to interact with the db directly.

e.g.
wp_add_term(term_name,term_slug)
wp_add_term_to_taxonomy(term_name,taxonomy,term_description,opt_parent)
wp_relate_term_to_object(term_name,taxonomy,object_id)


We can then write the tag/catgeory/link_category API on top of this low
level API so that they do not require any direct db access.


westi
--
Peter Westwood <peter.westwood@...>
http://blog.ftwr.co.uk
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jamie Talbot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ryan Boren wrote:

> Putting aside the rest of the debate, let's talk schema.  Here's a
> proposed three table solution where we have terms (tags/categories),
> taxonomies (link categories, post categories, post tags), and objects
> (posts/pages and links).
>
> // terms contains the actual
> categories/tags/terms/classifiers/whatevers.  It stores only ID, name,
> and slug.
> CREATE TABLE $wpdb->terms (
> term_id bigint(20) NOT NULL auto_increment,
> term_name varchar(55) NOT NULL default '',
> term_slug varchar(200) NOT NULL default '',
> PRIMARY KEY  (term_ID),
> KEY term_slug (term_slug)
> );
> // term_taxonomy puts a term in the context of a taxonomy (link
> category, post category, or tag).  Hierarchy is put here as well as
> counts.
> CREATE TABLE $wpdb->term_taxonomy (
> term_id bigint(20) NOT NULL default '0',
> taxonomy varchar(20) NOT NULL default '0',
> term_description longtext NOT NULL,
> parent bigint(20) NOT NULL default '0',
> count bigint(20) NOT NULL default '0'
> );
> // term_relationships relates a term to a post or link or undeclared
> future object thingy.  The relationship is placed within the context
> of a given taxonomy.
> CREATE TABLE $wpdb->term_relationships (
> object_id bigint(20) NOT NULL default '0',
> term_id bigint(20) NOT NULL default '0',
> taxonomy varchar(20) NOT NULL default '0',
> PRIMARY KEY  (object_ID),
> );
>
> I haven't thought about the best keying yet.  TBD.

This is a vast improvement on what we have, and a very nice way of providing flexibility for the
future.  Nice job Ryan!  I agree that we probably don't need another table for taxonomy types.  I'm
wondering whether we could get away with only writing taxonomy in one place anyway though:

($wpdb->terms remains the same)

CREATE TABLE $wpdb->term_taxonomy (
term_taxonomy_id bigint(20) NOT NULL default '0',
term_id bigint(20) NOT NULL default '0',
taxonomy varchar(20) NOT NULL default '0',
term_description longtext NOT NULL,
parent bigint(20) NOT NULL default '0',
count bigint(20) NOT NULL default '0'
PRIMARY KEY (term_taxonomy_id)
);

CREATE TABLE $wpdb->term_relationships (
object_id bigint(20) NOT NULL default '0',
term_taxonomy_id bigint(20) NOT NULL default '0'
PRIMARY KEY  (object_ID)
);

Basically the same, except that the term_relationships and term_taxonomy tables are related by an
integer, instead of a tuple.  We're going to have to join the two tables to get taxonomy data
anyway, and nothing beats joining on an int.  We also only have to store the taxonomy field strings
once.  Net result of the same amount of fields, but a bit cleaner.  Thoughts?

Cheers,

Jamie.

- --
http://jamietalbot.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGI0i/rovxfShShFARAtsPAJ4oxynYk6ufMOmvcOI1bknBWSsMtgCfcAfH
9KeGX3dA0Z/b6SXBxw7ksq8=
=Kg5l
-----END PGP SIGNATURE-----
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jamie Talbot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Jamie Talbot wrote:
> CREATE TABLE $wpdb->term_relationships (
> object_id bigint(20) NOT NULL default '0',
> term_taxonomy_id bigint(20) NOT NULL default '0'
> PRIMARY KEY  (object_ID)
> );
>

While I'm thinking about this, is the object_id from this table expected to always come from
wp_posts?  Ryan mentioned this would also be for associating links.  If so, object_id can't be a
primary key as we'll be using ids from different tables.  Better perhaps to have a primary key
across the tuple, which will always be unique.  We'd also have to make sure we specified the
taxonomy in every query to make sure we didn't get back the wrong type inadvertently.

wp_links
1 http://wordpress.org WordPress ...
2 http://photomatt.net Matt ...

wp_posts
1 ... Hello World hello-world ...

SELECT tr.* FROM $wpdb->term_taxonomy AS t INNER JOIN $wpdb->term_relationships AS tr USING
term_taxonomy_id WHERE t.object_id = 1;

Could potentially return post_category, post_tag or link_category taxonomy data.  Specifying the
taxonomy will avoid this (assuming we always store the taxonomy correctly, and we've got bigger
problems if we don't...)

SELECT tr.* FROM $wpdb->term_taxonomy AS t INNER JOIN $wpdb->term_relationships AS tr USING
term_taxonomy_id WHERE t.object_id = 1 AND tr.taxonomy = 'post_category';

Would only return the post_category data, which you would assume would only ever be associated with
posts/pages.  Of course, if there is the possibility that we separate out wp_posts/wp_pages in the
future, we might have to revisit that.

Just throwing out some thoughts.

Cheers,

Jamie.

- --
http://jamietalbot.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGI1EzrovxfShShFARArEYAJwJQPyTmnp/zNfdBI6Nl+SFr5Sk/gCgiRtq
Ib74hUp51CwoGwqs4GP1jLs=
=gcN/
-----END PGP SIGNATURE-----
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Ryan Boren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4/16/07, Jamie Talbot <wphackers@...> wrote:
> Basically the same, except that the term_relationships and term_taxonomy tables are related by an
> integer, instead of a tuple.  We're going to have to join the two tables to get taxonomy data
> anyway, and nothing beats joining on an int.  We also only have to store the taxonomy field strings
> once.  Net result of the same amount of fields, but a bit cleaner.  Thoughts?

I had it like that in an earlier iteration, but how do plugins avoid
battling for new taxonomy ints?  A string provides less chance of
collision.  An int would be better for a join, however, so maybe we
should go ahead and add a taxonomy table that will act as a broker for
plugins to get a taxonomy id.

Ryan
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Ryan Boren :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4/16/07, Jamie Talbot <wphackers@...> wrote:

> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Jamie Talbot wrote:
> > CREATE TABLE $wpdb->term_relationships (
> > object_id bigint(20) NOT NULL default '0',
> > term_taxonomy_id bigint(20) NOT NULL default '0'
> > PRIMARY KEY  (object_ID)
> > );
> >
>
> While I'm thinking about this, is the object_id from this table expected to always come from
> wp_posts?  Ryan mentioned this would also be for associating links.  If so, object_id can't be a
> primary key as we'll be using ids from different tables.  Better perhaps to have a primary key
> across the tuple, which will always be unique.  We'd also have to make sure we specified the
> taxonomy in every query to make sure we didn't get back the wrong type inadvertently.

Indeed, object_ID can be a post id or a link id or something provided
by a plugin.   Definitely need to keep that in mind when laying out
the keys.

Ryan
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Steve Lewis-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4/16/07, Ryan Boren <ryan@...> wrote:

>
> // terms contains the actual
> categories/tags/terms/classifiers/whatevers.  It stores only ID, name,
> and slug.
> CREATE TABLE $wpdb->terms (
> term_id bigint(20) NOT NULL auto_increment,
> term_name varchar(55) NOT NULL default '',
> term_slug varchar(200) NOT NULL default '',
> PRIMARY KEY  (term_ID),
> KEY term_slug (term_slug)
> );
> // term_taxonomy puts a term in the context of a taxonomy (link
> category, post category, or tag).  Hierarchy is put here as well as
> counts.
> CREATE TABLE $wpdb->term_taxonomy (
> term_id bigint(20) NOT NULL default '0',
> taxonomy varchar(20) NOT NULL default '0',
> term_description longtext NOT NULL,
> parent bigint(20) NOT NULL default '0',
> count bigint(20) NOT NULL default '0'
> );
> // term_relationships relates a term to a post or link or undeclared
> future object thingy.  The relationship is placed within the context
> of a given taxonomy.
> CREATE TABLE $wpdb->term_relationships (
> object_id bigint(20) NOT NULL default '0',
> term_id bigint(20) NOT NULL default '0',
> taxonomy varchar(20) NOT NULL default '0',
> PRIMARY KEY  (object_ID),
> );
>

I would like to ask you to expound on three use cases:
 - Aliases  (for instance: products -> projects)
 - Lookup by slug (myblog.com/projects)
 - Both together (myblog.com/products)

I don't want to presume too much about your intentions.

--
SteveL
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jennifer Hodgdon :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ryan Boren wrote:

> // terms contains the actual
> categories/tags/terms/classifiers/whatevers.  It stores only ID, name,
> and slug.
> CREATE TABLE $wpdb->terms (
> term_id bigint(20) NOT NULL auto_increment,
> term_name varchar(55) NOT NULL default '',
> term_slug varchar(200) NOT NULL default '',
> PRIMARY KEY  (term_ID),
> KEY term_slug (term_slug)
> );

Can I suggest making the term_name field (or whatever it will be
called) either wider or a TEXT field? Multi-lingual plugins need more
space than 55 characters for category names and tags (they have to
store 2 or more versions of the text, plus some kind of tags
specifying languages). 200 characters is sufficient for most people
using multi-lingual plugins I think (see also
http://trac.wordpress.org/ticket/3729, reported by another user).

For comparison, link names are VARCHAR(255), and post titles are TEXT
(unlimited). I am not certain which is better VARCHAR or TEXT in terms
of database efficiency, in the case where (for most users) the actual
text is probably about 20-40 characters... will let the DB optimizers
weigh in on that one.

     Jennifer
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jamie Talbot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Ryan Boren wrote:

> On 4/16/07, Jamie Talbot <wphackers@...> wrote:
>> Basically the same, except that the term_relationships and
>> term_taxonomy tables are related by an
>> integer, instead of a tuple.  We're going to have to join the two
>> tables to get taxonomy data
>> anyway, and nothing beats joining on an int.  We also only have to
>> store the taxonomy field strings
>> once.  Net result of the same amount of fields, but a bit cleaner.
>> Thoughts?
>
> I had it like that in an earlier iteration, but how do plugins avoid
> battling for new taxonomy ints?  A string provides less chance of
> collision.  An int would be better for a join, however, so maybe we
> should go ahead and add a taxonomy table that will act as a broker for
> plugins to get a taxonomy id.
>

You're right that it might be a bit trickier, especially as we can't bank on subqueries (yet).  But,
we can still minimise queries, using INSERT ... SELECT.

Say I want to add a post to a category, cities.  I already know the post_id, which is stored in
$post_id.  Internally, this might look like the following semi-pseudocode, though obviously we'd
break this up into functions:

// First, try to add the tag if it doesn't already exist.
// I'm assuming we'll have a unique key on slug, right?  This won't generate a duplicate.
// As of 4.1, we could use ON DUPLICATE KEY UPDATE, but that will have to wait.
REPLACE INTO $wpdb->terms (term_name, term_slug) VALUES ('City', 'city')

// Now add the term_taxonomy entry.
if ($count = $wpdb->query("SELECT tt.count FROM $wpdb->term_taxonomy AS tt INNER JOIN $wpdb->terms
AS t ON tt.term_id = t.term_id WHERE tt.taxonomy = 'post_category' AND t.term_slug = 'city')) {
        // We'll be updating whatever's there, and incrementing the count.  Not a huge fan of
        // storing counts, by the way, but accept that there might be valid optimisation reasons for //
doing so.
        UPDATE $wpdb->term_taxonomy SET desciption = 'Cities visited', count = $count + 1
} else {
        // There was no existing entry for this taxonomy type, so add a new one.  Parent kept as 0 // for
now.
        INSERT INTO $wpdb->term_taxonomy SELECT term_id, 'post_category', 'Cities I\'ve visited', 0, 1 FROM
$wpdb->terms WHERE term_slug = 'city'
}

// Finally, associate the object with the new taxonomy term.  This query should always find
// the term_taxonomy_id correctly as we've just inserted or updated it.
INSERT INTO $wpdb->term_relationships (object_id, term_taxonomy_id) SELECT $post_id,
tt.term_taxonomy_id FROM $wpdb->term_taxonomy AS tt INNER JOIN $wpdb->terms AS t ON (t.term_id =
tt.term_id AND tt.taxonomy = 'post_category' AND t.term_slug = 'city')

You could of course rework this to add an extra query to determine the recently inserted term_id and
remove the need for the INSERT ... SELECTS, - not sure which would profile faster.  With correctly
applied indices and field sizes that match, I suspect it may be faster as above, even with many
terms.  I'm pretty sure that either of those approaches would be faster than a string join though.
At the end of the day, with 3 tables you're always going to need to do 3 or 4 queries.  But seeing
as this happens on the admin side, and we aren't overly worried about the performance of the admin
side, that shouldn't be a problem.

Cheers,

Jamie.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGJALLrovxfShShFARAtyRAJ4872/WgADBHA+EQ9WoLQOt0vkyzQCeJ+Zo
kED49DvXV4SaBSJZVoZ7Qgc=
=VvDi
-----END PGP SIGNATURE-----
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jamie Talbot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Lewis wrote:
> > On 4/16/07, Ryan Boren <ryan@...> wrote:
> > I would like to ask you to expound on three use cases:
> > - Aliases  (for instance: products -> projects)
> > - Lookup by slug (myblog.com/projects)
> > - Both together (myblog.com/products)
> >
> > I don't want to presume too much about your intentions.

In the trac ticket, it was initially thought that synonyms could be a feature added by a plugin.
Regardless of whether that ends up being the case or not, one standard way of representing groups
like this is to do a self join.  This might mean adding a column to the terms table called
'term_group' (which would also allow for synonyms of categories, which could be mighty handy for
multilingual plugins).

CREATE TABLE $wpdb->terms (
 term_id bigint(20) NOT NULL auto_increment,
 term_name varchar(55) NOT NULL default '',
 term_slug varchar(200) NOT NULL default '',
 term_group bigint(10) NOT NULL default 0
 PRIMARY KEY  (term_ID),
 KEY term_slug (term_slug)
);

1 Products products 1
2 Cities cities 0
3 Projects projects 1


// Returns all synonyms of 'products' (including 'products').
SELECT t2.* FROM $wpdb->terms AS t1 INNER JOIN $wpdb->terms AS t2 ON t1.term_group = t2.term_group
WHERE t1.term_group != 0 AND t1.slug = 'products'.

I also don't presume that this way will be chosen, but it is a possible solution.  I have
implemented code in a manner similar to this to store the relationships between post translations -
each post is in a translation group, from which you can find out all the other translations.  Since
I got the initial syntax worked out, it's performed pretty well.

I'm not certain it is a core feature either - do other tagging systems employ aliases?

Lookup by slug is easy.  Lookup by both is still easy, but a bit more inefficient as you'd have to
do an IN(term_ids) based on all the returned tags from the above query.  The alternative would be to
have a master slave tagging system, which sounds suspiciously like category parents and don't really
fit with the whole fast and loose tag thing.

Cheers,

Jamie.

- --
http://jamietalbot.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGJAkVrovxfShShFARAsgJAJwOr90zgpDFjfEok76wRc5Lj+6jKgCeI+mU
P5JJeQt5RyzgGbvFm4GBy+E=
=Jg3Q
-----END PGP SIGNATURE-----
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Parent Message unknown Re: Taxonomy Schema Proposal

by ozgreg-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello Everyone :-)

Overall I do agree with Jamie and the KISS principle is always key. In G2 (not that it's a reference or anything), we are doing the tags much like what Jamie explained and it's proven VERY performing so far.

We have 1 table that had the tags (tagID & tagName) and another table that does the mapping between the tags and then item being tagged.

It seems simplistic at first but overall it makes the all thing perform and I can tell you how much it's critical that things scale in todays world .... everybody is after fetting more hist, more post, more tags , well more .. so it's not efficient, people tends to look at other solutions/alternatives.

So teh design originally proposed seem to have a couple things to change (from my point of view :-)):

- I would not repeat the taxonomy field
- I would only use 2 tables (not three)
- I would try not to keep the non-tags with the tags.

Is it not more efficient in terms of coding to have a table that sole purpose is to hold tags ? this way a simple SELECT * gets all you need and only that :-)

Anyways, I just wanted to add a couple thoughs to the mix, I hope they help a bit :-)

Termitenshort




-------------------- m2f --------------------

Gallery2 Embedded Forums (http://www.galleryembedded.com/forums/)
-------------------- m2f --------------------


_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Knut-Olav Hoven :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Monday 16 April 2007 09:19:56 Ryan Boren wrote:
> Putting aside the rest of the debate, let's talk schema.  Here's a
> proposed three table solution where we have terms (tags/categories),
> taxonomies (link categories, post categories, post tags), and objects
> (posts/pages and links).

The schema you propose could make it easier for plugin developers.

It would indeed require a lot of rewrites in the core to make it support this
schema. Maybe it should be implemented for use by plugins only, at least to
begin with. A new API for plugins would not break current functionality.

In time we could port the (post|link) categories and tags to use this, if the
schema works as intended (for the plugin developers). We should reserve some
names to prevent breaking plugins when/if we deside on porting cats/tags.



--
Knut-Olav Hoven
Systemutvikler               mob: +47 986 71 700
Linpro AS                    http://www.linpro.no/
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Steve Lewis-6 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 4/16/07, Jamie Talbot <wphackers@...> wrote:
>
>
> In the trac ticket, it was initially thought that synonyms could be a
> feature added by a plugin.


At this point the API and upgrade patterns suggest that adding columns to
existing tables is a dangerous thing for plugins to attempt.  I wanted to at
least air that here.  It could be addressed a number of ways, but making
sure the topic is explored earlier then later seemed wise.

term_group bigint(10) NOT NULL default 0


I would suggest alias_term_id as a column name, but a self-join was my first
thought.  Using the word 'group' is very often a gateway to pain IME.  That
word is just too easily overloaded and vague, though the idea of using this
as a lever to achieve translation is clever I have to admit.  This solution
is distinct from the category type of hierarchy in that you only need to
manage a level of depth of 0 and 1.  That means when you add a new alias,
you only alias to root terms, not to aliases.  That simplifies the interface
and the data model tremendously.

Lookup by slug is easy.


Notice I slipped a trick into that one.  You don't know if 'products' is a
tag or a post category or ... based on the URL.  It isn't clear that we
should care.


> Lookup by both is still easy, but a bit more inefficient as you'd have to
> do an IN(term_ids) based on all the returned tags from the above query.


Couldn't you just self join, take the 'parent' term_id, and then lookup
objects based on that term_id?  No need for an IN clause if the 'parent'
term_id points at itself, as you suggested in your reply. Of course getting
that inserted id into the alias column is it's own challenge.  :)

--
SteveL
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Matt Mullenweg :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Ryan Boren wrote:
> "taxonomy" is a string that can be 'post-category', 'link-category',
> or 'post-tag'.  Plugins can add their own taxonomy types.

How are URLs handled in this system?

Is the mapping of object_id to either post_id or link_id based on the
taxonomy type?

--
Matt Mullenweg
  http://photomatt.net | http://wordpress.org
http://automattic.com | http://akismet.com
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal

by Jamie Talbot :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Steve Lewis wrote:
> On 4/16/07, Jamie Talbot <wphackers@...> wrote:
>>
>>
> At this point the API and upgrade patterns suggest that adding columns to
> existing tables is a dangerous thing for plugins to attempt.  I wanted
> to at
> least air that here.  It could be addressed a number of ways, but making
> sure the topic is explored earlier then later seemed wise.

Sure.  I think this column might actually be a valuable addition to the core myself, but will leave
that decision to those with authority.

> term_group bigint(10) NOT NULL default 0
>
> I would suggest alias_term_id as a column name, but a self-join was my
> first thought.  Using the word 'group' is very often a gateway to pain IME.  That
> word is just too easily overloaded and vague, though the idea of using this
> as a lever to achieve translation is clever I have to admit.  This solution
> is distinct from the category type of hierarchy in that you only need to
> manage a level of depth of 0 and 1.  That means when you add a new alias,
> you only alias to root terms, not to aliases.  That simplifies the
> interface and the data model tremendously.

I think perhaps my original example was a little unclear.  The alias_term_id is not intended to be a
reference to a term_id, it is its own value that only makes sense in that column.  I would normally
agree with you about 'group', In this context, I would say that alias group is conceptually clearer.
 For example

1 Cities cities 0
2 Products products 1
3 Projects projects 1

The '1' in the third column signifies that they are in the same 'alias group' - it doesn't relate to
the row with a term_id of '1'.  The general pseudocode for adding an alias is:

Is the term I want to make an alias of already in an 'alias group'?
Yes:
        Add my new term with that same 'alias group'.
No:
        Select the next available 'alias group' (MAX() + 1 maybe)
        Update the existing term with the new 'alias group'
        Add my new term with that same 'alias group'.

Too many inverted commas but you get the idea.  There's no depth level at all - you're in an alias
group, or you're not.  Maintaining a 1-level hierarchy implies that the first term is the 'master'.
 What happens if you delete it?  In this scheme, it doesn't matter.  That's my take on it anyway.
You could make the column NULL by default, instead of 0, which simplifies certain operations, but is
slower (NOT NULL is virtually always faster).

>> Lookup by slug is easy.
>
> Notice I slipped a trick into that one.  You don't know if 'products' is a
> tag or a post category or ... based on the URL.  It isn't clear that we
> should care.

That's true, if you ignore the fact that category URLs will be prepended by category/, and tag URLs
by tag/.  Given that those bases exist though, I think we should be safe.

>> Lookup by both is still easy, but a bit more inefficient as you'd have to
>> do an IN(term_ids) based on all the returned tags from the above query.
>  
> Couldn't you just self join, take the 'parent' term_id, and then lookup
> objects based on that term_id?  No need for an IN clause if the 'parent'
> term_id points at itself, as you suggested in your reply. Of course getting
> that inserted id into the alias column is it's own challenge.  :)

There isn't a parent term_id (see above).  You'd do your self join to get all ids of the aliases of
the specified term, then use that in an IN() clause to filter only the posts/links that you wanted
from the term_references table.

Not many people have commented on this thread, despite the large number of people who were vocally
opposed to the existing schema.  Does this mean people approve of this new, suggested direction, or
are there any better solutions out there?

Cheers,

Jamie.

- --
http://jamietalbot.com
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGJhLnrovxfShShFARAmqEAJ4g1g7s1cVytkHbpzqDJiavj7EQPACfV6lt
lYyrWsKErMOdFIZIzosoay0=
=W+GW
-----END PGP SIGNATURE-----
_______________________________________________
wp-hackers mailing list
wp-hackers@...
http://lists.automattic.com/mailman/listinfo/wp-hackers

Re: Taxonomy Schema Proposal