Tagging of messages and/or threads

View: New views
5 Messages — Rating Filter:   Alert me  

Tagging of messages and/or threads

by Brian Thomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I've just been made aware that you are hosting archives of the xwiki-users and xwiki-dev mailing lists.  I like what you've done a lot, particularly with the keyword-stemming feature which makes searching far more powerful.

I don't recall exactly, but thought I had heard that Nabble supports user tagging of posts.  That sounded pretty fabulous - sort of like subscribing to the mailing list with a shared Google account, or tagging with a dedicated shared del.icio.us account, it adds the human cognitive association that generally escapes most search engines, even those using latent-semantic indexing with less than monstrous databases.

Alas, I find no mention of this feature here.  Have I missed it, or was I misinformed?

Re: Tagging of messages and/or threads

by fschmidt :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brian Thomas wrote:
I like what you've done a lot, particularly with the keyword-stemming feature which makes searching far more powerful.
We have this thanks to Lucene.

I don't recall exactly, but thought I had heard that Nabble supports user tagging of posts.  That sounded pretty fabulous - sort of like subscribing to the mailing list with a shared Google account, or tagging with a dedicated shared del.icio.us account, it adds the human cognitive association that generally escapes most search engines, even those using latent-semantic indexing with less than monstrous databases.

Alas, I find no mention of this feature here.  Have I missed it, or was I misinformed?
I don't think we have this, but I would like to understand the idea better.  There are already a lot of tagging services out there.  If we implemented tagging, what could we offer beyond what the generic tagging services offer?


Franklin Schmidt
Nabble.com

Re: Tagging of messages and/or threads

by Brian Thomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I'm afraid I have to apologize; to tell you the truth, I haven't thought it through deeply myself...

However, I think that the main benefit, if there is one, over a general tagging service is in the location of the tag store itself - or rather its specificity, since when I use Nabble's search I'm looking strictly in the mailing list[s] of interest.

Of course, tagging services have two kinds of value, the synergy of which is the reason (I believe) for their success: value for the individual tagger, to keep his own list of bookmarks, indexed by his own conceptual or semantic associations; and value for searchers, because the aggregate of all those individual tags is a powerful semantic index of documents they haven't necessarily seen themselves. The service doesn't have to pay for their product, because their users provide it; and advertisers will part with bucks to be seen in highly-targeted contexts.

So the general tagging service tends to have the greater value for the individual tagger, who can have all his tags in one place, whereas the narrow service is more valuable to the searcher, who doesn't have to go to extra effort to narrow his search to results from one site.

The question, then, is: how can I have my cake and eat it too? Or what would that mean? It's in any tag contributor's interest to contribute tags to a more general service; it's in the Nabble-specific searcher's interest to have a store of Nabble-specific tagged links.

I suppose that the best idea would be to allow (rather, encourage, since you can hardly stop them) tag contributors to tag using the tagging service of their choice, and somehow to facilitate the aggregation of Nabble-search-specific results from all the tagbases, since people will tag Nabble pages in their own preferred locations and their integration with the Nabble search.

Technically speaking, this is a piece of cake; I don't know how the tagging services would like being aggregated, though. It's not absolutely necessary to aggregate them through your service, but would be more useful to be able to have Lucene integrate the results, to eliminate duplicates, assign weighting, and present them all together.

Still, there is a possible partial solution that we could actually implement in our own custom skins - lots of sites already offer tagging links to all of the popular services on their pages - but to present queries to them along with the Nabble search is a major bit of AJAXian work, not to mention that merging the results with Nabble's in the browser is much harder still.

I hope that I have satisfied your desire for better understanding of the idea; I know I've satisfied a similar desire that I didn't even know I had...:)

So according to my now enriched understanding of the problem, Nabble wouldn't need or even want its own tagbase, but Nabble's searchability could benefit greatly if its engine were able to search not only for messages that actually contain the search terms (or grammatical variations thereof), but that others have associated with them.

Further, the greater the number of tagged links that can be searched, the better the applicability metric can be for ranking results, and large tagbases are worthwhile even apart from the actual links if semantic associations can be inferred among different tags, i.e.: if a significant percentage of entries containing tag A also contain tag B, regardless of the relevance of their links to a given search query, then a semantic association between A and B may be inferred whose strength is some function of that percentage. Likewise, if tag C has a high rate of coincidence with both A and B, it has a small but significant effect on the strength of the association of A and B. In any case, the degree of confidence in that association is a function of the number of links available.

As a result of this, in case it's not obvious, items that may be relevant to the user's search would be returned even if they didn't contain any form of the actual search terms. "Latent-semantic" indexing, which I mentioned before, uses an inferred-association technique such as I've detailed above, but instead of relying on users putting their own associations on documents, it works by examining gargantuan quantities of documents and mechanically building massive indices on every word, calculating degrees of association between words found in the documents based on proximity, grammatical equivalence, and other such things, ranking documents by an aggregate of the strength of association of their words to the search terms and to other words with which those words are strongly associated. The technique is space- and processor-intensive, but tagging gives users large incentives to perform the same job in a labor-intensive way by giving them an easy way to do something that's very valuable to them.


Re: Tagging of messages and/or threads

by tomi :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Brian Thomas wrote:
I suppose that the best idea would be to allow (rather, encourage, since you can hardly stop them) tag contributors to tag using the tagging service of their choice, and somehow to facilitate the aggregation of Nabble-search-specific results from all the tagbases, since people will tag Nabble pages in their own preferred locations and their integration with the Nabble search.
How do we find out, or get notified when a Nabble post has been tagged on some tagging service? If we could do that, we could add the tag keywords as a tag: field for that post in our lucene index, and this field will automatically be searched in the regular Nabble search (or the user could specify to search only tags).

I have never used a tagging site so I am not very familiar how it works, but if I understand this correctly, users do tagging by using some plugin they have installed in their browser, or just going to the tagging site and entering the tag info there. We could provide them with a "save this page" link, but still wouldn't know what tags they entered after following the link to del.icio.us.


Re: Tagging of messages and/or threads

by Brian Thomas :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

tomi wrote:
How do we find out, or get notified when a Nabble post has been tagged on some tagging service? If we could do that, we could add the tag keywords as a tag: field for that post in our lucene index, and this field will automatically be searched in the regular Nabble search (or the user could specify to search only tags).

I have never used a tagging site so I am not very familiar how it works, but if I understand this correctly, users do tagging by using some plugin they have installed in their browser, or just going to the tagging site and entering the tag info there. We could provide them with a "save this page" link, but still wouldn't know what tags they entered after following the link to del.icio.us.
Never used a tagging site?  O my...  Well, pop round to http://del.icio.us posthaste, and see what you've been missing!  In brief, though, if you haven't read the portion of this thread before Frank brought you in, I have a lengthy bit of thinking-out-loud which probably explains it as well as I've ever managed to

As to how to be notified, I'd say that direct notification of when or where someone posted a link would not be necessary, because independent knowledge of posts is easy to come by.  As you say, most taggers use bookmarklets or browser extensions to drop their tags, so hijacking that flow would be very difficult, and the only reason I don't say impossible is that I haven't seriously considered how it could be done.  Fortunately, it isn't necessary to control or eavesdrop on their tagging activity, since it's public.

Tailored RSS feeds are a pretty standard feature of tagging sites; if you could tailor feeds to return only links to your site, then after an initial query to each tagging site you could poll relatively infrequently for those feeds to keep Lucene's indices fresh.

The most critical thing you need to do is to be sure that you capture the greatest practicable amount of tagging activity by knowing about as many of the users' tagstores as you can, so you know where to look.  Since it's in their interest, and doesn't require divulging any sensitive information, it shouldn't be hard to get the users to volunteer the information, by means of a questionnaire or some such thing.  All you need to learn is the address of the site, so you can go there and figure out how to set up a feed.

A good way to encourage that kind of activity might be to offer a form with a multiple-selection list of all the tagging sites to which you know how to post, so that if desired they can use it to post to multiple sites.  There are Firefox extensions that do just that, but not everyone has them, and some explanatory text can inform the underclued of what's available to them.  Also, if the tagger wants to use a site that you don't offer, give him a place to fill it in, so that you can research how to connect to it; in fact, you can even let them tell you how to do it, and many probably would; as I say, it's often plainly in their interest to do so.

I need to read up more on Lucene anyway because I'm planning to activate XWiki's Lucene plugin in some of our sites, but a question remains in my mind about how the weighting is assigned and/or calculated.

I think that your approach is sound, and if Lucene allows semantic constraints on search values (i.e.: return a match on field:key if and only if some portion of a document known as "field" matches the value of "key") as you seem to imply, that's a big plus, which means that I should put a higher priority on it.

Thank you for your time; this looks pretty interesting.
LightInTheBox - Buy quality products at wholesale price!