It got through the first time :-)
Maurice Nicholson wrote:
I think we are using a different definition of term frequency. To me
it's the number of occurrences of the *term*. However, the termFreqs
method is returning the number of *documents* (instances of domain
classes, in Grails) where the term occurs, disregarding the occurrences
of the term itself.
Let me simplify my previous example. Let's imagine there's a single
instance of Paragraph in our DB/index:
p= new Paragraph(text: "Hello John, my name is John and this is my
friend John")
p.save()
Now, according to my definition, the frequency of the term "John" is 3
According to Paragraph.termFreqs(), it's 1 (because there's only one
domain object where the term John appears, disregarding the fact that it
appears 3 times)
Of course, when searching, a Paragraph object/document where the term
"John" appears three times will rank higher than a Paragraph where it
appears only once. So, of course, this information is stored somewhere
in the index. (Last night I spent some time working with Luke which is
amazing and fun :-) )
I'm not in immediate need of this feature, I'm just being picky while I
learn :-)
I'll dig around Compass and Lucene a little more and see what I can find.
On a completely unrelated, but more important note:
How would you describe the query performance of Compass/Lucene versus
searching in the relational database using normal GORM/HSQL?
BarZ
> Anyway, I think the feature you describe makes sense, but it can be
> achieved now, if not especially optimised, by simply hunting for the
> term in the term-freqs, eg:
>
> Book.termFreqs.find { it.term == 'marmalade' }.freqs
>
> The information exists in the index, so it could be exposed in a
> simpler fashion, but is it required? Term-freqs are an advanced topic
> (IMHO) and I wonder how many people will use this feature?
>
> The other point is that the term frequency currently provided is the
> frequency of a term over the whole index, not just a single Book
> instance! Again the information is in the index on a per Lucene
> document (Book instance) basis, it's just a question of exposing it.
>
> As you said these are features that make sense in Compass itself so I
> think they are questions for the Compass forum.
>
> Cheers,
> Maurice
>
> On 21/04/2008, *Barzilai Spinak* <
barcho@...
> <mailto:
barcho@...>> wrote:
>
> Hi Maurice.
> I'm glad to tell you that the two main bugs I had with previous
> versions, now seem to be fixed! (the NPE when cascade-saving, and some
> other errors with component references).
>
> The other thing was about termFreq, which I thought it had a bug.
> Maybe
> it's not a bug after all, but some misunderstanding on my part, or
> it's
> not clearly explained in the docs, or it's a bug :-)
>
> Let me explain.
>
> When calling SomeClass.termFreqs('someTerm'), I thought the resulting
> number would be "the number of occurrences of someTerm within
> instances
> SomeClass".
>
> For example, if I had:
> (new Album(title:'yeah yeah yeah')).save()
> (new Album(title:'Just say yeah')).save()
>
> and then I query: Album.termFreqs('yeah'), I would get a count of 4
> However, what I seem to be getting is "the number of Album instances
> that have the term 'yeah' in any of their indexable properties".
>
> So... maybe this is the intended behaviour... maybe not... in any
> case,
> I think that 1) it should be more explicitly explained, 2) a
> *real* term
> frequencies, with respect to terms should be added.
> Like for example (completely made up example), if I had a Book class,
> which hasMany Paragraph, and 'm storing the text in the Paragraph. And
> I'm doing some text analysis, wanting to know how many times a certain
> term appears in the Book. I don't want the count of paragraphs that
> contain that word, I want the actual number of occurrences of that
> word.
>
> Thinking a little more, maybe this behaviour is a "feature" of
> Compass?
>
>
> BarZ
>
>
> ---------------------------------------------------------------------
> To unsubscribe from this list, please visit:
>
>
http://xircles.codehaus.org/manage_email>
>
>
---------------------------------------------------------------------
To unsubscribe from this list, please visit:
http://xircles.codehaus.org/manage_email