|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
wizard for search in LuceneHi,
I want to make a wizard that can help to find n-grams terms. For example: If i want to search History, after write it the system propose you the following searches: history europe history spain history ..... Consulting the terms indexed. Does it exits in Lucene? thank you, Albert |
|
|
Re: wizard for search in LuceneAlbert Juhe:
> > Hi, > > I want to make a wizard that can help to find n-grams terms. > For example: > If i want to search History, after write it the system propose you the > following searches: > history europe > history spain > history ..... > Consulting the terms indexed. > > Does it exits in Lucene? Hi. I interpret your question in such a way that you want autocompletion in your search system? In that case, I believe there are some Analyzer's which does this in the 'contrib' package. Also, I've created an Analyzer which creates "bigrams" (n-gram of size 2) in my master thesis. Feel free to download it from this page: http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/ Also, have a look at the package org.apache.lucene.analysis.ngram: http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram/package-summary.html -- Asbjørn A. Fellinghaug asbjorn@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: wizard for search in Lucene From what I can understand, you want to insert the word "history" and then
get proposed "related" terms in combination with your input query. In essense this would be to do a "look-up" on top-terms in the subset of documents matching the initial query "history". Exactly how you could do this is a bit uncertain from my knowledge, but I suggest you read up on term-frequency and the tf-idf scheme. Also: take a look at the org.apache.lucene.search.similar package: http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/package-summary.html and read the motivation email listed in the first segment of http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/similar/MoreLikeThis.html I couldn't really see how you would autocomplete after the word history without listing a bunch of un-interesting terms as suggestions... But i might be wrong... Of course, if it was autocompletion you were looking for¸ Asbjørn answered that one just fine:) Best regards, Aleksander M. Stensby On Thu, 09 Oct 2008 18:49:26 +0200, Asbjørn A. Fellinghaug <asbjorn@...> wrote: > Albert Juhe: >> >> Hi, >> >> I want to make a wizard that can help to find n-grams terms. >> For example: >> If i want to search History, after write it the system propose you the >> following searches: >> history europe >> history spain >> history ..... >> Consulting the terms indexed. >> >> Does it exits in Lucene? > > Hi. > > I interpret your question in such a way that you want autocompletion in > your search system? In that case, I believe there are some Analyzer's > which does this in the 'contrib' package. Also, I've created an Analyzer > which creates "bigrams" (n-gram of size 2) in my master thesis. > Feel free to download it from this page: > http://asbjorn.fellinghaug.com/blog/2008/08/the-code-for-my-master-thesis/ > > Also, have a look at the package org.apache.lucene.analysis.ngram: > http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/analysis/ngram/package-summary.html > -- Aleksander M. Stensby Senior Software Developer Integrasco A/S +47 41 22 82 72 aleksander.stensby@... --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@... For additional commands, e-mail: java-user-help@... |
|
|
Re: wizard for search in LuceneHi,
This is my first version, it isn't fast, because I want to get this information without modifying index. Now I'm working to improve it (including freeling). public String docsTerme(IndexReader reader, String terme) { String resultat = ""; TermPositions tP; ArrayList alDocs = new ArrayList(); long start = new Date().getTime(); int veinsTrobats = 0; //neightbours find it //Where is the term try { tP = reader.termPositions(new Term("contingut", terme)); //Documents where the term is found. while (tP.next()) { infoTerme it = new infoTerme(terme, tP.doc(), tP.freq()); resultat += it.toString(); for (int i = 0; i < it.getFrequencia(); i++) { it.add(tP.nextPosition()); } alDocs.add(it); //we store: term, document id, positions resultat += "(" + it.toStringPosicions() + ")<br/>"; } } catch (IOException e) { System.out.println("Error trobant documents termes: " + e); return null; } //Terms in a document for (int i = 0; i < alDocs.size(); i++) { infoTerme iT = (infoTerme) alDocs.get(i); //We need term,id document and positions resultat += "<br/>" + iT.getId_document() + ":<br/>"; //Id document try { TermFreqVector[] tfv = reader.getTermFreqVectors(iT.getId_document()); //All the terms found in a document int j = 0; String[] llistatTermes = tfv[j].getTerms(); int paraulesAnalitzades = 0; veinsTrobats = 0; while (veinsTrobats < iT.getFrequencia() && paraulesAnalitzades < llistatTermes.length) { resultat += "," + llistatTermes[paraulesAnalitzades]; TermPositions termP = reader.termPositions(new Term("contingut", llistatTermes[paraulesAnalitzades]));//Documents on apareix el terme while (termP.next()) { if (termP.doc() == iT.getId_document()) { //The word it's found in the same id document, maybe neightbours boolean veins = false; int ind = 0; while (!veins && ind < termP.freq()) { int posicio = termP.nextPosition(); if (iT.sonVeins(posicio)) { veins = true; resultat += "<br/>" + veinsTrobats + "/" + iT.getFrequencia() + " They are neightbours (proximity 1):" + iT.getTerme() + " i " + llistatTermes[paraulesAnalitzades] + "(" + posicio + ")<br/>"; veinsTrobats++; } else { ind++; } } } } paraulesAnalitzades++; } } catch (IOException e) { System.out.println("Error I cant find terms: " + e); return null; } } long end = new Date().getTime(); resultat += "<br/>Time elapsed: " + (end - start) + "ms"; return resultat; } infoTerme.java thank you, Albert
|
| Free Forum Powered by Nabble | Forum Help |