Need help with DictionaryCompoundWordTokenFilterFactory

View: New views
10 Messages — Rating Filter:   Alert me  

Need help with DictionaryCompoundWordTokenFilterFactory

by Kraus, Ralf | pixelhouse GmbH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I am trying to solve the typical german "Donaudampfschiff"- problem by
using the DictionaryCompoundWordTokenFilter ...
Anyone can show me how to configure my schema.xml to use the
DictionaryCompoundWordTokenFilterFactory ???

Greets -Ralf-

RE: Need help with DictionaryCompoundWordTokenFilterFactory

by Steven A Rowe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ralf,

On 10/10/2008 at 10:57 AM, Kraus, Ralf | pixelhouse GmbH wrote:
> I am trying to solve the typical german "Donaudampfschiff"-
> problem by using the DictionaryCompoundWordTokenFilter ...
> Anyone can show me how to configure my schema.xml to use the
> DictionaryCompoundWordTokenFilterFactory ???

Minimally, add the following inside the <analyzer> section for your field type:

<filter class="solr.DictionaryCompoundWordTokenFilterFactory"
        dictFile="/path/to/your/dictionary" />

You can also add the following (optional) attributes:

  - "minWordSize" (default: 5)
  - "minSubwordSize" (default: 2)
  - "maxSubwordSize" (default: 15)
  - "onlyLongestMatch" (default: true)

FYI, the compound package summary in the nightly trunk Lucene contrib javadocs has some useful information:

<http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc/contrib-analyzers/org/apache/lucene/analysis/compound/package-summary.html>

Steve


Re: Need help with DictionaryCompoundWordTokenFilterFactory

by Kraus, Ralf | pixelhouse GmbH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thx a lot !

I downloaded a dictionary called "de_DR.xml" and put it into my "conf"
directory...
Then I changed my schema.xml to :

class="solr.DictionaryCompoundWordTokenFilterFactory"
dictFile="./conf/de_DR.xml"
minWordSize="5"
minSubwordSize="2"
maxSubwordSize="15"
onlyLongestMatch="true"

but solr can´t find the dictionary file :-(

SCHWERWIEGEND: Could not start SOLR. Check solr/home property
java.lang.RuntimeException: Error opening null
at
org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:191)
at
org.apache.solr.core.SolrResourceLoader.getLines(SolrResourceLoader.java:237)
at
org.apache.solr.core.SolrResourceLoader.getLines(SolrResourceLoader.java:213)
at
org.apache.solr.analysis.DictionaryCompoundWordTokenFilterFactory.inform(DictionaryCompoundWordTokenFilterFactory.java:49)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:322)

Any hints ?

Greets -Ralf-

RE: Need help with DictionaryCompoundWordTokenFilterFactory

by Steven A Rowe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ralf,

On 10/13/2008 at 5:45 AM, Kraus, Ralf | pixelhouse GmbH wrote:
> but solr can´t find the dictionary file :-(

Try using the name of the file without a path - I believe the conf/ directory is in the search path used by Solr when loading resources, i.e.:

   dictFile="de_DR.xml"

As an alternative, you should definitely be able to give an absolute path to the dictionary file.

Steve

RE: Need help with DictionaryCompoundWordTokenFilterFactory

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: Try using the name of the file without a path - I believe the conf/ directory is in the search path used by Solr when loading resources, i.e.:
:
:    dictFile="de_DR.xml"

according to the code the param name is "dictionary" not dictFile.

I'll add a better error message.

-Hoss


RE: Need help with DictionaryCompoundWordTokenFilterFactory

by hossman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


: :    dictFile="de_DR.xml"
:
: according to the code the param name is "dictionary" not dictFile.

PS: the dictionary file shouldn't be and XML file, it should look just
like a stopwords file (one word per line)


-Hoss


Re: Need help with DictionaryCompoundWordTokenFilterFactory

by Kraus, Ralf | pixelhouse GmbH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Chris Hostetter schrieb:

> : :    dictFile="de_DR.xml"
> :
> : according to the code the param name is "dictionary" not dictFile.
>
> PS: the dictionary file shouldn't be and XML file, it should look just
> like a stopwords file (one word per line)
>
>
> -Hoss
>
>  
thx !!!!!

It finally runs perfect !

Greets -Ralf-

RE: Need help with DictionaryCompoundWordTokenFilterFactory

by Steven A Rowe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Oops, variable-name != attribute-name.

Thanks Hoss.

Steve

On 10/14/2008 at 1:12 AM, Chris Hostetter wrote:

>
> > Try using the name of the file without a path - I believe
> the conf/ directory is in the search path used by Solr when
> loading resources, i.e.:
> >
> >    dictFile="de_DR.xml"
>
> according to the code the param name is "dictionary" not dictFile.
>
> I'll add a better error message.
>
> -Hoss
>
>

 


Re: Need help with DictionaryCompoundWordTokenFilterFactory

by Kraus, Ralf | pixelhouse GmbH :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Steven A Rowe schrieb:
> Oops, variable-name != attribute-name.
>
> Thanks Hoss.
>
> Steve
So ....

"dictFile" or "dictionary"  ???

Greets -Ralf-




RE: Need help with DictionaryCompoundWordTokenFilterFactory

by Steven A Rowe :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi Ralf,

On 10/14/2008 at 9:35 AM, Kraus, Ralf | pixelhouse GmbH wrote:
> Steven A Rowe schrieb:
> > Oops, variable-name != attribute-name.
> >
> > Thanks Hoss.
> >
> > Steve
> So ....
>
> "dictFile" or "dictionary"  ???

Sorry, didn't mean to muddy the water.

Hoss is correct.  I misread the source code.  "dictionary" is the correct attribute name, not "dictFile".

Steve
LightInTheBox - Buy quality products at wholesale price!