« Return to Thread: unicode 5.1 again

Re: unicode 5.1 again

by Wolfgang Meier-2 :: Rate this Message:

Reply to Author | View in Thread

Hi David,

just a quick answer before I need to leave: the standard XPath
contains() function will *not* use SimpleTokenizer. It just compares
strings based on their codepoints. Contrary to that, the full-text index
tokenizes the string and uses SimpleTokenizer for that.

This probably means that your modification to SimpleTokenizer did never
work and you have to check the class. You could try to use
text:index-terms to print out the actual index contents (I'd like to
show an example, but I don't have time - searching the mail archive
should provide some help though).

Wolfgang


> A few weeks ago I posted an inquiry about getting eXist to index new
> Unicode 5.1 characters properly. Since eXist relies on ischar() and the
> versions of Java running on our machines have only a Unicode 5.0 sense
> of what is and isn't a character, eXist was failing to treat the new 5.1
> characters like the alphabetic characters they are. With help from the
> eXist developers, I patched SimpleTokenizer.java so that searches using
> the XPath contains() function and range indexes would work as I needed
> them to.

-------------------------------------------------------------------------
This SF.net email is sponsored by the 2008 JavaOne(SM) Conference
Don't miss this year's exciting event. There's still time to save $100.
Use priority code J8TL2D2.
http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
_______________________________________________
Exist-open mailing list
Exist-open@...
https://lists.sourceforge.net/lists/listinfo/exist-open

 « Return to Thread: unicode 5.1 again

LightInTheBox - Buy quality products at wholesale price