|
View:
New views
4 Messages
—
Rating Filter:
Alert me
|
|
|
The solution to search related problems: operators, quotes, phrase, chinese.hi Acarboni, Ticheler, GN developers: Did you encounter the problems search related on web in Advanced Search? I did. 1. Problem. These are the search related problems I found: 1) operators: the operators( and, or, not ) can not take any effect. 2) quotes: also can not take any effect. 3) the phrase query: must use quotes, but quotes,..... 4) the character query in Asian Language like chinese: can not get the exact result, GN found the metadata which includes each character in query, not the query phrase. the effect is like: "any more", and Geonetwork found "any" and "more". 2. WHY? ok, why? what are the reasons? I The analyzer is the main reason for the problems. In the java class file of services.main.Search, I saw that the query sentence will be send to MainUtil.splitWord function to split the word, like below. if (any != null) any.setText(MainUtil.splitWord(any.getText())); Take a look at the splitWord function, it used StandardAnalyzer. public static String splitWord(String requestStr) { Analyzer a = new StandardAnalyzer(); ..... } We know, the StandardAnalyzer will filter some strings like "and", "or", "not", "as"..., and it also filter the quotes ("), so the return of this function will ignore the operator and quotes. As default operator "and", the GN will use "and" to query in Lucene. So, the problems become. 3.Solution. How to resolve that? Just do not use the StandardAnalyzer? No, we need it to analyze the query sentence, for example, the phrase in the quotes. So we must find the quotes before analyze, and send the phrase between quotes to analyzer. My solution can let the quotes, operators, phrase take effect, it can resolve the problem, implement the search function and Chinese involved. Below is my solution, if (any != null) { any.setText( splitWord(any.getText()) ); } Use the splitWord to replace the MainUtil.splitWord, and MainUtil.splitWord will be used in splitWord. Below is the splitWord function in Search.java //code from here, these code will be in .service.main.Search.java file private static final String OPER_AND = " and "; private static final String OPER_OR = " or "; private static final String OPER_NOT = " not "; private String splitWord( String strValue ) { //basic process string: trim, multi whitespace changed to one. String strQuoteSg = "\'"; String strQuoteDb = "\""; //single quote to double quote mark strValue = strValue.replaceAll( strQuoteSg, strQuoteDb); //trim strValue = strValue.trim(); //union the continued whitespace to one single strValue = strValue.replaceAll("\\s\\s+", " "); //toLowerCase, the search is not case sensitive strValue = strValue.toLowerCase(); if( strValue.length()>0 ) { int nFirstIndex = strValue.indexOf(strQuoteDb); if( nFirstIndex<0 ) { //no quotes, must use the operator and, or, not to supple the quotes strValue = replaceComponent( strValue ); } return splitString( strValue ); } else return strValue; } // " " --> " and " private String replaceComponent( String strValue ) { String strQuoteDb = "\""; String strWhitespace = " "; //add quotes to head and tail strValue = strQuoteDb +strValue+ strQuoteDb; //find the whitespace index int nIndex = strValue.indexOf( strWhitespace ); if( nIndex<0 ) return strValue; else { //and ,or ,not strValue = checkKeyword( strValue ); //if not inclucde, just use add as default. if( strValue.contains( OPER_AND ) || strValue.contains( OPER_OR ) || strValue.contains( OPER_NOT )) { return strValue; } else { return strValue.replace( strWhitespace, strQuoteDb+ strWhitespace+"and"+strWhitespace+strQuoteDb); } } } private String checkKeyword(String strValue) { strValue = checkKeywordComponent( strValue, OPER_AND ); strValue = checkKeywordComponent( strValue, OPER_OR ); strValue = checkKeywordComponent( strValue, OPER_NOT ); return strValue; } //add quotes to the head and tail of the string //the strValue and keyword must be lowercase private String checkKeywordComponent(String strValue, String keyword) { StringBuffer sb = new StringBuffer(); sb.append( strValue ); int nIndex = sb.indexOf( keyword ); int offset = keyword.length(); String strQuoteDb = "\""; while( nIndex >=0 ) { //check the quote if( !sb.substring( nIndex-1, nIndex).equals( strQuoteDb )) { sb.insert( nIndex, strQuoteDb ); offset++; } if( !sb.substring( nIndex+offset, nIndex+offset+1).equals( strQuoteDb )) { sb.insert( nIndex+offset, strQuoteDb ); } nIndex = sb.indexOf(keyword, nIndex+2 ); offset = keyword.length(); } return sb.toString(); } private String splitString(String strValue) { //clear the whitespace of head and tail strValue = strValue.trim(); //continued whitespace to one strValue = strValue.replaceAll("\\s\\s+", " "); //add quotes for operator: and ,or ,not strValue = checkKeyword( strValue ); String strQuoteDb = "\""; StringBuffer sb = new StringBuffer(); int nStartIndex = 0; int nFirstIndex = strValue.indexOf( strQuoteDb ); while( nFirstIndex>=0 ) { sb.append( strValue.substring( nStartIndex, nFirstIndex+1 ) ); int nSecondQuote = strValue.indexOf(strQuoteDb, nFirstIndex+1 ); nStartIndex = (nFirstIndex<strValue.length()-1)? nFirstIndex+1 : strValue.length(); if( nSecondQuote<0 ) //the last quote not exist { String strLast = strValue.substring(nStartIndex, strValue.length() ); strLast = MainUtil.splitWord( strLast ); sb.append( strLast ); sb.append( strQuoteDb ); nStartIndex = strValue.length()-1; break; } else { String strLast = strValue.substring( nStartIndex, nSecondQuote ); strLast = MainUtil.splitWord( strLast ); sb.append( strLast ); sb.append( strQuoteDb ); nStartIndex = nSecondQuote+1; } //find the third " nFirstIndex = strValue.indexOf( strQuoteDb, nStartIndex ); } if( nStartIndex+1 < strValue.length() ) { sb.append( strValue.substring(nStartIndex+1)); } return sb.toString(); } You can have a test. 4. COMMIT? who can commit this to GN source? Or how can i commit this ?
雅虎邮箱,您的终生邮箱! ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@... https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork |
|
|
Re: The solution to search related problems: operators, quotes, phrase, chinese.hi Zhuhua,
as for your problems 1 - 3, you're right, even though the documentation states that "or", "not" and "phrase" operators in queries are supported, they are not. If we'd like these operators in that way (i.e. operating in a query from a single search field), we should use a Lucene QueryParser, and we don't. However some time ago I alternatively implemented "or", "without" and "phrase" queries by adding extra search fields for each of these -- kind of like Google's advanced search page where you also have separate input fields for this. These fields are normally "hidden" (invisible) in the advanced search section; if you set their "display" property to either "inline" or "block", you should be able to use them straight away. As for your 4th problem, I'm not entirely sure what you mean.. Kind regards Heikki Doeleman On Tue, Sep 9, 2008 at 6:19 AM, zhuhua zha <zhuhuazha2004@...> wrote:
------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@... https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork |
|
|
Re: The solution to search related problems: operators, quotes, phrase, chinese.the 4th problem is chinese related.
In english one word is a word or term, like "helllo body", it is two words. when we use StandardAnalyzer, like MainUtil.splitWord, the result will be " hello body ". "hello" still "hello", "body" still "body". But in chinese "XX YY", "X" "Y" stand for one character, the result will be " X X Y Y ", so "XX" result in " X X ". but we need the phrase "XX" indeed, so we need the phrase query in lucene. > hi Zhuhua, > > as for your problems 1 - 3, you're right, even > though the documentation > states that "or", "not" and "phrase" operators in > queries are supported, > they are not. If we'd like these operators in that > way (i.e. operating in a > query from a single search field), we should use a > Lucene QueryParser, and > we don't. > > However some time ago I alternatively implemented > "or", "without" and > "phrase" queries by adding extra search fields for > each of these -- kind of > like Google's advanced search page where you also > have separate input fields > for this. These fields are normally "hidden" > (invisible) in the advanced > search section; if you set their "display" property > to either "inline" or > "block", you should be able to use them straight > away. > > As for your 4th problem, I'm not entirely sure what > you mean.. > > Kind regards > Heikki Doeleman > > > ___________________________________________________________ 雅虎邮箱,您的终生邮箱! http://cn.mail.yahoo.com/ ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@... https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork |
|
|
Re: The solution to search related problems: operators, quotes, phrase, chinese.Okay,
if this is the case I think we maybe better start using a Lucene QueryParser, which I think probably should handle multi-lingual cases better than that MainUtil.splitWord thing that we have, which is somewhat unfortunate anyway. Do you, or anyone on this list have any experience in using Lucene QueryParsers with languages in non-western writing like Chinese ? Certainly this issue must have come up with users of the very popular Lucene search library... let's look for a solution that's already there, is my opinion. Did my answer to your problems 1-3 address those issues, or is everything blocked by your problem 4 ? Are there no other implementations of GeoNetwork in Chinese ? And if anyone has one, how do you solve this problem ? Kind regards Heikki Doeleman On Tue, Sep 9, 2008 at 5:30 PM, zhuhua zha <zhuhuazha2004@...> wrote: the 4th problem is chinese related. ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ GeoNetwork-devel mailing list GeoNetwork-devel@... https://lists.sourceforge.net/lists/listinfo/geonetwork-devel GeoNetwork OpenSource is maintained at http://sourceforge.net/projects/geonetwork |
| Free Forum Powered by Nabble | Forum Help |