Re: re-ranking ....

2009-07-15 Thread KK
fetch all the search results along with their corresponding values for all the terms used for scoring and then you use those values and play-around with them and re-rank your results to your hearts content/wish. --kk On Wed, Jul 15, 2009 at 11:28 AM, henok sahilu wrote: > what i want to do

Re: Hindi, diacritics and search results

2009-07-13 Thread KK
languages mixed with english content. As you can see for english it applies the usual process of stemming/stop-word-removal etc. Try it out and do let us know if you face any issues. Thanks, KK. On Sat, Jul 11, 2009 at 8:05 AM, Robert Muir wrote: > there is really no default in lucene &g

Re: Using IN to retrieve data after lucene search.

2009-07-08 Thread KK
you might go thru this as well, http://www.lucidimagination.com/blog/2009/03/18/exploring-lucenes-indexing-code-part-2/ HTH, KK On Thu, Jul 9, 2009 at 10:52 AM, Aditya R wrote: > > Hi all, > > I am new to lucene. In my sample application I have used lucene to index my > 17 field d

Re: Problem in Running the Lucene web application demo

2009-07-01 Thread KK
no idea about the webapp demo app, but are you sure you have all the required files like the jar in the right place? On Sat, Jun 27, 2009 at 9:50 PM, mayank juneja wrote: > Hi > > I am a new user to Lucene. > > I tried running the Lucene web application demo provided with the source. I > am able

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-11 Thread KK
Thank you very much Yonik. I downloaded the latest Solr build, pulled the WordDelimiterFilter and used it with the same option as used by Solr default and it worked like a charm. Thanks to Robert also. Thanks, KK On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote: > I just cut'n

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-11 Thread KK
t unicoded word endings and behaving as expected. Any idea on this issue is welcome. Help me fix the issue. BTW, lucene ppl when is that basic worddelimiterfilter going to be added to Lucene as well? Any idea? Thanks, KK. On Tue, Jun 9, 2009 at 7:01 PM, Yonik Seeley wrote: > I just cut

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-09 Thread KK
t;साल"; int length = hindiStr.length(); System.out.println("str length " + length); for (int i=0; i wrote: > KK can you give me an example of some indian text for which it is doing > this? > > Thanks! > > On Mon, Jun 8, 2009 at 1:03 AM, KK wrote:

How to make wordDelimiterFilter[pulled from Solr nighly] to not break non-english words in a wrong way in lucene indexing/searching?

2009-06-08 Thread KK
Hi All, I'm trying to index some indian web page content which are basically a mix of indian and say 5% of english content in the same page itself. For all this I can not use standard or simple analyzer as they break the non-english words in a wrong places say[because the isLetter(ch) happens to be

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-07 Thread KK
rs therein. I hope I made it clear. What could be the reason for this? Any idea on fixing the same. Thanks, KK On Sat, Jun 6, 2009 at 9:45 PM, Robert Muir wrote: > kk, i haven't had that experience with worddelimiterfilter on indian > languages, is it possible you could provide me

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-06 Thread KK
g 5 values, thats fine, but somehow its messing with unicode content. How to get rid of that? Any thougts? It seems setting those values is some proper way might fix the problem, I'm not sure, though. Thanks, KK. On Fri, Jun 5, 2009 at 7:37 PM, Robert Muir wrote: > kk an easier solution to

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
st it here. Thanks, KK. On Fri, Jun 5, 2009 at 7:37 PM, Robert Muir wrote: > kk an easier solution to your first problem is to use > worddelimiterfilterfactory if possible... you can get an instance of > worddelimiter filter from that. > > thanks, > robert > > On Fri, Ju

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
e of the other one. Anyway can you guide me getting rid of the above error. And yes I'll change the order of applying the filters as you said. Thanks, KK. On Fri, Jun 5, 2009 at 5:48 PM, Robert Muir wrote: > KK, you got the right idea. > > though I think you might want to ch

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
y out the delimiter. Will update you on that. Thanks a lot. KK On Fri, Jun 5, 2009 at 5:30 PM, Robert Muir wrote: > i think you are on the right track... once you build your analyzer, put it > in your classpath and play around with it in luke and see if it does what > you want. > >

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-05 Thread KK
ts = new PorterFilter(ts); return ts; } } Does this sound OK? I think it will do the job...let me try it out.. I dont need custom filter as per my requirement, at least not for these basic things I'm doing? I think so... Thanks, KK. On Thu, Jun 4, 2009 at 6:36 PM, Robert Muir wrote:

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
thing. What you say? Thanks, KK. On Thu, Jun 4, 2009 at 6:36 PM, Robert Muir wrote: > KK well you can always get some good examples from the lucene contrib > codebase. > For example, look at the DutchAnalyzer, especially: > > TokenStream tokenStream(String fieldName, Reader rea

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
ndEdn. Is something htere? do let me know. Thanks, KK. On Thu, Jun 4, 2009 at 6:19 PM, Robert Muir wrote: > KK, for your case, you don't really need to go to the effort of detecting > whether fragments are english or not. > Because the English stemmers in lucene will not modify y

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
Uwe, thanks for your lightening fast reponse :-). I'm looking into that and let me see how far I can go...Also I request Muir to point me to the exact analyzer he mentiioned in thr previous mail. Thanks, KK On Thu, Jun 4, 2009 at 6:10 PM, Uwe Schindler wrote: > > I request Uwe to g

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-04 Thread KK
om analyzer only if thats not going to be too complex. LOL, I'm a new user to lucene and know basics of Java coding. Thank you very much. --KK. On Thu, Jun 4, 2009 at 5:30 PM, Robert Muir wrote: > yes this is true. for starters KK, might be good to startup solr and look > a

Re: How to support stemming and case folding for english content mixed with non-english content?

2009-06-03 Thread KK
ntifiers, but do we really need that? Because we've only non-english content mixed with english[and not french or russian etc]. What is the best way of approaching the problem? Any thoughts! Thanks, KK. On Wed, Jun 3, 2009 at 9:42 PM, Robert Muir wrote: > KK, is all of your latin script t

How to support stemming and case folding for english content mixed with non-english content?

2009-06-03 Thread KK
nt intermingled with non-english content. I must metion that we dont have stemming and case folding for these non-english content. I'm stuck with this. Some one do let me know how to proceed for fixing this issue. Thanks, KK.

Re: How to get top x[30 or 40] docs from result still alongwith the support for hit highlighting?

2009-06-02 Thread KK
Thanks for your response. BTW, I got it done using TopDocs in place of Hits and used this String content = searcher.doc(topDocs.scoreDocs[i].doc).get("content"); instead of String content = hits.doc(i).get("content"); Thanks, KK On Tue, Jun 2, 2009 at 6:52 PM, Erick Eric

How to get top x[30 or 40] docs from result still alongwith the support for hit highlighting?

2009-06-02 Thread KK
not figure out how to plug the same thing in the above code fragment, a good example will be helpful. As of now I thing its the highlighter thats taking the major part of the time consumed for search. So we can restrict the whole thing for only the part that we are going to show on the first page. Any idea on the same is very welcome. Thank you. --KK.

How to post date encoded in NCR(decimal) to lucene indexer?

2009-06-01 Thread KK
or I've to convert the same to \u format[this is just replace &# with \u and replace the 4-dig number with its hex equivalent]. This manual method doesnot sound good to me. If there is any standard way to doing the same, please someone let ke know. Thank you. --KK. One question? Is it mand

Re: highlighting searched results in document

2009-05-28 Thread KK
the end of the day, then try to get that done. --KK On Thu, May 28, 2009 at 1:16 PM, Ritu choudhary wrote: > Is this possible through lucene or has anybody tried such thing? > > On 28/05/2009, Ritu choudhary wrote: > > well friend let me explain the whole thing to you then: > >

Re: highlighting searched results in document

2009-05-27 Thread KK
Yes, the getBestFragment() returns the matched fragment "fragmentcount" numbers each separated with the "fragmentseparator". what exactly you mean by "highlight the searched word in the document." what is this document??? first let us know what exactly you want to

Re: highlighting searched results in document

2009-05-27 Thread KK
what exactly is your requirement? Displaying the final search results in a webpage? or anything else. The results that you are getting is correct. Now you have to decide what you want to do with that. I thought you are trying to show the results in a webpage. --KK On Thu, May 28, 2009 at 11:54

Re: highlighting searched results in document

2009-05-27 Thread KK
Forgot: Are you trying all this from command line? Because thats wehn you get the ouput as unprocessed html , those span tags, when you pass the same to display the content as a webpage they will be processed by the browser and you will see the colored matches. --KK On Thu, May 28, 2009 at 11:49

Re: highlighting searched results in document

2009-05-27 Thread KK
Yes , thats the expected output. Now put that full content[whatever the searcer returned] in the html page alongwith the styling for the same, and you will see the matches in yellow [you chose yellow as color for highlighting]. --KK On Thu, May 28, 2009 at 11:42 AM, Ritu choudhary wrote: >

Re: highlighting searched results in document

2009-05-27 Thread KK
ghter highlighter = new Highlighter(new QueryScorer(query)); You missed the formatter altogether but you added thestyler at the end, though. Add it and it will work like a charm. --KK On Wed, May 27, 2009 at 10:40 PM, Ritu choudhary wrote: > Am i coding it wrongly ...please reply. >

Re: highlighting searched results in document

2009-05-27 Thread KK
@Ritu Wouter's reply must have fixed the problem, right? Or still stuck? --KK On Wed, May 27, 2009 at 1:46 PM, Wouter Heijke wrote: > Hi, > It sounds to me that you are highlighting the query string and not the > document. You will have to pass the document's content to

Re: highlighting searched results in document

2009-05-26 Thread KK
classpath. As you can see in the last part of the code the final output is being written to a file. As per your requirement remove that code as well as the part that adds html and style tags. Now the code adds the highllight span whereeve there is a match. So now we've to put the style script in

Re: Hit highlighting for non-english unicode index/queries not working?

2009-05-26 Thread KK
le and will definitely go through the examples of LIA 2ndEdn. Thank you. --KK On Tue, May 26, 2009 at 6:55 PM, Erick Erickson wrote: > It's fairly easy to construct your own analyzer bystringing together some > filters and tokenizers. LIA (1st ed) > had a SynonymAnalyzer. You probably

Re: how to get the word before and the word after the matched Term?

2009-05-26 Thread KK
Thank you very much @ Grant. I used the whitespaceanalyzer and other highlighter methods provided for all unicoded docs and its working fine. Thank you all. The book LIA2ndEdn helped me a lot specifically the examples in the highlighting section. Thanks, KK. On Tue, May 26, 2009 at 4:43 PM

Re: New user in lucene

2009-05-26 Thread KK
more information you can always post in in this mailing list. HTH KK On Tue, May 26, 2009 at 12:39 PM, StanleyTan wrote: > > > Hi Alexander, > > thanks for your advise. but im thinking how am i suppose to integrate in? > because i tried to google, and after d/ling the zip folder

Re: Hit highlighting for non-english unicode index/queries not working?

2009-05-26 Thread KK
port both english and non-english indexing/searching/highlighting. Thank you all. Any ideas on the same are always welcome. Thanks, KK. On Tue, May 26, 2009 at 1:24 AM, Robert Muir wrote: > as mentioned previously, i dont think your text is being analyzed the way > you want. > &g

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
ormation on the net regarding the advanced features of lucene all of which are clearly explained in this book with examples. Thank you. KK. On Mon, May 25, 2009 at 7:47 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > I would do some googling to find examples, or rea

Hit highlighting for non-english unicode index/queries not working?

2009-05-25 Thread KK
a lucene index from the command line using the raw unicode texts like this, [...@kk-laptop]$ java LuceneSearcher "\u0BAA\u0BBF\u0BB0\u0B9A\u0BC1\u0BB0" and it gives me the page that mathces the above query. Now I tried to do the same alongwith highliting. So in the code I posted above you

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
Thanks @Michael. I've no idea about this contrib though I'm looking into highlighter. Can you throw some lights on the same. The steps to be taken for achieving the same. I'm completely new to this thing. Can you point me to some examples for the same? Thank you. KK. On Mon, May 2

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-25 Thread KK
Thanks for your response @Seid. Can any Lucene user give me directions on this regard? I'm stuck. Really appreciate your help. Thanks, KK On Mon, May 25, 2009 at 2:43 PM, Seid Muhie wrote: > actually I used the normal java standard libraries for this work. I > used lucene only to r

Re: how to get the word before and the word after the matched Term?

2009-05-25 Thread KK
One more information I would like to add, # I'm building index mostly for non-english texts/documents. and searching is done using unicode utf-8 texts[its obivious, right?] Thanks KK On Mon, May 25, 2009 at 10:58 AM, KK wrote: > Hi All. > I want to do the same thing with say a wind

Re: How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-24 Thread KK
s per your mail, you used Java to extract the neighbors, Is that using the standard techniques i.e using those spanqueries/termvectors or something else. If you can elaborate all this a bit It'd be very helpful. Thank you. KK> On Mon, May 25, 2009 at 10:51 AM, Seid Muhie wrote: > fo

Re: how to get the word before and the word after the matched Term?

2009-05-24 Thread KK
to make use of SpanQuery, TermVector and TermVectorMapper for these purposes, right? NB:I also want to add hit highlighting after fixing the neighbor problem. Thanks, KK. On Thu, May 21, 2009 at 4:46 PM, Grant Ingersoll wrote: > See > http://www.lucidimagination.com/search/docu

How to extract 15/20 words around the matched query after getting results from lucene searcher?

2009-05-24 Thread KK
words after that and will show that to end user. Any idea on doing the same will be very helpful. Thank you. KK.

Re: How to query/search unicoded docs in lucene using unicode text as query?

2009-05-23 Thread KK
e pointers on doing unicode normalization please let me know. If you think that might help I'ld definitely give it a try. Thanks, KK On Thu, May 21, 2009 at 7:40 PM, Robert Muir wrote: > hello, your example (hindi), is probably suffering from a number of search > issues:

How to index pages containing NCR(dec) unicode encodings?

2009-05-23 Thread KK
7;ve to get the utf-8 encoding for them. Is this the way to fix this? or there are other and better ways for doing the same. I need proper guidance from someone who has faced similar problems earlier. All are welcome to give their views/ideas on the same. Thanks, KK

Which analyzer to use for non-english unicoded text?

2009-05-22 Thread KK
ive me ideas on this. Along with this I would also like to do hit highlighting irrespective of language. Ideas on this will be equally helpful. Is simpleAnalyzer() good enough for indexing and searching? Thanks, KK

Re: hit highlighting in lucene ?

2009-05-21 Thread KK
some write ups on the same, do give me the pointers. it'll help me a lot. Pointers to the unicode default algorithms mentioned in your mail will be equally helpful. Thanks, KK. On Thu, May 21, 2009 at 8:03 PM, Robert Muir wrote: > its definitely an area in lucene that could use some i

How to query/search unicoded docs in lucene using unicode text as query?

2009-05-21 Thread KK
the analyzer solr was using in the default setting[i used the default setting only, and pretty sure it was using lot many analyzers/filter factory]. Thanks for all your time and appreciation. Thanks, KK.

Re: hit highlighting in lucene ?

2009-05-21 Thread KK
nguages. Cant we have a single indexer that handles non-eng and eng in equally good ways? Or any other ideas on the same ? Thanks, KK. On Thu, May 21, 2009 at 6:18 PM, Joel Halbert wrote: > The highlighter should be language independent. So long as you are > consistent with your use

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
Thank you very much. As you told me I just added a single line in the jsp page mentioning the charset as utf-8 and it worked like a charm. Thank you. KK On Thu, May 21, 2009 at 5:47 PM, Uwe Schindler wrote: > If you print the result e.g. to a webpage through the servlet API, the > out

hit highlighting in lucene ?

2009-05-21 Thread KK
s for these regional languages. Any other ideas of doing the same would be helpful as well. Thanks, KK.

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
x27;m able to see the regional text. but no through the browser . How to decoding when fetching the search results throught searcher? Thanks KK On Thu, May 21, 2009 at 1:05 PM, KK wrote: > Thanks @Uwe. > #To answer your last mails query, textOnly is the output of the method > downloadPa

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
'm not using the encoding for HTTP parameters, I'll use that and let you know. Thank you very much. KK, On Thu, May 21, 2009 at 12:50 PM, Uwe Schindler wrote: > I forgot: > > > byte [] utfEncodeByteArray = textOnly.getBytes(); > > String utfString = new S

Re: Posting unicode data to lucene not working during searching/retreival!

2009-05-21 Thread KK
alyzer used during > indexing and searching. Often analyzers written for specific languages > cannot correctly handle characters from foreign languages. But e.g. > StandardAnalyzer or WhitespaceAnalyzer does not modify the tokens in any > way > (if making them lowercase is not a problem).

Posting unicode data to lucene not working during searching/retreival!

2009-05-20 Thread KK
laces. Earlier I was using Solr and I was posting using the same method and retreival was also working fine, but I dont' know what is the issue with lucene, may be I'm missing something. Can someone tell me what could be the issue? Thank you. KK,

Re: How to create a new index

2009-05-20 Thread KK
Thank you ag...@john. This is even better. I don't have to bother about the 3rd argument, right? I'll use the same one everytime for both registering a new core as well as adding docs to an existing one. Thanks, KK. On Wed, May 20, 2009 at 6:54 PM, John Byrne wrote: > Hi KK, >

Re: How to create a new index

2009-05-20 Thread KK
it fixed my problem very soon. Thank you all and special thanks to Lucene guys. Thanks, KK. On Wed, May 20, 2009 at 6:28 PM, John Byrne wrote: > I think the problem is that you are creating an new index every time you > add a document: > > IndexWriter writer = new IndexWriter(tru

Re: How to create a new index

2009-05-20 Thread KK
hits = searcher.search(query); } catch (Exception ex) { ex.printStackTrace(); } int hitCount = hits.length(); System.out.println("Results found :" + hitCount); for (int ix=0; (ix wrote: > Hi KK, > > Easier still, you cou

How to create a new index

2009-05-20 Thread KK
How to create a new index? everytime I need to do so , I've to create a new directory and put the path to that, right? how to automate the creation of new directory? I'm a new user of lucene. Please help me out. Thanks, KK.

Does lucene support on the fly creation of new cores

2009-05-19 Thread KK
rJ user to switch to lucene? an estimate on this? Thanks, KK.