Re: Lucene Search Result with Line Numbers?

2005-04-11 Thread Karl Øie
Oh, forgot your last question, thats why the field "line" has to be stored, upon query you have to get the "line" number from the document that represents the line and in "forward" / "back" actions you will have sort the resultset by line value and print only chunks of that result. Mvh Karl Øi

Re: Lucene Search Result with Line Numbers?

2005-04-11 Thread Karl Øie
Yes, the biggest drawback is text spanning lines: L1 - it was the best of times, L2 - it was the worst of times will return no hits for the search "it was the best of times, it was the worst of times" (with quotes). because no single lucene document contains the whole text alone. I would be inte

Re: Lucene Search Result with Line Numbers?

2005-04-11 Thread Karl Øie
Most indexing creates a Lucene document for each Source document. What would need is to create a Lucene document for each line. String src_doc = "crash.java"; int line_number = 0; while(reader!=EOF) { String line = reader.readLine(); Document ld = new Document(); ld.add(ne

Re: Multi-analyzer ?

2005-04-11 Thread Karl Øie
I don't think you can figure out the language from the input box value alone, i can't see any way to select the correct language analyzer at this point. What you can do is to put Chinese, Japanese, English and Dutch content in separate indexes and use multisearcher to search in all of them, and

Re: Urgent, please help Index/Search in UTF-8 ???

2005-04-11 Thread Karl Øie
If you use a servlet and a HTML Form to feed queries to the QueryParser take good care of all configurations around the servlet container. If you, like me, use tomcat you might have to recode the query into internal java form (utf-8) before you pass it to lucene. read this: http://www.crazysqui

Re: indexing performance of little documents

2005-04-01 Thread Karl Øie
This might sound a bit lame but it has worked for me. I have had the same problem where the amount of small lucene documents slows down the building of large indexes. Search is pretty fast, and read only, so for my case i just created three indexes and saved every three lucene documents into on