Re: Index missing documents

2006-02-20 Thread Michael van Rooyen
Thanks Otis. All the documents were written in a using the same IndexWriter, without ever closing it. Is this what could be responsible for the documents not being in the segmens file, or is this bad practice? Maybe I should use a writer for a batch of documents (1000 or so maybe?), and then

webserverless search with lucene on offline HTML doc

2006-02-20 Thread paolo berto
Hello, I would like to figure out if it is possible to write a java applet able to search with lucene through an HTML documentation WITHOUT having a webserver installed on the system and on multiple platforms. So I have a set of static offline HTML files forming a software documentation,

RE: webserverless search with lucene on offline HTML doc

2006-02-20 Thread Trieschnigg, R.B. \(Dolf\)
Hi Paulo, The main problem is that Lucene needs to store its index on a disk which under normal circumstances an applet may not read. The applet operates in a sandbox, which only allows "safe" operations. Reading and writing to disk is not allowed. An applet can only get resources from the host

Re: webserverless search with lucene on offline HTML doc

2006-02-20 Thread Fabio Insaccanebbia
Wouldn't this be a good case for the JarDirectory implementation somebody asked for? The index could then be statically written in a jar file downloaded with the applet (the original mail refers to static offline HTML files). It could even be a great idea for improving the Maven site-plugin :-) [I

Re: webserverless search with lucene on offline HTML doc

2006-02-20 Thread paolo berto
Hey Dolf. On Feb 20, 2006, at 12:11 PM, Trieschnigg, R.B. ((Dolf)) wrote: Hi Paulo, The main problem is that Lucene needs to store its index on a disk which under normal circumstances an applet may not read. The applet operates in a sandbox, which only allows "safe" operations. Reading a

Problem with TermDocs

2006-02-20 Thread Anton Potehin
Ir is IndexReader. termIdent is Term int freq = ir.docFreq(termIdent); if (freq > 1) { TermDocs termDocs = ir.termDocs(termIdent); int[] docsArr = new int[freq]; int[] freqArr = new int[freq]; int number = termDocs.read(docsArr,freqArr); System.out.println(number)

Re: webserverless search with lucene on offline HTML doc

2006-02-20 Thread paolo berto
On Feb 20, 2006, at 12:42 PM, Fabio Insaccanebbia wrote: Wouldn't this be a good case for the JarDirectory implementation somebody asked for? The index could then be statically written in a jar file downloaded with the applet (the original mail refers to static offline HTML files). It could eve

Re: Rebuilding after modifying JSP's

2006-02-20 Thread Erik Hatcher
On Feb 19, 2006, at 7:11 PM, Michael Dodson wrote: Turns out this isn't a lucene isolated problem. I'm getting the same error no matter what I try to build. I suppose I should be asking the ant mailing list... Do you have a CLASSPATH set in your environment? If so, remove or empty it a

Lucene CPU Utilization

2006-02-20 Thread Amany Moussa
Hello, I am building a Lucene index with over a million documents retrieved from database. I am running the application on Unix, I am getting a 100% CPU utilization the moment the application start. The application creates a list of small indices in a temp directory then merge them all in the

Re: Custom Sorting

2006-02-20 Thread SOME ONE
Hi, Yes, my queries are like the first case. And as there have been no other suggestions to do it in a single search operation, will have to do it the way you suggested. This technique will do the job particularly because title's text is always in the body as well. So finally I will have to run tw

Re: Custom Sorting

2006-02-20 Thread Michael D. Curtin
SOME ONE wrote: Hi, Yes, my queries are like the first case. And as there have been no other suggestions to do it in a single search operation, will have to do it the way you suggested. This technique will do the job particularly because title's text is always in the body as well. So finally I

StandardAnalyzer question ...

2006-02-20 Thread Mufaddal Khumri
Hi, When StandardAnalyzer is used to index documents, arent the terms, amongst other things, lower cased and stored that ways in the index? I have a index field that I index like this: ramWriter = new IndexWriter(ramDir, standardAnalyzer, true); ... ... doc.add(Field.Text("categoryN

Re: StandardAnalyzer question ...

2006-02-20 Thread Oskar Berger
Hello, Not yet an expert in the field, but as I've understood the thing the terms are indexed as you specify them (through the filters) but the contents are stored depending on whether you want it or not (Filed.UnStored(), which happens to be on its way to get deprecated). So maybe you search the

exact match ..

2006-02-20 Thread Mufaddal Khumri
lets say i do this while indexing: doc.add(Field.Text("categoryNames", categoryNames)); Now while searching categoryNames, I do a search for "digital cameras". I only want to match the exact phrase digital cameras with documents who have exactly the phrase "digital cameras" in the categoryName

span first query and boosting ..

2006-02-20 Thread Mufaddal Khumri
Hi, I do this: SpanFirstQuery fullPhraseInCategoryNamesQuery = new SpanFirstQuery(new SpanTermQuery(new Term("categoryNames", "digital cameras")), 2); fullPhraseInCategoryNamesQuery.setBoost(8); In my log output i get this: spanFirst(categoryNames:digit camera, 2)) Why cant I boost a span q

Re: span first query and boosting ..

2006-02-20 Thread Erik Hatcher
On Feb 20, 2006, at 12:22 PM, Mufaddal Khumri wrote: Hi, I do this: SpanFirstQuery fullPhraseInCategoryNamesQuery = new SpanFirstQuery (new SpanTermQuery(new Term("categoryNames", "digital cameras")), 2); fullPhraseInCategoryNamesQuery.setBoost(8); In my log output i get this: spanFirst(c

Re: exact match ..

2006-02-20 Thread Steven Rowe
Mufaddal Khumri wrote: lets say i do this while indexing: doc.add(Field.Text("categoryNames", categoryNames)); Now while searching categoryNames, I do a search for "digital cameras". I only want to match the exact phrase digital cameras with documents who have exactly the phrase "digital came

Re: exact match ..

2006-02-20 Thread Mufaddal Khumri
Hi Steve, If I understand you right, I could use something like the Keyword analyzer to tokenize the entire stream as a single token and store that in the index. I could definitely the keyword analyzer while indexing this particular field "categoryNames". Now my questions is on how to search

Re: exact match ..

2006-02-20 Thread Mufaddal Khumri
Hi, Just realized that the various fields I have are part of the same document. But in order to leverage the KeywordAnalyzer, I would have to now have two sets of document. One document with the fields: title, content <--- analyzed by custom analyzer Other document with the fields: categoryNam

Re: exact match ..

2006-02-20 Thread Erik Hatcher
On Feb 20, 2006, at 1:02 PM, Mufaddal Khumri wrote: If I understand you right, I could use something like the Keyword analyzer to tokenize the entire stream as a single token and store that in the index. I could definitely the keyword analyzer while indexing this particular field "categoryN

Re: exact match ..

2006-02-20 Thread Erik Hatcher
On Feb 20, 2006, at 1:22 PM, Mufaddal Khumri wrote: Just realized that the various fields I have are part of the same document. But in order to leverage the KeywordAnalyzer, I would have to now have two sets of document. One document with the fields: title, content <--- analyzed by custom a

RE: Speedup indexing process

2006-02-20 Thread Mordo, Aviran (EXP N-NANNATEK)
After indexing is done, you can copy the index files and merge them to one large index. Or you can maintain several small indexes and search across indexes. Aviran http://www.aviransplace.com -Original Message- From: Java Programmer [mailto:[EMAIL PROTECTED] Sent: Friday, February 17, 2

Re: Lucene CPU Utilization

2006-02-20 Thread Otis Gospodnetic
I think I answered that question just the other day privately... No, there is nothing in Lucene to help you with CPU utilization. However, if you are running this on a UNIX box of some kind, you can (re)nice the process and thus lower its priority, giving other processes more time with the CP

Re: Index missing documents

2006-02-20 Thread Otis Gospodnetic
No, using the same IndexWriter is the way to go. If you want things to be written to disk more frequently, lower the maxBufferedDocs setting. Go down to 1, if you want. You'll use less memory (RAM), Documents will be written to disk without getting buffered in RAM, but the indexing process wi

Re: Lucene CPU Utilization

2006-02-20 Thread Amany Moussa
Thank you so much for your reply. I know that you answered this question before. I just wanted to post the question to receive more feedbacks and share the information. Thanks again. Amany M. --- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > I think I answered that question just the other > da

Lucene in multithreaded enviroment

2006-02-20 Thread Klaus
Hi I'm using Lucene in a web application. Every time a new object is added to the system the index will be updated. May there be any problems, if two objects were created at the same moment? I know Lucene has some locking mechanism. Thx klaus -Ursprüngliche Nachricht- Von: Amany Mouss

Re: Lucene CPU Utilization

2006-02-20 Thread jwang
We're going to run into this issue when dealing with some of our larger customers. What we plan on doing is to separate our indexers in to separate cpus, and then throttle them by using sleep(100) or some other number to be determined in testing. We also plan on doing this over 2 weekends, sin

Re: Lucene in multithreaded enviroment

2006-02-20 Thread Otis Gospodnetic
Hi Klaus, If you use a single instance of IndexWriter, you can call addDocument(...) on it without synchronizing (things are thread safe inside the call). If you are opening/closing IndexWriters yourself, then you have to make sure you have only 1 IndexWriter open at a time. If you have Lucene