RE: Can i use lucene to search the internet.

2006-03-22 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Hi, Thanks for the idea. The second option is a nice idea. I dont want to use google api. The second option of urs says that , use a crawler to crawl the webpage and use lucene to search ,"tats a nice idea". Is there any other posibility than this. Adivce will be appreciated. -Original Mess

Speed up Indexing

2006-03-22 Thread hu andy
Hi,everyone. I have a large mount of xml files of size 1G. I use lucene(the dotNet edition) to index . There are 8 fields for a document, with 4 keyword fields and 4 unstored fields. I have set the minMergeDocs to 1 and mergeFactor to 100. It took about 2.5 hours (main memeory 3G, CPU p4 ) .I a

Re: Can i use lucene to search the internet.

2006-03-22 Thread gekkokid
Can i use lucene to search the internet.you could use the google api, that would seem the easiest method, however im not 100% sure what the program is meant to do, search the whole site or just that page? with the google api, you create a window, with inputs for the url and the keyword and it w

Re: Can i use lucene to search the internet.

2006-03-22 Thread gekkokid
Title: Can i use lucene to search the internet. Hi, are you asking does it have a crawler? no it doesn't but nutch does http://lucene.apache.org/nutch/ :)   _gk - Original Message - From: Babu, KameshNarayana (GE, Research, consultant) To: java-user@lucene.apache.org

Can i use lucene to search the internet.

2006-03-22 Thread Babu, KameshNarayana \(GE, Research, consultant\)
Title: Can i use lucene to search the internet. hi all, Can i use lucene to search the internet. Are do we have nay open source applications. Thanks in advance  GE Global Research Kamesh NarayanaBabu John F. Welch Technology Centre Information Technology Management, Plot 122, Export Pro

MMapDirectory on Windows Broken?

2006-03-22 Thread Tom
Hi - I just tried MMapDirectory on windows, running the app I use to populate my index, and it fairly quickly dies. Does it work for anyone? Same code works fine with FSDirectory on Windows, or on Linux with MM. I get: [java] Constructing lucene index in ./test_repos [java] Canon:

RE: Search not working

2006-03-22 Thread Tuan, Frank
Thanks Koji, this "feature" always gets me. It's working now. -Original Message- From: Koji Sekiguchi [mailto:[EMAIL PROTECTED] Sent: Wednesday, March 22, 2006 4:27 PM To: java-user@lucene.apache.org Subject: RE: Search not working Hi Frank, Why your assertion fails because addOrUpdat

RE: Search not working

2006-03-22 Thread Koji Sekiguchi
Hi Frank, Why your assertion fails because addOrUpdate() method doesn't work properly. The method is called INDEX_LIMIT times in populateIndex() for loop, but the index is newly created every time due to CREATE_NEW flag at the IndexWriter constructor: > protected void addOrUpdate(Document doc

Re: Multiple threads in Lucene

2006-03-22 Thread Otis Gospodnetic
Yes, 1 IndexWriter + multiple IndexSearchers definitely work together :) I can't tell what you're doing wrong with the threads... it looks like you might be opening multiple IndexWriters on the same index/directory (big no no). Otis - Original Message From: Nikhil Goel <[EMAIL PROTECTED

RE: java.lang.OutOfMemoryError in lucene

2006-03-22 Thread Koji Sekiguchi
> What else could it be? maybe the ibm jvm? I'm not sure this is the case, but there is an issue about IBM JDK at FAQ. Please read: Why can't I use Lucene with IBM JDK 1.3.1? http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-1416be459d0bb822360b058 aac3c2ccf8ecc133e regards, Koji ---

Multiple threads in Lucene

2006-03-22 Thread Nikhil Goel
Hi Lucene Developers, According to Lucene Documentation, IndexWriter can exist with multiple IndexSearcher and its thread safe. To verify that: I wrote a simple program to simulate that condition but unfortunately I get an exception. Please let me know if anyone has ever tested the Lucene claim th

RE: Repeat Second time: Extract important terms by programming??

2006-03-22 Thread Edgar Meij
That's relatively easy, but not out-of-the box... Something like: private TreeMap getTFIDF(String index, int DocumentID, String Field ){ try{ IndexReader ir = IndexReader.open(index); TermFreqVector tv = ir.getTermFreqVector(DocumentID, Field); String[] Termstv=tv.getTerms(

Re: Query for a non-value

2006-03-22 Thread Daniel Noll
Nick Atkins wrote: Hi there, How do I do a query for the value of a field not being equal to something? For example, we all do Query("field:value") but I want to do Query("NOT field:value") to essentially return all the documents that do not have fields with this value? I've tried this but Luc

Re: Lookup Issues

2006-03-22 Thread Doug Cutting
The Hits-based search API is optimized for returning earlier hits. If you want the lowest-scoring matches, then you could reverse-sort the hits, so that these are returned first. Or you could use the TopDocs-based API to retrieve hits up to your "toHits". (Hits-based search is implemented us

Lookup Issues

2006-03-22 Thread Aigner, Thomas
Howdy all, I am having a performance issue. When I do a search for items, getting more information takes a long time. Ex. If there are 1M hits (I know, why look for that many or even allow it, but let's say we return 1M hits). When the user wants to see the last 25, it takes a LONG time

Search not working

2006-03-22 Thread Tuan, Frank
Hi, I'm new to Lucene, so I thought I'd write a test to understand Lucene. However, I'm running into issues with searching after adding documents into an index. The test fails at assertContains in testAddAndSearch() and testDelete(). I'm sure it's something trivial. Can someone please help?

Re: Query for a non-value

2006-03-22 Thread Otis Gospodnetic
Nick, FAQ entry: http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0cda565d913389773ca9c3246bde894c3e99084e Otis - Original Message From: Nick Atkins <[EMAIL PROTECTED]> To: java-user@lucene.apache.org Sent: Wednesday, March 22, 2006 3:16:47 PM Subject: Query for a non-value Hi the

Re: Errors when searching index and writing to index simultaenously

2006-03-22 Thread Otis Gospodnetic
Venu, Actually, make that 2) sound more like this: 2) Must try to reuse the same IndexSearcher for the same index being searched, unless the index has changed (adds/deletes) AND I want to have the latest "snapshot" of the index for searching (e.g. see the new documents added since IndexSearche

Re: Query question

2006-03-22 Thread Otis Gospodnetic
Hi Thomas, Use "Keyword" (untokenized) field to index your paths. Consider using PerFieldAnalyzerWrapper to specify KeywordAnalyzer for your path field. Use the force, Luke - http://www.getopt.org/luke/ , to ensure your paths are indexed correctly. Otis - Original Message From: WATHEL

Re: FileNotFoundException: Corrupted Index?

2006-03-22 Thread Otis Gospodnetic
Hi Olivier, You have shutdown hooks for read-only operations. They won't corrupt your index. I'd add shutdown hooks for IndexWriter. If that fixes your problem, it would be great if you could add your shutdown hook code to the FAQ on the Wiki, or at least post it to java-user, so somebody els

Query for a non-value

2006-03-22 Thread Nick Atkins
Hi there, How do I do a query for the value of a field not being equal to something? For example, we all do Query("field:value") but I want to do Query("NOT field:value") to essentially return all the documents that do not have fields with this value? I've tried this but Lucene always returns no

Repeat Second time: Extract important terms by programming??

2006-03-22 Thread thanh nguyen
Can anyone help me? Bạn có sử dụng Yahoo! không? Hãy xem thử trang chủ Yahoo! Việt Nam! http://vn.yahoo.com--- Begin Message --- Hello, Suppose that I have indexed a document with Lucene. How can I

Re: java.lang.OutOfMemoryError in lucene

2006-03-22 Thread escobar5
I think the problem is not the memory, because i just tried to search in a 11k index that contains only one document but i still get the same problem. What else could it be? maybe the ibm jvm? -- View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.ht

Re: java.lang.OutOfMemoryError in lucene

2006-03-22 Thread Olivier Jaquemet
Then you should probably try to increase them to a higher value to see if the problem still occurs. The memory consumption on your production server is probably much higher than what you are used to on your development platform. escobar5 wrote: I forgot to tell, i've already checked that and t

Re: java.lang.OutOfMemoryError in lucene

2006-03-22 Thread escobar5
I forgot to tell, i've already checked that and they are: -Xms = 306m -Xmx = 320m -- View this message in context: http://www.nabble.com/java.lang.OutOfMemoryError-in-lucene-t1324911.html#a3536145 Sent from the Lucene - Java Users forum at Nabble.com. ---

Re: java.lang.OutOfMemoryError in lucene

2006-03-22 Thread Olivier Jaquemet
You should probably increase the memory allocated to the jvm using java option such as -Xms128m -Xmx256m (define 128mb of memory at startup which can increase to a maximum of 256) escobar5 wrote: Hello, i'm having a problem when searching in lucene, i get a java.lang.OutOfMemoryError: JVMXE00

java.lang.OutOfMemoryError in lucene

2006-03-22 Thread escobar5
Hello, i'm having a problem when searching in lucene, i get a java.lang.OutOfMemoryError: JVMXE004:OutOfMemoryError, stAllocArray for executeJava failed. My index is about 17MB, when i run the search in my PC, it works ok, but when i deploy it in the AIX server i get the error. Can you tell m

FileNotFoundException: Corrupted Index?

2006-03-22 Thread Olivier Jaquemet
Hi all, We are using the last version of lucene (1.9.1), and sometimes we end up with such error when opening one of the index our application uses: java.io.FileNotFoundException: [...]/LuceneIndex/_ 46.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method)

Query question

2006-03-22 Thread WATHELET Thomas
I use Lucene 1.9.1 How to parse an unc path like \\tom\share\5\tom.doc in a query to search in the index key field? String key="\\tom\share\5\tom.doc "; Ex: Hits hits = multisearch.search(new TermQuery(new Term("key", QueryParser.escape(key; I ask this question because this key exist int

RE: Errors when searching index and writing to index simultaenously

2006-03-22 Thread Satuluri, Venu_Madhav
I've figured out the problem: 1) I must make sure both the searching process and indexing process use the same lock directory (Thanks, Luc!) 2) I must not execute Hits.doc() after closing IndexSearcher. i.e. I must close IndexSearcher *after* I am done with retrieving documents from Hits. Thanks,

RE: Errors when searching index and writing to index simultaenously

2006-03-22 Thread Satuluri, Venu_Madhav
> Make sure both the indexing process and the searcher process use the > same directory to store the Lock files (default your home directory I > believe). I am not sure if earlier they were using the same directory. Now I've made sure they same directory; still I get the second type of exception

RE: Lucene 1.9.1 Query

2006-03-22 Thread WATHELET Thomas
Thanks -Original Message- From: Koji Sekiguchi [mailto:[EMAIL PROTECTED] Sent: mercredi 22 mars 2006 12:56 To: java-user@lucene.apache.org Subject: RE: Lucene 1.9.1 Query Please use an instance method version: QueryParser pq = new QueryParser( "text", new StandardAnalyzer() ); Query que

RE: Lucene 1.9.1 Query

2006-03-22 Thread WATHELET Thomas
Thanks -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: mercredi 22 mars 2006 12:55 To: java-user@lucene.apache.org Subject: RE: Lucene 1.9.1 Query You need to create a QueryParser instance and use that instead: QueryParser qp = new QueryParser("text", new Stan

Adaptive fetch schedule

2006-03-22 Thread Raghavendra Prabhu
Hi Does the inlink value problem solve the OPIC problem which was there. That is on a recrawl, the page would have a higher score. Does this fix that problem? Rgds Prabhu

RE: Errors when searching index and writing to index simultaenously

2006-03-22 Thread Vanlerberghe, Luc
Make sure both the indexing process and the searcher process use the same directory to store the Lock files (default your home directory I believe). Luc -Original Message- From: Satuluri, Venu_Madhav [mailto:[EMAIL PROTECTED] Sent: woensdag 22 maart 2006 14:14 To: java-user@lucene.apache

Errors when searching index and writing to index simultaenously

2006-03-22 Thread Satuluri, Venu_Madhav
Hi, If I run IndexSearcher.search() at the same time an IndexWriter is adding a document to the index, I get the following kind of exception frequently: java.io.FileNotFoundException: /_3j.fnm (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.Ra

RE: Lucene 1.9.1 Query

2006-03-22 Thread Koji Sekiguchi
Please use an instance method version: QueryParser pq = new QueryParser( "text", new StandardAnalyzer() ); Query query = qp.parse( searchvalue ); regards, Koji > -Original Message- > From: WATHELET Thomas [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 22, 2006 8:25 PM > To: java-use

RE: Lucene 1.9.1 Query

2006-03-22 Thread Tim.Wright
You need to create a QueryParser instance and use that instead: QueryParser qp = new QueryParser("text", new StandardAnalyzer()); Query query = qp.parse(this.searchvalue); Cheers, Tim. -Original Message- From: WATHELET Thomas [mailto:[EMAIL PROTECTED] Sent: 22 March 2006 11:25 To: java

Lucene 1.9.1 Query

2006-03-22 Thread WATHELET Thomas
How to replace this Expression Query query = QueryParser.parse(this.searchvalue, "text", new StandardAnalyzer()); in Lucene 1.9.1 because the method parse is deprecated?

Which field has a hit?

2006-03-22 Thread Frank Kunemann
Hi again, is there a way to receive the fields of a document that have a hit? My problem is that in my case a lucene document consists of many different files that belong together. Each of the files has an own content field, but I don't store the content to keep the index as small as possible. The

Re: Term Vector Question

2006-03-22 Thread Daniel Cortes
Ok, thks. I read another time my question and it's normally I didn't obtain any reply :D .Excuseme My index contain fields like CONTAIN,GROUPID,USERID,TOOL. My question is how can I do to obtain a list of terms contained in a group of results. For example, I want all the terms in field CONTAIN

Re: lucene highlighter

2006-03-22 Thread Raghavendra Prabhu
Hi Mark Currently both of the terms have the same score (weightage) As you mentioned, i would want it to be decreased so during the next run for selecting second fragment, term1 has less weightage and term2 which has not been selected has more weightage Thanks Rgds Prabhu On 3/22/06, mark har

Re: lucene highlighter

2006-03-22 Thread mark harwood
>>How can i adjust the lucene highlighter to make sure >> that atleast each term is displayed in the query result First some, basic things to sanity check: * A classic problem: are you using compatible analyzers for tokenizing the query and the document content (both index time and highlight tim