Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread Sebastin
Hi testn, here is my index details: Index fields :5 fields Store Fileds:10 fields Index code: contents=new StringBuilder().append(compCallingPartyNumber).append(

答复: unable to search from a string con taining numbers seperated by comma.

2007-09-05 Thread Kai Hu
Hi, I have this problem too. I made a test use StandardAnalyzer: Analyzer analyzer = new StandardAnalyzer(); Reader reader = new BufferedReader(new InputStreamReader(new StringBufferInputStream(Aaa,0982,abc))); TokenStream tokenStream =

Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread testn
A couple things to make sure: 1. When you open IndexWriter, what is the analyzer you use? StandardAnalyzer? 2. How many records are there? 3. Could you also check number of terms in your indices? If there are too many terms, you could consider chop something in smaller piece for example... store

Re: Look for strange encodings -- tokenization

2007-09-05 Thread Steven Rowe
poeta simbolista wrote: I'd want to know the best way to look for strange encodings on a Lucene index. i have several inputs where input can have been encoded on different sets. I not always know if my guess about the encoding has been ok. Hence, I'd thought of querying the index for some

Re: Extract terms not by reader, but by documents

2007-09-05 Thread Rafael Rossini
Thank´s for the reply Grant, let me try to explain exactly what I´d like to do. Take the 2 docs: Doc1: Microsoft is a nice software company, and Xbox seems to be a nice product too. Doc2: Nintendo and Sony have been in the game industry for a long time, but now, Microsoft is trying to enter with

How to boost a document based on a field in the document

2007-09-05 Thread Adam Ruggles
I'm trying to find a query that would boost a document based on a field in the document. I have a simple index with title, description, date, ... I also have a field called vote. I want items that have been voted higher to be ranked as higher in the search results. Is there a query and or

Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread Sebastin
I use StandardAnalyzer.the records daily ranges from 5 crore to 6 crore. for every second i am updating my Index. i instantiate IndexSearcher object one time for all the searches. for an hour can i see the updated records in the indexstore by reinstantiating IndexSearcher object.but the problem

Re: Look for strange encodings -- tokenization

2007-09-05 Thread poeta simbolista
Thank you Steven, I have problems while providing those searches, I think it is because of the StandardAnalyzer is taking those bad-encoding characters as separators hence not creating such tokens when reading... Regarding the other idea you provided, did you mean then, that if a document

Re: How to boost a document based on a field in the document

2007-09-05 Thread Erick Erickson
What would happen if you sorted by vote? Perhaps within ranges of scores? There's a thread in the list in response to a post I made about buckets that might be relevant Otherwise, you might think about boosting the relevant parts of the document at *index* time based on the value of vote

Re: How to boost a document based on a field in the document

2007-09-05 Thread Adam Ruggles
Well a sort would remove the relevance portion of the query, which I really don't want to do. I tried using the ValueSourceQuery but it doesn't seem to be able to handle negative vote values. Buckets sound interesting but since there is no max voting value it would be difficult to build the

Re: How to boost a document based on a field in the document

2007-09-05 Thread Erick Erickson
I think you misunderstand. The buckets are NOT the votes, they are the relevance scores from the search. So your search returns relevance scores (raw) from, say 1 - 100. You could collect the results in 5 buckets and sort by vote *within* the bucket. So the user still sees the most relevant

Re: How to boost a document based on a field in the document

2007-09-05 Thread Adam Ruggles
Doh... Thanks. Erick Erickson wrote: I think you misunderstand. The buckets are NOT the votes, they are the relevance scores from the search. So your search returns relevance scores (raw) from, say 1 - 100. You could collect the results in 5 buckets and sort by vote *within* the bucket.

Re: Extract terms not by reader, but by documents

2007-09-05 Thread Karl Wettin
Rafael, are you looking for IndexReader.getTermFreqVector? -- karl 5 sep 2007 kl. 16.48 skrev Rafael Rossini: Thank´s for the reply Grant, let me try to explain exactly what I´d like to do. Take the 2 docs: Doc1: Microsoft is a nice software company, and Xbox seems to be a nice product

Re: Extract terms not by reader, but by documents

2007-09-05 Thread Grant Ingersoll
On Sep 5, 2007, at 10:48 AM, Rafael Rossini wrote: Thank´s for the reply Grant, let me try to explain exactly what I´d like to do. Take the 2 docs: Doc1: Microsoft is a nice software company, and Xbox seems to be a nice product too. Doc2: Nintendo and Sony have been in the game industry

Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread Chris Hostetter
: I use StandardAnalyzer.the records daily ranges from 5 crore to 6 crore. for : every second i am updating my Index. i instantiate IndexSearcher object one : time for all the searches. for an hour can i see the updated records in the : indexstore by reinstantiating IndexSearcher object.but the

Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread Sebastin
I set IndexSearcher as the application Object after the first search. here is my code: if(searcherOne.isOpen()==(true)){ Directory compressDir2 =

Re: Java Heap Space -Out Of Memory Error

2007-09-05 Thread Chris Hostetter
: I set IndexSearcher as the application Object after the first search. ... : how can i reconstruct the new IndexSearcher for every hour to see the : updated records . i'm confused ... my understanding based on the comments you made below (in an earlier message) was that you already