Re: Store the documents content in the index

2011-07-18 Thread Andrew Kane
Some file systems might be slow if too many files are in one folder, try splitting them into subfolders... Andrew. On Sun, Jul 17, 2011 at 8:40 AM, starz10de wrote: > HI, > > Currently my text source files (800 000) are stored in folder which make > retrieving it by many users some how slow. I

Re: Store the documents content in the index

2011-07-18 Thread Erick Erickson
It's certainly possible as others have said, but don't be surprised if it's not performant. At root, you still have a disk out there that's being used for fetching the data. Simply moving it from fetching individual files to fetching that data from the index doesn't change that fundamental fact. B

Re: Store the documents content in the index

2011-07-18 Thread starz10de
thanks for your reply -- View this message in context: http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180435.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

RE: Store the documents content in the index

2011-07-18 Thread starz10de
thanks for your reply -- View this message in context: http://lucene.472066.n3.nabble.com/Store-the-documents-content-in-the-index-tp3176703p3180432.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To

RE: Store the documents content in the index

2011-07-18 Thread Jagdish Vasani
Yes..you can Store Text file content by saying that field store-Field.Store.YES.same time you can also index it by saying Field.Index.ANALYZED as another parameter of Field class contructor. Thanks, Jagdish -Original Message- From: starz10de [mailto:farag_ah...@yahoo.com] Sent: Sunday,

RE: teragram to Lucene

2011-07-18 Thread Jagdish Vasani
See the surround query in lucene/contrib. It support proximity search query syntax is diff..but you can customize it. By customizing javacc language file-"QueryParser .jj" you can get as you wish. Thanks, Jagdish -Original Message- From: Walt [mailto:junk...@comcast.net] Sent: Friday

Re: highlighting

2011-07-18 Thread Sabeer Hussain
I am using Lucene 4.0 and trying to use its highlighting feature. I am not getting the desired result due to some mistake that I am not able to identify. My source code looks like String sourceText = "liver disease kidney transplant"; String termString ="\"transplant\"";

[Announce] Solr 3.3 with RankingAlgorithm NRT capability, very high performance 10000 tps

2011-07-18 Thread Nagendra Nagarajayya
Hi! I would like to announce the availability of Solr 3.3 with RankingAlgorithm and Near Real Time (NRT) search capability now. The NRT performance is very high, 10,000 documents/sec with the MBArtists 390k index. The NRT functionality allows you to add documents without the IndexSearchers be

RE: Advanced NearSpanQuery

2011-07-18 Thread Jeroen Lauwers
For your information: After a closer inspection, I found a couple of errors in my code. I've fixed most of them so if anyone is interested, just let me know. Jeroen -Original Message- From: Jeroen Lauwers [mailto:jeroen.lauw...@ctlo.net] Sent: vrijdag 15 juli 2011 17:08 To: java-user@lu

Re: HighFreqTerms for results set

2011-07-18 Thread Mihai Caraman
Faceted search is for single-term fields, wright? Isn't it bad practice to apply it for each word in each field in the resulting set?(if it's even posible) Again, I want to find the most frequent word in a resulting set. Words are in fields that contain phrases, not in their own field. 2011/7/18

Re: HighFreqTerms for results set

2011-07-18 Thread Manish Bafna
Use Facet by that field. It will bring up top words. On Mon, Jul 18, 2011 at 6:03 PM, Mihai Caraman wrote: > So I looked around and found no viable solution for this problem: > How to extract the most frequent terms in the search result set after > submitting the query. > > HighFreqTerms >

HighFreqTerms for results set

2011-07-18 Thread Mihai Caraman
So I looked around and found no viable solution for this problem: How to extract the most frequent terms in the search result set after submitting the query. HighFreqTerms and docFreq

RE: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior

2011-07-18 Thread Uwe Schindler
Hi Thomas, Just one question: Are these docIds from Lucene or your own ones? And second, are the underlying indexes also built with the corresponding Lucene versions? The reason behind: Nothing in Lucene guarantees the order of docIds for same scores, they can be arbitrary. One change in Lucene

Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior

2011-07-18 Thread Thomas Rewig
Hi Ian, yes the score is identical but the inner ordering of same scores seems to be different in the versions. In Lucene 3.3.0 it seems that terms with special characters will be preferred before the exact hit. My code is: PhraseQuery query = new PhraseQuery(); query.add(new

Re: Store the documents content in the index

2011-07-18 Thread Ian Lea
Of course. See the javadocs for Field, Field.Store and Field.Index. -- Ian. On Sun, Jul 17, 2011 at 1:40 PM, starz10de wrote: > HI, > > Currently my text source files (800 000) are stored in folder which make > retrieving it by many users some how slow. I heard it might be possible that > the

Re: TermQuery - ExactMatching, Lucene 3.1.0 vs. 3.3.0, special character behavior

2011-07-18 Thread Ian Lea
I'm not sure what you are getting at. A search using 3.1.0 and 3.3.0 returns the same docs with identical scores, except that one gives them in order A,B and the other in order B,A? What search method are you using? Does it guarantee anything about the order of returning docs with identical scor