Re: 400 MB Fields

2011-06-08 Thread Alexander Kanarsky
Otis, Not sure about the Solr, but with Lucene It was certainly doable. I saw fields way bigger than 400Mb indexed, sometimes having a large set of unique terms as well (think something like log file with lots of alphanumeric tokens, couple of gigs in size). While indexing and querying of such thi

RE: 400 MB Fields

2011-06-07 Thread Burton-West, Tom
blogs From: Otis Gospodnetic [otis_gospodne...@yahoo.com] Sent: Tuesday, June 07, 2011 6:59 PM To: solr-user@lucene.apache.org Subject: 400 MB Fields Hello, What are the biggest document fields that you've ever indexed in Solr or that you've heard of? Ah, it must be Tom's Hathi t

Re: 400 MB Fields

2011-06-07 Thread Lance Norskog
uot; wrote: > >>Hi, >> >> >>> I think the question is strange... May be you are wondering about >>>possible >>> OOM exceptions? >> >>No, that's an easier one. I was more wondering whether with 400 MB Fields >>(indexed, not store

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
gt; >> I think the question is strange... May be you are wondering about >>possible >> OOM exceptions? > >No, that's an easier one. I was more wondering whether with 400 MB Fields >(indexed, not stored) it becomes incredibly slow to: >* analyze >* commit / write

Re: 400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hi, > I think the question is strange... May be you are wondering about possible > OOM exceptions? No, that's an easier one. I was more wondering whether with 400 MB Fields (indexed, not stored) it becomes incredibly slow to: * analyze * commit / write to disk * search > I thi

Re: 400 MB Fields

2011-06-07 Thread Fuad Efendi
I think the question is strange... May be you are wondering about possible OOM exceptions? I think we can pass to Lucene single document containing comma separated list of "term, term, ..." (few billion times)... Except "stored" and "TermVectorComponent"... I believe thousands companies already in

Re: 400 MB Fields

2011-06-07 Thread Erick Erickson
>From older (2.4) Lucene days, I once indexed the 23 volume "Encyclopedia of Michigan Civil War Volunteers" in a single document/field, so it's probably within the realm of possibility at least ... Erick On Tue, Jun 7, 2011 at 6:59 PM, Otis Gospodnetic wrote: > Hello, > > What are the biggest do

400 MB Fields

2011-06-07 Thread Otis Gospodnetic
Hello, What are the biggest document fields that you've ever indexed in Solr or that you've heard of? Ah, it must be Tom's Hathi trust. :) I'm asking because I just heard of a case of an index where some documents having a field that can be around 400 MB in size! I'm curious if anyone has an