Re: Large Data Set Suggestions

2008-11-08 Thread Noble Paul നോബിള്‍ नोब्ळ्
where ignoreerrors=break means that an error in Inner #2 would prevent Inner #3. Lance -Original Message- From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] Sent: Thursday, November 06, 2008 8:39 PM To: solr-user@lucene.apache.org Subject: Re: Large Data Set Suggestions Hi

Re: Large Data Set Suggestions

2008-11-07 Thread Noble Paul നോബിള്‍ नोब्ळ्
??? ?? [mailto:[EMAIL PROTECTED] Sent: Thu 11/6/2008 11:38 PM To: solr-user@lucene.apache.org Subject: Re: Large Data Set Suggestions Hi Lance, This is one area we left open in DIH. What is the best way to handle this. On error it should give up or continue with the next? -- --Noble Paul

RE: Large Data Set Suggestions

2008-11-07 Thread Steven Anderson
Ideally, it would be a configuration option. Also, it would be great to have a hook to log or process an exception. Steve -Original Message- From: Noble Paul ??? ?? [mailto:[EMAIL PROTECTED] Sent: Thu 11/6/2008 11:38 PM To: solr-user@lucene.apache.org Subject: Re: Large Data

RE: Large Data Set Suggestions

2008-11-07 Thread Lance Norskog
[mailto:[EMAIL PROTECTED] Sent: Thursday, November 06, 2008 8:39 PM To: solr-user@lucene.apache.org Subject: Re: Large Data Set Suggestions Hi Lance, This is one area we left open in DIH. What is the best way to handle this. On error it should give up or continue with the next? On Fri, Nov 7

RE: Large Data Set Suggestions

2008-11-06 Thread Steven Anderson
The performance of DIH is likely to be faster than SolrJ. Because , it does not have the overhead of an http request. Understood. However, we may not have the option of co-locating the data to be injested with the Solr server. What is your data source? I am assuming it is xml. Yes.

RE: Large Data Set Suggestions

2008-11-06 Thread Steven Anderson
In that case you may put the file in a mounted NFS directory or you can serve it out with an apache server. That's one option although someone else on the list mentioned that performance was 10x slower in their NFS experience. Another option is to serve up the files via Apache and pull them

Re: Large Data Set Suggestions

2008-11-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Thu, Nov 6, 2008 at 7:04 PM, Steven Anderson [EMAIL PROTECTED] wrote: The performance of DIH is likely to be faster than SolrJ. Because , it does not have the overhead of an http request. Understood. However, we may not have the option of co-locating the data to be injested with the Solr

Re: Large Data Set Suggestions

2008-11-06 Thread Walter Underwood
100X, not 10X. And with the index on NFS. Reading the input data from NFS would be slower than local, but probably not 10X. --wunder On 11/6/08 5:56 AM, Steven Anderson [EMAIL PROTECTED] wrote: That's one option although someone else on the list mentioned that performance was 10x slower in

RE: Large Data Set Suggestions

2008-11-06 Thread Lance Norskog
on an error. Lance -Original Message- From: Steven Anderson [mailto:[EMAIL PROTECTED] Sent: Thursday, November 06, 2008 5:57 AM To: solr-user@lucene.apache.org Subject: RE: Large Data Set Suggestions In that case you may put the file in a mounted NFS directory or you can serve it out

Re: Large Data Set Suggestions

2008-11-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
PROTECTED] Sent: Thursday, November 06, 2008 5:57 AM To: solr-user@lucene.apache.org Subject: RE: Large Data Set Suggestions In that case you may put the file in a mounted NFS directory or you can serve it out with an apache server. That's one option although someone else on the list

Large Data Set Suggestions

2008-11-05 Thread Steven Anderson
Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ. Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump reference in the DIH wiki, but

Re: Large Data Set Suggestions

2008-11-05 Thread Fergus McMenemie
Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ. Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump reference in the DIH wiki, but

Re: Large Data Set Suggestions

2008-11-05 Thread souravm
: Re: Large Data Set Suggestions Greetings! I've been asked to do some indexing performance testing on Solr 1.3 using large XML document data sets (10M-60M docs) with DIH versus SolrJ. Does anyone have any suggestions where I might find a good data set this size? I saw the wikipedia dump

Re: Large Data Set Suggestions

2008-11-05 Thread Noble Paul നോബിള്‍ नोब्ळ्
The performance of DIH is likely to be faster than SolrJ. Because , it does not have the overhead of an http request. What is your data source? I am assuming it is xml. SolrJ cannot directly index xml . You may need to read docs from xml before solrj can index it. --Noble On Wed, Nov 5, 2008