RE: Large Data Set Suggestions

Lance Norskog Fri, 07 Nov 2008 23:52:56 -0800

In my DIH tests I ran a nested loop where the outer RSS feed gave a list of 
feeds, and the inner loop walked each feed. Some of the feeds were bogus, and 
the DIH loop immediately failed.

It would be good to have at least "ignoreerrors=true" the way 'ant' does. This 
would be set inside each loop. Even better is standard programming language 
continue/break semantics. Example:

Outer ignoreerrors=continue
    Inner #1 ignoreerrors=ignore
        processing loop
    Inner #2 ignoreerrors=continue
        processing loop
    Inner #3 ignoreerrors=break
        processing loop
    Inner #4 ignoreerrors=break
        processing loop

After an error in an inner loop:
Inner #1 continues to its next item.
Inner #2 stops its loop but continues on to Inner #3
Inner #3 stops it loop AND Inner #4 does not run.

Inner #3 has to succeed. If it fails, this loop of Outer fails but Outer 
continues to its next item.  Other cases: if Inner #2 is false, does Inner #3 
get run? Perhaps instead of true and false they could be ignore/break/continue 
where "ignoreerrors=break" means that an error in Inner #2 would prevent Inner 
#3.

Lance

-----Original Message-----
From: Noble Paul നോബിള്‍ नोब्ळ् [mailto:[EMAIL PROTECTED] 
Sent: Thursday, November 06, 2008 8:39 PM
To: solr-user@lucene.apache.org
Subject: Re: Large Data Set Suggestions

Hi Lance,
This is one area we left open in DIH. What is the best way to handle this. On 
error it should give up or continue with the next?

On Fri, Nov 7, 2008 at 12:44 AM, Lance Norskog <[EMAIL PROTECTED]> wrote:
> You can also do streaming XML upload for the XML-based indexing. This 
> can feed, say, 100k records in one XML file from a separate machine.
>
> All of these options ignore the case where there is an error in your 
> input records v.s. the schema.  DIH gives up on an error. Streaming 
> XML gives up on an error.
>
> Lance
>
> -----Original Message-----
> From: Steven Anderson [mailto:[EMAIL PROTECTED]
> Sent: Thursday, November 06, 2008 5:57 AM
> To: solr-user@lucene.apache.org
> Subject: RE: Large Data Set Suggestions
>
>> In that case you may put the file in a mounted NFS directory or you 
>> can serve it out with an apache server.
>
> That's one option although someone else on the list mentioned that 
> performance was 10x slower in their NFS experience.
>
> Another option is to serve up the files via Apache and pull them via 
> DIH HTTP.
>
> Thankfully, there are lots of options, but we need to determine which 
> one will perform best.
>
> Thanks,
>
> A. Steven Anderson
> 410-418-9908 VSTI
> 443-790-4269 cell
>
>
>
>

--
--Noble Paul

RE: Large Data Set Suggestions

Reply via email to