Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
Thanks! Turns out I was using less space on the VM than I thought; that with a lack of decent error checking and I didn't catch the out-of-space problem. As soon as I added more space, I was able to index everything without a problem. Thanks again. On Tuesday, September 9, 2014 6:49:35 PM

Re: Bulk Indexing Problems

2014-09-09 Thread joergpra...@gmail.com
Code looks okay, so it might be just the full volume that is in the way Jörg On Tue, Sep 9, 2014 at 8:44 PM, Joshua P wrote: > This is the code I've been using to index: > > I'm going to try to fix the running out of space issue and then try > slimming down settings. Thank you. > > public class

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
This is the code I've been using to index: I'm going to try to fix the running out of space issue and then try slimming down settings. Thank you. public class Indexer { private static final Logger logger = LogManager.getLogger( "ESBulkUploader"); public static void main(String[] args

Re: Bulk Indexing Problems

2014-09-09 Thread joergpra...@gmail.com
Check the path.data setting in config/elasticsearch.yml Jörg On Tue, Sep 9, 2014 at 7:50 PM, Joshua P wrote: > Just reran the indexer and found this error coming up. I'm running out of > disk space on the partition ES wants to write to. > > F38KqHhnRDWtiJCss5Wz0g -- INTERNAL_SERVER_ERROR -- > T

Re: Bulk Indexing Problems

2014-09-09 Thread joergpra...@gmail.com
You mentioned problems around 200.000 docs. What are these problems and how do you think you can fix them? How does your bulk indexing procedure look like? By finetuning I mean slimming down all ES settings to the absolute minimum to slow down indexing and allocate less resources. But in your case

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
Just reran the indexer and found this error coming up. I'm running out of disk space on the partition ES wants to write to. F38KqHhnRDWtiJCss5Wz0g -- INTERNAL_SERVER_ERROR -- TranslogException[[index_type][0] Failed to write operation [org.elasticsearch.index.translog.Translog$Create@6f1f6b1e]]

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
Hi Jörg, Can you elaborate on what you mean by I still need more fine tuning? I've upped the heap size to 4g (in both places I mentioned before because it's not clear to me which one ES actually uses). I haven't tried to index again yet. Other than throttling my indexing, what are some other

Re: Bulk Indexing Problems

2014-09-09 Thread joergpra...@gmail.com
Let ES_HEAP_SIZE at least to 1 GB, for smaller heaps like 512m and indexing around 1 million docs, you need some more fine tuning, which is complicated. Your machine is ok to set the heap to 4 GB which is 50% of 8 GB RAM. Jörg On Tue, Sep 9, 2014 at 5:39 PM, Joshua P wrote: > Here is /etc/defau

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
Here is /etc/default/elasticsearch # Run Elasticsearch as this user ID and group ID #ES_USER=elasticsearch #ES_GROUP=elasticsearch # Heap Size (defaults to 256m min, 1g max) ES_HEAP_SIZE=512m # Heap new generation #ES_HEAP_NEWSIZE= # max direct memory #ES_DIRECT_SIZE= # Maximum number of open

Re: Bulk Indexing Problems

2014-09-09 Thread vineeth mohan
Hello Joshua , I am not sure which variable you are referring to on the memory settings in the config file , please paste the comment and config. I usually change the config from init.d script. Best approach would be to bulk index say 10,000 feeds in sync mode , wait until is everything is indexe

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
You also said you wouldn't recommend indexing that much information at once. How would you suggest breaking it up and what status should I look for before doing another batch? I have to come up with some process that is repeatable and mostly automated. On Tuesday, September 9, 2014 11:12:59 AM

Re: Bulk Indexing Problems

2014-09-09 Thread Joshua P
Thanks for the reply, Vineeth! What's a practical heap size? I've seen some people saying they set it to 30gb but this confuses me because in the /etc/default/elasticsearch file, the comment suggests the max is only 1gb? I'll look into the threadpool issue. Is there a Java API for monitoring

Re: Some Bulk Indexing PRoblems

2014-09-09 Thread vineeth mohan
Hello Joshua , Please refrain from posting the same question twice. If you need to add additional information , just reply to the original thread. Thanks Vineeth On Tue, Sep 9, 2014 at 7:54 PM, Joshua P wrote: > Hi there! Sorry I posted two topics. I've somehow managed to post an > i

Re: Bulk Indexing Problems

2014-09-09 Thread vineeth mohan
Hello Joshuva , I have a feeling this has something to do with the threadpool. There is a limit on number of feeds to be queued for indexing. Try increasing the size of threadpool queue of index and bulk to a large number. Also through cluster node API on threadpool, you can see if any request ha

Some Bulk Indexing PRoblems

2014-09-09 Thread Joshua P
Hi there! Sorry I posted two topics. I've somehow managed to post an incomplete post. I'm trying to do a one-time index of about 800,000 records into an instance of elasticsearch. But I'm having a bit of trouble. It continually fails around 200,000 records. Looking at in the Elasticsearch Head

Bulk Indexing Problems

2014-09-09 Thread Joshua P
Hi there! I'm trying to do a one-time index of about 800,000 records into an instance of elasticsearch. But I'm having a bit of trouble. It continually fails around 200,000 records. Looking at in the Elasticsearch Head Plugin, my index goes offline and becomes unrecoverable. For now, I have