subject:"Using Pig\/Spark on ElasticSearch \(as External Storage\)"

Using Pig/Spark on ElasticSearch (as External Storage)

2014-10-12 Thread Sang Dang

Hi All, Currently I am using ElasticSearch for a logging system. My first solution is that every log will put on ES and index will rolling by date. To do real time stats, I will use Aggregation. To do statistic I will use Spark (or Hive, Shark whatever) on ES data (thanks to ElasticSearch-Hadoop

Re: Using Pig/Spark on ElasticSearch (as External Storage)

2014-10-12 Thread Costin Leau

It depends on various factors. Do you put all the data under one index or is it one index per day/month/hour? What type of script and performance degradation do you see? If it's easier feel free to reach out on irc. I'll be traveling this week but we'll be back the next one. Cheers On Oct 12,

Re: Using Pig/Spark on ElasticSearch (as External Storage)

2014-10-12 Thread Sang Dang

Hi Costin Leau, Currently I just pull all data in one index (INDEX_NAME_DATE) In my benmark, I just do two function, count and count distinct field. P/S: Thanks for your fast response, I would really happy to see you at IRC (just give me the time). On Sunday, October 12, 2014 8:02:57 PM