Re: Error on ElasticSearch-Hadoop with Scala11

2015-07-04 Thread Deepak Subhramanian
It worked when I added the scala 2.10 dependency to the project.. org.scala-lang scala-library 2.10.1 On Sat, Jul 4, 2015 at 10:31 AM, Deepak Subhramanian wrote: > I am getting error while using es-hadoop with spark on scala11. It > works with the spark-shell binaries I have with s

Error on ElasticSearch-Hadoop with Scala11

2015-07-04 Thread Deepak Subhramanian
I am getting error while using es-hadoop with spark on scala11. It works with the spark-shell binaries I have with scala10 This line generates the error. val esRDD = sc.esRDD("realtimeanalytics/events") This works finaloutput.toJSON.saveJsonToEs("realtimeanalytics/events") Any inputs will be app

Re: [Hadoop] Slow performance of Elasticsearch-Hadoop + Spark SQL

2015-06-01 Thread Costin Leau
k SQL? Environment: (everything is running on the same box): Elasticsearch 1.4.4 elasticsearch-hadoop 2.1.0.BUILD-SNAPSHOT Spark 1.3.0. CURL: curl -XPOST "http://localhost:9200/summary/intervals/_search"; -d' { "query" : { "filtered"

[Hadoop] Slow performance of Elasticsearch-Hadoop + Spark SQL

2015-06-01 Thread Dmitriy Fingerman
performance? Are there any ways to tune performance of Elasticsearch + Spark SQL? Environment: (everything is running on the same box): Elasticsearch 1.4.4 elasticsearch-hadoop 2.1.0.BUILD-SNAPSHOT Spark 1.3.0. CURL: curl -XPOST "http://localhost:9200/summary/intervals/_search

Re: Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-29 Thread Costin Leau
nsdag 29 april 2015 kl. 00:11:47 UTC+2 skrev Costin Leau: >> >> Hi, >> >> It seems you are running into a classpath problem. The class mentioned in >> the exception (org/elasticsearch/hadoop/serialization/dto/Node) is part of >> the elasticsearch-hadoop-hive-XXX.

Re: Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-29 Thread Rasmus Aveskogh
Thanks. We got it working by adding the jar to the hive-config, rather than by "add jar" .. -ra Den onsdag 29 april 2015 kl. 00:11:47 UTC+2 skrev Costin Leau: > > Hi, > > It seems you are running into a classpath problem. The class mentioned in > the exception

Re: Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-28 Thread Costin Leau
Hi, It seems you are running into a classpath problem. The class mentioned in the exception (org/elasticsearch/hadoop/serialization/dto/Node) is part of the elasticsearch-hadoop-hive-XXX. jar - you can verify this yourself. The fact that it is not found at runtime suggests that the a different

Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-28 Thread Rasmus Aveskogh
Hi! I've followed the various guides to get going with the elasticsearch-hadoop-integration in Hive, but I run into some issue: > add jar hdfs://host:9000//lib/elasticsearch-hadoop-hive-2.1.0.Beta4.jar; INFO : converting to local hdfs: //host:9000//lib/elasticsearch-hadoop-hive-2.1

Re: about elasticsearch-hadoop error

2015-04-16 Thread Costin Leau
Based on your cryptic message I would guess the issue is likely that the jar you are building is incorrect as it's manifest is invalid. Spark most likely is signed and thus extra content breaks this. See http://www.elastic.co/guide/en/elasticsearch/hadoop/master/troubleshooting.html#help

about elasticsearch-hadoop error

2015-04-15 Thread guoyiqincn
*Hello* *when add elasticsearch-hadoop jar * *this is a error* Spark assembly has been built with Hive, including Datanucleus jars on classpath Exception in thread "main" java.lang.SecurityException: Invalid signature file digest for Manifest main att

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2015-04-08 Thread Paul Chua
I'm having an issue very similar to this; I'm not sure exactly what you did to get the array contents. I've made a new post here: https://groups.google.com/forum/#!topic/elasticsearch/MpOqKthgqtA -- Paul Chua Data Scientist 317-979-5643 [image: cid:02526A0B-9444-47C7-A3EC-12B05A02CB50] *We help

Re: Understanding Elasticsearch-Hadoop

2015-04-04 Thread Costin Leau
Hi, Hadoop means a lot of things as it has a lot of components. I'm sorry to hear the resources you read don't give you enough answers. The 'definition' of Elasticsearch Hadoop is given in the documentation preface [1] which I quote below: " Elasticsearch for Apac

Understanding Elasticsearch-Hadoop

2015-04-04 Thread Bharvi Dixit
Hi, Even after going through so many resources and reading about es-hadoop i am unable to clarify some of my doubts like: How to run elasticsearch data nodes on your hadoop data nodes?? Can i install an elasticsearch cluster and store indexes on hadoop HDFS?? if yes then how?? Will i have to ke

elasticsearch-hadoop pyspark [Hadoop]

2015-03-19 Thread Jeffrey Hoekman
for Costin...? I enjoyed the talk at Spark Summit East on spark-elasticseach integration in Spark 1.3 (sparkContext.esRDD and rdd.saveToEs APIs). Will these APIs eventually be able for pyspark context/rdd? Cheers, JH -- You received this message because you are subscribed to the Google Group

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Costin Leau
You're close: elasticsearch-hadoop snapshot (aka dev aka master) works on spark 1.2, 1.1 and 1.0, both core and sql elasticsearch-hadoop beta3 (not snapshot) works on spark 1.1 and spark 1.0, both core and sql elasticsearch-hadoop beta2 (not snapshot) works on spark 1.0 (core and sql) The su

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Jeff Steinmetz
Thank you for the summary - you are confirming (as a sanity check for myself): elasticsearch-hadoop beta3 (not snapshot) on spark core 1.1 only elasticsearch-hadoop-beta3-SNAPSHOT with spark core 1.1, 1.2 and 1.3 -- as long as I don't use Spark SQL when using 1.2 and 1.3 Costin - I am a

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Costin Leau
e in order, the same should apply for es-hadoop as well since it relies only on Spark (and Scala of course). On Tue, Mar 17, 2015 at 10:43 PM, Jeff Steinmetz < jeffrey.steinm...@gmail.com> wrote: > There are plenty of spark / akka / scala / elasticsearch-hadoop > dependencies to ke

spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Jeff Steinmetz
There are plenty of spark / akka / scala / elasticsearch-hadoop dependencies to keep track of. Is it true that elasticsearch-hadoop needs to be compiled for a specific spark version to run correctly on the cluster? I'm also trying to keep track of the akka version and scala version. i.e

Re: elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Chen Wang
you need to pass the array of maps as a script parameter > and not use primitives instead (you can use Hive column mapping to extract > the ones you need)? > > On Thu, Mar 12, 2015 at 11:56 PM, Chen Wang > wrote: > >> Folks, >> I am using elasticsearch-hadoop-hive-2.1.0.B

Re: elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Costin Leau
JSON form incorrect. Any reason why you need to pass the array of maps as a script parameter and not use primitives instead (you can use Hive column mapping to extract the ones you need)? On Thu, Mar 12, 2015 at 11:56 PM, Chen Wang wrote: > Folks, > I am using elasticsearch-hadoop-hive-2.1.

elasticsearch-hadoop-hive exception when writing array> column

2015-03-12 Thread Chen Wang
Folks, I am using elasticsearch-hadoop-hive-2.1.0.Beta3.jar I defined the external table as:. CREATE EXTERNAL TABLE IF NOT EXISTS ${staging_table}( customer_id STRING, store_purchase array>) ROW FORMAT SERDE 'org.elasticsearch.hadoop.hive.EsSerDe&#x

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-02-02 Thread Abhishek Patel
Hi Julien Yes. Probably that's the only possible work around to do this. What I am planning to do is, calculate the name of index prior to writing and add a field named "indexname" in my JSON and then I will use JavaEsSpark.saveJsonToEs(jrd, "index_{indexname}/type"); Thanks for the reply.

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-02-02 Thread Julien Naour
Hi Abhishek, I'll probably do a previous step to process the day by line (like -MM-dd). And just do JavaEsSpark.saveJsonToEs(jrd, "index_{date}/type"); If the range is needed I'll probably do the same. Add a feature with {time} - {time} % 86400}_{ {time} + 86400 - {date} % 86400 } and just in

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-02-02 Thread Abhishek Patel
Hi Julien, I am trying to achieve something similar. In my case, my JSON contains a field "time" in Unix time. And i want to partition my indexes by this field. That is, if one JSON1 contains 1422904680 in "time" and JSON2 contains 1422991080 in time, then i want to create indexes which are pa

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-19 Thread Julien Naour
gt; > Example of data (~700 millions lines for ~90days): > > 2014-01-01,05,06,ici > 2014-01-04,05,06,la > > The first one have to be send to my-index-2014-01-01/my-type and the other > my-index-2014-01-04/my-type > I would like to do it without having to launch 90 saveJsonToES (us

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-19 Thread Julien Naour
e and the other my-index-2014-01-04/my-type I would like to do it without having to launch 90 saveJsonToES (using the elasticsearch-hadoop spark API) Is it more clear? It seems that the dynamic index could work for me. I'll try that right away. Thanks again Julien 2015-01-19 16:18 GMT+01

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-19 Thread Costin Leau
[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/configuration.html#cfg-multi-writes [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/spark.html#spark-write-dyn [3] https://github.com/elasticsearch/elasticsearch-hadoop/issues/358 On 1/19/15 4:50 PM, J

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-19 Thread Julien Naour
a complex workflow using Spark (Parsing, Cleaning, Machine >> Learning). >> At the end of the workflow I want to send aggregated results to >> elasticsearch so my portal could query data. >> There will be two types of processing: streaming and the possibility to >>

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
Learning). > At the end of the workflow I want to send aggregated results to > elasticsearch so my portal could query data. > There will be two types of processing: streaming and the possibility to > relaunch workflow on all available data. > > Right now I use elasticsearch-h

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
on all available data. > > Right now I use elasticsearch-hadoop and particularly the spark part to > send document to elasticsearch with the saveJsonToEs(myindex, mytype) > method. > The target is to have an index by day using the proper template that we > build. > AFAIK you could not ad

elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-15 Thread Julien Naour
available data. Right now I use elasticsearch-hadoop and particularly the spark part to send document to elasticsearch with the saveJsonToEs(myindex, mytype) method. The target is to have an index by day using the proper template that we build. AFAIK you could not add consideration of a feature

Re: Elasticsearch-Hadoop Data Locality

2014-12-31 Thread Costin Leau
For the record, what spark and es-hadoop version are you using? For each shard in your index, es-hadoop creates one Spark task which gets informed of the whereabouts of the underlying shard. So in your case, you would end up with 20 tasks/workers, one per shard, streaming data back to the maste

Elasticsearch-Hadoop Data Locality

2014-12-31 Thread Elliott Bradshaw
I'm trying to get a spark job running that pulls several million documents from an Elasticsearch cluster for some analytics that cannot be done via aggregations. It was my understanding that es-hadoop maintained data locality when the spark cluster was running alongside the elasticsearch clust

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-18 Thread Kamil Dziublinski
you need some more info let me know. Dummy input file was placed in src/test/resources/input/input.txt for test to read it. I tested this with gradle project (within my existing one). elasticsearch-hadoop 2.0.2 dependency and java 7. You can see the exception being thrown in console when running it

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-16 Thread Costin Leau
[1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/troubleshooting.html On 12/15/14 6:03 PM, Kamil Dziublinski wrote: Hi, I had only one jar on classpath and none in hadoop cluster. I had different types of values in my MapWritable tho. It turns out this was the problem. So I

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-15 Thread Kamil Dziublinski
changed everything to be Text it started working. Is this intended behaviour? Cheers, Kamil. On Friday, December 12, 2014 8:37:03 PM UTC+1, Costin Leau wrote: > > Hi, > > This error is typically tied to a classpath issue - make sure you have > only one elasticsearch-hadoop jar

Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-15 Thread CAI Longqi
Thanks, I managed to fix this issue by `export HADOOP_CLASSPATH=/path/to/my/elasticsearch-hadoop-2.0.2.jar`. Don't know why, but it works. I have already configured that using -libjars; I don't know why hadoop needs me to specify that again using that global variable. Another questi

Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread Costin Leau
http://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html#Example%3A+WordCount+v2.0 [2] http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/ On 12/14/14 3:32 PM, CAI Longqi wrote: Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the pr

[hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread CAI Longqi
Hello, I’m using elasticsearch-hadoop-2.0.2.jar, and meet the problem: Exception in thread "main" java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat at com.clqb.app.ElasticSearch.run(ElasticSearch.java:46) at org.apache.hadoop.util.ToolRunner.run(ToolRunn

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-12 Thread Costin Leau
Hi, This error is typically tied to a classpath issue - make sure you have only one elasticsearch-hadoop jar version in your classpath and on the Hadoop cluster. On 12/12/14 5:56 PM, Kamil Dziublinski wrote: Hi guys, I am trying to run a MR job that reads from HDFS and stores into

ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-12 Thread Kamil Dziublinski
Hi guys, I am trying to run a MR job that reads from HDFS and stores into ElasticSearch cluster. I am getting following error: Error: org.elasticsearch.hadoop.serialization.EsHadoopSerializationException: Cannot handle type [class org.apache.hadoop.io.MapWritable], instance [org.apache.hadoop

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
I have actually went through the API and I get the big picture now. I appreciate your help. Thanks! :) Le mercredi 3 décembre 2014 16:50:33 UTC+1, Costin Leau a écrit : > > I'm not sure what you are expecting since the results are as expected. See > the javadocs [1] for ArrayWritable. > toStri

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
I'm not sure what you are expecting since the results are as expected. See the javadocs [1] for ArrayWritable. toStrings() returns a String[] while get() a Writable[]. In other words you get an array of Strings and Writables and neither implements toString natively. To get the actual content yo

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
Ok actually Strings() returns the String[] array that has the contents and that solved my problem. Thanks again Costin! :) Le mercredi 3 décembre 2014 16:29:38 UTC+1, Elias Abou Haydar a écrit : > > I've tried to call toStrings() > I got this : > title : [Ljava.lang.String;@35112ff7 > > wi

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
I've tried to call toStrings() I got this : title : [Ljava.lang.String;@35112ff7 with the get(), i'm getting this: title : [Lorg.apache.hadoop.io.Writable;@666f5678 Le mercredi 3 décembre 2014 16:21:40 UTC+1, Costin Leau a écrit : > > You're getting back an array ([Samsung EF-C])

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
I've already tried that. It doesn't work... :/ Le mercredi 3 décembre 2014 16:21:40 UTC+1, Costin Leau a écrit : > > You're getting back an array ([Samsung EF-C]) - a Writable wrapper > around org.hadoop.io.ArrayWritable (to actually > allow it to be serialized). > So call toStrings() or ge

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
You're getting back an array ([Samsung EF-C]) - a Writable wrapper around org.hadoop.io.ArrayWritable (to actually allow it to be serialized). So call toStrings() or get() to get its content. On 12/3/14 3:30 PM, Elias Abou Haydar wrote: I've tried that. It returns a org.elasticsearch.hadoo

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
I've tried that. It returns a org.elasticsearch.hadoop.mr.WritableArrayWritable object. How can I get my field content out of that? Le mercredi 3 décembre 2014 14:10:24 UTC+1, Costin Leau a écrit : > > That's because your MapWritables doesn't use Strings as keys but rather > org.apache.hadoop.i

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
That's because your MapWritables doesn't use Strings as keys but rather org.apache.hadoop.io.Text In other words, you can see the data is in the map however you cannot retrieve it since you are using the wrong key (try inspecting the map object types). Try values.get(new Text("title")) On 12

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Elias Abou Haydar
That works fine for me thank you! But I'd also wanted to be able to build and object from the MapWritable values in the mapper. Consider values as MapWritable object. When I try to get a specified value from values.get("title") per example the returning value is null but the field exists in th

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-02 Thread Costin Leau
Simply specify the fields that you are interested in, in the query and you are good to go. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html On 12/2/14 12:52 PM, Elias Abou Haydar wrote: I'm trying to write a mapreduce job where I can query e

elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-02 Thread Elias Abou Haydar
I'm trying to write a mapreduce job where I can query elasticsearch so it can return to me specific fields. Is there any way to do that? My mapping contains about 30 fields and I will need just 4 of them ("_id","title","description","category") The way I was doing it is to process each answer t

Re: Elasticsearch Hadoop WRITE operation not using Reducer

2014-11-03 Thread Sarath
Hi Telax, Even though i don't set number of reduce tasks, Hadoop takes care of starting reducers if needed. In my case 1 reducer is running. Issue here custom reducer defined as part of Job configuration is not invoked by Hadoop Thanks, Sarath > > -- You received this message because you are

Re: Elasticsearch Hadoop WRITE operation not using Reducer

2014-11-03 Thread Telax
Hi, Doesn't look like you've set the number of reduce tasks in your job config. i.e. 'job.setNumReduceTasks(10);' -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving emails from it, send an emai

Re: Elasticsearch Hadoop WRITE operation not using Reducer

2014-11-03 Thread Sarath
Hi Costin, Thanks for the response. You are right. Map/Reduce integration relies on the Input/OutputFormat. Even after removing EsOutputFormat my custom reducer is not invoked. Should be some issue with hadoop configuration. Thanks, Sarath -- You received this message because you are subscr

Re: Elasticsearch Hadoop WRITE operation not using Reducer

2014-11-03 Thread Costin Leau
types and the job fails silently after invoking context.write method. In fact, you can just remove the EsOutputFormat and see whether it makes any difference (it shouldn't). On 11/1/14 7:20 AM, Sarath wrote: Hi All, Will Elasticsearch Hadoop WRITE operation doesn't use our cust

elasticsearch-hadoop

2014-11-03 Thread Pavan Kumar
hi, i am planning to start using elasticsearch for tweet analytics on company product. is it feasible "if i index and store data directly into the hdfs via elasticsearch, instead of storing raw data and fetching that raw data back to elasticsearch"?? and can we communicate and do some comp

Elasticsearch Hadoop WRITE operation not using Reducer

2014-10-31 Thread Sarath
Hi All, Will Elasticsearch Hadoop WRITE operation doesn't use our custom reducer? I tried with following code and observed that our customer reducer is not invoked. job.setOutputFormatClass(EsOutputFormat.class); job.setMapOutputKeyClass(NullWritable.

elasticsearch fields and elasticsearch-hadoop

2014-10-17 Thread Akil Harris
Is there an easy way to rename the fields on an index? I have a field named "searchTerm" that I use for some event tracking. But the elasticsearch-hadoop library assumes all elasticsearch fields are lowercase and is converting all field names to lower case. When hadoop tries to re

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-10-14 Thread Jinyuan Zhou
Thanks, Jinyuan (Jack) Zhou On Tue, Oct 14, 2014 at 1:36 PM, Costin Leau wrote: > You need the appropriate hadoop jar on your classpath otherwise > es-hadoop repository plugin cannot connect to HDFS. In the repo, > you'll find two versions with vanilla hadoop1 and hadoop2 - however if > you are

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-10-14 Thread Costin Leau
You need the appropriate hadoop jar on your classpath otherwise es-hadoop repository plugin cannot connect to HDFS. In the repo, you'll find two versions with vanilla hadoop1 and hadoop2 - however if you are using a certain distro, for best compatibility you should use that distro client jars. Plea

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-10-14 Thread Jinyuan Zhou
My ES cluster nodes and Hadoop nodes are not collocated. Light version does not works for me without putting enough correct versions of hadoop related jars. Right now I don't want to create my jar as Brent did and I don't want to install hadoop or copy jars on the es nodes either . Right now I

[ANN] Elasticsearch Hadoop 2.0.2 and 2.1 Beta 2 with Storm and Spark SQL support

2014-10-08 Thread Costin Leau
Hi everyone, Elasticsearch Hadoop 2.0.2 and 2.1 Beta2, featuring Apache Storm integration and Apache Spark SQL, have been released. You can read all about them here [1]. Feedback is welcome! Cheers, http://www.elasticsearch.org/blog/elasticsearch-hadoop-2-0-2-and-2-1-beta2/ -- Costin -- You

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Zach Cox
whether there's load building up or if anything > unusual happens. > > Since it's unclear what the issue might be, take baby steps [1] and start > with minimal load (smaller bulk size + less tasks) see whether there are > any issues and keep on going. > > [1] ht

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Costin Leau
thing unusual happens. Since it's unclear what the issue might be, take baby steps [1] and start with minimal load (smaller bulk size + less tasks) see whether there are any issues and keep on going. [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/troubleshooting.html

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Zach Cox
1m, it means things are not > going well at all. > > On 10/3/14 6:09 PM, Zach Cox wrote: > >> Is there anything else we could try here to debug elasticsearch-hadoop >> being unable to write to Elasticsearch? We're >> still seeing the same number of these fails durin

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Costin Leau
e we could try here to debug elasticsearch-hadoop being unable to write to Elasticsearch? We're still seeing the same number of these fails during the nightly batch runs even after switching to 2.0.2.BUILD-SNAPSHOT, and I don't see any additional lines from org.elasticsearch.hadoop.__rest

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Zach Cox
Is there anything else we could try here to debug elasticsearch-hadoop being unable to write to Elasticsearch? We're still seeing the same number of these fails during the nightly batch runs even after switching to 2.0.2.BUILD-SNAPSHOT, and I don't see any additional

Re: elasticsearch-hadoop sporadic timeouts

2014-10-01 Thread Zach Cox
Hi Costin - by "bulk size/entries number" are you referring to the es.batch.size.bytes and es.batch.size.entries config values described here? http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/configuration.html#configuration-serialization It looks like the only ela

Re: elasticsearch-hadoop sporadic timeouts

2014-10-01 Thread Zach Cox
ulk size/entries number? > > > On 10/1/14 2:15 PM, Zach Cox wrote: >> >> Hi Costin - we updated our dependencies to use elasticsearch-hadoop >> 2.0.2.BUILD-SNAPSHOT, but that didn't seem to change >> anything. We're still seeing the same task failures whil

Re: elasticsearch-hadoop sporadic timeouts

2014-10-01 Thread Costin Leau
hat's your bulk size/entries number? On 10/1/14 2:15 PM, Zach Cox wrote: Hi Costin - we updated our dependencies to use elasticsearch-hadoop 2.0.2.BUILD-SNAPSHOT, but that didn't seem to change anything. We're still seeing the same task failures while trying to write to Elastic

Re: elasticsearch-hadoop sporadic timeouts

2014-10-01 Thread Zach Cox
Hi Costin - we updated our dependencies to use elasticsearch-hadoop 2.0.2.BUILD-SNAPSHOT, but that didn't seem to change anything. We're still seeing the same task failures while trying to write to Elasticsearch. The only difference in the logs is that now I don&

Re: elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Costin Leau
Can you please try the 2.0.2.BUILD-SNAPSHOT? I think you might be running into issue #256 which was fixed some time ago and will be part of the upcoming 2.0.2, 2.1 Beta2. Cheers, On 9/30/14 6:43 PM, Zach Cox wrote: Hi Costin: elasticsearch-hadoop 2.0.0 cascading 2.5.4 scalding 0.10.0 Thanks

Re: elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Zach Cox
Hi Costin: elasticsearch-hadoop 2.0.0 cascading 2.5.4 scalding 0.10.0 Thanks, Zach On Tuesday, September 30, 2014 10:25:10 AM UTC-5, Costin Leau wrote: > > What version of es-hadoop/es/cascading are you using? > > On 9/30/14 6:16 PM, Zach Cox wrote: > > Hi - we're havi

Re: elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Costin Leau
ke this: https://gist.githubusercontent.com/zcox/3d6cf4329d49ca03271b/raw/57c46a5e4c9ea04d5c4209414d6f847492d16c0d/gistfile1.txt Seems like elasticsearch-hadoop tries talking to an ES node, it times out, tries the next one, it times out, etc until all nodes in the cluster are exhausted and then it gives up. As far as I c

elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Zach Cox
d/gistfile1.txt Seems like elasticsearch-hadoop tries talking to an ES node, it times out, tries the next one, it times out, etc until all nodes in the cluster are exhausted and then it gives up. As far as I can tell, the ES cluster is healthy while this is occurring. May map tasks are succeeding -

Re: elasticsearch hadoop, dynamically decide index name too (not just type name), is it possible?

2014-09-11 Thread Jinyuan Zhou
or a mapreduce job >> to EsOutuputFormat. Below is a part regarding >> dynamically decide the document type. >> (http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/ >> current/mapreduce.html). My question is is it possible to >> parameterize the "my-collec

Re: elasticsearch hadoop, dynamically decide index name too (not just type name), is it possible?

2014-09-11 Thread Costin Leau
://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html). My question is is it possible to parameterize the "my-collection"? Thanks, Jack writing to dynamic/multi-resourcesedit <https://github.com/elasticsearch/elasticsearch-hadoop/edit/2.0/docs/src/reference/

elasticsearch hadoop, dynamically decide index name too (not just type name), is it possible?

2014-09-11 Thread Jinyuan Zhou
I saw hadoop documentation regarding setting up index for a mapreduce job to EsOutuputFormat. Below is a part regarding dynamically decide the document type. (http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html). My question is is it possible to parameterize the

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-09-01 Thread Mateusz Kaczynski
(Much delayed) thank you Costin. Indeed, on Ubuntu, changing ES_CLASSPATH to include hadoop and hadoop/lib directories in /etc/default/elasticsearch (and exporting it in /etc/init.d/elasticsearch) and installing light plugin version did work. -- You received this message because you are subscr

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-09-01 Thread Mateusz Kaczynski
(Much delayed) thank you Costin. Indeed, on Ubuntu, changing ES_CLASSPATH to include hadoop and hadoop/lib directories in /etc/default/elasticsearch (and exporting it in /etc/init.d/elasticsearch) and installing light plugin version did work. On Thursday, 14 August 2014 20:59:39 UTC, Costin Le

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-08-14 Thread Costin Leau
Hi, The hdfs repository relies on vanilla Hadoop 2.2 since that's the official stable version of Yarn. Since you are using a different Hadoop version, use the 'light' version as explained in the docs - this contains only the repository-hdfs, without the Hadoop dependency (since you already hav

[ANN] Elasticsearch Hadoop 2.0.1 and 2.1.Beta with Spark support

2014-08-14 Thread Costin Leau
Hi everyone, Elasticsearch Hadoop 2.0.1 and 2.1 Beta1, featuring native Apache Spark integration, have been released. You can read all about them here [1]. Feedback is welcome! Cheers, [1] http://www.elasticsearch.org/blog/es-hadoop-2-0-1-and-2-1-beta1/ -- Costin -- You received this

Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-08-14 Thread Mateusz Kaczynski
I'm trying to get es-hadoop repository plugin working on our hadoop 2.0.0-cdh4.6.0 distribution and it seems like I'm quite lost. I installed plugin's -hadoop2 version on the machines on our hadoop cluster (which also run our stage elasticsearch nodes). When attempting to create a repository o

[Connection refused] in elasticsearch-hadoop

2014-07-28 Thread M_20
may not be found. See JobConf(Class) or JobConf#setJar(String). 14/07/28 11:58:23 WARN mr.EsOutputFormat: Speculative execution enabled for reducer - consider disabling it to prevent data corruption 14/07/28 11:58:23 INFO util.Version: Elasticsearch Hadoop v2.0.0 [eb4487f75f] 14/07/28 11:58:23 INFO mr.E

Re: elasticsearch-hadoop: bulk indexing JSON

2014-07-25 Thread Costin Leau
Have you looked at the docs? http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html On Fri, Jul 25, 2014 at 11:04 PM, M_20 wrote: > Hi Guys, > > Could you please give me a java sample code of mapper and reducer in > Elasticsearch-hadoop? > I&

Re: elasticsearch-hadoop: bulk indexing JSON

2014-07-25 Thread M_20
Hi Guys, Could you please give me a java sample code of mapper and reducer in Elasticsearch-hadoop? I'd appreciate it. Thanks -- You received this message because you are subscribed to the Google Groups "elasticsearch" group. To unsubscribe from this group and stop receiving

Re: ElasticSearch Hadoop

2014-07-17 Thread Costin Leau
On 7/17/14 8:38 PM, James Cook wrote: I've read through much of the documentation for es-hadoop, but I might be coming away with some misunderstandings. The setup docs for elasticsearch for apache hadoop (es-hadoop) uses the word /interact/ which is a bit vague. Elasticsearch for Apache H

ElasticSearch Hadoop

2014-07-17 Thread James Cook
I've read through much of the documentation for es-hadoop, but I might be coming away with some misunderstandings. The setup docs for elasticsearch for apache hadoop (es-hadoop) uses the word *interact* which is a bit vague. Elasticsearch for Apache Hadoop is an open-source, stand-alone, > sel

Re: ElasticSearch+Hadoop+Spark

2014-07-15 Thread Costin Leau
Hi, Issue #231 which I believe you have raised, has been fixed in 2.x - can you please try the latest 2.0.1.BUILD-SNAPSHOT and report back? Thanks! On 7/15/14 9:32 AM, János Háber wrote: Hi guys, I writing a spark application where I want to use ES with Hadoop. I have a lot of document in

ElasticSearch+Hadoop+Spark

2014-07-14 Thread János Háber
Hi guys, I writing a spark application where I want to use ES with Hadoop. I have a lot of document in ES now I want to aggregate but I can't. My document's have different fields which means some have "twitter" field, with values, some have "facebook" etc When I try to read the data from ES I g

Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-13 Thread James Campbell
u can create and >> update documents without having to include the id in the source document, >> so I think it would make sense to be able to do that with >> elasticsearch-hadoop also. >> >> On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: >> &

Re: Elasticsearch-Hadoop: EsOutputFormat and the 'date' type

2014-07-11 Thread Andrew Nixon
Hi Costin, thank you for your reply. My issue actually came down to the ordering of my matches. I had a 'match:*' as the first dynamic template which disabled norms. Although this template didn't explicitly define a type for any matched field it would automatically set the 'date' field to a string

Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Costin Leau
he id in the source document, > so I think it would make sense to be able to do that with > elasticsearch-hadoop also. > > On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: > >> You need to specify the id of the document you want to update somehow. >> Since in

Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote: > > You need to specify the id

Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-11 Thread Brian Thomas
gets included in the source document. In elasticsearch, you can create and update documents without having to include the id in the source document, so I think it would make sense to be able to do that with elasticsearch-hadoop also. On Thursday, July 10, 2014 5:49:18 PM UTC-4, Costin Leau wrote

Re: Elasticsearch-Hadoop: EsOutputFormat and the 'date' type

2014-07-10 Thread Costin Leau
Make sure the template does match. This might not be always obvious however it's easy to test out. First, check your template and after defining the template, send a request with a sample payload to see whether the doc gets properly created. A common mistake is defining the template after the ind

Re: Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-10 Thread Costin Leau
ternatives that you thought of? Cheers, On 7/7/14 10:48 PM, Brian Thomas wrote: I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source doc

Setting id of document with elasticsearch-hadoop that is not in source document

2014-07-07 Thread Brian Thomas
I am trying to update an elasticsearch index using elasticsearch-hadoop. I am aware of the *es.mapping.id* configuration where you can specify that field in the document to use as an id, but in my case the source document does not have the id (I used elasticsearch's autogenerated id

Re: Use arrays as update parameters with elasticsearch-hadoop-mr

2014-07-07 Thread James Campbell
, Jul 3, 2014 at 8:58 PM, James Campbell > wrote: > >> I would like to update an existing document that has an array from >> elasticsearch hadoop. >> >> I notice that I can do that from curl directly, for example: >> >> PUT arraydemo/temp/1 >> { >&

  1   2   >