Re: Cannot Read Elasticsearch date type with format basic_date_time in Spark

2015-06-01 Thread Costin Leau
Hi, This has been fixed in master [1] - please try the latest snapshot. [1] https://github.com/elastic/elasticsearch-hadoop/issues/458 PS - We're moving to https://discuss.elastic.co/, please join us there for any future discussions! On 6/1/15 9:56 PM, Nicolas Phung wrote: Hi, It was with

Re: [Hadoop] Slow performance of Elasticsearch-Hadoop + Spark SQL

2015-06-01 Thread Costin Leau
The best way is to use a profiler to understand where time is spent. Spark while it is significantly faster than Hadoop, cannot compete with CULR. The latter is a simple REST connection - the former triggers a JVM, Scala, akka, Spark, which triggers es-hadoop which does the parallel call against

Re: Remote access about Spark and Elasticsearch

2015-05-16 Thread Costin Leau
Hi, As you pointed out, the issue is caused by using the AWS private IP vs the public one. the connector queries the nodes directly and will use the IP they advertise - in AWS, typically this is the private IP. As such, after the initial discovery (which is done using the public IP), the

Re: Cannot Read Elasticsearch date type with format basic_date_time in Spark

2015-05-14 Thread Costin Leau
Looks like a bug. Can you please raise an issue? What is the Spark and Java version you are using? Thanks, On 5/11/15 4:45 PM, Nicolas Phung wrote: Hello, I'm trying to build a RDD from ElasticSearch data with elasticsearch-spark 2.1.0.Beta4. I have the following field mapping : date:

Re: Load data into HDFS using ES-Spark

2015-05-14 Thread Costin Leau
Hi, I replied over at Discuss [1]. Thanks, [1] https://discuss.elastic.co/t/load-data-into-hdfs-using-es-spark/297/2 On 5/6/15 5:22 PM, Lucas Weissert wrote: Hello, I am reading data from elasticsearch using spark (ES-Spark). After i get the data using sc.esRDD(.../...) I want to store

Re: [Hadoop] - Difference between task creation for a write and read-update-write operation in ES

2015-05-14 Thread Costin Leau
Hi, First it would help to know what version of Elaticsearch, Elasticsearch Hadoop, JVM and Spark are you using. On 5/6/15 1:41 PM, piyush goyal wrote: Hi Costin, I saw a different behavior of creating task for write to ES operation while working on my project. The difference is as follows:

Re: [hadoop] newbie question

2015-05-03 Thread Costin Leau
To add to Mark's answer: 1. Hadoop means a lot of things so typically, if you are not familiar with it or not a user, the answer tends to be no 2. No. Data is indexed from Hadoop to Elasticsearch or vice-versa. see elastic.co/hadoop and the various presentations on this topic. Again, es-hadoop is

Re: es spark ignoring fields query parameter

2015-04-30 Thread Costin Leau
To specify the query, use es.query param or the second string param in esRDD. The doc provide detailed information. In a future version, we'll throw an exception if a query is specified in the index. On Apr 30, 2015 5:03 PM, Israel Klein israel.kl...@gmail.com wrote: Hi, I am using

Re: Help on using Elastic Search Api from SPark

2015-04-30 Thread Costin Leau
It looks like it is related to an intellij maven import issue as it is not downloading any new dependency I am adding. Pardon? I don't understand what you are saying. Are we talking about the same project? Have you looked at the docs I mentioned? In particular [1] the install page? I ask since:

Re: Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-29 Thread Costin Leau
skrev Costin Leau: Hi, It seems you are running into a classpath problem. The class mentioned in the exception (org/elasticsearch/hadoop/serialization/dto/Node) is part of the elasticsearch-hadoop-hive-XXX. jar - you can verify this yourself. The fact that it is not found at runtime suggests

Re: Unable to get elasticsearch-hadoop working with Hive/Beeline

2015-04-28 Thread Costin Leau
Hi, It seems you are running into a classpath problem. The class mentioned in the exception (org/elasticsearch/hadoop/serialization/dto/Node) is part of the elasticsearch-hadoop-hive-XXX. jar - you can verify this yourself. The fact that it is not found at runtime suggests that the a different

Re: Cannot read from Elasticsearch using Spark SQL

2015-04-20 Thread Costin Leau
Beta3 work with Spark SQL 1.0 and 1.1 Spark SQL 1.2 was released after that and broke binary backwards compatibility however his has been fixed in master/dev version [1] Note that Spark SQL 1.3 was released as well and again, broke backwards compatibility this time significant hence why there

Re: org.elasticsearch.hadoop.rest.EsHadoopNoNodesLeftException: Connection error (check network and/or proxy settings)- all nodes failed

2015-04-17 Thread Costin Leau
The error should be self explanatory - Elasticsearch cluster is not accessible. Make sure that 1.1.1.1 is accessible from the Spark cluster and that the REST interface is enabled and exposed. On Fri, Apr 17, 2015 at 11:50 AM, guoyiqi...@gmail.com wrote: my spark job running this is a error

Re: [hadoop] Not analyzed field

2015-04-17 Thread Costin Leau
By creating the index mapping before hand in Elasticsearch. This is also explained in the docs [1] [1] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/mapping.html#explicit-mapping On Fri, Apr 17, 2015 at 4:22 PM, jean.freg...@gmail.com wrote: Hi, Simple question : How do i tell

Re: about elasticsearch-hadoop error

2015-04-16 Thread Costin Leau
Based on your cryptic message I would guess the issue is likely that the jar you are building is incorrect as it's manifest is invalid. Spark most likely is signed and thus extra content breaks this. See http://www.elastic.co/guide/en/elasticsearch/hadoop/master/troubleshooting.html#help On

Re: [hadoop] Reading severals index with Hive

2015-04-16 Thread Costin Leau
mercredi 15 avril 2015 11:02:43 UTC+2, Costin Leau a écrit : First, what version of es-hadoop connector are you using? Your syntax is correct (documentation explaining it here [1]) however it would help to use `es.resource.write` instead of just `es.resource`. Shouldn't really matter

Re: [hadoop] Reading severals index with Hive

2015-04-15 Thread Costin Leau
First, what version of es-hadoop connector are you using? Your syntax is correct (documentation explaining it here [1]) however it would help to use `es.resource.write` instead of just `es.resource`. Shouldn't really matter but please try it out. Also memory should be an issue. Assuming you are

Re: Trouble with Timestamp format 'dd-MM-yyyy HH:mm:ss

2015-04-15 Thread Costin Leau
Hi, There are several components at play here and it's worth understanding which one does what. 1. Elasticsearch offers quite a number of options for dealing with Date objects as explained in the docs [1]. Note this options need to be defined before-hand on your index otherwise (or through an

Re: [Hadoop]Writing to Elastic thanks to Hive

2015-04-09 Thread Costin Leau
That's due to a bug in Hive (I assume you are running version 0.13). This has been fixed in the dev version which is available from maven [1] [1] http://www.elastic.co/guide/en/elasticsearch/hadoop/master/install.html#download-dev On 4/9/15 6:33 PM, valentin.dupont...@gmail.com wrote: Hi all,

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-08 Thread Costin Leau
= sc.esRDD(logs/app, q5); What I get from the rdd are tuples (docID, Map[of the field=value]). Should I also expect to find facets ? If so, how do I get them ? Il giorno mercoledì 1 aprile 2015 12:02:20 UTC+2, Costin Leau ha scritto: The short answer is that the connector relies

Re: Try to make es-hadoop run

2015-04-07 Thread Costin Leau
trouble ? Le mercredi 1 avril 2015 11:17:37 UTC+2, Costin Leau a écrit : See the configuration section of the page, in particular the es.version property. It defaults to 1.4.0 but you can change it to 1.5.0 or any other version. On 4/1/15 12:10 PM, stéphane Verdy wrote: Hi

Re: Try to make es-hadoop run

2015-04-07 Thread Costin Leau
... What can i do to solve my trouble ? Le mercredi 1 avril 2015 11:17:37 UTC+2, Costin Leau a écrit : See the configuration section of the page, in particular the es.version property. It defaults to 1.4.0 but you can change it to 1.5.0 or any other version. On 4/1/15 12:10 PM

Re: ES ignores queries through Spark

2015-04-07 Thread Costin Leau
You haven't specified the version of elasticsearch-spark. Either way the issue is likely caused by the fact that each query is executed locally on each target shard. In other words, your limit of 1000 entries is executed on each shard so at a maximum you will have 1000 x number of shards. By

Re: Fields containing only whitespace are assigned null values by EsInputFormat in a JSON document

2015-04-07 Thread Costin Leau
Looks like a bug likely caused by the conversion to Hadoop Writable. Can you please raise an issue at Github (under es-hadoop)? Thanks, On 4/7/15 10:11 PM, Suchindra Agarwal wrote: I am using EsInputFormat on EC2 to query data from Elasticsearch. Example document: | { feature_key:{

Re: Writing data to elasticsearch from hive table

2015-04-06 Thread Costin Leau
Hi, Unfortunately there's no enough information to post a diagnostic. See this [1] section of the docs for more information on what's needed. In particular the Hive logs posted as a gist since the Hive tends to wrap the underlying issue with other exceptions, 'swallowing' the root cause.

Re: Understanding Elasticsearch-Hadoop

2015-04-04 Thread Costin Leau
Hi, Hadoop means a lot of things as it has a lot of components. I'm sorry to hear the resources you read don't give you enough answers. The 'definition' of Elasticsearch Hadoop is given in the documentation preface [1] which I quote below: Elasticsearch for Apache Hadoop is an ‘umbrella’

Re: Problem with ES- Shield

2015-04-03 Thread Costin Leau
, BEN SALEM Omar omar.bensa...@esprit.tn mailto:omar.bensa...@esprit.tn wrote: Hey Cosin, And when It comes to the marvel? Can you see why It's not seeing my indices? Thanks, On Thu, Apr 2, 2015 at 9:46 PM, Costin Leau costin.l...@gmail.com mailto:costin.l...@gmail.com wrote

Re: Problem with ES- Shield

2015-04-02 Thread Costin Leau
The configuration example is correct - one can pass any configuration property there and it will be picked up as explained here [1] Additionally, to verify one can turn on logging [2] to see the connection is properly authenticated [1]

Re: How to make unstructured data available in Hadoop for analysis with Elasticsearch?

2015-04-01 Thread Costin Leau
is , I have created some Dashboards with Kibana; now what I want to do, is giving my dashboards to the client I'm working with/for to to do his analytics without letting him see what's going on the ES cluster. What's the easiest way to do this? Thanks, ᐧ On Wed, Apr 1, 2015 at 11:20 AM, Costin Leau

Re: How to make unstructured data available in Hadoop for analysis with Elasticsearch?

2015-04-01 Thread Costin Leau
If your data is in Hadoop, you would index it in Elasticsearch (through the es-hadoop connector) and then run your queries against it (either from Hadoop through the same connector) or through other tools (like Kibana) outside Hadoop. There are plenty of blog posts, documentations,

Re: Try to make es-hadoop run

2015-04-01 Thread Costin Leau
step by step _ online's documentation which you have given me the link! Have you the right link for downloading the latest version of Elastic on YARN? I can't find it by myself sorry Regards, Stéphane 2015-03-31 17:04 GMT+02:00 Costin Leau costin.l...@gmail.com mailto:costin.l...@gmail.com

Re: Elastichsearch CDH Hadoop Integration

2015-04-01 Thread Costin Leau
/elastic/elasticsearch/issues/9072 On 4/1/15 12:23 PM, Ravi sai kumar wrote: Thanks Costin Leau. My data will be in terabytes local mode doesnt suits my requirement may be. If i set the hadoop path, Are there any changes we have to do for gateway.type? existing property is gateway.type: local

Re: Elastichsearch CDH Hadoop Integration

2015-04-01 Thread Costin Leau
For best results and performance, point Elasticsearch to your local storage. It's just like any other service (MySQL, Postgres, etc..) On 4/1/15 11:48 AM, Ravi sai kumar wrote: I have the problem with elastichsearch integrating with hadoop. I am new to elasticsearch. I have installed on CDH

Re: How to get aggregations working in Elasticsearch Spark adapter ?

2015-04-01 Thread Costin Leau
The short answer is that the connector relies on scan/scroll search for its core functionality. And with aggs it needs to switch the way it queries the cluster to a count search. This is the last major feature that needs to be addressed before the 2.1 release. There's also an issue for it raised

Re: Settings Exception with repository-hdfs plugin

2015-04-01 Thread Costin Leau
You probably have a syntax error (make sure you are using spaces and NOT tabs in your yml file). I just did a quick check and it worked with the latest ES. As inspiration you can try the configuration used within the test suite (checked every night) available here [1]. Additionally you can

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Costin Leau
See this section of the docs: http://www.elastic.co/guide/en/shield/current/hadoop.html On 4/1/15 9:35 PM, Michael Young wrote: I have Elasticsearch 1.4.4 and Shield 1.0.2 configured in my environment. I'm able to successfully connect to my cluster without issues using Active Directory as my

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Costin Leau
.*, but not es.net.http.* that I can see. Thank you again! -- Michael -- Michael On Wed, Apr 1, 2015 at 4:44 PM, Costin Leau costin.l...@gmail.com mailto:costin.l...@gmail.com wrote: Actually they are also part of the es-hadoop documentation under the configuration

Re: [Hadoop] Specifying username/password information for Shield-configured Elasticsearch

2015-04-01 Thread Costin Leau
#_basic_authentication On 4/1/15 11:16 PM, Michael Young wrote: Thank you! I was looking in the ES-Hadoop documentation and didn't see these settings. I didn't realize they were in the Shield documentation. I'll give this a shot. -- Michael On Wed, Apr 1, 2015 at 2:44 PM, Costin Leau costin.l

Re: Try to make es-hadoop run

2015-03-31 Thread Costin Leau
Debugging Hadoop is ... tricky. Most likely the parameter you are passing (es.version) is incorrect - there's no such Elasticsearch version. 2.1.0.Beta3 is the connector, es-hadoop version. es.version indicates the version of Elasticsearch itself. My advice is to first start with basic steps

Re: How to delete documents from Elasticsearch in Spark?

2015-03-30 Thread Costin Leau
If the issue is still open, then the feature hasn't been addressed. When an issue is closed, you'll see the reason why that is - typically with links to the commit and also the doc updates (in case of a new feature). The docs are built roughly every hour so any update will be seen on the website

Re: java.lang.UnsupportedClassVersionError: org/elasticsearch/ElasticsearchException : Unsupported major.minor version 51.0

2015-03-27 Thread Costin Leau
Double check you are using JDK 1.7 or higher for your Elastic/JBoss combination. It looks like you are using JDK 1.6 - the exception indicates that the Java runtime doesn't recognize the bytecode version (51.0 stands for Java 7 class files [1]). The class files major versions are: Java 8 =

Re: [Spark] SchemaRdd saveToEs produces Bad JSON errors

2015-03-24 Thread Costin Leau
Hi, It appears the problem lies with what type of document you are trying to save to ES - which is basically invalid. For some reason the table/rdd schema gets lost and the content is serialized without it: |[38,38]| I'm not sure why this happens hence why I raised an issue [1] Thanks, [1]

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Costin Leau
es-hadoop doesn't depend on akka, only on Spark. The scala version that es-hadoop is compiled against matches the one used by the Spark version compiled against for each release - typically this shouldn't pose a problem. Unfortunately, despite the minor version increments, some of the Spark APIs

Re: spark version, elasticsearch-hadoop version, akka version sync up

2015-03-17 Thread Costin Leau
with spark core 1.1, 1.2 and 1.3 -- as long as I don't use Spark SQL when using 1.2 and 1.3 Costin - I am amazed by your ability to keep all this straight - my head would explode dealing with all the dependencies in flux. Kudos to you. On Tuesday, March 17, 2015 at 2:12:06 PM UTC-7, Costin

Re: Kibana with Hadoop directly?

2015-03-12 Thread Costin Leau
On the bright side, you can use es-hadoop connector [1] to easily get data from Hadoop/HDFS to Elasticsearch and back whatever your Hadoop stack (Map/Reduce, Cascading, Pig, Hive, Spark, Storm). [1] https://www.elastic.co/products/hadoop On Fri, Mar 13, 2015 at 3:15 AM, aa...@definemg.com wrote:

Re: elasticsearch-hadoop-hive exception when writing arraymapstring,string column

2015-03-12 Thread Costin Leau
The exception occurs because you are trying to extract a field (the script parameters) from a complex type (array) and not a primitive. The issue with that (and why it's currently not supported) is because the internal structure of the complex type can get quite complex and its serialized, JSON

Re: Hive to elasticsearch Parsing exception.

2015-03-12 Thread Costin Leau
Likely the issue is caused by the fact that in your manual mapping, the NULL value is not actually mapped to null but actually to a string value. You should be able to get around it by converting NULL to a proper NULL value which es-hadoop can recognized; additionally you can 'translate' it to a

Re: [Hadoop][Spark] Exclude metadata fields from _source

2015-02-18 Thread Costin Leau
Hi Itay, Sorry I missed your email. I'm not clear from your post how your documents look like - can you post a gist somewhere with your JSON input that you are sending to Elasticsearch? Typically the metadata appear in the _source if they are declared that way. You should be able to go around

Re: eshadoop : why lowercasing field names ?

2015-02-15 Thread Costin Leau
Hi, Actually the documentation is incorrect. The problem lies mainly with Hive who treat field names as case insensitive. In Pig the behavior was inconsistent, depending on the version and the script used. Hence why using the mapping feature is recommended. However, the doc needs to be

Re: Hive query from elasticsearch

2015-02-10 Thread Costin Leau
You cannot use Hive SQL without Hive. In other words, with es-hadoop, you cannot just use arbitrary SQL on top of Elastic. What you can however interact with Elasticsearch from within a Hive environment, meaning you can execute Hive SQL on top of Hive which underneath communicates with

Re: Elastic search service not stopping on windows server 2012 x64

2015-02-05 Thread Costin Leau
On the thread you mentioned, the OP didn't follow up on whether the links (containing connection pools, jTDS driver or options for the MS driver) helped. Again, it's not about removing the JDBC driver but rather replacing it or adding some options to make it behave correctly. On 2/5/15 5:28

Re: Accessing ES in Hadoop

2015-02-05 Thread Costin Leau
look like. We could not find anything online about that. Thanks in advance. - Douglas On Tuesday, February 3, 2015 at 4:10:50 PM UTC-5, Costin Leau wrote: Hi, Whether ES is running on YARN, Linux, Windows, Docker or AWS doesn't matter to the clients as long as they have access

Re: Accessing ES in Hadoop

2015-02-03 Thread Costin Leau
Hi, Whether ES is running on YARN, Linux, Windows, Docker or AWS doesn't matter to the clients as long as they have access to the instance. In other words, logstash doesn't see any difference in ES if it's running on Linux vs YARN. However one has to take into account the difference in the

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-30 Thread Costin Leau
) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:19 79) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThre ad.java:107) On Thursday, 29 January 2015 23:00:57 UTC, Costin Leau wrote: Not sure if you've seen my previous message but please try out the master. On Fri, Jan 30

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-30 Thread Costin Leau
Works fine in master - see the comment added to your gist. On 1/30/15 12:59 AM, Neil Andrassy wrote: I get the same problem with the json string approach too. On Thursday, 29 January 2015 22:24:07 UTC, Neil Andrassy wrote: I'm using ES-Hadoop 1.2.0.Beta3 Spark variant with Scala 2.10.4

Re: ES with Hadoop

2015-01-29 Thread Costin Leau
I'm not sure whether you have one or multiple questions but it's perfectly fine to use ES for both storage and search. You can use HDFS as a snapshot/backup store to further improve the resilience of your system. Millions of documents is not an issue for ES On 1/29/15 4:29 PM, Manoj Singh

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Costin Leau
What es-hadoop/spark are you using? Can you post snippet/gist on how you are calling saveToEs and what the Es-spark configuration looks like (does the RDD contain JSON or rich objects, etc..)? There are multiple ways to specify the parentId and in master (dev build) this should work no problem.

Re: [HADOOP] [Spark] Problem with encoding of parentId containing backslash

2015-01-29 Thread Costin Leau
I suggest trying master (the dev build - see the docs for more information[1]). You should not have to use the JSON format. By the way, one addition in master is that you can use case classes instead of Maps and es-spark will know how to serialize them. That plus having the metadata separated from

Re: Where is the data stored? ElasticSearch YARN

2015-01-19 Thread Costin Leau
Installing a plugin or changing a configuration means restarting each node and since YARN is not persistent, one would have to handle this outside. es-yarn could potentially address that however at that point, it becomes more a puppet/chef feature which is outside the scope of the project. The

Re: elasticsearch-hadoop for spark, index documents from a RDD in different index by day: myindex-2014-01-01 for example

2015-01-19 Thread Costin Leau
Hi Julien, I'm unclear of what you are trying to achieve and what doesn't work. es-hadoop allows either a static index/type or a dynamic one [1] [2]. One can also use a 'formatter' so for example you can use a pattern like {@timestamp:-MM-dd} - meaning the field @timestamp will be used as

Re: Unable to write in ElasticSearch using Spark in java (throws java.lang.IncompatibleClassChangeError: Implementing class exception)

2015-01-16 Thread Costin Leau
Hi, Most likely you have some classes compiled against some old libraries - it could even be your jar. Spark relies on Java serialization so if your classes or library change, you need to make sure the update version is used through-out the entire chain. Oh, and by the way, you seem to be

Re: Elasticsearch and Hadoop on the same cluster

2015-01-14 Thread Costin Leau
Hi Charles, No need to repost - if you are looking for real-time replies, try the IRC. What exactly are you looking for? Elasticsearch and Hadoop are 'services' that can share the same hardware. Since ES does not depend on Hadoop or vice-versa, you can install each one as you typically do - the

Re: Failed stopping 'elasticsearch-service-x64' service

2015-01-06 Thread Costin Leau
, Garrett On Saturday, January 3, 2015 10:10:42 AM UTC-6, Costin Leau wrote: Do you see anything in the logs? Can you try removing and reinstalling the service? What's your OS/configuration? On 1/2/15 10:32 PM, Garrett Johnson wrote: By own it's own I mean service stop

Re: Failed stopping 'elasticsearch-service-x64' service

2015-01-05 Thread Costin Leau
, Costin Leau wrote: Do you see anything in the logs? Can you try removing and reinstalling the service? What's your OS/configuration? On 1/2/15 10:32 PM, Garrett Johnson wrote: By own it's own I mean service stop or using services.msc and clicking restart on the service. Both attempts

Re: Failed stopping 'elasticsearch-service-x64' service

2015-01-03 Thread Costin Leau
Do you see anything in the logs? Can you try removing and reinstalling the service? What's your OS/configuration? On 1/2/15 10:32 PM, Garrett Johnson wrote: By own it's own I mean service stop or using services.msc and clicking restart on the service. Both attempts get the same error. On

Re: Elasticsearch-Hadoop Data Locality

2014-12-31 Thread Costin Leau
For the record, what spark and es-hadoop version are you using? For each shard in your index, es-hadoop creates one Spark task which gets informed of the whereabouts of the underlying shard. So in your case, you would end up with 20 tasks/workers, one per shard, streaming data back to the

Re: EsBolt - Schema Definition

2014-12-29 Thread Costin Leau
EsBolt 'passes' the information to Elasticsearch. To define how the data is indexed, simply define your mapping in Elasticsearch directly. Without any mapping, Elasticsearch will try to automatically detect and map your data accordingly which might match your expectation or not. For example see

Re: [hadoop] ElasticsearchIllegalArgumentException

2014-12-29 Thread Costin Leau
It looks like you have a mapping problem in Elasticsearch. Typically this occurs when you try incompatible/multiple value types on the same key (for example setting a string to a number field or an object to a string field - this looks like the case, etc...). The field in question looks to be

Re: Elasticsearch Spark EsHadoopNoNodesLeftException in cluster Mode

2014-12-29 Thread Costin Leau
Check the node status and see whether it behaves normally or not while data is being loaded. If the load is too high and the node not properly configured it could keep on rejecting data or a GC might be triggered, causing es-hadoop to fail the job. On 12/23/14, Rahul Kumar

Re: Newbie question about Spark and Elasticsearch

2014-12-18 Thread Costin Leau
of help :) thanks. chris Le lundi 8 décembre 2014 10:19:12 UTC-5, Costin Leau a écrit : Hi, First off I recommend using the native integration (aka the Java/Scala APIs) instead of MapReduce. The latter works but the former is better performing and more flexible. ES works

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-16 Thread Costin Leau
Text as a key, but depending on type Text, LongWritable, BooleanWritable or DoubleWritable as value in that map. When I changed everything to be Text it started working. Is this intended behaviour? Cheers, Kamil. On Friday, December 12, 2014 8:37:03 PM UTC+1, Costin Leau wrote: Hi

Re: Where is the data stored? ElasticSearch YARN

2014-12-16 Thread Costin Leau
I recommend reading the project documentation [1]; there's a dedicated section that covers storage [2]. [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/index.html [2] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/ey-setup.html#_storage On 12/16/14

Re: [hadoop] java.lang.NoClassDefFoundError: org/elasticsearch/hadoop/mr/EsOutputFormat

2014-12-14 Thread Costin Leau
Hi, It looks like es-hadoop is not part of your classpath (hence the NCDFE). This might be either due to some misconfiguration of your classpath or due to the way the Configuration object is used. It looks like you are using it correctly though typically I use Job(Configuration) instead of

Re: ElasticSearch hadoop - .EsHadoopSerializationException

2014-12-12 Thread Costin Leau
Hi, This error is typically tied to a classpath issue - make sure you have only one elasticsearch-hadoop jar version in your classpath and on the Hadoop cluster. On 12/12/14 5:56 PM, Kamil Dziublinski wrote: Hi guys, I am trying to run a MR job that reads from HDFS and stores into

Re: Newbie question about Spark and Elasticsearch

2014-12-08 Thread Costin Leau
Hi, First off I recommend using the native integration (aka the Java/Scala APIs) instead of MapReduce. The latter works but the former is better performing and more flexible. ES works in a similar fashion to the HDFS store - the data doesn't go through the master rather, each task has its own

Re: Progress on Hive Push Down Filtering

2014-12-04 Thread Costin Leau
Hi, There are two aspects when dealing with large tables. 1. Projection The table mapping/definition is necessary as it indicates what information is needed - a small mapping excludes a lot of unnecessary data. 2. Push Down filtering Unfortunately there hasn't been much happening on this

Re: How to Upgrade from 1.1.1 to 1.2.2 in a windows enviroment (as windows service)

2014-12-04 Thread Costin Leau
-specific and necessitate re-installing it after every upgrade? On Thursday, July 17, 2014 3:51:31 AM UTC-4, Costin Leau wrote: Hi, Remove the old service (service remove) then install it again using the new path. Going forward you might want to look into using file-system links

Re: Progress on Hive Push Down Filtering

2014-12-04 Thread Costin Leau
/in the pipeline? Cheers James On Thursday, 4 December 2014 20:04:17 UTC+11, Costin Leau wrote: Hi, There are two aspects when dealing with large tables. 1. Projection The table mapping/definition is necessary as it indicates what information is needed - a small mapping excludes

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
the debug output : [title, categoryId] {title=[Samsung EF-CI950BCEGWW S View], categoryId=[3485]} title : null category : null Thanks again! Le mardi 2 décembre 2014 12:36:57 UTC+1, Costin Leau a écrit : Simply specify the fields that you are interested in, in the query and you are good to go

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
a org.elasticsearch.hadoop.mr.WritableArrayWritable object. How can I get my field content out of that? Le mercredi 3 décembre 2014 14:10:24 UTC+1, Costin Leau a écrit : That's because your MapWritables doesn't use Strings as keys but rather org.apache.hadoop.io.Text In other words, you can see the data

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-03 Thread Costin Leau
: [Ljava.lang.String;@35112ff7 with the get(), i'm getting this: title : [Lorg.apache.hadoop.io.Writable;@666f5678 Le mercredi 3 décembre 2014 16:21:40 UTC+1, Costin Leau a écrit : You're getting back an array ([Samsung EF-C]) - a Writable wrapper around org.hadoop.io.ArrayWritable

Re: elasticsearch-hadoop - getting specified fields from elasticsearch as an input to a mapreduce job.

2014-12-02 Thread Costin Leau
Simply specify the fields that you are interested in, in the query and you are good to go. [1] http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-fields.html On 12/2/14 12:52 PM, Elias Abou Haydar wrote: I'm trying to write a mapreduce job where I can query

Re: Can't integrate Elasticsearch with Hive

2014-11-28 Thread Costin Leau
, November 27, 2014 3:29:04 PM UTC-8, Costin Leau wrote: Hi, The issue is most likely caused by two different versions of es-hadoop within your classpath, probably es-hadoop 2.0.x (2.0.2) and 2.1.x (2.1.0.Beta3). If they are picked up by Hive or Hadoop it means

Re: Hive write data to elastic search

2014-11-28 Thread Costin Leau
On Thursday, November 27, 2014 2:59:49 PM UTC-8, Costin Leau wrote: Hi Atul, What does your Hive script looks like? What version of hive and es-hadoop are you using? Can you post them along with the stacktrace on Gist or Pastebin [1]? The exception message is pretty straight

Re: Hive write data to elastic search

2014-11-27 Thread Costin Leau
Hi Atul, What does your Hive script looks like? What version of hive and es-hadoop are you using? Can you post them along with the stacktrace on Gist or Pastebin [1]? The exception message is pretty straight-forward - either 'es.resource' is missing or the resource type is incorrectly

Re: Can't integrate Elasticsearch with Hive

2014-11-27 Thread Costin Leau
Hi, The issue is most likely caused by two different versions of es-hadoop within your classpath, probably es-hadoop 2.0.x (2.0.2) and 2.1.x (2.1.0.Beta3). If they are picked up by Hive or Hadoop it means the JVM will have two jars with classes under the same package name. This leads to weird

Re: Spark streaming elasticsearch dependencies

2014-11-19 Thread Costin Leau
Cascading and its dependencies are not available in Maven central but in their own repo (which es-hadoop cannot specify through its pom). However since you are not using cascasding but rather spark, I suggest you use the dedicated jar for that, namely: elasticsearch-spark_2.10

Re: Elasticsearch Hadoop WRITE operation not using Reducer

2014-11-03 Thread Costin Leau
Hi, es-hadoop does not use either mapper or reducers; the Map/Reduce integration relies on the Input/OutputFormat which can be invoked either from a Mapper or from a Reducer. Your Reducer might not be invoked for a variety of reasons; typically the map and reduce phases have different output

Re: [hadoop][pig] Using ES UDF to connect over HTTPS through Apache to ES

2014-10-22 Thread Costin Leau
That's because currently, es-hadoop does not support SSL (and thus HTTPS). There are plans to make this happen in 2.1 but we are not there yet. In the meantime I suggest trying to use either an HTTP proxy or an HTTP-to-HTTPS proxy. Cheers, On 10/22/14 7:11 PM, Aidan Higgins wrote: Hi, I am

Re: Elasticsearch-Hadoop repository plugin Cloudera Hadoop 2.0.0-cdh4.6.0

2014-10-14 Thread Costin Leau
You need the appropriate hadoop jar on your classpath otherwise es-hadoop repository plugin cannot connect to HDFS. In the repo, you'll find two versions with vanilla hadoop1 and hadoop2 - however if you are using a certain distro, for best compatibility you should use that distro client jars.

Re: Is there a way to update ES records using Spark?

2014-10-13 Thread Costin Leau
You can the mapping options [1], namely `es.mapping.id` to specify the id field of your documents. [1] http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/master/configuration.html#cfg-mapping On Mon, Oct 13, 2014 at 12:55 PM, Preeti Raj - Buchhada pbuchh...@gmail.com wrote: Anyone has

Re: Using Pig/Spark on ElasticSearch (as External Storage)

2014-10-12 Thread Costin Leau
It depends on various factors. Do you put all the data under one index or is it one index per day/month/hour? What type of script and performance degradation do you see? If it's easier feel free to reach out on irc. I'll be traveling this week but we'll be back the next one. Cheers On Oct 12,

Re: [Hadoop][Storm] Aim of EsSpout

2014-10-09 Thread Costin Leau
Hi, EsSpout returns the results of a query which currently it's a one time event since Elastic is a storage not a queue. So in this regard is useful for short-lived topologies that require the query data for processing. There are plans to extend this in the future for things like

[ANN] Elasticsearch Hadoop 2.0.2 and 2.1 Beta 2 with Storm and Spark SQL support

2014-10-08 Thread Costin Leau
Hi everyone, Elasticsearch Hadoop 2.0.2 and 2.1 Beta2, featuring Apache Storm integration and Apache Spark SQL, have been released. You can read all about them here [1]. Feedback is welcome! Cheers, http://www.elasticsearch.org/blog/elasticsearch-hadoop-2-0-2-and-2-1-beta2/ -- Costin -- You

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Costin Leau
. Elasticsearch cluster has 4 nodes. Where can I find the bulk size/entries numbers? Thanks, Zach On Wed, Oct 1, 2014 at 7:19 AM, Costin Leau costin.l...@gmail.com mailto:costin.l...@gmail.com wrote: The error indicates the ES nodes don't reply

Re: elasticsearch-hadoop sporadic timeouts

2014-10-03 Thread Costin Leau
, Zach On Fri, Oct 3, 2014 at 10:46 AM, Costin Leau costin.l...@gmail.com mailto:costin.l...@gmail.com wrote: You can always enable TRACE though that is likely to create way too much information in production and slow things down considerably. The first thing you can do is minimize

Re: elasticsearch-hadoop sporadic timeouts

2014-10-01 Thread Costin Leau
in the logs is that now I don't see the SimpleHttpConnectionManager warnings. Any ideas what we could try next? Thanks, Zach On Tuesday, September 30, 2014 10:54:27 AM UTC-5, Costin Leau wrote: Can you please try the 2.0.2.BUILD-SNAPSHOT? I think you might be running into issue #256 which

Re: elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Costin Leau
What version of es-hadoop/es/cascading are you using? On 9/30/14 6:16 PM, Zach Cox wrote: Hi - we're having problems with one of our map-reduce jobs that writes to Elasticsearch. Lots of map tasks are failing due to ES being unavailable, with logs like this:

Re: elasticsearch-hadoop sporadic timeouts

2014-09-30 Thread Costin Leau
, Zach On Tuesday, September 30, 2014 10:25:10 AM UTC-5, Costin Leau wrote: What version of es-hadoop/es/cascading are you using? On 9/30/14 6:16 PM, Zach Cox wrote: Hi - we're having problems with one of our map-reduce jobs that writes to Elasticsearch. Lots of map tasks

Re: elasticsearch and spark

2014-09-28 Thread Costin Leau
)] On Sunday, August 10, 2014 4:53:39 AM UTC-7, Costin Leau wrote: Sorry to hear that. I strongly suggest updating to the latest SNAPSHOT as both dynamic and es.mapping.id http://es.mapping.id (and co) are working (there are several tests hitting on exactly this in the suite

  1   2   3   >