Re: RE: ElasticSearch for Spark times out

Otis Gospodnetic Wed, 22 Apr 2015 20:12:05 -0700

Hi,

If you get ES response back in 1-5 seconds that's pretty slow.  Are these
ES aggregation queries?  Costin may be right about GC possibly causing
timeouts.  SPM <http://sematext.com/spm/> can give you all Spark and all
key Elasticsearch metrics, including various JVM metrics.  If the problem
is GC, you'll see it.  If you monitor both Spark side and ES side, you
should be able to find some correlation with SPM.


Otis
--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/


On Wed, Apr 22, 2015 at 5:43 PM, Costin Leau <costin.l...@gmail.com> wrote:

> Hi,
>
> First off, for Elasticsearch questions is worth pinging the Elastic
> mailing list as that is closer monitored than this one.
>
> Back to your question, Jeetendra is right that the exception indicates
> nodata is flowing back to the es-connector and
> Spark.
> The default is 1m [1] which should be more than enough for a typical
> scenario. As a side note the scroll size is 50 per
> tasks
> (so 150 suggests 3 tasks).
>
> Once the query is made, scrolling the document is fast - likely there's
> something else at hand that causes the
> connection to timeout.
> In such cases, you can enable logging on the REST package and see what
> type of data transfer occurs between ES and Spark.
>
> Do note that if a GC occurs, that can freeze Elastic (or Spark) which
> might trigger the timeout. Consider monitoring
> Elasticsearch during
> the query and see whether anything jumps - in particular the memory
> pressure.
>
> Hope this helps,
>
> [1]
> http://www.elastic.co/guide/en/elasticsearch/hadoop/master/configuration.html#_network
>
> On 4/22/15 10:44 PM, Adrian Mocanu wrote:
>
>> Hi
>>
>> Thanks for the help. My ES is up.
>>
>> Out of curiosity, do you know what the timeout value is? There are
>> probably other things happening to cause the timeout;
>> I don’t think my ES is that slow but it’s possible that ES is taking too
>> long to find the data. What I see happening is
>> that it uses scroll to get the data from ES; about 150 items at a
>> time.Usual delay when I perform the same query from a
>> browser plugin ranges from 1-5sec.
>>
>> Thanks
>>
>> *From:*Jeetendra Gangele [mailto:gangele...@gmail.com]
>> *Sent:* April 22, 2015 3:09 PM
>> *To:* Adrian Mocanu
>> *Cc:* u...@spark.incubator.apache.org
>> *Subject:* Re: ElasticSearch for Spark times out
>>
>> Basically ready timeout means hat no data arrived within the specified
>> receive timeout period.
>>
>> Few thing I would suggest
>>
>> 1.are your ES cluster Up and running?
>>
>> 2. if 1 is yes then reduce the size of the Index make it few kbps and
>> then test?
>>
>> On 23 April 2015 at 00:19, Adrian Mocanu <amoc...@verticalscope.com
>> <mailto:amoc...@verticalscope.com>> wrote:
>>
>> Hi
>>
>> I use the ElasticSearch package for Spark and very often it times out
>> reading data from ES into an RDD.
>>
>> How can I keep the connection alive (why doesn’t it? Bug?)
>>
>> Here’s the exception I get:
>>
>> org.elasticsearch.hadoop.serialization.EsHadoopSerializationException:
>> java.net.SocketTimeoutException: Read timed out
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:86)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.ParsingUtils.doSeekToken(ParsingUtils.java:70)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.ParsingUtils.seek(ParsingUtils.java:58)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.readHit(ScrollReader.java:149)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:102)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.ScrollReader.read(ScrollReader.java:81)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.rest.RestRepository.scroll(RestRepository.java:314)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.hadoop.rest.ScrollQuery.hasNext(ScrollQuery.java:76)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:46)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> ~[scala-library.jar:na]
>>
>>                  at
>> scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>> ~[scala-library.jar:na]
>>
>>                  at
>> scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:388)
>> ~[scala-library.jar:na]
>>
>>                  at
>> scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>> ~[scala-library.jar:na]
>>
>>                  at
>> scala.collection.Iterator$class.foreach(Iterator.scala:727)
>> ~[scala-library.jar:na]
>>
>>                  at
>> scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>> ~[scala-library.jar:na]
>>
>>                  at
>> org.apache.spark.shuffle.hash.HashShuffleWriter.write(HashShuffleWriter.scala:65)
>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>
>>                  at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>
>>                  at
>> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:41)
>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>
>>                  at org.apache.spark.scheduler.Task.run(Task.scala:54)
>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>
>>                  at
>> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
>> ~[spark-core_2.10-1.1.0.jar:1.1.0]
>>
>>                  at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>> [na:1.7.0_75]
>>
>>                  at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>> [na:1.7.0_75]
>>
>>                  at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
>>
>> Caused by: java.net.SocketTimeoutException: Read timed out
>>
>>                  at java.net.SocketInputStream.socketRead0(Native Method)
>> ~[na:1.7.0_75]
>>
>>                  at
>> java.net.SocketInputStream.read(SocketInputStream.java:152) ~[na:1.7.0_75]
>>
>>                  at
>> java.net.SocketInputStream.read(SocketInputStream.java:122) ~[na:1.7.0_75]
>>
>>                  at
>> java.io.BufferedInputStream.read1(BufferedInputStream.java:273)
>> ~[na:1.7.0_75]
>>
>>                  at
>> java.io.BufferedInputStream.read(BufferedInputStream.java:334)
>> ~[na:1.7.0_75]
>>
>>                  at
>> org.apache.commons.httpclient.WireLogInputStream.read(WireLogInputStream.java:69)
>> ~[commons-httpclient-3.1.jar:na]
>>
>>                  at
>> org.apache.commons.httpclient.ContentLengthInputStream.read(ContentLengthInputStream.java:170)
>> ~[commons-httpclient-3.1.jar:na]
>>
>>                  at
>> java.io.FilterInputStream.read(FilterInputStream.java:133) ~[na:1.7.0_75]
>>
>>                  at
>> org.apache.commons.httpclient.AutoCloseInputStream.read(AutoCloseInputStream.java:108)
>> ~[commons-httpclient-3.1.jar:na]
>>
>>                  at
>> org.elasticsearch.hadoop.rest.DelegatingInputStream.read(DelegatingInputStream.java:57)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  at
>> org.codehaus.jackson.impl.Utf8StreamParser.loadMore(Utf8StreamParser.java:172)
>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>
>>                  at
>> org.codehaus.jackson.impl.Utf8StreamParser.parseEscapedFieldName(Utf8StreamParser.java:1502)
>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>
>>                  at
>> org.codehaus.jackson.impl.Utf8StreamParser.slowParseFieldName(Utf8StreamParser.java:1404)
>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>
>>                  at
>> org.codehaus.jackson.impl.Utf8StreamParser._parseFieldName(Utf8StreamParser.java:1231)
>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>
>>                  at
>> org.codehaus.jackson.impl.Utf8StreamParser.nextToken(Utf8StreamParser.java:495)
>> ~[jackson-core-asl-1.9.11.jar:1.9.11]
>>
>>                  at
>> org.elasticsearch.hadoop.serialization.json.JacksonJsonParser.nextToken(JacksonJsonParser.java:84)
>> ~[elasticsearch-hadoop-2.1.0.Beta3.jar:2.1.0.Beta3]
>>
>>                  ... 22 common frames omitted
>>
>>
>>
>>
> --
> Costin
>
>
>
>
> --
> Costin
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>

Re: RE: ElasticSearch for Spark times out

Reply via email to