date:20170222

Is there a list of missing optimizations for typed functions?

2017-02-22 Thread Justin Pihony

I was curious if there was introspection of certain typed functions and ran the following two queries: ds.where($"col" > 1).explain ds.filter(_.col > 1).explain And found that the typed function does NOT result in a PushedFilter. I imagine this is due to a limited view of the function, so I have

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-22 Thread Nick Pentreath

And to be clear, are you doing a self-join for approx similarity? Or joining to another dataset? On Thu, 23 Feb 2017 at 02:01, nguyen duc Tuan wrote: > Hi Seth, > Here's the parameters that I used in my experiments. > - Number of executors: 16 > - Executor's memories: vary from 1G -> 2G -> 3G

Is there any limit on number of tasks per stage attempt?

2017-02-22 Thread Parag Chaudhari

Hi, Is there any limit on number of tasks per stage attempt? *Thanks,* *Parag*

Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari

Thanks! If spark does not log these events in event log then why spark history server provides an API to get RDD information? >From the documentation, /applications/[app-id]/storage/rdd A list of stored RDDs for the given application. /applications/[app-id]/storage/rdd/[rdd-id] Details for

DataframeWriter - How to change filename extension

2017-02-22 Thread Nirav Patel

Hi, I am writing Dataframe as TSV using DataframeWriter as follows: myDF.write.mode("overwrite").option("sep","\t").csv("/out/path") Problem is all part files have .csv extension instead of .tsv as follows: part-r-00012-f9f06712-1648-4eb6-985b-8a9c79267eef.csv All the records are stored in TSV

Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao

It is too verbose, and will significantly increase the size event log. Here is the comment in the code: // No-op because logging every update would be overkill > override def onBlockUpdated(event: SparkListenerBlockUpdated): Unit = {} > > On Thu, Feb 23, 2017 at 11:42 AM, Parag Chaudhari wrote:

Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari

Thanks a lot the information! Is there any reason why EventLoggingListener ignore this event? *Thanks,* *Parag* On Wed, Feb 22, 2017 at 7:11 PM, Saisai Shao wrote: > AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will > not be written into event-log, I think that's w

Re: Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Saisai Shao

AFAIK, Spark's EventLoggingListerner ignores BlockUpdate event, so it will not be written into event-log, I think that's why you cannot get such info in history server. On Thu, Feb 23, 2017 at 9:51 AM, Parag Chaudhari wrote: > Hi, > > I am running spark shell in spark version 2.0.2. Here is my p

Why spark history server does not show RDD even if it is persisted?

2017-02-22 Thread Parag Chaudhari

Hi, I am running spark shell in spark version 2.0.2. Here is my program, var myrdd = sc.parallelize(Array.range(1, 10)) myrdd.setName("test") myrdd.cache myrdd.collect But I am not able to see any RDD info in "storage" tab in spark history server. I looked at this

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Cody Koeninger

If you're talking about the version of scala used to build the broker, that shouldn't matter. If you're talking about the version of scala used for the kafka client dependency, it shouldn't have compiled at all to begin with. On Wed, Feb 22, 2017 at 12:11 PM, Muhammad Haseeb Javed <11besemja...@se

RDD blocks on Spark Driver

2017-02-22 Thread prithish

Hello, Had a question. When I look at the executors tab in Spark UI, I notice that some RDD blocks are assigned to the driver as well. Can someone please tell me why? Thanks for the help.

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-22 Thread nguyen duc Tuan

Hi Seth, Here's the parameters that I used in my experiments. - Number of executors: 16 - Executor's memories: vary from 1G -> 2G -> 3G - Number of cores per executor: 1-> 2 - Driver's memory: 1G -> 2G -> 3G - The similar threshold: 0.6 MinHash: - number of hash tables: 2 SignedRandomProjection: -

Spark Streaming - parallel recovery

2017-02-22 Thread Dominik Safaric

Hi, As I am investigate among others onto the fault recovery capabilities of Spark, I’ve been curious - what source code artifact initiates the parallel recovery process? In addition, how is a faulty node detected (from a driver's point of view)? Thanks in advance, Dominik ---

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

2017-02-22 Thread Muhammad Haseeb Javed

I just noticed that Spark version that I am using (2.0.2) is built with Scala 2.11. However I am using Kafka 0.8.2 built with Scala 2.10. Could this be the reason why we are getting this error? On Mon, Feb 20, 2017 at 5:50 PM, Cody Koeninger wrote: > So there's no reason to use checkpointing at

Re: Spark Streaming: Using external data during stream transformation

2017-02-22 Thread Abhisheks

If I understand correctly, you need to create a UDF (if you are using java Extend appropriate UDF e.g. UDF1, UDF2 ..etc depending on number of arguments and have this static list as a member variable in your class. You can use this udf as filter in your stream directly. On Tue, Feb 21, 2017 at 8:

Executor links in Job History

2017-02-22 Thread yohann jardin

Hello, I'm using Spark 2.1.0 and hadoop 2.2.0. When I launch jobs on Yarn, I can retrieve their information on Spark History Server, except that the links to stdout/stderr of executors are wrong -> they lead to their url while the job was running. We have the flag 'yarn.log-aggregation-enabl

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-22 Thread Seth Hendrickson

I'm looking into this a bit further, thanks for bringing it up! Right now the LSH implementation only uses OR-amplification. The practical consequence of this is that it will select too many candidates when doing approximate near neighbor search and approximate similarity join. When we add AND-ampl

Re: Spark executors in streaming app always uses 2 executors

2017-02-22 Thread Jon Gregg

Spark offers a receiver-based approach or direct approach with Kafka ( https://spark.apache.org/docs/2.1.0/streaming-kafka-0-8-integration.html), and a note in the receiver-based approach says "topic partitions in Kafka does correlate to partitions of RDDs generated in Spark Streaming." A fix migh

Re: Spark SQL : Join operation failure

2017-02-22 Thread Yong Zhang

Your error message is not clear about what really happens. Is your container killed by Yarn, or it indeed runs OOM? When I run the spark job with big data, here is normally what I will do: 1) Enable GC output. You need to monitor the GC output in the executor, to understand the GC pressure. If

[ANNOUNCE] Apache Bahir 2.1.0 Released

2017-02-22 Thread Christian Kadner

The Apache Bahir community is pleased to announce the release of Apache Bahir 2.1.0 which provides the following extensions for Apache Spark 2.1.0: - Akka Streaming - MQTT Streaming - MQTT Structured Streaming - Twitter Streaming - ZeroMQ Streaming For more information about Apache

Is there a list of missing optimizations for typed functions?

Re: Practical configuration to run LSH in Spark 2.1.0

Is there any limit on number of tasks per stage attempt?

Re: Why spark history server does not show RDD even if it is persisted?

DataframeWriter - How to change filename extension

Re: Why spark history server does not show RDD even if it is persisted?

Re: Why spark history server does not show RDD even if it is persisted?

Re: Why spark history server does not show RDD even if it is persisted?

Why spark history server does not show RDD even if it is persisted?

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

RDD blocks on Spark Driver

Re: Practical configuration to run LSH in Spark 2.1.0

Spark Streaming - parallel recovery

Re: Why does Spark Streaming application with Kafka fail with “requirement failed: numRecords must not be negative”?

Re: Spark Streaming: Using external data during stream transformation

Executor links in Job History

Re: Practical configuration to run LSH in Spark 2.1.0

Re: Spark executors in streaming app always uses 2 executors

Re: Spark SQL : Join operation failure

[ANNOUNCE] Apache Bahir 2.1.0 Released

20 matches

Site Navigation

Mail list logo

Footer information