formats would be of great help.
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Efficient-approach-to-store-an-RDD-as-a-file-in-HDFS-and-read-it-back-as-an-RDD-tp25279.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Other than setting the following.
sparkConf.set("spark.streaming.unpersist", "true")
sparkConf.set("spark.cleaner.ttl", "7200s")
On Wed, Nov 4, 2015 at 5:03 PM, swetha <swethakasire...@gmail.com> wrote:
> Hi,
>
> How to unpersist a DStream
Hi,
Is Indexed RDDs released yet?
Thanks,
Swetha
On Sun, Nov 1, 2015 at 1:21 AM, Gylfi <gy...@berkeley.edu> wrote:
> Hi.
>
> You may want to look into Indexed RDDs
> https://github.com/amplab/spark-indexedrdd
>
> Regards,
> Gylfi.
>
>
>
>
>
>
Hi,
I have a requirement wherein I have to load data from hdfs, build an RDD and
then lookup by key to do some updates to the value and then save it back to
hdfs. How to lookup for a value using a key in an RDD?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list
Hi,
We currently use reduceByKey to reduce by a particular metric name in our
Streaming/Batch job. It seems to be doing a lot of shuffles and it has
impact on performance. Does using a custompartitioner before calling
reduceByKey improve performance?
Thanks,
Swetha
--
View this message
me other scheme other than hash-based) then you need to
> implement a custom partitioner. It can be used to improve data skews, etc.
> which ultimately improves performance.
>
> On Tue, Oct 27, 2015 at 1:20 PM, swetha <swethakasire...@gmail.com> wrote:
>
>> Hi,
>>
>
unt memory allocated for shuffles by changing
> the configuration spark.shuffle.memoryFraction . More fraction would cause
> less spilling.
>
>
> On Tue, Oct 27, 2015 at 6:35 PM, swetha kasireddy <
> swethakasire...@gmail.com> wrote:
>
>> So, Wouldn't
ow it could affect performance.
>
> Used correctly it should improve performance as you can better control
> placement of data and avoid shuffling…
>
> -adrian
>
> From: swetha kasireddy
> Date: Monday, October 26, 2015 at 6:56 AM
> To: Adrian Tanase
> Cc: Bill Bejeck, "us
Hi,
Does the use of custom partitioner in Streaming affect performance?
On Mon, Oct 5, 2015 at 1:06 PM, Adrian Tanase wrote:
> Great article, especially the use of a custom partitioner.
>
> Also, sorting by multiple fields by creating a tuple out of them is an
> awesome,
?
SparkContext.addFile()
SparkFiles.get(fileName)
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Distributed-caching-of-a-file-in-SPark-Streaming-tp25157.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
to do shuffles?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Job-splling-to-disk-and-memory-in-Spark-Streaming-tp25149.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
e manually screwed up a topic, or ... ?
>
> If you really want to just blindly "recover" from this situation (even
> though something is probably wrong with your data), the most
> straightforward thing to do is monitor and restart your job.
>
>
>
>
> On W
(this.trackerClass)
Some(newCount)
}
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-have-Single-refernce-of-a-class-in-Spark-Streaming-tp25103.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
ncache the
> previous one, and cache a new one.
>
> TD
>
> On Fri, Oct 16, 2015 at 12:02 PM, swetha <swethakasire...@gmail.com>
> wrote:
>
>> Hi,
>>
>> How to put a changing object in Cache for ever in Streaming. I know that
>> we
>> can do
Hi,
I have the following functions that I am using for my job in Scala. If you
see the getSessionId function I am returning null sometimes. If I return
null the only way that I can avoid processing those records is by filtering
out null records. I wanted to avoid having another pass for filtering
val groupedSessions = sessions.groupByKey();
val sortedSessions = groupedSessions.mapValues[(List[(Long,
String)])](iter => iter.toList.sortBy(_._1))
*
Does use of transform for code reuse affect groupByKey performance?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-use
emp files. They are not
> necessary for checkpointing and only stored in your local temp directory.
> They will be stored in "/tmp" by default. You can use `spark.local.dir` to
> set the path if you find your "/tmp" doesn't have enough space.
>
> Best Regards,
> Shixiong Zhu
>
12. You either provided an
invalid fromOffset, or the Kafka topic has been damaged
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Lost-leader-exception-in-Kafka-Direct-for-Streaming-tp24891.html
Sent from the Apache Spark User List mailing
variable when submitting a job in Spark?
-Dcom.w1.p1.config.runOnEnv=dev
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-set-System-environment-variables-in-Spark-tp24875.html
Sent from the Apache Spark User List mailing list archive
Sep 17 18:52 shuffle_23103_6_0.data
-rw-r--r-- 1 371932812 Sep 17 18:52 shuffle_23125_6_0.data
-rw-r--r-- 1 19857974 Sep 17 18:53 shuffle_23291_19_0.data
-rw-r--r-- 1 55342005 Sep 17 18:53 shuffle_23305_8_0.data
-rw-r--r-- 1 92920590 Sep 17 18:53 shuffle_23303_4_0.data
Thanks,
Swetha
Hi,
How to use reduceByKey inside updateStateByKey? Suppose I have a bunch of
keys for which I need to do sum and average inside the updateStateByKey by
joining with old state. How do I accomplish that?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560
Hi,
How to obtain the current key in updateStateBykey ?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-obtain-the-key-in-updateStateByKey-tp24792.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Hi,
How to make Group By more efficient? Is it recommended to use a custom
partitioner and then do a Group By? And can we use a custom partitioner and
then use a reduceByKey for optimization?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3
Hi,
When I try to recover my Spark Streaming job from a checkpoint directory, I
get a StackOverFlow Error as shown below. Any idea as to why this is
happening?
15/09/18 09:02:20 ERROR streaming.StreamingContext: Error starting the
context, marking it as stopped
java.lang.StackOverflowError
Hi,
How is the ContextCleaner different from spark.cleaner.ttl?Is
spark.cleaner.ttl when there is ContextCleaner in the Streaming job?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/broadcast-variable-get-cleaned-by-ContextCleaner
Hi,
How to set the number of executors and tasks in a Spark Streaming job in
Mesos? I have the following settings but my job still shows me 11 active
tasks and 11 executors. Any idea as to why this is happening
?
sparkConf.set(spark.mesos.coarse, true)
sparkConf.set(spark.cores.max, 128)
Hi,
I see the following error in my Spark Job even after using like 100 cores
and 16G memory. Did any of you experience the same problem earlier?
15/08/18 21:51:23 ERROR shuffle.RetryingBlockFetcher: Failed to fetch block
input-0-1439959114400, and will not retry (0 retries)
Hi,
What is the optimal approach to do Secondary sort in Spark? I have to first
Sort by an Id in the key and further sort it by timeStamp which is present
in the value.
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/What-is-the-optimal
keys
with different return values?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Multiple-UpdateStateByKey-Functions-in-the-same-job-tp24119.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
, classOf[LongWritable], classOf[Text]).
map{case (x, y) = (x.toString, y.toString)}
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-add-multiple-sequence-files-from-HDFS-to-a-Spark-Context-to-do-Batch-processing-tp24102.html
the performance
be impacted?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-Json-file-groupby-function-tp9618p24041.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
Scala?
val lines = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).
map{case (x, y) = ((x.toString,
Utils.toJsonObject(y.toString).get(request).getAsJsonObject().get(queryString).toString))}
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560
Streaming?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-maintain-Windows-of-data-along-with-maintaining-session-state-using-UpdateStateByKey-tp23986.html
Sent from the Apache Spark User List mailing list archive at Nabble.com
with
active sessions are still available for joining with those in the current
job. So, what do we need to keep the data in memory in between two batch
jobs? Can we use Tachyon?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-keep-RDDs
Hi,
Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
Streaming to do lookups/updates/deletes in RDDs using keys by storing them
as key/value pairs.
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Is-IndexedRDD
Hi Ankur,
Is IndexedRDD available in Spark 1.4.0? We would like to use this in Spark
Streaming to do lookups/updates/deletes in RDDs using keys by storing them
as key/value pairs.
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com
have
to use any code like ssc.checkpoint(checkpointDir)? Also, how is the
performance if I use both DStream Checkpointing for maintaining the state
and use Kafka Direct approach for exactly once semantics?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560
Hi,
What happens if the master node fails in the case of Spark Streaming? Would
the data be lost?
Thanks,
Swetha
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Regarding-master-node-failure-tp23701.html
Sent from the Apache Spark User List mailing list
101 - 138 of 138 matches
Mail list logo