Hi Team,
I am evaluating different ways to submit & monitor spark Jobs using REST
Interfaces.
When to use Livy vs Spark Job Server?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/LIVY-VS-Spark-Job-Server-tp27722.html
Sent from the Apache Sp
Thanks for the reply RK.
Using the first option, my application doesn't recognize
spark.driver.extraJavaOptions.
With the second option, the issue remains as same,
2016-07-21 12:59:41 ERROR SparkContext:95 - Error initializing SparkContext.
org.apache.spark.SparkException: Found both spark.exe
Hi Team,
I am using *CDH 5.7.1* with spark *1.6.0*
I have a spark streaming application that read s from kafka & do some
processing.
The issue is while starting the application in CLUSTER mode, i want to pass
custom log4j.properies file to both driver & executor.
*I have the below command :-*
Hi Team,
I have a spark application up & running on a 10 node Standalone cluster.
When i launch the application in cluster mode i am able to create separate
log file for driver & executors (common for all executors).
But, my requirement is to create separate log file for each executors. Is it
fe
Hi Team,
Is there a way we can consume from Kafka using spark Streaming direct API
using multiple consumers (belonging to same consumer group)
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-streaming-Kafka-Direct-API-Multiple-consumers-
Hi All,
Is there any Pub-Sub for JMS provided by Spark out of box like Kafka?
Thanks.
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Streaming-and-JMS-tp5371p25548.html
Sent from the Apache Spark User List mailing list archive at Nabbl
Hi All,
I have a Spark SQL application to fetch data from Hive, on top I have a akka
layer to run multiple Queries in parallel.
*Please suggest a mechanism, so as to figure out the number of spark jobs
running in the cluster at a given instance of time. *
I need to do the above as, I see the ave
It does depend on the network IO within your cluster & CPU usage. Said that
the difference in time to run should not be huge (assumption, you are not
running any other job in the cluster in parallel).
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Jo
Hi Team,
I have a hive partition table with partition column having spaces.
When I try to run any query, say a simple "Select * from table_name", it
fails.
*Please note the same was working in spark 1.2.0, now I have upgraded to
1.3.1. Also there is no change in my application code base.*
If I
Reduce *spark.sql.shuffle.partitions* from default of 200 to total number of
cores.
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/4-seconds-to-count-13M-lines-Does-it-make-sense-tp22360p22374.html
Sent from the Apache Spark User List mailing list archive a
How is spark faster than MR when data is in disk in both cases?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Vs-MR-tp22373.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.
---
Hi Experts,
I have a parquet dataset of 550 MB ( 9 Blocks) in HDFS. I want to run SQL
queries repetitively.
Few questions :
1. When I do the below (persist to memory after reading from disk), it takes
lot of time to persist to memory, any suggestions of how to tune this?
val inputP
Hi All,
Suppose I have a parquet file of 100 MB in HDFS & my HDFS block is 64MB, so
I have 2 block of data.
When I do, *sqlContext.parquetFile("path")* followed by an action , two
tasks are stared on two partitions.
My intend is to read this 2 blocks in more partitions to fully utilize my
cluste
Hi Experts,
I have a scenario, where in I want to write to a avro file from a streaming
job that reads data from kafka.
But the issue is, as there are multiple executors and when all try to write
to a given file I get a concurrent exception.
I way to mitigate the issue is to repartition & have a
Hi Experts,
Like saveAsParquetFile on schemaRDD, there is a equivalent to store in ORC
file.
I am using spark 1.2.0.
As per the link below, looks like its not part of 1.2.0, so any latest
update would be great.
https://issues.apache.org/jira/browse/SPARK-2883
Till the next release, is there a w
Hi Experts,
Few general Queries :
1. Can a single block/partition in a RDD have more than 1 kafka message? or
there will be one & only one kafka message per block? In a more broader way,
is the message count related to block in any way or its just that any
message received with in a particular b
Resolved.
I changed to Apache Hadoop 2.4.0 & Apache spark 1.2.0 combination, all works
fine.
Must be because the 1.2.0 version of spark was compiled with hadoop 2.4.0
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-wi
Hi All,
Please clarify.
Can we say 1 RDD is generated every batch interval?
If the above is true. Then, is the foreachRDD() operator executed one & only
once for each batch processing?
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Can-we
Sorry for the typo.
Apache Hadoop version is 2.6.0
Regards,
Sam
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/ReliableDeliverySupervisor-Association-with-remote-system-tp20859p20860.html
Sent from the Apache Spark User List mailing list archive at Nabbl
eated by samyamaiti on 12/25/14.
*/
object Driver {
def main(args: Array[String]) {
//CheckPoint dir in HDFS
val checkpointDirectory =
"hdfs://localhost:8020/user/samyamaiti/SparkCheckpoint1"
//functionToCreateContext
def functionToCreateContext(): StreamingContext = {
20 matches
Mail list logo