Unusual bug,please help me,i can do nothing!!!

2022-03-30 Thread spark User
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows "F

error bug,please help me!!!

2022-03-20 Thread spark User
Hello, I am a spark user. I use the "spark-shell.cmd" startup command in windows cmd, the first startup is normal, when I use the "ctrl+c" command to force the end of the spark window, it can't start normally again. .The error message is as follows "F

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings  : failed to remove cache rdd or failed  to remove broadcast variable. Please help us how to

Connection Reset by Peer : failed to remove cached rdd

2021-07-29 Thread Big data developer need help relat to spark gateway roles in 2 . 0
Hi Team , We are facing issue in production where we are getting frequent Still have 1 request outstanding when connection with the hostname was closed connection reset by peer : errors as well as warnings  : failed to remove cache rdd or failed  to remove broadcast variable. Please help us how to

Re: Application Timeout

2021-03-25 Thread Brett Spark
://twitter.com/jaceklaskowski > > <https://twitter.com/jaceklaskowski> > > > On Tue, Jan 19, 2021 at 11:27 PM Brett Spark > wrote: > >> Hello! >> When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our >> standalone Spark "applications&qu

Application Timeout

2021-01-19 Thread Brett Spark
Hello! When using Spark Standalone & Spark 2.4.4 / 3.0.0 - we are seeing our standalone Spark "applications" timeout and show as "Finished" after around an hour of time. Here is a screenshot from the Spark master before it's marked as finished. [image: image.png] H

Spark stable release for Hadoop 3

2020-04-28 Thread Piper Spark
Hello, We are considering whether to use Hadoop or Kubernetes as the cluster manager for Spark. We would prefer to have Hadoop 3 because of its native support for scheduling GPUs. Although there is a Spark 3.0.0 pre-view2 version available that is pre-built for Hadoop 3, I would like to know

Re: writing into oracle database is very slow

2019-04-19 Thread spark receiver
; much on Oracle? How many partitions do you have on Oracle side? > > Am 06.04.2019 um 16:59 schrieb Lian Jiang : > > Hi, > > My spark job writes into oracle db using: > > df.coalesce(10).write.format("jdbc").option("url", url) > .option("driver&qu

Re: Hive to Oracle using Spark - Type(Date) conversion issue

2018-06-06 Thread spark receiver
ote: > > HI Jorn, > > Thanks for your sharing different options, yes we are trying to build a > generic tool for Hive to Spark export. > FYI, currently we are using sqoop, we are trying to migrate from sqoop to > spark. > > Thanks > -G > > On Tue, Mar 20, 2018 a

Re: [Structured Streaming] More than 1 streaming in a code

2018-04-13 Thread spark receiver
: 0 > --- > ++ > |aver| > ++ > | 3.0| > ++ > > --- > Batch: 1 > --- > ++ > |aver| > ++ > | 4.0| > ++ >

spark streaming kafka not displaying data in local eclipse

2018-01-16 Thread vr spark
Hi, I have a simple Java program to read data from kafka using spark streaming. When i run it from eclipse on my mac, it is connecting to the zookeeper, bootstrap nodes, But its not displaying any data. it does not give any error. it just shows 18/01/16 20:49:15 INFO Executor: Finished task

DataFrame joins with Spark-Java

2017-11-29 Thread sushma spark
Dear Friends, I am new to spark DataFrame. My requirement is i have a dataframe1 contains the today's records and dataframe2 contains yesterday's records. I need to compare the today's records with yesterday's records and find out new records which are not exists in the yesterday's records based

Re: Reload some static data during struct streaming

2017-11-13 Thread spark receiver
I need it cached to improve throughput ,only hope it can be refreshed once a day not every batch. > On Nov 13, 2017, at 4:49 PM, Burak Yavuz <brk...@gmail.com> wrote: > > I think if you don't cache the jdbc table, then it should auto-refresh. > > On Mon, Nov 13, 2

Reload some static data during struct streaming

2017-11-13 Thread spark receiver
Hi I’m using struct streaming(spark 2.2) to receive Kafka msg ,it works great. The thing is I need to join the Kafka message with a relative static table stored in mysql database (let’s call it metadata here). So is it possible to reload the metadata table after some time interval(like

Re: Driver hung and happend out of memory while writing to console progress bar

2017-02-13 Thread Spark User
How much memory have you allocated to the driver? Driver stores some state for tracking the task, stage and job history that you can see in the spark console, it does take up a significant portion of the heap, anywhere from 200MB - 1G, depending no your map reduce steps. Either way that is a good

Re: Question about best Spark tuning

2017-02-13 Thread Spark User
completing tasks quicker and let the spark scheduler (which is low cost and efficient based on my observation, it is never the bottleneck) do the work of distributing the work among the tasks. I have experimented with 1 task per core, 2-3 tasks per core and all the way up to 20+ tasks per core

Re: Is it better to Use Java or Python on Scala for Spark for using big data sets

2017-02-13 Thread Spark User
Spark has more support for scala, by that I mean more APIs are available for scala compared to python or Java. Also scala code will be more concise and easy to read. Java is very verbose. On Thu, Feb 9, 2017 at 10:21 PM, Irving Duran <irving.du...@gmail.com> wrote: > I would say Ja

Re: Performance bug in UDAF?

2017-02-09 Thread Spark User
one has solved similar problem. Thanks, Bharath On Mon, Oct 31, 2016 at 11:40 AM, Spark User <sparkuser2...@gmail.com> wrote: > Trying again. Hoping to find some help in figuring out the performance > bottleneck we are observing. > > Thanks, > Bharath > > On Sun, Oct

Re: covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-24 Thread vr spark
Hi, The source file i have is on local machine and its pretty huge like 150 gb. How to go about it? On Sun, Nov 20, 2016 at 8:52 AM, Steve Loughran <ste...@hortonworks.com> wrote: > > On 19 Nov 2016, at 17:21, vr spark <vrspark...@gmail.com> wrote: > > Hi, > I am

Potential memory leak in yarn ApplicationMaster

2016-11-21 Thread Spark User
Hi All, It seems like the heap usage for org.apache.spark.deploy.yarn.ApplicationMaster keeps growing continuously. The driver crashes with OOM eventually. More details: I have a spark streaming app that runs on spark-2.0. The spark.driver.memory is 10G and spark.yarn.driver.memoryOverhead

covert local tsv file to orc file on distributed cloud storage(openstack).

2016-11-19 Thread vr spark
Hi, I am looking for scala or python code samples to covert local tsv file to orc file and store on distributed cloud storage(openstack). So, need these 3 samples. Please suggest. 1. read tsv 2. convert to orc 3. store on distributed cloud storage thanks VR

Re: Performance bug in UDAF?

2016-10-31 Thread Spark User
Trying again. Hoping to find some help in figuring out the performance bottleneck we are observing. Thanks, Bharath On Sun, Oct 30, 2016 at 11:58 AM, Spark User <sparkuser2...@gmail.com> wrote: > Hi All, > > I have a UDAF that seems to perform poorly when its input is skewed

Performance bug in UDAF?

2016-10-30 Thread Spark User
es down to 4 minutes. So I am trying to understand why is there such a big performance difference? What in UDAF causes the processing time to increase in orders of magnitude when there is a skew in the data as observed above? Any insight from spark developers, contributors, or anyone else who has

RDD to Dataset results in fixed number of partitions

2016-10-21 Thread Spark User
Hi All, I'm trying to create a Dataset from RDD and do groupBy on the Dataset. The groupBy stage runs with 200 partitions. Although the RDD had 5000 partitions. I also seem to have no way to change that 200 partitions on the Dataset to some other large number. This seems to be affecting the

receiving stream data options

2016-10-13 Thread vr spark
Hi, I have a continuous rest api stream which keeps spitting out data in form of json. I access the stream using python requests.get(url, stream=True, headers=headers). I want to receive them using spark and do further processing. I am not sure which is best way to receive it in spark. What

Question about single/multi-pass execution in Spark-2.0 dataset/dataframe

2016-09-27 Thread Spark User
').count() //similar counts for 20 attributes //code to merge attr1Counts and attr2Counts and attr3Counts //translate it to desired output format and save the result. Some more details: 1) The application is a spark streaming application with batch interval in the order of 5 - 10 mins 2) Data set

Re: spark-submit failing but job running from scala ide

2016-09-26 Thread vr spark
Hi Jacek/All, I restarted my terminal and then i try spark-submit and again getting those errors. How do i see how many "runtimes" are running and how to have only one? some how my spark 1.6 and spark 2.0 are conflicting. how to fix it? i installed spark 1.6 earlier using this

Running jobs against remote cluster from scala eclipse ide

2016-09-26 Thread vr spark
Hi, I use scala IDE for eclipse. I usually run job against my local spark installed on my mac and then export the jars and copy it to spark cluster of my company and run spark submit on it. This works fine. But i want to run the jobs from scala ide directly using the spark cluster of my company

Re: spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark
yes, i have both spark 1.6 and spark 2.0. I unset the spark home environment variable and pointed spark submit to 2.0. Its working now. How do i uninstall/remove spark 1.6 from mac? Thanks On Sun, Sep 25, 2016 at 4:28 AM, Jacek Laskowski <ja...@japila.pl> wrote: > Hi, > > Can

spark-submit failing but job running from scala ide

2016-09-25 Thread vr spark
Hi, I have this simple scala app which works fine when i run it as scala application from the scala IDE for eclipse. But when i export is as jar and run it from spark-submit i am getting below error. Please suggest *bin/spark-submit --class com.x.y.vr.spark.first.SimpleApp test.jar* 16/09/24 23

Re: Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi Ted/All, i did below to get fullstack and see below, not able to understand root cause.. except Exception as error: traceback.print_exc() and this what i get... File "/usr/local/spark/python/lib/pyspark.zip/pyspark/sql/context.py", line 580, in sql return Data

Re: Attempting to accept an unknown offer

2016-08-17 Thread vr spark
sql ? > > On Wed, Aug 17, 2016 at 9:04 AM, vr spark <vrspark...@gmail.com> wrote: > >> spark 1.6.1 >> mesos >> job is running for like 10-15 minutes and giving this message and i >> killed it. >> >> In this job, i am creating data frame from a hive sql

Attempting to accept an unknown offer

2016-08-17 Thread vr spark
W0816 23:17:01.984846 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910492 W0816 23:17:01.984987 16360 sched.cpp:1195] Attempting to accept an unknown offer b859f2f3-7484-482d-8c0d-35bd91c1ad0a-O162910493 W0816 23:17:01.985124 16360

Undefined function json_array_to_map

2016-08-17 Thread vr spark
Hi, I am getting error on below scenario. Please suggest. i have a virtual view in hive view name log_data it has 2 columns query_map map<string,string> parti_date int Here is my snippet for the spark data frame my dataframe res=sqlcont.sql("select part

Re: dataframe row list question

2016-08-12 Thread vr spark
Hi Experts, Please suggest On Thu, Aug 11, 2016 at 7:54 AM, vr spark <vrspark...@gmail.com> wrote: > > I have data which is json in this format > > myList: array > |||-- elem: struct > ||||-- nm: string (nullable = true) > ||||-- vL

dataframe row list question

2016-08-11 Thread vr spark
I have data which is json in this format myList: array |||-- elem: struct ||||-- nm: string (nullable = true) ||||-- vList: array (nullable = true) |||||-- element: string (containsNull = true) from my kafka stream, i created a dataframe

Spark SQL -JDBC connectivity

2016-08-09 Thread Soni spark
Hi, I would to know the steps to connect SPARK SQL from spring framework (Web-UI). also how to run and deploy the web application?

Re: read only specific jsons

2016-07-27 Thread vr spark
, 2016 at 12:05 PM, Cody Koeninger <c...@koeninger.org> wrote: > Have you tried filtering out corrupt records with something along the > lines of > > df.filter(df("_corrupt_record").isNull) > > On Tue, Jul 26, 2016 at 1:53 PM, vr spark <vrspark...@gmail.com>

read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. I am using pyspark kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD

read only specific jsons

2016-07-26 Thread vr spark
i am reading data from kafka using spark streaming. I am reading json and creating dataframe. kvs = KafkaUtils.createDirectStream(ssc, kafkaTopic1, kafkaParams) lines = kvs.map(lambda x: x[1]) lines.foreachRDD(mReport) def mReport(clickRDD): clickDF = sqlContext.jsonRDD(clickRDD

Error in Word Count Program

2016-07-19 Thread RK Spark
val textFile = sc.textFile("README.md")val linesWithSpark = textFile.filter(line => line.contains("Spark")) linesWithSpark.saveAsTextFile("output1") Same error: org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/home/user/spark-1.5.1-bin-hadoop2.4/bin/README.md

Input path does not exist error in giving input file for word count program

2016-07-15 Thread RK Spark
val count = inputfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_ + _); org.apache.hadoop.mapred.InvalidInputException: Input path does not exist:

Re: Getting error in inputfile | inputFile

2016-07-15 Thread RK Spark
scala> val count = inputfile.flatMap(line => line.split((" ").map(word => (word,1)).reduceByKey(_ + _) | | You typed two blank lines. Starting a new command. I am getting like how to solve this Regrads, Ramkrishna KT

Getting error in inputfile | inputFile

2016-07-15 Thread RK Spark
I am using Spark version is 1.5.1, I am getting errors in first program of spark,ie.e., word count. Please help me to solve this *scala> val inputfile = sc.textFile("input.txt")* *inputfile: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[13] at textFile at :21* *scal

Unable to Run Spark Streaming Job in Hadoop YARN mode

2016-03-30 Thread Soni spark
Hi All, I am unable to run Spark Streaming job in my Hadoop Cluster, its behaving unexpectedly. When i submit a job, it fails by throwing some socket exception in HDFS, if i run the same job second or third time, it runs for sometime and stops. I am confused. Is there any configuration in YARN

Re: overriding spark.streaming.blockQueueSize default value

2016-03-29 Thread Spark Newbie
experiences. Thanks, On Mon, Mar 28, 2016 at 10:40 PM, Spark Newbie <sparknewbie1...@gmail.com> wrote: > Hi All, > > The default value for spark.streaming.blockQueueSize is 10 in > https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/

overriding spark.streaming.blockQueueSize default value

2016-03-28 Thread Spark Newbie
Hi All, The default value for spark.streaming.blockQueueSize is 10 in https://github.com/apache/spark/blob/branch-1.6/streaming/src/main/scala/org/apache/spark/streaming/receiver/BlockGenerator.scala. In spark kinesis asl 1.4 the received Kinesis records are stored by calling addData on line 115

Issues facing while Running Spark Streaming Job in YARN cluster mode

2016-03-22 Thread Soni spark
Hi , I am able to run spark streaming job in local mode, when i try to run the same job in my YARN cluster, its throwing errors. Any help is appreciated in this regard Here are my Exception logs: Exception 1: java.net.SocketTimeoutException: 48 millis timeout while waiting for channel

How to Catch Spark Streaming Twitter Exception ( Written Java)

2016-03-14 Thread Soni spark
Dear All, I am facing problem with Spark Twitter Streaming code, When ever twitter4j throws exception, i am unable to catch that exception. Could anyone help me catching that exception. Here is Pseudo Code: SparkConf sparkConf = new SparkConf().setMaster("local[2]").setApp

Terminate Spark job in eclipse

2016-03-14 Thread Soni spark
Hi Friends, Anyone can help me about how to terminate the Spark job in eclipse using java code? Thanks Soniya

Spark Twitter streaming

2016-03-07 Thread Soni spark
Hallo friends, I need a urgent help. I am using spark streaming to get the tweets from twitter and loading the data into HDFS. I want to find out the tweet source whether it is from web or mobile web or facebook ..etc. could you please help me logic. Thanks Soniya

Re: spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark
moidanalytics.com> wrote: > > Can you look in the executor logs and see why the sparkcontext is being > shutdown? Similar discussion happened here previously. > http://apache-spark-user-list.1001560.n3.nabble.com/RECEIVED-SIGNAL-15-SIGTERM-td23668.html > > Thanks > Best Regards

spark job submisson on yarn-cluster mode failing

2016-01-21 Thread Soni spark
Hi Friends, I spark job is successfully running on local mode but failing on cluster mode. Below is the error message i am getting. anyone can help me. 16/01/21 16:38:07 INFO twitter4j.TwitterStreamImpl: Establishing connection. 16/01/21 16:38:07 INFO twitter.TwitterReceiver: Twitter receiver

Re: ClassNotFoundException when executing spark jobs in standalone/cluster mode on Spark 1.5.2

2015-12-29 Thread Prem Spark
you need make sure this class is accessible to all servers since its a cluster mode and drive can be on any of the worker nodes. On Fri, Dec 25, 2015 at 5:57 PM, Saiph Kappa <saiph.ka...@gmail.com> wrote: > Hi, > > I'm submitting a spark job like this: > > ~/spark-1.5.2-bi

Re: why one of Stage is into Skipped section instead of Completed

2015-12-27 Thread Prem Spark
heir local dirs and Spark recognizes that, so rather than > re-computing, it will start from the following stage. So, this is a good > thing in that you’re not re-computing a stage. In your case, it looks like > there’s already the output of the userreqs RDD (reduceByKey) so it doesn’t > re

Can anyone explain Spark behavior for below? Kudos in Advance

2015-12-27 Thread Prem Spark
Scenario1: val z = sc.parallelize(List("12","23","345",""),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) => x + y) res143: String = 10 Scenario2: val z = sc.parallelize(List("12","23","","345"),2) z.aggregate("")((x,y) => math.min(x.length, y.length).toString, (x,y) =>

why one of Stage is into Skipped section instead of Completed

2015-12-25 Thread Prem Spark
Whats does the below Skipped Stage means. can anyone help in clarifying? I was expecting 3 stages to get Succeeded but only 2 of them getting completed while one is skipped. Status: SUCCEEDED Completed Stages: 2 Skipped Stages: 1 Scala REPL Code Used: accounts is a basic RDD contains

Unable to create hive table using HiveContext

2015-12-23 Thread Soni spark
Hi friends, I am trying to create hive table through spark with Java code in Eclipse using below code. HiveContext sqlContext = new org.apache.spark.sql.hive.HiveContext(sc.sc()); sqlContext.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING)"); but i am getting error

create hive table in Spark with Java code

2015-12-20 Thread Soni spark
Hi Friends, I have created a hive external table with partition. I want to alter the hive table partition through spark with java code. alter table table1 add if not exists partition(datetime='2015-12-01') location 'hdfs://localhost:54310/spark/twitter/datetime=2015-12-01/' The above query

How do I link JavaEsSpark.saveToEs() to a sparkConf?

2015-12-14 Thread Spark Enthusiast
Folks, I have the following program : SparkConf conf = new SparkConf().setMaster("local").setAppName("Indexer").set("spark.driver.maxResultSize", "2g");conf.set("es.index.auto.create", "true");conf.set("es.nodes", "localhost");conf.set("es.port", "9200");conf.set("es.write.operation",

epoch date time problem to load data into in spark

2015-12-08 Thread Soni spark
Hi Friends, I am written a spark streaming program in Java to access twitter tweets and it is working fine. I can able to copy the twitter feeds to HDFS location by batch wise.For each batch, it is creating a folder with epoch time stamp. for example, If i give HDFS location as *hdfs

Re: SparkException: Failed to get broadcast_10_piece0

2015-11-30 Thread Spark Newbie
Pinging again ... On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which Spark release are you using ? > > Please take a look at: > https://issues.apache.org/jira/browse/SPARK-5594 > > Cheers > > On Wed, Nov 25, 2015 at 3:59 PM, Spark New

Error in block pushing thread puts the KinesisReceiver in a stuck state

2015-11-25 Thread Spark Newbie
Hi Spark users, I have been seeing this issue where receivers enter a "stuck" state after it encounters a the following exception "Error in block pushing thread - java.util.concurrent.TimeoutException: Futures timed out". I am running the application on spark-1.4.1 and u

SparkException: Failed to get broadcast_10_piece0

2015-11-25 Thread Spark Newbie
Hi Spark users, I'm seeing the below exceptions once in a while which causes tasks to fail (even after retries, so it is a non recoverable exception I think), hence stage fails and then the job gets aborted. Exception --- java.io.IOException: org.apache.spark.SparkException: Failed to get

Re: SparkException: Failed to get broadcast_10_piece0

2015-11-25 Thread Spark Newbie
Using Spark-1.4.1 On Wed, Nov 25, 2015 at 4:19 PM, Ted Yu <yuzhih...@gmail.com> wrote: > Which Spark release are you using ? > > Please take a look at: > https://issues.apache.org/jira/browse/SPARK-5594 > > Cheers > > On Wed, Nov 25, 2015 at 3:59 PM, Spark New

Spark twitter streaming in Java

2015-11-18 Thread Soni spark
Dear Friends, I am struggling with spark twitter streaming. I am not getting any data. Please correct below code if you found any mistakes. import org.apache.spark.*; import org.apache.spark.api.java. function.*; import org.apache.spark.streaming.*; import org.apache.spark.streaming.api.java

Re: s3a file system and spark deployment mode

2015-10-15 Thread Spark Newbie
Are you using EMR? You can install Hadoop-2.6.0 along with Spark-1.5.1 in your EMR cluster. And that brings s3a jars to the worker nodes and it becomes available to your application. On Thu, Oct 15, 2015 at 11:04 AM, Scott Reynolds <sreyno...@twilio.com> wrote: > List, > > Right n

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-15 Thread Spark Newbie
l the number > retries (see Spark's configuration page). The job by default does not get > resubmitted. > > You could try getting the logs of the failed executor, to see what caused > the failure. Could be a memory limit issue, and YARN killing it somehow. > > > > On Wed,

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-14 Thread Spark Newbie
rdless of whether they were successfully processed or not. On Wed, Oct 14, 2015 at 11:01 AM, Spark Newbie <sparknewbie1...@gmail.com> wrote: > I ran 2 different spark 1.5 clusters that have been running for more than > a day now. I do see jobs getting aborted due to task retry's maxin

Re: Spark 1.5 java.net.ConnectException: Connection refused

2015-10-14 Thread Spark Newbie
I ran 2 different spark 1.5 clusters that have been running for more than a day now. I do see jobs getting aborted due to task retry's maxing out (default 4) due to ConnectionException. It seems like the executors die and get restarted and I was unable to find the root cause (same app code

Spark 1.5 java.net.ConnectException: Connection refused

2015-10-13 Thread Spark Newbie
Hi Spark users, I'm seeing the below exception in my spark streaming application. It happens in the first stage where the kinesis receivers receive records and perform a flatMap operation on the unioned Dstream. A coalesce step also happens as a part of that stage for optimizing the performance

DEBUG level log in receivers and executors

2015-10-12 Thread Spark Newbie
Hi Spark users, Is there an easy way to turn on DEBUG logs in receivers and executors? Setting sparkContext.setLogLevel seems to turn on DEBUG level only on the Driver. Thanks,

Re: Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie
? I can send it if that will help dig into the root cause. On Fri, Oct 9, 2015 at 2:18 PM, Tathagata Das <t...@databricks.com> wrote: > Can you provide the before stop and after restart log4j logs for this? > > On Fri, Oct 9, 2015 at 2:13 PM, Spark Newbie <sparknewbie1...@

Spark checkpoint restore failure due to s3 consistency issue

2015-10-09 Thread Spark Newbie
Hi Spark Users, I'm seeing checkpoint restore failures causing the application startup to fail with the below exception. When I do "ls" on the s3 path I see the key listed sometimes and not listed sometimes. There are no part files (checkpointed files) in the specified S3 path. This i

Getting an error when trying to read a GZIPPED file

2015-09-02 Thread Spark Enthusiast
Folks, I have an input file which is gzipped. I use sc.textFile("foo.gz") when I see the following problem. Can someone help me how to fix this? 15/09/03 10:05:32 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id15/09/03 10:05:32 INFO CodecPool: Got brand-new

Data Frame support CSV or excel format ?

2015-08-27 Thread spark user
Hi all , Can we create data frame from excels sheet or csv file , in below example It seems they support only json ? DataFrame df = sqlContext.read().json(examples/src/main/resources/people.json);

Spark

2015-08-24 Thread Spark Enthusiast
I was running a Spark Job to crunch a 9GB apache log file When I saw the following error: 15/08/25 04:25:16 WARN scheduler.TaskSetManager: Lost task 99.0 in stage 37.0 (TID 4115, ip-10-150-137-100.ap-southeast-1.compute.internal): ExecutorLostFailure (executor 29 lost)15/08/25 04:25:16 INFO

How to parse multiple event types using Kafka

2015-08-23 Thread Spark Enthusiast
Folks, I use the following Streaming API from KafkaUtils : public JavaPairInputDStreamString, String inputDStream() { HashSetString topicsSet = new HashSetString(Arrays.asList(topics.split(,))); HashMapString, String kafkaParams = new HashMapString, String();

How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast
Folks, As I see, the Driver program is a single point of failure. Now, I have seen ways as to how to make it recover from failures on a restart (using Checkpointing) but I have not seen anything as to how to restart it automatically if it crashes. Will running the Driver as a Hadoop Yarn

Re: How to automatically relaunch a Driver program after crashes?

2015-08-19 Thread Spark Enthusiast
Thanks for the reply. Are Standalone or Mesos the only options? Is there a way to auto relaunch if driver runs as a Hadoop Yarn Application? On Wednesday, 19 August 2015 12:49 PM, Todd bit1...@163.com wrote: There is an option for the spark-submit (Spark standalone or Mesos

Re: Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
Forgot to mention. Here is how I run the program :  ./bin/spark-submit --conf spark.app.master=local[1] ~/workspace/spark-python/ApacheLogWebServerAnalysis.py On Wednesday, 12 August 2015 10:28 AM, Spark Enthusiast sparkenthusi...@yahoo.in wrote: I wrote a small python program : def

Not seeing Log messages

2015-08-11 Thread Spark Enthusiast
I wrote a small python program : def parseLogs(self): Read and parse log file self._logger.debug(Parselogs() start) self.parsed_logs = (self._sc .textFile(self._logFile) .map(self._parseApacheLogLine) .cache())

How do I Process Streams that span multiple lines?

2015-08-03 Thread Spark Enthusiast
All  examples of Spark Stream programming that I see assume streams of lines that are then tokenised and acted upon (like the WordCount example). How do I process Streams that span multiple lines? Are there examples that I can use? 

Re: Spark 1.3.1 + Hive: write output to CSV with header on S3

2015-07-17 Thread spark user
password = ; String url = jdbc:hive2://quickstart.cloudera:1/default;  On Friday, July 17, 2015 2:29 AM, Roberto Coluccio roberto.coluc...@gmail.com wrote: Hello community, I'm currently using Spark 1.3.1 with Hive support for outputting processed data on an external Hive table

Re: Java 8 vs Scala

2015-07-15 Thread spark user
I struggle lots in Scala , almost 10 days n0 improvement , but when i switch to Java 8 , things are so smooth , and I used Data Frame with Redshift and Hive all are looking good .if you are very good In Scala the go with Scala otherwise Java is best fit  . This is just my openion because I am

Data Frame for nested json

2015-07-14 Thread spark user
is DataFrame  support nested json to dump directely to data base  For simple json it working fine  {id:2,name:Gerald,email:gbarn...@zimbio.com,city:Štoky,country:Czech Republic,ip:92.158.154.75”},  But for nested json it failed to load  root |-- rows: array (nullable = true) |    |-- element:

Java 8 vs Scala

2015-07-14 Thread spark user
Hi All  To Start new project in Spark , which technology is good .Java8 OR  Scala . I am Java developer , Can i start with Java 8  or I Need to learn Scala . which one is better technology  for quick start any POC project  Thanks  - su 

Re: spark - redshift !!!

2015-07-08 Thread spark user
Hi 'I am looking how to load data in redshift .Thanks  On Wednesday, July 8, 2015 12:47 AM, shahab shahab.mok...@gmail.com wrote: Hi, I did some experiment with loading data from s3 into spark. I loaded data from s3 using sc.textFile(). Have a look at the following code snippet

SparkR dataFrame read.df fails to read from aws s3

2015-07-08 Thread Ben Spark
I have Spark 1.4 deployed on AWS EMR but methods of SparkR dataFrame read.df method cannot load data from aws s3. 1) read.df error message read.df(sqlContext,s3://some-bucket/some.json,json) 15/07/09 04:07:01 ERROR r.RBackendHandler: loadDF on org.apache.spark.sql.api.r.SQLUtils failed

spark - redshift !!!

2015-07-07 Thread spark user
Hi Can you help me how to load data from s3 bucket to  redshift , if you gave sample code can you pls send me  Thanks su

Can a Spark Driver Program be a REST Service by itself?

2015-07-01 Thread Spark Enthusiast
Folks, My Use case is as follows: My Driver program will be aggregating a bunch of Event Streams and acting on it. The Action on the aggregated events is configurable and can change dynamically. One way I can think of is to run the Spark Driver as a Service where a config push can be caught via

Can I do Joins across Event Streams ?

2015-07-01 Thread Spark Enthusiast
Hi, I have to build a system that reacts to a set of events. Each of these events are separate streams by themselves which are consumed from different Kafka Topics and hence will have different InputDStreams. Questions: Will I be able to do joins across multiple InputDStreams and collate the

Serialization Exception

2015-06-29 Thread Spark Enthusiast
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:315)     at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:305)     at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:132

Re: s3 bucket access/read file

2015-06-29 Thread spark user
: S3 HEAD request failed for '/user%2Fdidi' - ResponseCode=400, ResponseMessage=Bad Request what does the user has to do here??? i am using key secret !!! How can i simply create RDD from text file on S3 Thanks Didi -- View this message in context: http://apache-spark-user-list.1001560.n3

Re: Scala/Python or Java

2015-06-25 Thread spark user
Spark is based on Scala and it written in Scala .To debug and fix issue i guess learning Scala is good  for long term ? any advise ? On Thursday, June 25, 2015 1:26 PM, ayan guha guha.a...@gmail.com wrote: I am a python fan so I use python. But what I noticed some features

Scala/Python or Java

2015-06-25 Thread spark user
Hi All , I am new for spark , i just want to know which technology is good/best for spark learning ? 1) Scala 2) Java 3) Python  I know spark support all 3 languages , but which one is best ? Thanks su  

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
Again, by Storm, you mean Storm Trident, correct? On Wednesday, 17 June 2015 10:09 PM, Michael Segel msegel_had...@hotmail.com wrote: Actually the reverse. Spark Streaming is really a micro batch system where the smallest window is 1/2 a second (500ms). So for CEP, its not really

Re: Spark or Storm

2015-06-17 Thread Spark Enthusiast
Shioji eshi...@gmail.com wrote: We've evaluated Spark Streaming vs. Storm and ended up sticking with Storm. Some of the important draw backs are: Spark has no back pressure (receiver rate limit can alleviate this to a certain point, but it's far from ideal)There is also no exactly-once

Re: Spark or Storm

2015-06-16 Thread Spark Enthusiast
eventUpstream services --- KAFKA - event Stream Processor Complex Event Processor Elastic Search. From what I understand, Storm will make a very good ESP and Spark Streaming will make a good CEP. But, we are also evaluating Storm with Trident

Re[2]: HBase 0.96+ with Spark 1.0+

2014-09-11 Thread spark
+ with Spark 1.0+ Hi Siyuan, Thanks for the input. We are preferring to use the SparkBuild.scala instead of maven. I did not see any protobuf.version related settings in that file. But - as noted by Sean Owen - in any case the issue we are facing presently is about the duplicate incompatible

  1   2   >