You can create a SparkContext in your program and run it as a standalone
application without using spark-submit.
Here's something that will get you started:
//Create SparkContext
val sconf = new SparkConf()
.setMaster("spark://spark-ak-master:7077")
.setAppName("Test")
.s
on using yarn-cluster, it works good
On Mon, Jun 29, 2015 at 12:07 PM, ram kumar wrote:
> SPARK_CLASSPATH=$CLASSPATH:/usr/hdp/2.2.0.0-2041/hadoop-mapreduce/*
> in spark-env.sh
>
> I think i am facing the same issue
> https://issues.apache.org/jira/browse/SPARK-6203
>
>
>
> On Mon, Jun 29, 2015 a
Hi,
I am working on legacy project using spark java code.
I have a function which takes sqlContext as an argument, however, I need a
JavaSparkContext in that function.
It seems that sqlContext.sparkContext() return a scala sparkContext.
I did not find any API for casting a scala sparkContext t
Hi
Have you tested the Cloudera project:
https://github.com/cloudera/spark-timeseries ?
Let me know how did you progress on that route as I am also interested in
that topic ?
Cheers
On 26 June 2015 at 14:07, Caio Cesar Trucolo wrote:
> Hi everyone!
>
> I am working with multiple time series
It seems that JavaSparkContext is just a wrapper of scala sparkContext.
In JavaSparkContext, the scala one is used to do all the job.
If I pass the same scala sparkContext to initialize JavaSparkContext, I
still manipulate on the same sparkContext.
Sry for spamming.
Hao
On Mon, Jun 29, 2015 at
Hi Haviv,
have you tried sc.broadcast(model), the broadcast method is a member of
sparkContext class.
Thanks
Himanshu
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/kmeans-broadcast-tp23511p23526.html
Sent from the Apache Spark User List mailing list arch
Hi there
I am running 30 APPs in my spark cluster, and some of the APPs got
exception like below:[root@slave3 0]# cat stderr
15/06/29 17:20:08 INFO executor.CoarseGrainedExecutorBackend: Registered signal
handlers for [TERM, HUP, INT]
15/06/29 17:20:09 WARN util.NativeCodeLoader: Unable to
Hi Matt,
is there a reason you need to call coalesce every loop iteration? Most
likely it forces spark to do lots of unnecessary shuffles. Also - for
really large number of inputs this approach can lead to due to to many
nested RDD.union calls. A safer approach is to call union from
SparkCon
I can't see an obvious problem. Could you post the full minimal code that
reproduces the problem? Also why version of Spark and Scala are you using?
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/Scala-problem-when-using-g-vertices-map-not-a-member-of-type-
Hi, I'm using spark streaming to process data. I do a simple flatMap on each
record as follows
package bb;
import java.io.*;
import java.net.*;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.net.URI;
import java.util.List;
import java.util.Array
Hi Akhil,
Thanks for your reply.
Here it is
Launch Command: "/usr/lib/jvm/java-7-oracle/jre/bin/java" "-cp"
"/etc/spark/:/opt/spark/lib/spark-assembly-1.4.0-hadoop2.3.0.jar:/opt/spark-1.4.0-bin-hadoop2.3/lib/datanucleus-rdbms-3.2.9.jar:/opt/spark-1.4.0-bin
-hadoop2.3/lib/datanucleus-api-jdo-3.2
For prototyping purposes, I created a test program injecting dependancies using
Spring.
Nothing fancy. This is just a re-write of KafkaDirectWordCount. When I run
this, I get the following exception:
Exception in thread "main" org.apache.spark.SparkException: Task not
serializable
at
org.
I want to store the Spark application arguments such as input file, output
file into a Java property files and pass that file into Spark Driver. I'm
using spark-submit for submitting the job but couldn't find a parameter to
pass the properties file. Have you got any suggestions?
Hi there,
I have some traces from my master and some workers where for some reason,
the ./work directory of an application can not be created on the workers.
There is also an issue with the master's temp directory creation.
master logs: http://pastebin.com/v3NCzm0u
worker's logs: http://pastebin.
No, spark can not do that as it does not replicate partitions (so no retry
on different worker). It seems your cluster is not provisioned with correct
permissions. I would suggest to automate node provisioning.
On Mon, Jun 29, 2015 at 11:04 PM, maxdml wrote:
> Hi there,
>
> I have some traces fr
Hi,
Any {fan-out -> process in parallel -> fan-in -> aggregate} pattern of data
flow can be conceptually Map-Reduce(MR, as it is done in Hadoop).
Apart from the bigger list of map, reduce, sort, filter, pipe, join,
combine,... functions, that are many times more efficient and productive for
de
I'd like to toss out another idea that doesn't involve a complete end-to-end
Kerberos implementation. Essentially, have the driver authenticate to
Kerberos, instantiate a Hadoop file system, and serialize/cache it for the
executors to use instead of them having to instantiate their own.
- Dri
Currently, I am considering to use Guava Suppliers for delayed
initialization in workers
Supplier supplier = (Serializable & Supplier) () -> new T();
Supplier singleton = Suppliers.memoize(supplier);
On 26 June 2015 at 13:17, Igor Berman wrote:
> asked myself same question today...actually de
Hi, Akhil. Thank you for your reply. I tried what you suggested. But it
exists the following error.
source code is:
JavaPairRDD distFile=sc.hadoopFile(
"hdfs://cMaster:9000/wcinput/data.txt",
DataInputFormat.class,LongWritable.class,Text.class);
while DataInputFormat class is defined as this:
cl
Thanks TD, this helps.
Looking forward to some fix where framework handles the batch failures by some
callback methods. This will help not having to write try/catch in every
transformation / action.
Regards,
Amit
From: Tathagata Das mailto:t...@databricks.com>>
Date: Saturday, June 27, 2015 at
3. You need to use your own method, because you need to set up your job.
Read the checkpoint documentation.
4. Yes, if you want to checkpoint, you need to specify a url to store the
checkpoint at (s3 or hdfs). Yes, for the direct stream checkpoint it's
just offsets, not all the messages.
On Sun
see also:
https://github.com/apache/spark/pull/6848
On Mon, Jun 29, 2015 at 12:48 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> sc.hadoopConfiguration.set("mapreduce.input.fileinputformat.split.maxsize",
> "67108864")
>
> sc.sequenceFile(getMostRecentDirectory(tablePath, _.startsWith("_")).get
> + "/*", classO
Hi
Let me take ashot at your questions. (I am sure people like Cody and TD
will correct if I am wrong)
0. This is exact copy from the similar question in mail thread from Akhil D:
Since you set local[4] you will have 4 threads for your computation, and
since you are having 2 receivers, you are le
Hey all,
I try to make a DataFrame by inspection (using Spark 1.4.0), but run into a
parameter of my case class not being supported. Minimal example:
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
import com.vividsolutions.jts.geom.Coordinate
case class Fo
You may want to store the propertyfile either locally or ifyour intend to
launch it in yarn modes then in HDFS (asyou do notknow which node wil
become your AM)
On Mon, Jun 29, 2015 at 10:51 PM, diplomatic Guru
wrote:
> I want to store the Spark application arguments such as input file, output
>
Hi,
I'm trying to run Spark without Hadoop where the data would be read and
written to local disk.
For this I have few Questions -
1. Which download I need to use ? In the download option I don't see any
binary download which does not need Hadoop. Is the only way to do this to
download the sourc
1. Here you are basically creating 2 receivers and asking each of them to
consume 3 kafka partitions each.
- In 1.2 we have high level consumers so how can we restrict no of kafka
partitions to consume from? Say I have 300 kafka partitions in kafka topic
and as in above I gave 2 receivers and 3 ka
On 29 Jun 2015, at 14:18, Dave Ariens
mailto:dari...@blackberry.com>> wrote:
I'd like to toss out another idea that doesn't involve a complete end-to-end
Kerberos implementation. Essentially, have the driver authenticate to
Kerberos, instantiate a Hadoop file system, and serialize/cache it f
Hi
You really donot need hadoop installation. You can dowsload a pre-built
version with any hadoop and unzip it and you are good to go. Yes it may
complain while launching master and workers, safely ignore them. The only
problem is while writing to a directory. Of course you will not be able to
us
Thanks, Steve--I should have tested out this theory before spamming the list.
I haven't been able to get anything working after testing this theory out.
I'll hit up the Spark dev mailing list and try to garner enough interest to get
some Jira's cut.
I really appreciate everyone's feedback, tha
Hi all,
What is the best way to remotely debug, with breakpoints, spark apps?
Thanks in advance,
Best regards!
Pietro
I'm running a query from the BigDataBenchmark, query 1B to be precise.
When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
with 5 mesos slave, through a spark shell, all is well.
However rerunning the query a few times:
scala> sqlContext.sql("SELECT pageURL, pageRank FROM
Sourav:
Please see https://spark.apache.org/docs/latest/spark-standalone.html
Cheers
On Mon, Jun 29, 2015 at 7:33 AM, ayan guha wrote:
> Hi
>
> You really donot need hadoop installation. You can dowsload a pre-built
> version with any hadoop and unzip it and you are good to go. Yes it may
> com
Also, how do you suggest catching exceptions while using with connector API
like, saveAsNewAPIHadoopFiles ?
From: amit assudani mailto:aassud...@impetus.com>>
Date: Monday, June 29, 2015 at 9:55 AM
To: Tathagata Das mailto:t...@databricks.com>>
Cc: Cody Koeninger mailto:c...@koeninger.org>>,
"us
The underlying issue is a filesystem corruption on the workers.
In the case where I use hdfs, with a sufficient amount of replica, would
Spark try to launch a task on another node where the block replica is
present?
Thanks :-)
--
Henri Maxime Demoulin
2015-06-29 9:10 GMT-04:00 ayan guha :
> No
Please read:
http://search-hadoop.com/m/q3RTtig4WhHTdMa1
FYI
On Mon, Jun 29, 2015 at 8:37 AM, Pietro Gentile <
pietro.gentile89.develo...@gmail.com> wrote:
> Hi all,
>
> What is the best way to remotely debug, with breakpoints, spark apps?
>
>
> Thanks in advance,
> Best regards!
>
> Pietro
>
It's a scheduler question. Spark will retry the task on the same worker.
>From spark standpoint data is not replicated because spark provides fault
tolerance but lineage not by replication.
On 30 Jun 2015 01:50, "Max Demoulin" wrote:
> The underlying issue is a filesystem corruption on the worker
Cool.
On 29 Jun 2015 21:10, "郭谦" wrote:
> Akhil Das,
>
> You give me a new idea to solve the problem.
>
> Vova provides me a way to solve the problem just before
>
> Vova Shelgunov
>
> Sample code for submitting job from any other java app, e.g. servlet:
>
> http://pastebin.com/X1S28ivJ
>
> I app
When you call collect, you are bringing whole dataset back to driver
memory.
On 30 Jun 2015 01:43, "hbogert" wrote:
> I'm running a query from the BigDataBenchmark, query 1B to be precise.
>
> When running this with Spark (1.3.1)+ mesos(0.21) in coarse grained mode
> with 5 mesos slave, through a
No. He is collecting the results of the SQL query, not the whole dataset.
The REPL does retain references to prior results, so it's not really the
best tool to be using when you want no-longer-needed results to be
automatically garbage collected.
On Mon, Jun 29, 2015 at 9:13 AM, ayan guha wrote:
Hello,
I noticed that some of the spark-core APIs are not available with version
1.4.0 release of SparkR. For example textFile(), flatMap() etc. The code
seems to be there but is not exported in NAMESPACE. They were all available
as part of the AmpLab Extras previously. I wasn't able to find any
e
I see. Thank you for your help!
--
Henri Maxime Demoulin
2015-06-29 11:57 GMT-04:00 ayan guha :
> It's a scheduler question. Spark will retry the task on the same worker.
> From spark standpoint data is not replicated because spark provides fault
> tolerance but lineage not by replication.
> On
Actually, Hadoop InputFormats can still be used to read and write from
"file://", "s3n://", and similar schemes. You just won't be able to
read/write to HDFS without installing Hadoop and setting up an HDFS cluster.
To summarize: Sourav, you can use any of the prebuilt packages (i.e.
anything othe
Hi All ,
What is the best possible way to load multiple data tables using spark sql
Map options = new HashMap<>();
options.put("driver", MYSQLDR);
options.put("url", MYSQL_CN_URL);
options.put("dbtable","(select * from courses);
*can i add multiple tables to options map options.put("dbtable1"
Would there be a way to force the 'old' data out? Because at this point
I'll have to restart the shell every couple of queries to get meaningful
timings which are comparable to spark submit .
On Jun 29, 2015 6:20 PM, "Mark Hamstra" wrote:
> No. He is collecting the results of the SQL query, not
To my knowledge this is not supported.
On Mon, Jun 29, 2015 at 10:47 AM, Ashish Soni wrote:
>
> Hi All ,
>
> What is the best possible way to load multiple data tables using spark sql
>
> Map options = new HashMap<>();
> options.put("driver", MYSQLDR);
> options.put("url", MYSQL_CN_URL);
> optio
The RDD API is pretty complex and we are not yet sure we want to export all
those methods in the SparkR API. We are working towards exposing a more
limited API in upcoming versions. You can find some more details in the
recent Spark Summit talk at
https://spark-summit.org/2015/events/sparkr-the-pas
I'm having trouble using "select pow(col) from table" It seems the function
is not registered for SparkSQL. Is this on purpose or an oversight? I'm
using pyspark.
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val hadoopConf=sparkContext.hadoopConfiguration;
hadoopConf.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
hadoopConf.set("fs.s3.awsAccessKeyId",yourAccessKey)
hadoopConf.set("fs.s3.awsSecretAcc
Pls check your ACL properties.
On Monday, June 29, 2015 11:29 AM, didi wrote:
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val hadoopConf=sparkContext.hadoopConfiguration;
hadoopConf.set("fs.s3.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSys
Hi Jey,
Thanks for your inputs.
Probably I'm getting error as I'm trying to read a csv file from local file
using com.databricks.spark.csv package. Probably this package has hard
coded dependency on Hadoop as it is trying to read input format from
HadoopRDD.
Can you please confirm ?
Here is wha
I want to apply some logic on the basis of a FIX count of number of
tuples in each RDD . *suppose emit one rdd for every 5 tuple of previous
RDD . *
--
Thanks & Regards,
Anshu Shukla
Hi,
not sure what the context is but I think you can do something similar with
mapPartitions:
rdd.mapPartitions { iterator =>
iterator.grouped(5).map { tupleGroup => emitOneRddForGroup(tupleGroup) }
}
The edge case is when the final grouping doesn't have exactly 5 items, if
that matters.
On
Hi Sourav,
The error seems to be caused by the fact that your URL starts with
"file://" instead of "file:///".
Also, I believe the current version of the package for Spark 1.4 with Scala
2.11 should be "com.databricks:spark-csv_2.11:1.1.0".
-Jey
On Mon, Jun 29, 2015 at 12:23 PM, Sourav Mazumder
In pyspark, when I convert from rdds to dataframes it looks like the rdd is
being materialized/collected/repartitioned before it's converted to a
dataframe.
Just wondering if there's any guidelines for doing this conversion and
whether it's best to do it early to get the performance benefits of
da
Hi Bob,I tested your scenario with Spark 1.3 and I assumed you did not miss the
second parameter of pow(x,y)
from pyspark.sql import SQLContextsqlContext = SQLContext(sc)
df = sqlContext.jsonFile("/vagrant/people.json")# Displays the content of the
DataFrame to stdoutdf.show()#These are all fined
1.4 and I did set the second parameter. The DSL works fine but trying out
with SQL doesn't.
On Mon, Jun 29, 2015, 4:32 PM Salih Oztop wrote:
> Hi Bob,
> I tested your scenario with Spark 1.3 and I assumed you did not miss the
> second parameter of pow(x,y)
>
> from pyspark.sql import SQLContext
Interesting. Looking at the definitions, sql.functions.pow is defined only
for (col,col). Just as an experiment, create a column with value 2 and see
if that works.
Cheers
On Mon, Jun 29, 2015 at 1:34 PM, Bob Corsaro wrote:
> 1.4 and I did set the second parameter. The DSL works fine but trying
I have a join + leftOuterJoin + reduceByKey.
the join operation worked but the left outer join failed.
There are million log lines and when i did this
~ dvasthimal$ cat ~/Desktop/errors | grep " ERROR " | grep -v
"client.TransportResponseHandler" | grep -v "shuffle.RetryingBlockFetcher"
| grep -
I recommend writing using dstream.foreachRDD, and then
rdd.saveAsNewAPIHadoopFile inside try catch. See the implementation of
dstream.saveAsNewAPIHadoopFiles
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/PairDStreamFunctions.scala#L716
On
My job has multiple stages, each time a stage fails i have to restart the
entire app.
I understand Spark restarts failed tasks.
However, Is there a way to restart a Spark app from failed stage ?
--
Deepak
Hi All,
While using Checkpoints ( using HDFS ), if connectivity to hadoop cluster is
lost for a while and gets restored in some time, what happens to the running
streaming job.
Is it always assumed that connection to checkpoint FS ( this case HDFS ) would
ALWAYS be HA and would never fail for
Hitested wih Spark 1.4
We need to import pow otherwise it uses python version of pow I guess.
>>> from pyspark.sql.functions import pow>>>
>>> df.select(pow(df.age,df.age)).show()
15/06/29 22:36:05 INFO Ta++| POWER(age,
age)|++| null||
2.05891132094649E44|
I'm trying to compute the eigendecomposition of a matrix in a portion of my
code, using mllib.linalg.EigenValueDecomposition
(https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/linalg/EigenValueDecomposition.scala
)
as follows:
val tol = 1e-10
val maxIter
HI Jey,
Not much of luck.
If I use the class com.databricks:spark-csv_2.
11:1.1.0 or com.databricks.spark.csv_2.11.1.1.0 I get class not found
error. With com.databricks.spark.csv I don't get the class not found error
but I still get the previous error even after using file:/// in the URI.
Regar
My error was related to Scala version. Upon further reading, I realized
that it takes some effort to get Spark working with Scala 2.11.
I've reverted to using 2.10 and moved past that error. Now I hit the issue
you mentioned. Waiting for 1.4.1.
Srikanth
On Fri, Jun 26, 2015 at 9:10 AM, Roberto Co
I get the same error even when I define covOperator not to use a matrix at
all:
def covOperator(v : BDV[Double]) :BDV[Double] = { v }
--
View this message in context:
http://apache-spark-user-list.1001560.n3.nabble.com/breeze-linalg-DenseMatrix-not-found-tp23537p23538.html
Sent from the Apach
Hi All
here goes my first question :
Here is my use case
I have 1TB data I want to process on ec2 using spark
I have uploaded the data on ebs volume
The instruction on amazon ec2 set up explains
"*If your application needs to access large datasets, the fastest way to do
that is to load them from
The format is still "com.databricks.spark.csv", but the parameter passed to
spark-shell is "--packages com.databricks:spark-csv_2.11:1.1.0".
On Mon, Jun 29, 2015 at 2:59 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:
> HI Jey,
>
> Not much of luck.
>
> If I use the class com.databricks
I am running a spark streaming example from learning spark book with one
change. The change I made was for streaming a file from HDFS.
val lines = ssc.textFileStream("hdfs:/user/hadoop/spark/streaming/input")
I ran the application number of times and every time dropped a new file in
the input dir
Hello,
It is my understanding that shuffle are written on disk and that they act
as checkpoints.
I wonder if this is true only within a job, or across jobs. Please note
that I use the words job and stage carefully here.
1. can a shuffle created during JobN be used to skip many stages from
JobN+1
Ah, for #3, maybe this is what *rdd.checkpoint *does!
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.RDD
Thomas
On Mon, Jun 29, 2015 at 7:12 PM, Thomas Gerber
wrote:
> Hello,
>
> It is my understanding that shuffle are written on disk and that they act
> as chec
Hi, TD
In my code,
I write like this:
dstream.foreachRDD { rdd =>
try {
} catch {
}
}
it will still throw exception, and the driver will be killed...
I need to catch exception in rdd.foreachPartition
just like these, so I need to retry by myself ..
dstream.foreach
Regarding 1 and 2, yes shuffle output is stored on the worker local disks and
will be reused across jobs as long as they’re available. You can identify when
they’re used by seeing skipped stages in the job UI. They are periodically
cleaned up based on available space of the configured spark.loca
What do you mean by "new file", do you upload an already existing file onto
HDFS or create a new one locally and then upload it to HDFS?
bit1...@163.com
From: ravi tella
Date: 2015-06-30 09:59
To: user
Subject: spark streaming HDFS file issue
I am running a spark streaming example from learnin
Thanks Silvio.
On Mon, Jun 29, 2015 at 7:41 PM, Silvio Fiorito <
silvio.fior...@granturing.com> wrote:
> Regarding 1 and 2, yes shuffle output is stored on the worker local
> disks and will be reused across jobs as long as they’re available. You can
> identify when they’re used by seeing skipp
It seems the root cause of the delay was the sheer size of the DAG for
those jobs, which are towards the end of a long series of jobs.
To reduce it, you can probably try to checkpoint (rdd.checkpoint) some
previous RDDs. That will:
1. save the RDD on disk
2. remove all references to the parents of
Yes, the observation is correct. That connectivity is assumed to be HA.
On Mon, Jun 29, 2015 at 2:34 PM, Amit Assudani
wrote:
> Hi All,
>
> While using Checkpoints ( using HDFS ), if connectivity to hadoop
> cluster is lost for a while and gets restored in some time, what happens to
> the run
Hi Jey,
This solves the class not found problem. Thanks.
But still the inputs format is not yet resolved. Looks like it is still
trying to create a HadoopRDD I don't know why. The error message goes like -
java.lang.RuntimeException: Error in configuring object
at
org.apache.hadoop.util.Refl
Order the tasks by status and see if there are any with status failed.
On Mon, 29 Jun 2015 at 2:26 pm ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> Attached pic shows the error that is displayed in Spark UI.
>
>
> On Mon, Jun 29, 2015 at 2:22 PM, ÐΞ€ρ@Ҝ (๏̯͡๏)
> wrote:
>
>> I have a join + leftOuterJoin + reduceByKey.
All InputFormats will use HadoopRDD or NewHadoopRDD. Do you use "file:///"
instead of "file://"?
On Mon, Jun 29, 2015 at 8:40 PM, Sourav Mazumder <
sourav.mazumde...@gmail.com> wrote:
> Hi Jey,
>
> This solves the class not found problem. Thanks.
>
> But still the inputs format is not yet resolve
There are many with failed. Attached pic shows exception traces.
On Mon, Jun 29, 2015 at 9:07 PM, Matthew Jones wrote:
> Order the tasks by status and see if there are any with status failed.
> On Mon, 29 Jun 2015 at 2:26 pm ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
>
>> Attached pic shows the error that is display
I can't see any failed tasks in the attached pic.
On Mon, Jun 29, 2015 at 9:31 PM ÐΞ€ρ@Ҝ (๏̯͡๏) wrote:
> There are many with failed. Attached pic shows exception traces.
>
> On Mon, Jun 29, 2015 at 9:07 PM, Matthew Jones wrote:
>
>> Order the tasks by status and see if there are any with status
Hi Folks,
I just stepped up from 1.3.1 to 1.4.0, the most notable difference for me so
far is the data frame reader/writer. Previously:
val myData = hiveContext.load("s3n://someBucket/somePath/","parquet")
Now:
val myData = hiveContext.read.parquet("s3n://someBucket/somePath")
Using the ori
Facing following error message while performing sbt/sbt assembly
Error occurred during initialization of VM
Could not reserve enough space for object heap
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
pls give me solution for it
--
Hey Spark Users,
I'm writing a demo with Spark and HBase. What I've done is packaging a
**fat jar**: place dependencies in `build.sbt`, and use `sbt assembly` to
package **all dependencies** into one big jar. The rest work is copy the
fat jar to Spark master node and then launch by `spark-submit`.
You can pass `--packages your:comma-separated:maven-dependencies` to spark
submit if you have Spark 1.3 or greater.
Best regards,
Burak
On Mon, Jun 29, 2015 at 10:46 PM, SLiZn Liu wrote:
> Hey Spark Users,
>
> I'm writing a demo with Spark and HBase. What I've done is packaging a
> **fat jar**:
Hi Burak,
Is `--package` flag only available for maven, no sbt support?
On Tue, Jun 30, 2015 at 2:26 PM Burak Yavuz wrote:
> You can pass `--packages your:comma-separated:maven-dependencies` to spark
> submit if you have Spark 1.3 or greater.
>
> Best regards,
> Burak
>
> On Mon, Jun 29, 2015 a
88 matches
Mail list logo