While the job is running, just look in the directory and see whats the root
cause of it (is it the logs? is it the shuffle? etc). Here's a few
configuration options which you can try:
- Disable shuffle : spark.shuffle.spill=false (It might end up in OOM)
- Enable log rotation:
You can also set these in the spark-env.sh file :
export SPARK_WORKER_DIR=/mnt/spark/
export SPARK_LOCAL_DIR=/mnt/spark/
Thanks
Best Regards
On Mon, Jul 6, 2015 at 12:29 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
While the job is running, just look in the directory and see whats
Its complaining for a jdbc driver. Add it in your driver classpath like:
./bin/spark-sql --driver-class-path
/home/akhld/sigmoid/spark/lib/mysql-connector-java-5.1.32-bin.jar
Thanks
Best Regards
On Mon, Jul 6, 2015 at 11:42 AM, sandeep vura sandeepv...@gmail.com wrote:
Hi Sparkers,
I am
If you want a long running application, then go with spark streaming (which
kind of blocks your resources). On the other hand, if you use job server
then you can actually use the resources (CPUs) for other jobs also when
your dbjob is not using them.
Thanks
Best Regards
On Sun, Jul 5, 2015 at
Looks like, it spend more time writing/transferring the 40GB of shuffle
when you used kryo. And surpirsingly, JavaSerializer has 700MB of shuffle?
Thanks
Best Regards
On Sun, Jul 5, 2015 at 12:01 PM, Gavin Liu ilovesonsofanar...@gmail.com
wrote:
Hi,
I am using TeraSort benchmark from
With binary i think it might not be possible, although if you can download
the sources and then build it then you can remove this function
https://github.com/apache/spark/blob/master/repl/scala-2.10/src/main/scala/org/apache/spark/repl/SparkILoop.scala#L1023
which initializes the SQLContext.
I think you can open up a jira, not sure if this PR
https://github.com/apache/spark/pull/2209/files (SPARK-2890
https://issues.apache.org/jira/browse/SPARK-2890) broke the validation
piece.
Thanks
Best Regards
On Fri, Jul 3, 2015 at 4:29 AM, Koert Kuipers ko...@tresata.com wrote:
i am
Can you paste the code? Something is missing
Thanks
Best Regards
On Fri, Jul 3, 2015 at 3:14 PM, Jem Tucker jem.tuc...@gmail.com wrote:
In the driver when running spark-submit with --master yarn-client
On Fri, Jul 3, 2015 at 10:23 AM Akhil Das ak...@sigmoidanalytics.com
wrote:
Where does
Did you try:
build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package
Thanks
Best Regards
On Fri, Jul 3, 2015 at 2:27 PM, 1106944...@qq.com 1106944...@qq.com wrote:
Hi all,
Anyone build spark 1.4 source code for sparkR with maven/sbt, what's
comand ? using
Where does it returns null? Within the driver or in the executor? I just
tried System.console.readPassword in spark-shell and it worked.
Thanks
Best Regards
On Fri, Jul 3, 2015 at 2:32 PM, Jem Tucker jem.tuc...@gmail.com wrote:
Hi,
We have an application that requires a username/password to
rdd's which are no longer required will be removed from memory by spark
itself (which you can consider as lazy?).
Thanks
Best Regards
On Wed, Jul 1, 2015 at 7:48 PM, Jem Tucker jem.tuc...@gmail.com wrote:
Hi,
The current behavior of rdd.unpersist() appears to not be lazily executed
and
Have a look at the sc.wholeTextFiles, you can use it to read the whole csv
contents into the value and then split it on \n and add them up to a list
and return it.
*sc.wholeTextFiles:*
Read a directory of text files from HDFS, a local file system (available on
all nodes), or any Hadoop-supported
Looks like a jar conflict to me.
ava.lang.NoSuchMethodException:
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData.getBytesWritten()
You are having multiple versions of the same jars in the classpath.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 6:58 AM, nkd kalidas.nimmaga...@gmail.com
It says:
Caused by: java.net.ConnectException: Connection refused: slave2/...:54845
Could you look in the executor logs (stderr on slave2) and see what made it
shut down? Since you are doing a join there's a high possibility of OOM etc.
Thanks
Best Regards
On Wed, Jul 1, 2015 at 10:20 AM,
Now i'm having a strange feeling to try this on KBOX
http://kevinboone.net/kbox.html :/
Thanks
Best Regards
On Wed, Jul 1, 2015 at 9:10 AM, Exie tfind...@prodevelop.com.au wrote:
FWIW, I had some trouble getting Spark running on a Pi.
My core problem was using snappy for compression as it
Have a look at https://spark.apache.org/docs/latest/job-scheduling.html
Thanks
Best Regards
On Wed, Jul 1, 2015 at 12:01 PM, Nirmal Fernando nir...@wso2.com wrote:
Hi All,
Is there any additional configs that we have to do to perform $subject?
--
Thanks regards,
Nirmal
Associate
Have a look at the window, updateStateByKey operations, if you are looking
for something more sophisticated then you can actually persists these
streams in an intermediate storage (say for x duration) like HBase or
Cassandra or any other DB and you can do global aggregations with these.
Thanks
.addJar works for me when i run it as a stand-alone application (without
using spark-submit)
Thanks
Best Regards
On Tue, Jun 30, 2015 at 7:47 PM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, running into a pretty strange issue:
I'm setting
spark.executor.extraClassPath
Since its a windows machine, you are very likely to be hitting this one
https://issues.apache.org/jira/browse/SPARK-2356
Thanks
Best Regards
On Wed, Jul 1, 2015 at 12:36 AM, Sourav Mazumder
sourav.mazumde...@gmail.com wrote:
Hi,
I'm running Spark 1.4.0 without Hadoop. I'm using the binary
Have a look at the StageInfo
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.scheduler.StageInfo
class,
it has method stageFailed. You could make use of it. I don't understand the
point of restarting the entire application.
Thanks
Best Regards
On Tue, Jun 30, 2015 at
How much memory you have on that machine? You can increase the heap-space
by *export _JAVA_OPTIONS=-Xmx2g*
Thanks
Best Regards
On Tue, Jun 30, 2015 at 11:00 AM, Chintan Bhatt
chintanbhatt...@charusat.ac.in wrote:
Facing following error message while performing sbt/sbt assembly
Error
This:
Caused by: java.util.concurrent.TimeoutException: Futures timed out after
[30 seconds]
Could happen for many reasons, one of them could be because of insufficient
memory. Are you running all 20 apps on the same node? How are you
submitting the apps? (with spark-submit?). I see you have
Try this way:
val data = sc.textFile(s3n://ACCESS_KEY:SECRET_KEY@mybucket/temp/)
Thanks
Best Regards
On Mon, Jun 29, 2015 at 11:59 PM, didi did...@gmail.com wrote:
Hi
*Cant read text file from s3 to create RDD
*
after setting the configuration
val
Cool.
On 29 Jun 2015 21:10, 郭谦 buptguoq...@gmail.com wrote:
Akhil Das,
You give me a new idea to solve the problem.
Vova provides me a way to solve the problem just before
Vova Shelgunovvvs...@gmail.com
Sample code for submitting job from any other java app, e.g. servlet:
http
Here's a bunch of configuration for that
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior
Thanks
Best Regards
On Fri, Jun 26, 2015 at 10:37 PM, igor.berman igor.ber...@gmail.com wrote:
Hi,
wanted to get some advice regarding tunning spark application
I see for some of
Which version of spark are you using? You can try changing the heap size
manually by *export _JAVA_OPTIONS=-Xmx5g *
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:52 PM, Yifan LI iamyifa...@gmail.com wrote:
Hi,
I just encountered the same problem, when I run a PageRank program which
has lots
You can create a SparkContext in your program and run it as a standalone
application without using spark-submit.
Here's something that will get you started:
//Create SparkContext
val sconf = new SparkConf()
.setMaster(spark://spark-ak-master:7077)
.setAppName(Test)
.
The input size is 512.0 MB (hadoop) / 4159106. Can this be reduced to 64
MB so as to increase the number of tasks. Similar to split size that
increases the number of mappers in Hadoop M/R.
On Thu, Jun 25, 2015 at 12:06 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Look in the tuning
Try to add them in the SPARK_CLASSPATH in your conf/spark-env.sh file
Thanks
Best Regards
On Thu, Jun 25, 2015 at 9:31 PM, Bin Wang binwang...@gmail.com wrote:
I am trying to run the Spark example code HBaseTest from command line
using spark-submit instead run-example, in that case, I can
You just need to set your HADOOP_HOME which appears to be null in the
stackstrace. If you are not having the winutils.exe, then you can download
https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
and put it there.
Thanks
Best Regards
On Thu, Jun 25, 2015 at 11:30 PM, Ashic
Which distributed database are you referring here? Spark can connect with
almost all those databases out there (You just need to pass the
Input/Output Format classes or there are a bunch of connectors also
available).
Thanks
Best Regards
On Fri, Jun 26, 2015 at 12:07 PM, louis.hust
Why do you want to do that?
Thanks
Best Regards
On Thu, Jun 25, 2015 at 10:16 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
Apparently, sc.paralleize (..) operation is performed in the driver
program not in the workers ! Is it possible to do this in worker process
for the sake of
Its a scala version conflict, can you paste your build.sbt file?
Thanks
Best Regards
On Fri, Jun 26, 2015 at 7:05 AM, stati srikanth...@gmail.com wrote:
Hello,
When I run a spark job with spark-submit it fails with below exception for
code line
/*val webLogDF =
JavaPairInputDStreamString, String messages =
KafkaUtils.createDirectStream(
jssc,
String.class,
String.class,
StringDecoder.class,
StringDecoder.class,
kafkaParams,
topicsSet
);
Here:
jssc = JavaStreamingContext
String.class = Key ,
Those provided spark libraries are compatible with scala 2.11?
Thanks
Best Regards
On Fri, Jun 26, 2015 at 4:48 PM, Srikanth srikanth...@gmail.com wrote:
Thanks Akhil for checking this out. Here is my build.sbt.
name := Weblog Analysis
version := 1.0
scalaVersion := 2.11.5
javacOptions
(๏̯͡๏) deepuj...@gmail.com wrote:
Its taking an hour and on Hadoop it takes 1h 30m, is there a way to make
it run faster ?
On Wed, Jun 24, 2015 at 11:39 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Cool. :)
On 24 Jun 2015 23:44, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
Its running now
a different guava dependency but the error
does go away this way
On Wed, Jun 24, 2015 at 10:04 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you try to add those jars in the SPARK_CLASSPATH and give it a try?
Thanks
Best Regards
On Wed, Jun 24, 2015 at 12:07 AM, Yana Kadiyska yana.kadiy
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:39 PM, 1106944...@qq.com 1106944...@qq.com
wrote:
Hi all
I have installed spark1.4, then want to use sparkR . assueme spark
master ip= node1, how to start sparkR ? and summit job to
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:39 PM, 1106944...@qq.com 1106944...@qq.com
wrote:
Hi all
I have installed spark1.4, then want to use sparkR . assueme spark
master ip= node1, how to start sparkR ? and summit job to
Can you look in the worker logs and see whats going on? It may happen that
you ran out of diskspace etc.
Thanks
Best Regards
On Thu, Jun 25, 2015 at 12:08 PM, barmaley o...@solver.com wrote:
I'm running Spark 1.3.1 on AWS... Having long-running application (spark
context) which accepts and
That totally depends on the way you extract the data. It will be helpful if
you can paste your code so that we will understand it better.
Thanks
Best Regards
On Wed, Jun 24, 2015 at 2:32 PM, William Ferrell wferr...@gmail.com wrote:
Hello -
I am using Apache Spark 1.2.1 via pyspark. Thanks
,
Is this the official R Package?
It is written : *NOTE: The API from the upcoming Spark release (1.4)
will not have the same API as described here. *
Thanks,
JC
ᐧ
2015-06-25 10:55 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com:
Here you go https://amplab-extras.github.io/SparkR-pkg/
Thanks
Best
Depending the size of the memory you are having, you ccould allocate 60-80%
of the memory for the spark worker process. Datanode doesn't require too
much memory.
On 23 Jun 2015 21:26, maxdml max...@cs.duke.edu wrote:
I'm wondering if there is a real benefit for splitting my memory in two for
Can you look a bit more in the error logs? It could be getting killed
because of OOM etc. One thing you can try is to set the
spark.shuffle.blockTransferService to nio from netty.
Thanks
Best Regards
On Wed, Jun 24, 2015 at 5:46 AM, ÐΞ€ρ@Ҝ (๏̯͡๏) deepuj...@gmail.com wrote:
I have a Spark job
Can you try to add those jars in the SPARK_CLASSPATH and give it a try?
Thanks
Best Regards
On Wed, Jun 24, 2015 at 12:07 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, I have been using Spark against an external Metastore service
which runs Hive with Cdh 4.6
In Spark 1.2, I was
A screenshot of your framework running would also be helpful. How many
cores does it have?
Did you try running it in coarse grained mode?
Try to add these to the conf:
sparkConf.set(spark.mesos.coarse, true)
sparkConfset(spark.cores.max, 2)
Thanks
Best Regards
On Wed, Jun 24, 2015 at 1:35 AM,
)
at
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
On Wed, Jun 24, 2015 at 7:16 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Can you look a bit more in the error logs? It could be getting
Why don't you do a normal .saveAsTextFiles?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 11:55 PM, anshu shukla anshushuk...@gmail.com
wrote:
Thanx for reply !!
YES , Either it should write on any machine of cluster or Can you please
help me ... that how to do this . Previously i was
Looks like a hostname conflict to me.
15/06/22 17:04:45 WARN Utils: Your hostname, datasci01.dev.abc.com resolves
to a loopback address: 127.0.0.1; using 10.0.3.197 instead (on interface
eth0)
15/06/22 17:04:45 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to
another address
Can you paste
May be while producing the messages, you can make it as a keyedMessage with
the timestamp as key and on the consumer end you can easily identify the
key (which will be the timestamp) from the message. If the network is fast
enough, then i think there could be a small millisecond lag.
Thanks
Best
Well, you could that (Stage information) is an ASCII representation of the
WebUI (running on port 4040). Since you set local[4] you will have 4
threads for your computation, and since you are having 2 receivers, you are
left with 2 threads to process ((0 + 2) -- This 2 is your 2 threads.) And
the
Did you happened to try this?
JavaPairRDDInteger, String hadoopFile = sc.hadoopFile(
/sigmoid, DataInputFormat.class, LongWritable.class,
Text.class)
Thanks
Best Regards
On Tue, Jun 23, 2015 at 6:58 AM, 付雅丹 yadanfu1...@gmail.com wrote:
Hello, everyone! I'm new in spark.
Use *spark.cores.max* to limit the CPU per job, then you can easily
accommodate your third job also.
Thanks
Best Regards
On Tue, Jun 23, 2015 at 5:07 PM, Wojciech Pituła w.pit...@gmail.com wrote:
I have set up small standalone cluster: 5 nodes, every node has 5GB of
memory an 8 cores. As you
Yes.
Thanks
Best Regards
On Mon, Jun 22, 2015 at 8:33 PM, Murthy Chelankuri kmurt...@gmail.com
wrote:
I have more than one jar. can we set sc.addJar multiple times with each
dependent jar ?
On Mon, Jun 22, 2015 at 8:30 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Try sc.addJar instead
You can use fileStream for that, look at the XMLInputFormat
https://github.com/apache/mahout/blob/ad84344e4055b1e6adff5779339a33fa29e1265d/examples/src/main/java/org/apache/mahout/classifier/bayes/XmlInputFormat.java
of mahout. It should give you full XML object as on record, (as opposed to
an XML
Could you elaborate a bit more? What do you meant by set up a standalone
server? and what is leading you to that exceptions?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 2:22 AM, nizang ni...@windward.eu wrote:
hi,
I'm trying to setup a standalone server, and in one of my tests, I got the
Totally depends on the use-case that you are solving with Spark, for
instance there was some discussion around the same which you could read
over here
http://apache-spark-user-list.1001560.n3.nabble.com/How-does-one-decide-no-of-executors-cores-memory-allocation-td23326.html
Thanks
Best Regards
Its pretty straight forward, this would get you started
http://stackoverflow.com/questions/24896233/how-to-save-apache-spark-schema-output-in-mysql-database
Thanks
Best Regards
On Mon, Jun 22, 2015 at 12:39 PM, Manohar753
manohar.re...@happiestminds.com wrote:
Hi Team,
How to split and
How are you submitting the application? Could you paste the code that you
are running?
Thanks
Best Regards
On Mon, Jun 22, 2015 at 5:37 PM, Sean Barzilay sesnbarzi...@gmail.com
wrote:
I am trying to run a function on every line of a parquet file. The
function is in an object. When I run the
Like this?
val rawXmls = ssc.fileStream(path, classOf[XmlInputFormat],
classOf[LongWritable],
classOf[Text])
Thanks
Best Regards
On Mon, Jun 22, 2015 at 5:45 PM, Yong Feng fengyong...@gmail.com wrote:
Thanks a lot, Akhil
I saw this mail thread before, but still do not understand how
Option 1 should be fine, Option 2 would bound a lot on network as the data
increase in time.
Thanks
Best Regards
On Mon, Jun 22, 2015 at 5:59 PM, Ashish Soni asoni.le...@gmail.com wrote:
Hi All ,
What is the Best Way to install and Spark Cluster along side with Hadoop
Cluster , Any
Have a look at
http://s3.thinkaurelius.com/docs/titan/0.5.0/titan-io-format.html You could
use those Input/Output formats with newAPIHadoopRDD api call.
Thanks
Best Regards
On Sun, Jun 21, 2015 at 8:50 PM, Madabhattula Rajesh Kumar
mrajaf...@gmail.com wrote:
Hi,
How to connect TItan
Not sure, but try removing the provided or create a lib directory in the
project home and bring that jar over there.
On 20 Jun 2015 18:08, Ritesh Kumar Singh riteshoneinamill...@gmail.com
wrote:
Hi,
I'm using IntelliJ ide for my spark project.
I've compiled spark 1.3.0 for scala 2.11.4 and
One workaround would be remove/move the files from the input directory once
you have it processed.
Thanks
Best Regards
On Fri, Jun 19, 2015 at 5:48 AM, Haopu Wang hw...@qilinsoft.com wrote:
Akhil,
From my test, I can see the files in the last batch will alwyas be
reprocessed upon
This is how i used to build a assembly jar with sbt:
Your build.sbt file would look like this:
*import AssemblyKeys._*
*assemblySettings*
*name := FirstScala*
*version := 1.0*
*scalaVersion := 2.10.4*
*libraryDependencies += org.apache.spark %% spark-core % 1.3.1*
*libraryDependencies +=
You can try setting these properties:
.set(spark.local.dir,/mnt/spark/)
.set(java.io.tmpdir,/mnt/spark/)
Thanks
Best Regards
On Fri, Jun 19, 2015 at 8:28 AM, yuemeng (A) yueme...@huawei.com wrote:
hi,all
if i want to change the /tmp folder to any other folder for spark ut use
Like this?
val add_msgs = KafkaUtils.createDirectStream[String, String, StringDecoder,
StringDecoder](
ssc, kafkaParams, Array(add).toSet)
val delete_msgs = KafkaUtils.createDirectStream[String, String,
StringDecoder, StringDecoder](
ssc, kafkaParams, Array(delete).toSet)
val
.setMaster(local) set it to local[2] or local[*]
Thanks
Best Regards
On Thu, Jun 18, 2015 at 5:59 PM, Bartek Radziszewski bar...@scalaric.com
wrote:
hi,
I'm trying to run simple kafka spark streaming example over spark-shell:
sc.stop
import org.apache.spark.SparkConf
import
Why not something like your mobile app pushes data to your webserver which
pushes the data to Kafka or Cassandra or any other database and have a
Spark streaming job running all the time operating on the incoming data and
pushes the calculated values back. This way, you don't have to start a
spark
Which version of spark? and what is your data source? For some reason, your
processing delay is exceeding the batch duration. And its strange that you
are not seeing any scheduling delay.
Thanks
Best Regards
On Thu, Jun 18, 2015 at 7:29 AM, Mike Fang chyfan...@gmail.com wrote:
Hi,
I have a
This might give you a good start
http://ampcamp.berkeley.edu/big-data-mini-course/movie-recommendation-with-mllib.html
its a bit old though.
Thanks
Best Regards
On Thu, Jun 18, 2015 at 2:33 PM, texol t.rebo...@gmail.com wrote:
Hi,
I'm new to GraphX and I'd like to use Machine Learning
You could possibly open up a JIRA and shoot an email to the dev list.
Thanks
Best Regards
On Wed, Jun 17, 2015 at 11:40 PM, jcai jonathon@yale.edu wrote:
Hi,
I am running this on Spark stand-alone mode. I find that when I examine the
web UI, a couple bugs arise:
1. There is a
Can you try repartitioning the rdd after creating the K,V. And also, while
calling the rdd1.join(rdd2, Pass the # partition argument too)
Thanks
Best Regards
On Wed, Jun 17, 2015 at 12:15 PM, Al M alasdair.mcbr...@gmail.com wrote:
I have 2 RDDs I want to Join. We will call them RDD A and RDD
Not sure why spark-submit isn't shipping your project jar (may be try with
--jars), You can do a sc.addJar(/path/to/your/project.jar) also, it should
solve it.
Thanks
Best Regards
On Wed, Jun 17, 2015 at 6:37 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks,
running into a pretty
Not quiet sure, but try pointing the spark.history.fs.logDirectory to your
s3
Thanks
Best Regards
On Tue, Jun 16, 2015 at 6:26 PM, Gianluca Privitera
gianluca.privite...@studio.unibo.it wrote:
In Spark website it’s stated in the View After the Fact section (
of reprocess some files.
Thanks
Best Regards
On Mon, Jun 15, 2015 at 2:49 PM, Haopu Wang hw...@qilinsoft.com wrote:
Akhil, thank you for the response. I want to explore more.
If the application is just monitoring a HDFS folder and output the word
count of each streaming batch into also HDFS
Did you look inside all logs? Mesos logs and executor logs?
Thanks
Best Regards
On Mon, Jun 15, 2015 at 7:09 PM, Gary Ogden gog...@gmail.com wrote:
My Mesos cluster has 1.5 CPU and 17GB free. If I set:
conf.set(spark.mesos.coarse, true);
conf.set(spark.cores.max, 1);
in the SparkConf
wrote:
Hi Akhil,
Thanks for your response.
I have 10 cores which sums of all my 3 machines and I am having 5-10
receivers.
I have tried to test the processed number of records per second by varying
number of receivers.
If I am having 10 receivers (i.e. one receiver for each core), then I
You can also look into https://spark.apache.org/docs/latest/tuning.html for
performance tuning.
Thanks
Best Regards
On Mon, Jun 15, 2015 at 10:28 PM, Rex X dnsr...@gmail.com wrote:
Thanks very much, Akhil.
That solved my problem.
Best,
Rex
On Mon, Jun 15, 2015 at 2:16 AM, Akhil Das ak
Whats in your executor (that .tgz file) conf/spark-default.conf file?
Thanks
Best Regards
On Mon, Jun 15, 2015 at 7:14 PM, Gary Ogden gog...@gmail.com wrote:
I'm loading these settings from a properties file:
spark.executor.memory=256M
spark.cores.max=1
spark.shuffle.consolidateFiles=true
I'm assuming by spark-client you mean the spark driver program. In that
case you can pick any machine (say Node 7), create your driver program in
it and use spark-submit to submit it to the cluster or if you create the
SparkContext within your driver program (specifying all the properties)
then
Something like this?
val huge_data = sc.textFile(/path/to/first.csv).map(x =
(x.split(\t)(1), x.split(\t)(0))
val gender_data = sc.textFile(/path/to/second.csv),map(x =
(x.split(\t)(0), x))
val joined_data = huge_data.join(gender_data)
joined_data.take(1000)
Its scala btw, python api should
Have a look here https://spark.apache.org/docs/latest/tuning.html
Thanks
Best Regards
On Mon, Jun 15, 2015 at 11:27 AM, Proust GZ Feng pf...@cn.ibm.com wrote:
Hi, Spark Experts
I have played with Spark several weeks, after some time testing, a reduce
operation of DataFrame cost 40s on a
I think it should be fine, that's the whole point of check-pointing (in
case of driver failure etc).
Thanks
Best Regards
On Mon, Jun 15, 2015 at 6:54 AM, Haopu Wang hw...@qilinsoft.com wrote:
Hi, can someone help to confirm the behavior? Thank you!
-Original Message-
From: Haopu
Yes, if you have enabled WAL and checkpointing then after the store, you
can simply delete the SQS Messages from your receiver.
Thanks
Best Regards
On Sat, Jun 13, 2015 at 6:14 AM, Michal Čizmazia mici...@gmail.com wrote:
I would like to have a Spark Streaming SQS Receiver which deletes SQS
Are you looking for something like filter? See a similar example here
https://spark.apache.org/examples.html
Thanks
Best Regards
On Sat, Jun 13, 2015 at 3:11 PM, Hao Wang bill...@gmail.com wrote:
Hi,
I have a bunch of large log files on Hadoop. Each line contains a log and
its severity. Is
I think the straight answer would be No, but yes you can actually hardcode
these parameters if you want. Look in the SparkContext.scala
https://github.com/apache/spark/blob/master/core%2Fsrc%2Fmain%2Fscala%2Forg%2Fapache%2Fspark%2FSparkContext.scala#L364
where all these properties are being
Looks like your spark is not able to pick up the HADOOP_CONF. To fix this,
you can actually add jets3t-0.9.0.jar to the classpath
(sc.addJar(/path/to/jets3t-0.9.0.jar).
Thanks
Best Regards
On Thu, Jun 11, 2015 at 6:44 PM, shahab shahab.mok...@gmail.com wrote:
Hi,
I tried to read a csv file
This is a good start, if you haven't read it already
http://spark.apache.org/docs/latest/streaming-programming-guide.html#dataframe-and-sql-operations
Thanks
Best Regards
On Thu, Jun 11, 2015 at 8:17 PM, 唐思成 jadetan...@qq.com wrote:
Hi all:
We are trying to using spark to do some real
You can disable shuffle spill (spark.shuffle.spill
http://spark.apache.org/docs/latest/configuration.html#shuffle-behavior)
if you are having enough memory to hold that much data. I believe adding
more resources would be your only choice.
Thanks
Best Regards
On Thu, Jun 11, 2015 at 9:46 PM, Al M
You can verify if the jars are shipped properly by looking at the driver UI
(running on 4040) Environment tab.
Thanks
Best Regards
On Sat, Jun 13, 2015 at 12:43 AM, Jonathan Coveney jcove...@gmail.com
wrote:
Spark version is 1.3.0 (will upgrade as soon as we upgrade past mesos
0.19.0)...
How many cores are you allocating for your job? And how many receivers are
you having? It would be good if you can post your custom receiver code, it
will help people to understand it better and shed some light.
Thanks
Best Regards
On Fri, Jun 12, 2015 at 12:58 PM, Chaudhary, Umesh
4040 is your driver port, you need to run some application. Login to your
cluster start a spark-shell and try accessing 4040.
Thanks
Best Regards
On Wed, Jun 10, 2015 at 3:51 PM, mrm ma...@skimlinks.com wrote:
Hi,
I am using Spark 1.3.1 standalone and I have a problem where my cluster is
RDD's are immutable, why not join two DStreams?
Not sure, but you can try something like this also:
kvDstream.foreachRDD(rdd = {
val file = ssc.sparkContext.textFile(/sigmoid/)
val kvFile = file.map(x = (x.split(,)(0), x))
rdd.join(kvFile)
})
Thanks
Best Regards
On
Opening your 4040 manually or ssh tunneling (ssh -L 4040:127.0.0.1:4040
master-ip, and then open localhost:4040 in browser.) will work for you then
.
Thanks
Best Regards
On Wed, Jun 10, 2015 at 5:10 PM, mrm ma...@skimlinks.com wrote:
Hi Akhil,
Thanks for your reply! I still cannot see port
May be you should update your spark version to the latest one.
Thanks
Best Regards
On Wed, Jun 10, 2015 at 11:04 AM, Chandrashekhar Kotekar
shekhar.kote...@gmail.com wrote:
Hi,
I have configured Spark to run on YARN. Whenever I start spark shell using
'spark-shell' command, it
Hope Swig http://www.swig.org/index.php and JNA
https://github.com/twall/jna/ might help for accessing c++ libraries from
Java.
Thanks
Best Regards
On Wed, Jun 10, 2015 at 11:50 AM, mahesht mahesh.s.tup...@gmail.com wrote:
There is C++ component which uses some model which we want to replace
standalone mode.
Any ideas?
Thanks
Dong Lei
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Tuesday, June 9, 2015 4:46 PM
*To:* Dong Lei
*Cc:* user@spark.apache.org
*Subject:* Re: ClassNotDefException when using spark-submit with
multiple jars and files located
Delete the checkpoint directory, you might have modified your driver
program.
Thanks
Best Regards
On Wed, Jun 10, 2015 at 9:44 PM, Ashish Nigam ashnigamt...@gmail.com
wrote:
Hi,
If checkpoint data is already present in HDFS, driver fails to load as it
is performing lookup on previous
Looks like libphp version is 5.6 now, which version of spark are you using?
Thanks
Best Regards
On Thu, Jun 11, 2015 at 3:46 AM, barmaley o...@solver.com wrote:
Launching using spark-ec2 script results in:
Setting up ganglia
RSYNC'ing /etc/ganglia to slaves...
...
Shutting down GANGLIA
401 - 500 of 1312 matches
Mail list logo