You just pasted your twitter credentials, consider changing it. :/
Thanks
Best Regards
On Wed, Aug 5, 2015 at 10:07 PM, narendra narencs...@gmail.com wrote:
Thanks Akash for the answer. I added endpoint to the listener and now it is
working.
--
View this message in context:
If you are using Kafka, then you can basically push an entire file as a
message to Kafka. In that case in your DStream, you will receive the single
message which is the contents of the file and it can of course span
multiple lines.
Thanks
Best Regards
On Mon, Aug 3, 2015 at 8:27 PM, Spark
One approach would be to use a Jobserver in between, create SparkContexts
in it. Lets say you create two, one which is configured to run on
coarse-grained and another set to fine-grained. Let the high priority jobs
hit the coarse-grained SparkContext and the other jobs use the fine-grained
one.
Just to add rdd.take(1) won't trigger the entire computation, it will just
pull out the first record. You need to do a rdd.count() or rdd.saveAs*Files
to trigger the complete pipeline. How many partitions do you see in the
last stage?
Thanks
Best Regards
On Tue, Aug 4, 2015 at 7:10 AM, ayan guha
that i want to ask is that i have used twitters streaming
api.and it seems that the above solution uses rest api. how can i used both
simultaneously ?
Any response will be much appreciated :)
Regards
On Tue, Aug 4, 2015 at 1:51 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Yes you can
Are you sitting behind a firewall and accessing a remote master machine? In
that case, have a look at this
http://spark.apache.org/docs/latest/configuration.html#networking, you
might want to fix few properties like spark.driver.host, spark.driver.host
etc.
Thanks
Best Regards
On Mon, Aug 3,
Currently RDDs are not encrypted, I think you can go ahead and open a JIRA
to add this feature and may be in future release it could be added.
Thanks
Best Regards
On Fri, Jul 31, 2015 at 1:47 PM, Matthew O'Reilly moreill...@qub.ac.uk
wrote:
Hi,
I am currently working on the latest version of
I guess it goes through that 500k files
https://github.com/apache/spark/blob/master/streaming/src/main/scala/org/apache/spark/streaming/dstream/FileInputDStream.scala#L193for
the first time and then use a filter from next time.
Thanks
Best Regards
On Fri, Jul 31, 2015 at 4:39 AM, Tathagata Das
LOL Brandon!
@ziqiu See http://spark.apache.org/community.html
You need to send an email to user-unsubscr...@spark.apache.org
Thanks
Best Regards
On Fri, Jul 31, 2015 at 2:06 AM, Brandon White bwwintheho...@gmail.com
wrote:
https://www.youtube.com/watch?v=JncgoPKklVE
On Thu, Jul 30, 2015
specific to my account?
Thanks in anticipation :)
On Thu, Jul 30, 2015 at 6:17 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Owh, this one fetches the public tweets, not the one specific to your
account.
Thanks
Best Regards
On Thu, Jul 30, 2015 at 6:11 PM, Sadaf Khan sa
It seem an issue with the ES connector
https://github.com/elastic/elasticsearch-hadoop/issues/482
Thanks
Best Regards
On Tue, Jul 28, 2015 at 6:14 AM, An Tran tra...@gmail.com wrote:
Hello all,
I am currently having an error with Spark SQL access Elasticsearch using
Elasticsearch Spark
What operation are you doing with streaming? Also can you look in the
datanode logs and see whats going on?
Thanks
Best Regards
On Tue, Jul 28, 2015 at 8:18 AM, guoqing0...@yahoo.com.hk
guoqing0...@yahoo.com.hk wrote:
Hi,
I got a error when running spark streaming as below .
Like this?
val data = sc.textFile(/sigmoid/audio/data/, 24).foreachPartition(urls =
speachRecognizer(urls))
Let 24 be the total number of cores that you have on all the workers.
Thanks
Best Regards
On Wed, Jul 29, 2015 at 6:50 AM, Peter Wolf opus...@gmail.com wrote:
Hello, I am writing a
Did you try removing this jar? build/sbt-launch-0.13.7.jar
Thanks
Best Regards
On Tue, Jul 28, 2015 at 12:08 AM, Rahul Palamuttam rahulpala...@gmail.com
wrote:
Hi All,
I hope this is the right place to post troubleshooting questions.
I've been following the install instructions and I get
You can easily push data to an intermediate storage from spark streaming
(like HBase or a SQL/NoSQL DB etc) and then power your dashboards with d3
js.
Thanks
Best Regards
On Tue, Jul 28, 2015 at 12:18 PM, UMESH CHAUDHARY umesh9...@gmail.com
wrote:
I have just started using Spark Streaming and
sc.parallelize takes a second parameter which is the total number of
partitions, are you using that?
Thanks
Best Regards
On Wed, Jul 29, 2015 at 9:27 PM, Kostas Kougios
kostas.koug...@googlemail.com wrote:
Hi, I do an sc.parallelize with a list of 512k items. But sometimes not all
executors
You can integrate it with any language (like php) and use ajax calls to
update the charts.
Thanks
Best Regards
On Thu, Jul 30, 2015 at 2:11 PM, UMESH CHAUDHARY umesh9...@gmail.com
wrote:
Thanks For the suggestion Akhil!
I looked at https://github.com/mbostock/d3/wiki/Gallery to know more
)
at java.lang.Thread.run(Thread.java:745)
*From:* Akhil Das [mailto:ak...@sigmoidanalytics.com]
*Sent:* Tuesday, July 28, 2015 2:30 PM
*To:* Manohar Reddy
*Cc:* user@spark.apache.org
*Subject:* Re: java.lang.ArrayIndexOutOfBoundsException: 0 on Yarn Client
You need to trigger an action on your
Put a try catch inside your code and inside the catch print out the length
or the list itself which causes the ArrayIndexOutOfBounds. It might happen
that some of your data is not proper.
Thanks
Best Regards
On Mon, Jul 27, 2015 at 8:24 PM, Manohar753 manohar.re...@happiestminds.com
wrote:
Hi
You need to find the bottleneck here, it could your network (if the data is
huge) or your producer code isn't pushing at 20k/s, If you are able to
produce at 20k/s then make sure you are able to receive at that rate (try
it without spark).
Thanks
Best Regards
On Sat, Jul 25, 2015 at 3:29 PM,
With s3n try this out:
*s3service.s3-endpoint*The host name of the S3 service. You should only
ever change this value from the default if you need to contact an
alternative S3 endpoint for testing purposes.
Default: s3.amazonaws.com
Thanks
Best Regards
On Tue, Jul 28, 2015 at 1:54 PM, Schmirr
Did you try it with just: (comment out line 27)
println Count of spark: + file.filter({s - s.contains('spark')}).count()
Thanks
Best Regards
On Sun, Jul 26, 2015 at 12:43 AM, tog guillaume.all...@gmail.com wrote:
Hi
I have been using Spark for quite some time using either scala or python.
One approach would be to store the batch data in an intermediate storage
(like HBase/MySQL or even in zookeeper), and inside your filter function
you just go and read the previous value from this storage and do whatever
operation that you are supposed to do.
Thanks
Best Regards
On Sun, Jul 26,
Did you try binding to 0.0.0.0?
Thanks
Best Regards
On Mon, Jul 27, 2015 at 10:37 PM, Wayne Song wayne.e.s...@gmail.com wrote:
Hello,
I am trying to start a Spark master for a standalone cluster on an EC2
node.
The CLI command I'm using looks like this:
Note that I'm specifying the
You need to trigger an action on your rowrdd for it to execute the map, you
can do a rowrdd.count() for that.
Thanks
Best Regards
On Tue, Jul 28, 2015 at 2:18 PM, Manohar Reddy
manohar.re...@happiestminds.com wrote:
Hi Akhil,
Thanks for thereply.I found the root cause but don’t know how
)
2015-07-27 11:17 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com:
So you are able to access your AWS S3 with s3a now? What is the error
that
you are getting when you try to access the custom storage with
fs.s3a.endpoint?
Thanks
Best Regards
On Mon, Jul 27, 2015 at 2:44 PM, Schmirr Wurst
How about IntelliJ? It also has a Terminal tab.
Thanks
Best Regards
On Fri, Jul 24, 2015 at 6:06 PM, saif.a.ell...@wellsfargo.com wrote:
Hi all,
I tried Notebook Incubator Zeppelin, but I am not completely happy with it.
What do you people use for coding? Anything with auto-complete,
Have a look at the current security support
https://spark.apache.org/docs/latest/security.html, Spark does not have
any encryption support for objects in memory out of the box. But if your
concern is to protect the data being cached in memory, then you can easily
encrypt your objects in memory
You can follow this doc
https://cwiki.apache.org/confluence/display/SPARK/Useful+Developer+Tools#UsefulDeveloperTools-IDESetup
Thanks
Best Regards
On Fri, Jul 24, 2015 at 10:56 AM, Siva Reddy ksiv...@gmail.com wrote:
Hi All,
I am trying to setup the Eclipse (LUNA) with Maven so that I
Its a serialization error with nested schema i guess. You can look at the
twitters chill avro serializer library. Here's two discussion on the same:
- https://issues.apache.org/jira/browse/SPARK-3447
-
Whats in your build.sbt? You could be messing with the scala version it
seems.
Thanks
Best Regards
On Fri, Jul 24, 2015 at 2:15 AM, Dan Dong dongda...@gmail.com wrote:
Hi,
When I ran with spark-submit the following simple Spark program of:
import org.apache.spark.SparkContext._
import
This spark.shuffle.sort.bypassMergeThreshold might help, You could also try
setting the shuffle manager to hash from sort. You can see more
configuration options from here
https://spark.apache.org/docs/latest/configuration.html#shuffle-behavior.
Thanks
Best Regards
On Fri, Jul 24, 2015 at 3:33
For each of your job, you can pass spark.ui.port to bind to a different
port.
Thanks
Best Regards
On Fri, Jul 24, 2015 at 7:49 PM, Joji John jj...@ebates.com wrote:
Thanks Ajay.
The way we wrote our spark application is that we have a generic python
code, multiple instances of which can
?
2015-07-20 18:11 GMT+02:00 Schmirr Wurst schmirrwu...@gmail.com:
Thanks, that is what I was looking for...
Any Idea where I have to store and reference the corresponding
hadoop-aws-2.6.0.jar ?:
java.io.IOException: No FileSystem for scheme: s3n
2015-07-20 8:33 GMT+02:00 Akhil Das
alternative from Python?
And also, I want to write the raw bytes of my object into files on disk,
and not using some Serialization format to be read back into Spark.
Is it possible?
Any alternatives for that?
Thanks,
Oren
On Thu, Jul 23, 2015 at 8:04 PM Akhil Das ak...@sigmoidanalytics.com
Best Regards
On Fri, Jul 24, 2015 at 11:25 AM, Zoran Jeremic zoran.jere...@gmail.com
wrote:
Hi Akhil,
Thank you for sending this code. My apologize if I will ask something that
is obvious here, since I'm newbie in Scala, but I still don't see how I can
use this code. Maybe my original
PM, ayan guha guha.a...@gmail.com wrote:
Hi Akhil
Thanks.Will definitely take a look. Couple of questions
1. Is it possible to use newHadoopAPI from dataframe.write or saveAs?
2. is esDF usable rom Python?
On Fri, Jul 24, 2015 at 2:29 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:
Did
I guess it would wait for sometime and throw up something like this:
Initial job has not accepted any resources; check your cluster UI to ensure
that workers are registered and have sufficient memory
Thanks
Best Regards
On Thu, Jul 23, 2015 at 7:53 AM, bit1...@163.com bit1...@163.com wrote:
Currently, the only way for you would be to create proper schema for the
data. This is not a bug, but you could open a jira (since this would help
others to solve their similar use-cases) for feature and in future version
it could be implemented and included.
Thanks
Best Regards
On Tue, Jul 21,
Here's a few more configurations
https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2#SettingUpHiveServer2-ConfigurationPropertiesinthehive-site.xmlFile
can't find anything on the timeouts though.
Thanks
Best Regards
On Wed, Jul 22, 2015 at 1:01 AM, Judy Nash
You can try adding that jar in SPARK_CLASSPATH (its deprecated though) in
spark-env.sh file.
Thanks
Best Regards
On Tue, Jul 21, 2015 at 7:34 PM, Michal Haris michal.ha...@visualdna.com
wrote:
I have a spark program that uses dataframes to query hive and I run it
both as a spark-shell for
Did you try:
val data = indexed_files.groupByKey
val *modified_data* = data.map { a =
var name = a._2.mkString(,)
(a._1, name)
}
*modified_data*.foreach { a =
var file = sc.textFile(a._2)
println(file.count)
}
Thanks
Best Regards
On Wed, Jul 22, 2015 at 2:18 AM, MorEru
It looks like its picking up the wrong namenode uri from the
HADOOP_CONF_DIR, make sure it is proper. Also for submitting a spark job to
a remote cluster, you might want to look at the spark.driver host and
spark.driver.port
Thanks
Best Regards
On Wed, Jul 22, 2015 at 8:56 PM, rok
Did you happened to look into esDF
https://github.com/elastic/elasticsearch-hadoop/issues/441? You can open
an issue over here if that doesn't solve your problem
https://github.com/elastic/elasticsearch-hadoop/issues
Thanks
Best Regards
On Tue, Jul 21, 2015 at 5:33 PM, ayan guha
You can look into .saveAsObjectFiles
Thanks
Best Regards
On Thu, Jul 23, 2015 at 8:44 PM, Oren Shpigel o...@yowza3d.com wrote:
Hi,
I use Spark to read binary files using SparkContext.binaryFiles(), and then
do some calculations, processing, and manipulations to get new objects
(also
,#android.toLowerCase,#iphone.toLowerCase))*
val newRDD = samplehashtags.map { x = (x,1) }
val joined = newRDD.join(rdd)
joined
})
filteredStream.print()
Thanks
Best Regards
On Wed, Jul 22, 2015 at 3:58 AM, Zoran Jeremic zoran.jere...@gmail.com
wrote:
Hi Akhil and Jorn,
I
where I have to store and reference the corresponding
hadoop-aws-2.6.0.jar ?:
java.io.IOException: No FileSystem for scheme: s3n
2015-07-20 8:33 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com:
Not in the uri, but in the hadoop configuration you can specify it.
property
namefs.s3a.endpoint
I'd suggest you upgrading to 1.4 as it has better metrices and UI.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 7:01 PM, Shushant Arora shushantaror...@gmail.com
wrote:
Is coalesce not applicable to kafkaStream ? How to do coalesce on
kafkadirectstream its not there in api ?
Shall calling
Do you have HADOOP_HOME, HADOOP_CONF_DIR and hadoop's winutils.exe in the
environment?
Thanks
Best Regards
On Mon, Jul 20, 2015 at 5:45 PM, nitinkalra2000 nitinkalra2...@gmail.com
wrote:
Hi All,
I am working on Spark 1.4 on windows environment. I have to set eventLog
directory so that I can
Here's two ways of doing that:
Without the filter function :
JavaPairDStreamString, String foo =
ssc.String, String, SequenceFileInputFormatfileStream(/tmp/foo);
With the filter function:
JavaPairInputDStreamLongWritable, Text foo = ssc.fileStream(/tmp/foo,
LongWritable.class,
(fs.s3n.endpoint,test.com
)
And I continue to get my data from amazon, how could it be ? (I also
use s3n in my text url)
2015-07-21 9:30 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com:
You can add the jar in the classpath, and you can set the property like:
sc.hadoopConfiguration.set(fs.s3a.endpoint
...@gmail.com
wrote:
Hi Akhil,
I don't have HADOOP_HOME or HADOOP_CONF_DIR and even winutils.exe ? What's
the configuration required for this ? From where can I get winutils.exe ?
Thanks and Regards,
Nitin Kalra
On Tue, Jul 21, 2015 at 1:30 PM, Akhil Das ak...@sigmoidanalytics.com
wrote
It could be a GC pause or something, you need to check in the stages tab
and see what is taking time, If you upgrade to Spark 1.4, it has better UI
and DAG visualization which helps you debug better.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 8:21 PM, Pa Rö paul.roewer1...@googlemail.com
wrote:
) is assumed.
/description
/property
Thanks
Best Regards
On Sun, Jul 19, 2015 at 9:13 PM, Schmirr Wurst schmirrwu...@gmail.com
wrote:
I want to use pithos, were do I can specify that endpoint, is it
possible in the url ?
2015-07-19 17:22 GMT+02:00 Akhil Das ak...@sigmoidanalytics.com
Just make sure there is no firewall/network blocking the requests as its
complaining about timeout.
Thanks
Best Regards
On Mon, Jul 20, 2015 at 1:14 AM, ankit tyagi ankittyagi.mn...@gmail.com
wrote:
Just to add more information. I have checked the status of this file, not
a single block is
Jorn meant something like this:
val filteredStream = twitterStream.transform(rdd ={
val newRDD = scc.sc.textFile(/this/file/will/be/updated/frequently).map(x
= (x,1))
rdd.join(newRDD)
})
newRDD will work like a filter when you do the join.
Thanks
Best Regards
On Sun, Jul 19, 2015 at 9:32
Could you name the Storage service that you are using? Most of them
provides a S3 like RestAPI endpoint for you to hit.
Thanks
Best Regards
On Fri, Jul 17, 2015 at 2:06 PM, Schmirr Wurst schmirrwu...@gmail.com
wrote:
Hi,
I wonder how to use S3 compatible Storage in Spark ?
If I'm using
. (no matrices loaded), Same exception is
coming.
Can anyone tell what createDataFrame does internally? Are there any
alternatives for it?
On Fri, Jul 17, 2015 at 6:43 PM, Akhil Das ak...@sigmoidanalytics.com
wrote:
I suspect its the numpy filling up Memory.
Thanks
Best Regards
On Fri
Did you try inputs.repartition(1).foreachRDD(..)?
Thanks
Best Regards
On Fri, Jul 17, 2015 at 9:51 PM, PAULI, KEVIN CHRISTIAN
[AG-Contractor/1000] kevin.christian.pa...@monsanto.com wrote:
Spark newbie here, using Spark 1.3.1.
I’m consuming a stream and trying to pipe the data from the
Can you paste the code? How much memory does your system have and how big
is your dataset? Did you try df.persist(StorageLevel.MEMORY_AND_DISK)?
Thanks
Best Regards
On Fri, Jul 17, 2015 at 5:14 PM, Harit Vishwakarma
harit.vishwaka...@gmail.com wrote:
Thanks,
Code is running on a single
= sqlCtx.createDataFrame(rdd2)
4. df.save() # in parquet format
It throws exception in createDataFrame() call. I don't know what exactly
it is creating ? everything in memory? or can I make it to persist
simultaneously while getting created.
Thanks
On Fri, Jul 17, 2015 at 5:16 PM, Akhil Das ak
Which version of spark are you using? insertIntoJDBC is deprecated (from
1.4.0), you may use write.jdbc() instead.
Thanks
Best Regards
On Wed, Jul 15, 2015 at 2:43 PM, Manohar753 manohar.re...@happiestminds.com
wrote:
Hi All,
Am trying to add few new rows for existing table in mysql using
Did you try this?
*val out=lines.filter(xx={*
val y=xx
val x=broadcastVar.value
var flag:Boolean=false
for(a-x)
{
if(y.contains(a))
flag=true
}
flag
}
*})*
Thanks
Best Regards
On Wed, Jul 15, 2015 at 8:10 PM, Naveen Dabas naveen.u...@ymail.com wrote:
I
Yes you can do that, just make sure you rsync the same file to the same
location on every machine.
Thanks
Best Regards
On Thu, Jul 16, 2015 at 5:50 AM, Julien Beaudan jbeau...@stottlerhenke.com
wrote:
Hi all,
Is it possible to use Spark to assign each machine in a cluster the same
task, but
I think any requests going to s3*:// requires the credentials. If they have
made it public (via http) then you won't require the keys.
Thanks
Best Regards
On Wed, Jul 15, 2015 at 2:26 AM, Pagliari, Roberto rpagli...@appcomsci.com
wrote:
Hi Sujit,
I just wanted to access public datasets on
Try to repartition it to a higher number (at least 3-4 times the total # of
cpu cores). What operation are you doing? It may happen that if you are
doing a join/groupBy sort of operation that task which is taking time is
having all the values, in that case you need to use a Partitioner which
will
This is where you can get started
https://spark.apache.org/docs/latest/sql-programming-guide.html
Thanks
Best Regards
On Mon, Jul 13, 2015 at 3:54 PM, vinod kumar vinodsachin...@gmail.com
wrote:
Hi Everyone,
I am developing application which handles bulk of data around
millions(This may
1. Yes open up the webui running on 8080 to see the memory/cores allocated
to your workers, and open up the ui running on 4040 and click on the
Executor tab to see the memory allocated for the executor.
2. mllib codes can be found over here
https://github.com/apache/spark/tree/master/mllib and
Why not add a trigger to your database table and whenever its updated push
the changes to kafka etc and use normal sparkstreaming? You can also write
a receiver based architecture
https://spark.apache.org/docs/latest/streaming-custom-receivers.html for
this, but that will be a bit time consuming.
Look in the worker logs and see whats going on.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 4:02 PM, Arthur Chan arthur.hk.c...@gmail.com
wrote:
Hi,
I use Spark 1.4. When saving the model to HDFS, I got error?
Please help!
Regards
my scala command:
have more memory, also if you have enough cores 4 records are
nothing.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 3:09 PM, vinod kumar vinodsachin...@gmail.com
wrote:
Hi Akhil
Is my choice to switch to spark is good? because I don't have enough
information regards limitation and working
Someone else also reported this error with spark 1.4.0
Thanks
Best Regards
On Tue, Jul 14, 2015 at 6:57 PM, Arthur Chan arthur.hk.c...@gmail.com
wrote:
Hi, Below is the log form the worker.
15/07/14 17:18:56 ERROR FileAppender: Error writing stream to file
Try adding it in your SPARK_CLASSPATH inside conf/spark-env.sh file.
Thanks
Best Regards
On Tue, Jul 14, 2015 at 7:05 AM, Jerrick Hoang jerrickho...@gmail.com
wrote:
Hi all,
I'm having conf/hive-site.xml pointing to my Hive metastore but sparksql
CLI doesn't pick it up. (copying the same
Can you paste your conf/spark-env.sh file? Put SPARK_MASTER_IP as the
master machine's host name in spark-env.sh file. Also add your slaves
hostnames into conf/slaves file and do a sbin/start-all.sh
Thanks
Best Regards
On Tue, Jul 14, 2015 at 1:26 PM, sivarani whitefeathers...@gmail.com
wrote:
wrote:
Hi Akhil,
It's interesting if RDDs are stored internally in a columnar format as
well?
Or it is only when an RDD is cached in SQL context, it is converted to
columnar format.
What about data frames?
Thanks!
--
Ruslan Dautkhanov
On Fri, Jul 10, 2015 at 2:07 AM, Akhil Das ak
You are a bit confused about master node, slave node and the driver
machine.
1. Master node can be kept as a smaller machine in your dev environment,
mostly in production you will be using Mesos or Yarn cluster manager.
2. Now, if you are running your driver program (the streaming job) on the
Just make sure you are having the same installation of
spark-1.4.0-bin-hadoop2.6 everywhere. (including the slaves, master, and
from where you start the spark-shell).
Thanks
Best Regards
On Mon, Jul 13, 2015 at 4:34 AM, Eduardo erocha@gmail.com wrote:
My installation of spark is not
Yes, that is correct. You can use this boiler plate to avoid spark-submit.
//The configurations
val sconf = new SparkConf()
.setMaster(spark://spark-ak-master:7077)
.setAppName(SigmoidApp)
.set(spark.serializer, org.apache.spark.serializer.KryoSerializer)
Did you try setting the HADOOP_CONF_DIR?
Thanks
Best Regards
On Sat, Jul 11, 2015 at 3:17 AM, maxdml maxdemou...@gmail.com wrote:
Also, it's worth noting that I'm using the prebuilt version for hadoop 2.4
and higher from the official website.
--
View this message in context:
Can you not use sc.wholeTextFile() and use a custom parser or a regex to
extract out the TransactionIDs?
Thanks
Best Regards
On Sat, Jul 11, 2015 at 8:18 AM, ssbiox sergey.korytni...@gmail.com wrote:
Hello,
I have a very specific question on how to do a search between particular
lines of
Can you dig a bit more in the worker logs? Also make sure that spark has
permission to write to /opt/ on that machine as its one machine always
throwing up.
Thanks
Best Regards
On Sat, Jul 11, 2015 at 11:18 PM, gaurav sharma sharmagaura...@gmail.com
wrote:
Hi All,
I am facing this issue in
Here's an example https://github.com/przemek1990/spark-streaming
Thanks
Best Regards
On Thu, Jul 9, 2015 at 4:35 PM, diplomatic Guru diplomaticg...@gmail.com
wrote:
Hello all,
I'm trying to configure the flume to push data into a sink so that my
stream job could pick up the data. My events
When you connect to the machines you can create an ssh tunnel to access the
UI :
ssh -L 8080:127.0.0.1:8080 MasterMachinesIP
And then you can simply open localhost:8080 in your browser and it should
show up the UI.
Thanks
Best Regards
On Thu, Jul 9, 2015 at 7:44 PM, rroxanaioana
It seems an issue with the azure, there was a discussion over here
https://azure.microsoft.com/en-in/documentation/articles/hdinsight-hadoop-spark-install/
Thanks
Best Regards
On Thu, Jul 9, 2015 at 9:42 PM, Daniel Haviv
daniel.ha...@veracity-group.com wrote:
Hi,
I'm running Spark 1.4 on
https://spark.apache.org/docs/latest/sql-programming-guide.html#caching-data-in-memory
Thanks
Best Regards
On Fri, Jul 10, 2015 at 10:05 AM, vinod kumar vinodsachin...@gmail.com
wrote:
Hi Guys,
Can any one please share me how to use caching feature of spark via spark
sql queries?
-Vinod
that's because sc is already initialized. You can do sc.stop() before you
initialize another one.
Thanks
Best Regards
On Fri, Jul 10, 2015 at 3:54 PM, Prateek . prat...@aricent.com wrote:
Hi,
I am running single spark-shell but observing this error when I give val
sc = new
Looks like a configuration problem with your spark setup, are you running
the driver on a different network? Can you try a simple program from
spark-shell and make sure your setup is proper? (like sc.parallelize(1 to
1000).collect())
Thanks
Best Regards
On Thu, Jul 9, 2015 at 1:02 AM, ÐΞ€ρ@Ҝ
On Wed, Jul 8, 2015 at 7:31 PM, Ashish Dutt ashish.du...@gmail.com wrote:
Hi,
We have a cluster with 4 nodes. The cluster uses CDH 5.4 for the past two
days I have been trying to connect my laptop to the server using spark
master ip:port but its been unsucessful.
The server contains data
Did you try sc.shutdown and creating a new one?
Thanks
Best Regards
On Wed, Jul 8, 2015 at 8:12 PM, Terry Hole hujie.ea...@gmail.com wrote:
I am using spark 1.4.1rc1 with default hive settings
Thanks
- Terry
Hi All,
I'd like to use the hive context in spark shell, i need to recreate the
Yes, just to add see the following scenario of rdd lineage:
RDD1 - RDD2 - RDD3 - RDD4
here RDD2 depends on the RDD1's output and the lineage goes till RDD4.
Now, for some reason RDD3 is lost, and spark will recompute it from RDD2.
Thanks
Best Regards
On Thu, Jul 9, 2015 at 5:51 AM, canan
Have a look
http://alvinalexander.com/scala/how-to-create-java-thread-runnable-in-scala,
create two threads and call thread1.start(), thread2.start()
Thanks
Best Regards
On Wed, Jul 8, 2015 at 1:06 PM, Ashish Dutt ashish.du...@gmail.com wrote:
Thanks for your reply Akhil.
How do you
Its showing connection refused, for some reason it was not able to connect
to the machine either its the machine\s start up time or its with the
security group.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 2:04 AM, Pagliari, Roberto rpagli...@appcomsci.com
wrote:
I'm following the tutorial
Whats the point of creating them in parallel? You can multi-thread it run
it in parallel though.
Thanks
Best Regards
On Wed, Jul 8, 2015 at 5:34 AM, Brandon White bwwintheho...@gmail.com
wrote:
Say I have a spark job that looks like following:
def loadTable1() {
val table1 =
Strange. What are you having in $SPARK_MASTER_IP? It may happen that it is
not able to bind to the given ip but again it should be in the logs.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 12:54 AM, maxdml maxdemou...@gmail.com wrote:
Hi,
I've been compiling spark 1.4.0 with SBT, from the
Did you try kryo? Wrap everything with kryo and see if you are still
hitting the exception. (At least you could see a different exception stack).
Thanks
Best Regards
On Tue, Jul 7, 2015 at 6:05 AM, Yana Kadiyska yana.kadiy...@gmail.com
wrote:
Hi folks, suffering from a pretty strange issue:
Can you try adding sc.stop at the end of your program? looks like its
having a hard-time closing off sparkcontext.
Thanks
Best Regards
On Tue, Jul 7, 2015 at 4:08 PM, Hafsa Asif hafsa.a...@matchinguu.com
wrote:
Hi,
I run the following simple Java spark standalone app with maven command
Here's a simplified example:
SparkConf conf = new SparkConf().setAppName(
Sigmoid).setMaster(local);
JavaSparkContext sc = new JavaSparkContext(conf);
ListString user = new ArrayListString();
user.add(Jack);
user.add(Jill);
instances having successively run on the same
machine?
--
Henri Maxime Demoulin
2015-07-07 4:10 GMT-04:00 Akhil Das ak...@sigmoidanalytics.com:
Strange. What are you having in $SPARK_MASTER_IP? It may happen that it
is not able to bind to the given ip but again it should be in the logs.
Thanks
If you don't want those logs flood your screen, you can disable it simply
with:
import org.apache.log4j.{Level, Logger}
Logger.getLogger(org).setLevel(Level.OFF)
Logger.getLogger(akka).setLevel(Level.OFF)
Thanks
Best Regards
On Sun, Jul 5, 2015 at 7:27 PM, Hellen
Try with *spark.cores.max*, executor cores is used when you usually run it
on yarn mode.
Thanks
Best Regards
On Mon, Jul 6, 2015 at 1:22 AM, nizang ni...@windward.eu wrote:
hi,
We're running spark 1.4.0 on ec2, with 6 machines, 4 cores each. We're
trying to run an application on a number of
301 - 400 of 1312 matches
Mail list logo