Hi,
I think the spark UI will be accessible whenever you launch a spark app in
the cluster it should be the Application Tracker link.
Regards,
Natu
On Tue, Sep 13, 2016 at 9:37 AM, Divya Gehlot
wrote:
> Hi ,
> Thank you all..
> Hurray ...I am able to view the hadoop web UI now @ 8088 . even
Hi,
I am working on a data pipeline in a Spark Streaming app that receives data
as a CSV regularly.
After some enrichment we send the data to another storage layer(ES in the
case). Some of the records in the incoming CSV might be repeated.
I am trying to devise a strategy based on MD5's of the l
Hi,
I wonder if it is possible to checkpoint only metadata and not the data in
RDD's and dataframes.
Thanks,
Natu
Hi,
I am running some spark loads . I notice that in it only uses one of the
machines(instead of the 3 available) of the cluster.
Is there any parameter that can be set to force it to use all the cluster.
I am using AWS EMR with Yarn.
Thanks,
Natu
Hi
Does anyone know wich one aws emr uses by default?
Thanks,
Natu
On Jun 16, 2016 5:12 PM, "David Newberger"
wrote:
> DataFrame is a collection of data which is organized into named columns.
>
> DataFrame.write is an interface for saving the contents of a DataFrame to
> external storage.
>
>
>
Hi,
You can select the common collumns and use DataFrame.union all .
Regards,
Natu
On Wed, Jun 15, 2016 at 8:57 PM, spR wrote:
> hi,
>
> how to concatenate spark dataframes? I have 2 frames with certain columns.
> I want to get a dataframe with columns from both the other frames.
>
> Regards,
Hi,
It seems to me that the checkpoint command is not persisting the
SparkContext hadoop configuration correctly . Can this be a possibility ?
Thanks,
Natu
On Mon, Jun 13, 2016 at 11:57 AM, Natu Lauchande
wrote:
> Hi,
>
> I am testing disaster recovery from Spark and having some is
Hi,
I am testing disaster recovery from Spark and having some issues when
trying to restore an input file from s3 :
2016-06-13 11:42:52,420 [main] INFO
org.apache.spark.streaming.dstream.FileInputDStream$FileInputDStreamCheckpointData
- Restoring files for time 146581086 ms -
[s3n://bucketfoo
Hi,
I am having the following error when using checkpoint in a spark streamming
app :
java.io.NotSerializableException: DStream checkpointing has been enabled
but the DStreams with their functions are not serializable
I am following the example available in
https://github.com/apache/spark/blob/m
Hi Mich,
I am also interested in the write up.
Regards,
Natu
On Thu, May 12, 2016 at 12:08 PM, Mich Talebzadeh wrote:
> Hi Al,,
>
>
> Following the threads in spark forum, I decided to write up on
> configuration of Spark including allocation of resources and configuration
> of driver, executo
Hi,
Not sure if this might be helpful to you :
https://github.com/ondra-m/ruby-spark .
Regards,
Natu
On Tue, May 10, 2016 at 4:37 PM, Lionel PERRIN
wrote:
> Hello,
>
>
>
> I’m looking for a solution to use jruby on top of spark. The only tricky
> point is that I need that every worker thread h
. If it’s 2 seconds then an RDD is created
> every 2 seconds.
>
>
>
> Cheers,
>
>
>
> *David*
>
>
>
> *From:* Natu Lauchande [mailto:nlaucha...@gmail.com]
> *Sent:* Tuesday, April 12, 2016 7:09 AM
> *To:* user@spark.apache.org
> *Subject:* DStream how
Hi,
What's the criteria for the number of RDD's created for each micro bath
iteration ?
Thanks,
Natu
Hi,
Is it possible to have both a sqlContext and a hiveContext in the same
application ?
If yes would there be any performance pernalties of doing so.
Regards,
Natu
How are you trying to run spark ? locally ? spark submit ?
On Sat, Apr 9, 2016 at 7:57 AM, maheshmath wrote:
> I have set SPARK_LOCAL_IP=127.0.0.1 still getting below error
>
> 16/04/09 10:36:50 INFO spark.SecurityManager: Changing view acls to: mahesh
> 16/04/09 10:36:50 INFO spark.SecurityMana
,
> Ben
>
> On Apr 8, 2016, at 9:15 PM, Natu Lauchande wrote:
>
> Hi Benjamin,
>
> I have done it . The critical configuration items are the ones below :
>
> ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl",
> "org.apache.hadoo
I don't see this happening without a store. You can try parquet on top of
hdfs. This will at least avoid third party systems burden.
On 09 Apr 2016 9:04 AM, "Daniela S" wrote:
> Hi,
>
> I would like to cache values and to use only the latest "valid" values to
> build a sum.
> In more detail, I r
s.aws.amazon.com/AmazonS3/latest/dev/NotificationHowTo.html
>
>
> On Fri, Apr 8, 2016 at 9:15 PM Natu Lauchande
> wrote:
>
>> Hi Benjamin,
>>
>> I have done it . The critical configuration items are the ones below :
>>
>>
Hi Benjamin,
I have done it . The critical configuration items are the ones below :
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.impl",
"org.apache.hadoop.fs.s3native.NativeS3FileSystem")
ssc.sparkContext.hadoopConfiguration.set("fs.s3n.awsAccessKeyId",
AccessKeyId)
ssc.spar
Hi,
I working on a spark streamming app , when in local i use the "local[*]" as
the master of my Spark Streamming Context .
I wonder what would be need to develop locally and run it in Yarn through
the IDE i am using IntelliJ idea.
Thanks,
Natu
Hi,
I am setting up a Scala spark streaming app in EMR . I wonder if anyone in
the list can help me with the following question :
1. What's the approach that you guys have been using to submit in an EMR
job step environment variables that will be needed by the Spark application
?
2. Can i have
Hi,
I am using spark streamming and using the input strategy of watching for
files in S3 directories. Using the textFileStream method in the streamming
context.
The filename contains relevant for my pipeline manipulation i wonder if
there is a more robust way to get this name other than captur
Hi Amit,
I don't see any default constructor in the JavaRDD docs
https://spark.apache.org/docs/latest/api/java/org/apache/spark/api/java/JavaRDD.html
.
Have you tried the following ?
JavaRDD jRDD[] ;
jRDD.add( jsc.textFile("/file1.txt") )
jRDD.add( jsc.textFile("/file2.txt") )
..
;
Natu
On S
Hi,
Looking here for the lookup function might help you:
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.rdd.PairRDDFunctions
Natu
On Sat, Oct 31, 2015 at 6:04 PM, swetha wrote:
> Hi,
>
> I have a requirement wherein I have to load data from hdfs, build an RDD
> and
I don't think so.
Spark is not keeping the results in memory unless you tell it too.
You have to explicitly call the cache method in your RDD:
linesWithSpark.cache()
Thanks,
Natu
On Fri, Oct 9, 2015 at 10:47 AM, vinod kumar
wrote:
> Hi Guys,
>
> May I know whether cache is enabled in spark
Hi,
Are you using EMR ?
Natu
On Sat, Sep 26, 2015 at 6:55 AM, SURAJ SHETH wrote:
> Hi Ankur,
> Thanks for the reply.
> This is already done.
> If I wait for a long amount of time(10 minutes), a few tasks get
> successful even on slave nodes. Sometime, a fraction of the tasks(20%) are
> complet
Are you using Scala in a distributed enviroment or in a standalone mode ?
Natu
On Thu, Dec 11, 2014 at 8:23 PM, ll wrote:
> hi.. i'm converting some of my machine learning python code into scala +
> spark. i haven't been able to run it on large dataset yet, but on small
> datasets (like http:/
elaborated in a chapter of an upcoming book that's available
> in early release; you can look at the accompanying source code to get
> some ideas too: https://github.com/sryza/aas/tree/master/kmeans
>
> On Mon, Nov 24, 2014 at 10:17 PM, Natu Lauchande
> wrote:
> > Hi all
Hi all,
I am getting started with Spark.
I would like to use for a spike on anomaly detection in a massive stream
of metrics.
Can Spark easily handle this use case ?
Thanks,
Natu
29 matches
Mail list logo