A clustering lib is necessary to manage multiple jvm. Akka cluster for
instance
Le 30 janv. 2017 8:01 AM, "Rohit Verma" a
écrit :
> Hi,
>
> If I am right, you need to launch other context from another jvm. If you
> are trying to launch from same jvm another context it will return you the
> exist
Hi,
If I am right, you need to launch other context from another jvm. If you are
trying to launch from same jvm another context it will return you the existing
context.
Rohit
On Jan 30, 2017, at 12:24 PM, Mark Hamstra
mailto:m...@clearstorydata.com>> wrote:
More than one Spark Context in a si
More than one Spark Context in a single Application is not supported.
On Sun, Jan 29, 2017 at 9:08 PM, wrote:
> Hi,
>
>
>
> I have a requirement in which, my application creates one Spark context in
> Distributed mode whereas another Spark context in local mode.
>
> When I am creating this, my c
Hi,
I sometimes get these Random init failures in test and prod. Is there a use
case that could lead to these errors?
For example: Not enough cores? driver and worker not on the same LAN? etc...
Running Spark 1.5.1. Retrying solves it.
Caused by: java.util.concurrent.TimeoutException: Futures ti
Okey, you are saying that 2.0.0 don't have that patch fixed ? @dev cc--
I don't like everytime changing the service versions !
Thanks.
On Mon, Jan 30, 2017 at 1:10 AM, Jacek Laskowski wrote:
> Hi,
>
> I think you have to upgrade to 2.1.0. There were few changes wrt the ERROR
> since.
>
> Jacek
Dear all,
1) When we don't set the reducer class in driver program, IdentityReducer
is invoked.
2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is
invoked.
Now, in the second scenario, we observed that the output is part-m-xx
format(instead of part-r-xx format) , which sho
Hi,
I have a requirement in which, my application creates one Spark context in
Distributed mode whereas another Spark context in local mode.
When I am creating this, my complete application is working on only one
SparkContext (created in Distributed mode). Second spark context is not getting
cr
Hi,
I am sorry I made a really bad Typo. What I meant in my email was actually
structured streaming so I wish I could do s/Spark Streaming/Structured
Streaming/g. Thanks for the pointers looks like what I was looking for is
actually watermarking since my question is all about what I should do if m
Hi,
I have 2 spark pipeline applications almost identical, but I found out a
significant difference between their performance.
Basically the 1st application consumes the streaming from Kafka, slice this
streaming in batches of 1 minute and for each record calculates a score
given the already load
Spark Streaming (DStreams) wasnt designed keeping event-time in mind.
Instead, we have designed Structured Streaming to naturally deal with event
time. You should check that out. Here are the pointers.
- Programming guide -
http://spark.apache.org/docs/latest/structured-streaming-programming-guide
You can use HDFS, S3, Azure, glusterfs, Ceph, ignite (in-memory ) a Spark
cluster itself does not store anything it just processes.
> On 29 Jan 2017, at 15:37, Alex wrote:
>
> But for persistance after intermediate processing can i use spark cluster
> itself or i have to use hadoop clust
Hi,
I think you have to upgrade to 2.1.0. There were few changes wrt the ERROR
since.
Jacek
On 29 Jan 2017 9:24 a.m., "Chetan Khatri"
wrote:
Hello Spark Users,
I am getting error while saving Spark Dataframe to Hive Table:
Hive 1.2.1
Spark 2.0.0
Local environment.
Note: Job is getting execut
Which graph do you are thinking about?
Here's one for neo4j
https://neo4j.com/blog/neo4j-3-0-apache-spark-connector/
From: Deepak Sharma
Sent: Sunday, January 29, 2017 4:28:19 AM
To: spark users
Subject: Examples in graphx
Hi There,
Are there any examples of usi
thanks thats helpful
On Sun, Jan 29, 2017 at 12:54 PM, Anton Okolnychyi <
anton.okolnyc...@gmail.com> wrote:
> Hi,
>
> I recently extended the Spark SQL programming guide to cover user-defined
> aggregations, where I modified existing variables and returned them back in
> reduce and merge. This a
Could you explain why this would work?
Assaf.
From: Haviv, Daniel [mailto:dha...@amazon.com]
Sent: Sunday, January 29, 2017 7:09 PM
To: Mendelson, Assaf
Cc: user@spark.apache.org
Subject: Re: forcing dataframe groupby partitioning
If there's no built in local groupBy, You could do something like
Hi,
I recently extended the Spark SQL programming guide to cover user-defined
aggregations, where I modified existing variables and returned them back in
reduce and merge. This approach worked and it was approved by people who
know the context.
Hope that helps.
2017-01-29 17:17 GMT+01:00 Koert K
If there's no built in local groupBy, You could do something like that:
df.groupby(C1,C2).agg(...).flatmap(x=>x.groupBy(C1)).agg
Thank you.
Daniel
On 29 Jan 2017, at 18:33, Mendelson, Assaf
mailto:assaf.mendel...@rsa.com>> wrote:
Hi,
Consider the following example:
df.groupby(C1,C2).agg(s
Hi,
Consider the following example:
df.groupby(C1,C2).agg(some agg).groupby(C1).agg(some more agg)
The default way spark would behave would be to shuffle according to a
combination of C1 and C2 and then shuffle again by C1 only.
This behavior makes sense when one uses C2 to salt C1 for skew
anyone?
it not i will follow the trail and try to deduce it myself
On Mon, Jan 23, 2017 at 2:31 PM, Koert Kuipers wrote:
> looking at the docs for org.apache.spark.sql.expressions.Aggregator it
> says for reduce method: "For performance, the function may modify `b` and
> return it instead of con
Hi Masf,
Do try the official Hbase Spark.
https://hbase.apache.org/book.html#spark
I think you will have to build the jar from source and run your spark
program with --packages .
https://spark-packages.org/package/hortonworks-spark/shc says it's not yet
published to Spark packages or Maven Repo.
Sorry mistake
1. Put the csv files into HDFS /apps//data/staging/
2. Multiple csv files for the same table can co-exist
3. like val df1 = spark.read.option("header", false).csv(location)
4. once the csv file read into df then you can do loads of things. The
csv files have to reside
you can use Spark directly on csv file.
1. Put the csv files into HDFS /apps//data/staging/
2. Multiple csv files for the same table can co-exist
3. like df1 = spark.read.option("header", false).csv(location)
4.
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view
But for persistance after intermediate processing can i use spark cluster
itself or i have to use hadoop cluster?!
On Jan 29, 2017 7:36 PM, "Deepak Sharma" wrote:
The better way is to read the data directly into spark using spark sql read
jdbc .
Apply the udf's locally .
Then save the data frame
I meant with distributed file system such as Ceph, Gluster etc...
> On 29 Jan 2017, at 14:45, Jörn Franke wrote:
>
> One alternative could be the oracle Hadoop loader and other Oracle products,
> but you have to invest some money and probably buy their Hadoop Appliance,
> which you have to eva
The better way is to read the data directly into spark using spark sql read
jdbc .
Apply the udf's locally .
Then save the data frame back to Oracle using dataframe's write jdbc.
Thanks
Deepak
On Jan 29, 2017 7:15 PM, "Jörn Franke" wrote:
> One alternative could be the oracle Hadoop loader and
Hi everyone,
We're starting a new meetup in London: Opensource Graph Technologies. Our
goal is to increase the awareness of opensource graph technologies and their
applications to the London developer community. In the past week, there have
been 119 people signing up to the group, and we're hoping
One alternative could be the oracle Hadoop loader and other Oracle products,
but you have to invest some money and probably buy their Hadoop Appliance,
which you have to evaluate if it make sense (can get expensive with large
clusters etc).
Another alternative would be to get rid of Oracle allt
Hi There,
Are there any examples of using GraphX along with any graph DB?
I am looking to persist the graph in graph based DB and then read it back
in spark , process using graphx.
--
Thanks
Deepak
www.bigdatabig.com
www.keosha.net
I´m trying to build an application where is necessary to do bulkGets and
bulkLoad on Hbase.
I think that I could use this component
https://github.com/hortonworks-spark/shc
*Is it a good option??*
But* I can't import it in my project*. Sbt cannot resolve hbase
connector
This is my build.sbt:
This is classis nothing special about it.
1. You source is Oracle schema tables
2. You can use Oracle JDBC connection with DIRECT CONNECT and parallel
processing to read your data from Oracle table into Spark FP using JDBC.
Ensure that you are getting data from Oracle DB at a time whe
Hi All,
Thanks for your response .. Please find below flow diagram
Please help me out simplifying this architecture using Spark
1) Can i skip step 1 to step 4 and directly store it in spark
if I am storing it in spark where actually it is getting stored
Do i need to retain HAdoop to store data
o
Hello Spark Users,
I am getting error while saving Spark Dataframe to Hive Table:
Hive 1.2.1
Spark 2.0.0
Local environment.
Note: Job is getting executed successfully and the way I want but still
Exception raised.
*Source Code:*
package com.chetan.poc.hbase
/**
* Created by chetan on 24/1/17.
32 matches
Mail list logo