Re: Having multiple spark context

2017-01-29 Thread vincent gromakowski
A clustering lib is necessary to manage multiple jvm. Akka cluster for instance Le 30 janv. 2017 8:01 AM, "Rohit Verma" a écrit : > Hi, > > If I am right, you need to launch other context from another jvm. If you > are trying to launch from same jvm another context it will return you the > exist

Re: Having multiple spark context

2017-01-29 Thread Rohit Verma
Hi, If I am right, you need to launch other context from another jvm. If you are trying to launch from same jvm another context it will return you the existing context. Rohit On Jan 30, 2017, at 12:24 PM, Mark Hamstra mailto:m...@clearstorydata.com>> wrote: More than one Spark Context in a si

Re: Having multiple spark context

2017-01-29 Thread Mark Hamstra
More than one Spark Context in a single Application is not supported. On Sun, Jan 29, 2017 at 9:08 PM, wrote: > Hi, > > > > I have a requirement in which, my application creates one Spark context in > Distributed mode whereas another Spark context in local mode. > > When I am creating this, my c

Failures on JavaSparkContext. - "Futures timed out after [10000 milliseconds]"

2017-01-29 Thread Gili Nachum
Hi, I sometimes get these Random init failures in test and prod. Is there a use case that could lead to these errors? For example: Not enough cores? driver and worker not on the same LAN? etc... Running Spark 1.5.1. Retrying solves it. Caused by: java.util.concurrent.TimeoutException: Futures ti

Re: Error Saving Dataframe to Hive with Spark 2.0.0

2017-01-29 Thread Chetan Khatri
Okey, you are saying that 2.0.0 don't have that patch fixed ? @dev cc-- I don't like everytime changing the service versions ! Thanks. On Mon, Jan 30, 2017 at 1:10 AM, Jacek Laskowski wrote: > Hi, > > I think you have to upgrade to 2.1.0. There were few changes wrt the ERROR > since. > > Jacek

No Reducer scenarios

2017-01-29 Thread रविशंकर नायर
Dear all, 1) When we don't set the reducer class in driver program, IdentityReducer is invoked. 2) When we set setNumReduceTasks(0), no reducer, even IdentityReducer is invoked. Now, in the second scenario, we observed that the output is part-m-xx format(instead of part-r-xx format) , which sho

Having multiple spark context

2017-01-29 Thread jasbir.sing
Hi, I have a requirement in which, my application creates one Spark context in Distributed mode whereas another Spark context in local mode. When I am creating this, my complete application is working on only one SparkContext (created in Distributed mode). Second spark context is not getting cr

Re: question on spark streaming based on event time

2017-01-29 Thread kant kodali
Hi, I am sorry I made a really bad Typo. What I meant in my email was actually structured streaming so I wish I could do s/Spark Streaming/Structured Streaming/g. Thanks for the pointers looks like what I was looking for is actually watermarking since my question is all about what I should do if m

Streaming jobs getting longer

2017-01-29 Thread Saulo Ricci
Hi, I have 2 spark pipeline applications almost identical, but I found out a significant difference between their performance. Basically the 1st application consumes the streaming from Kafka, slice this streaming in batches of 1 minute and for each record calculates a score given the already load

Re: question on spark streaming based on event time

2017-01-29 Thread Tathagata Das
Spark Streaming (DStreams) wasnt designed keeping event-time in mind. Instead, we have designed Structured Streaming to naturally deal with event time. You should check that out. Here are the pointers. - Programming guide - http://spark.apache.org/docs/latest/structured-streaming-programming-guide

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Jörn Franke
You can use HDFS, S3, Azure, glusterfs, Ceph, ignite (in-memory ) a Spark cluster itself does not store anything it just processes. > On 29 Jan 2017, at 15:37, Alex wrote: > > But for persistance after intermediate processing can i use spark cluster > itself or i have to use hadoop clust

Re: Error Saving Dataframe to Hive with Spark 2.0.0

2017-01-29 Thread Jacek Laskowski
Hi, I think you have to upgrade to 2.1.0. There were few changes wrt the ERROR since. Jacek On 29 Jan 2017 9:24 a.m., "Chetan Khatri" wrote: Hello Spark Users, I am getting error while saving Spark Dataframe to Hive Table: Hive 1.2.1 Spark 2.0.0 Local environment. Note: Job is getting execut

Re: Examples in graphx

2017-01-29 Thread Felix Cheung
Which graph do you are thinking about? Here's one for neo4j https://neo4j.com/blog/neo4j-3-0-apache-spark-connector/ From: Deepak Sharma Sent: Sunday, January 29, 2017 4:28:19 AM To: spark users Subject: Examples in graphx Hi There, Are there any examples of usi

Re: Aggregator mutate b1 in place in merge

2017-01-29 Thread Koert Kuipers
thanks thats helpful On Sun, Jan 29, 2017 at 12:54 PM, Anton Okolnychyi < anton.okolnyc...@gmail.com> wrote: > Hi, > > I recently extended the Spark SQL programming guide to cover user-defined > aggregations, where I modified existing variables and returned them back in > reduce and merge. This a

RE: forcing dataframe groupby partitioning

2017-01-29 Thread Mendelson, Assaf
Could you explain why this would work? Assaf. From: Haviv, Daniel [mailto:dha...@amazon.com] Sent: Sunday, January 29, 2017 7:09 PM To: Mendelson, Assaf Cc: user@spark.apache.org Subject: Re: forcing dataframe groupby partitioning If there's no built in local groupBy, You could do something like

Re: Aggregator mutate b1 in place in merge

2017-01-29 Thread Anton Okolnychyi
Hi, I recently extended the Spark SQL programming guide to cover user-defined aggregations, where I modified existing variables and returned them back in reduce and merge. This approach worked and it was approved by people who know the context. Hope that helps. 2017-01-29 17:17 GMT+01:00 Koert K

Re: forcing dataframe groupby partitioning

2017-01-29 Thread Haviv, Daniel
If there's no built in local groupBy, You could do something like that: df.groupby(C1,C2).agg(...).flatmap(x=>x.groupBy(C1)).agg Thank you. Daniel On 29 Jan 2017, at 18:33, Mendelson, Assaf mailto:assaf.mendel...@rsa.com>> wrote: Hi, Consider the following example: df.groupby(C1,C2).agg(s

forcing dataframe groupby partitioning

2017-01-29 Thread Mendelson, Assaf
Hi, Consider the following example: df.groupby(C1,C2).agg(some agg).groupby(C1).agg(some more agg) The default way spark would behave would be to shuffle according to a combination of C1 and C2 and then shuffle again by C1 only. This behavior makes sense when one uses C2 to salt C1 for skew

Re: Aggregator mutate b1 in place in merge

2017-01-29 Thread Koert Kuipers
anyone? it not i will follow the trail and try to deduce it myself On Mon, Jan 23, 2017 at 2:31 PM, Koert Kuipers wrote: > looking at the docs for org.apache.spark.sql.expressions.Aggregator it > says for reduce method: "For performance, the function may modify `b` and > return it instead of con

Re: Hbase and Spark

2017-01-29 Thread Sudev A C
Hi Masf, Do try the official Hbase Spark. https://hbase.apache.org/book.html#spark I think you will have to build the jar from source and run your spark program with --packages . https://spark-packages.org/package/hortonworks-spark/shc says it's not yet published to Spark packages or Maven Repo.

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Mich Talebzadeh
Sorry mistake 1. Put the csv files into HDFS /apps//data/staging/ 2. Multiple csv files for the same table can co-exist 3. like val df1 = spark.read.option("header", false).csv(location) 4. once the csv file read into df then you can do loads of things. The csv files have to reside

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Mich Talebzadeh
you can use Spark directly on csv file. 1. Put the csv files into HDFS /apps//data/staging/ 2. Multiple csv files for the same table can co-exist 3. like df1 = spark.read.option("header", false).csv(location) 4. Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Alex
But for persistance after intermediate processing can i use spark cluster itself or i have to use hadoop cluster?! On Jan 29, 2017 7:36 PM, "Deepak Sharma" wrote: The better way is to read the data directly into spark using spark sql read jdbc . Apply the udf's locally . Then save the data frame

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Jörn Franke
I meant with distributed file system such as Ceph, Gluster etc... > On 29 Jan 2017, at 14:45, Jörn Franke wrote: > > One alternative could be the oracle Hadoop loader and other Oracle products, > but you have to invest some money and probably buy their Hadoop Appliance, > which you have to eva

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Deepak Sharma
The better way is to read the data directly into spark using spark sql read jdbc . Apply the udf's locally . Then save the data frame back to Oracle using dataframe's write jdbc. Thanks Deepak On Jan 29, 2017 7:15 PM, "Jörn Franke" wrote: > One alternative could be the oracle Hadoop loader and

nvitation to speak about GraphX at London Opensource Graph Technologies Meetup

2017-01-29 Thread haikal
Hi everyone, We're starting a new meetup in London: Opensource Graph Technologies. Our goal is to increase the awareness of opensource graph technologies and their applications to the London developer community. In the past week, there have been 119 people signing up to the group, and we're hoping

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Jörn Franke
One alternative could be the oracle Hadoop loader and other Oracle products, but you have to invest some money and probably buy their Hadoop Appliance, which you have to evaluate if it make sense (can get expensive with large clusters etc). Another alternative would be to get rid of Oracle allt

Examples in graphx

2017-01-29 Thread Deepak Sharma
Hi There, Are there any examples of using GraphX along with any graph DB? I am looking to persist the graph in graph based DB and then read it back in spark , process using graphx. -- Thanks Deepak www.bigdatabig.com www.keosha.net

Hbase and Spark

2017-01-29 Thread Masf
I´m trying to build an application where is necessary to do bulkGets and bulkLoad on Hbase. I think that I could use this component https://github.com/hortonworks-spark/shc *Is it a good option??* But* I can't import it in my project*. Sbt cannot resolve hbase connector This is my build.sbt:

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Mich Talebzadeh
This is classis nothing special about it. 1. You source is Oracle schema tables 2. You can use Oracle JDBC connection with DIRECT CONNECT and parallel processing to read your data from Oracle table into Spark FP using JDBC. Ensure that you are getting data from Oracle DB at a time whe

Re: spark architecture question -- Pleas Read

2017-01-29 Thread Alex
Hi All, Thanks for your response .. Please find below flow diagram Please help me out simplifying this architecture using Spark 1) Can i skip step 1 to step 4 and directly store it in spark if I am storing it in spark where actually it is getting stored Do i need to retain HAdoop to store data o

Error Saving Dataframe to Hive with Spark 2.0.0

2017-01-29 Thread Chetan Khatri
Hello Spark Users, I am getting error while saving Spark Dataframe to Hive Table: Hive 1.2.1 Spark 2.0.0 Local environment. Note: Job is getting executed successfully and the way I want but still Exception raised. *Source Code:* package com.chetan.poc.hbase /** * Created by chetan on 24/1/17.