date:20161204

Re: Can spark support exactly once based kafka ? Due to these following question?

2016-12-04 Thread Michal Šenkýř

Hello John, 1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ? Spark's fault tolerance policy is, if there is a problem in processing a task or a

Re: How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-04 Thread Marco Mistroni

Hi In python you can use date time.fromtimestamp(..).strftime('%Y%m%d') Which spark API are you using? Kr On 5 Dec 2016 7:38 am, "Devi P.V" wrote: > Hi all, > > I have a dataframe like following, > > ++---+ > |client_id

How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

2016-12-04 Thread Devi P.V

Hi all, I have a dataframe like following, ++---+ |client_id |timestamp| ++---+ |cd646551-fceb-4166-acbc-b9|1477989416803 | |3bc61951-0f49-43bf-9848-b2|1477983725292 | |688a

Re: Access multiple cluster

2016-12-04 Thread ayan guha

Thank you guys. I will try JDBC route if I get access and let you know. On Mon, Dec 5, 2016 at 5:17 PM, Jörn Franke wrote: > If you do it frequently then you may simply copy the data to the > processing cluster. Alternatively, you could create an external table in > the processing cluster to the

Re: Access multiple cluster

2016-12-04 Thread Jörn Franke

If you do it frequently then you may simply copy the data to the processing cluster. Alternatively, you could create an external table in the processing cluster to the analytics cluster. However, this has to be supported by appropriate security configuration and might be less an efficient then c

how to use Dataset of transform method

2016-12-04 Thread LQ

how to write in java public Dataset transform(scala.Function1,Dataset> t)Concise syntax for chaining custom transformations.def featurize(ds: Dataset[T]): Dataset[U] = ...ds .transform(featurize) .transform(...) Parameters:t - (undocumented)Returns:(undocumented)Since

Can spark support exactly once based kafka ? Due to these following question?

2016-12-04 Thread John Fang

1. If a task complete the operation, it will notify driver. The driver may not receive the message due to the network, and think the task is still running. Then the child stage won't be scheduled ? 2. how do spark guarantee the downstream-task can receive the shuffle-data completely. As fact, I

Re: SVM regression in Spark

2016-12-04 Thread Evgenii Morozov

I don’t think there is such an algo. Originally SVM is for classification, but there is some twicked version that do regression, but unfortunately this is not available in apache spark, AFAIK. > On 01 Dec 2016, at 02:53, roni wrote: > > Hi Spark expert, > Can anyone help for doing SVR (Suppor

Re: Spark shuffle: FileNotFound exception

2016-12-04 Thread Evgenii Morozov

Swapnil, What do you think might be the size of the file that’s not found? For spark version below 2.0.0 there might be issues with blocks of size 2g. Is the file actually on a file system? I’d try to increase default parallelism to make sure partitions got smaller. Hope, this helps. > On 04

Re: Access multiple cluster

2016-12-04 Thread Mich Talebzadeh

The only way I think of would be accessing Hive tables through their respective thrift servers running on different clusters but not sure you can do it within Spark. Basically two different JDBC connections. HTH Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2

Access multiple cluster

2016-12-04 Thread ayan guha

Hi Is it possible to access hive tables sitting on multiple clusters in a single spark application? We have a data processing cluster and analytics cluster. I want to join a table from analytics cluster with another table in processing cluster and finally write back in analytics cluster. Best Ay

Re: Design patterns for Spark implementation

2016-12-04 Thread Pradeep Gaddam

I was hoping for someone to answer this question, As it resonates with many developers who are new to Spark and trying to adopt it at their work. Regards Pradeep On Dec 3, 2016, at 9:00 AM, Vasu Gourabathina mailto:vgour...@gmail.com>> wrote: Hi, I know this is a broad question. If this is no

RE: Creating schema from json representation

2016-12-04 Thread Mendelson, Assaf

Answering my own question (for those who are interested): val schema = df.schema val jsonString = schema.json val backToSchema = DataType.fromJson(jsonString).asInstanceOf[StructType] From: Mendelson, Assaf [mailto:assaf.mendel...@rsa.com] Sent: Sunday, December 04, 2016 11:11 AM To: user Subj

Creating schema from json representation

2016-12-04 Thread Mendelson, Assaf

Hi, I am trying to save a spark dataframe schema in scala. I can do df.schema.json to get the json string representation. Now I want to get the schema back from the json. However, it seems I need to parse the json string myself, get its fields object and generate the fields manually. Is there a b

Re: RDD getPartitions() size and HashPartitioner numPartitions

2016-12-04 Thread Manish Malhotra

Its a pretty nice question ! I'll trying to understand the problem, and see can help further. When you say CustomRDD I believe you will using it in the transformation stage, once the data is read from a file source like HDFS or Cassandra or Kafka. Now the RDD.getPartitions() should return the pa

Re: Can spark support exactly once based kafka ? Due to these following question?

Re: How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

How to convert a unix timestamp column into date format(yyyy-MM-dd) ?

Re: Access multiple cluster

Re: Access multiple cluster

how to use Dataset of transform method

Can spark support exactly once based kafka ? Due to these following question?

Re: SVM regression in Spark

Re: Spark shuffle: FileNotFound exception

Re: Access multiple cluster

Access multiple cluster

Re: Design patterns for Spark implementation

RE: Creating schema from json representation

Creating schema from json representation

Re: RDD getPartitions() size and HashPartitioner numPartitions

15 matches

Site Navigation

Mail list logo

Footer information