date:20160813

[no subject]

2016-08-13 Thread Jestin Ma

Hi, I'm currently trying to perform an outer join between two DataFrames/Sets, one is ~150GB, one is about ~50 GB on a column, id. df1.id is skewed in that there are many 0's, the rest being unique IDs. df2.id is not skewed. If I filter df1.id != 0, then the join works well. If I don't, then the

Re: Does Spark SQL support indexes?

2016-08-13 Thread Jörn Franke

Use a format that has built-in indexes, such as Parquet or Orc. Do not forget to sort the data on the columns that your filter on. > On 14 Aug 2016, at 05:03, Taotao.Li wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an index > on my temp table? if not, how can

How Spark sql query optimisation work if we are using .rdd action ?

2016-08-13 Thread mayur bhole

HI All, Lets say, we have val df = bigTableA.join(bigTableB,bigTableA("A")===bigTableB("A"),"left") val rddFromDF = df.rdd println(rddFromDF.count) My understanding is that spark will convert all data frame operations before "rddFromDF.count" into RDD equivalent operation as we are not performin

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread Ted Yu

Can you (or David) resend David's reply ? I don't see the reply in this thread. Thanks > On Aug 13, 2016, at 8:39 PM, yaochunnan wrote: > > Hi David, > Your answers have solved my problem! Detailed and accurate. Thank you very > much! > > > > -- > View this message in context: > http://a

Re: Why I can't use broadcast var defined in a global object?

2016-08-13 Thread yaochunnan

Hi David, Your answers have solved my problem! Detailed and accurate. Thank you very much! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Why-I-can-t-use-broadcast-var-defined-in-a-global-object-tp27523p27531.html Sent from the Apache Spark User List maili

Re: Does Spark SQL support indexes?

2016-08-13 Thread Chanh Le

Hi Taotao, Spark SQL doesn’t support index :). > On Aug 14, 2016, at 10:03 AM, Taotao.Li wrote: > > > hi, guys, does Spark SQL support indexes? if so, how can I create an index > on my temp table? if not, how can I handle some specific queries on a very > large table? it would iterate al

Does Spark SQL support indexes?

2016-08-13 Thread Taotao.Li

hi, guys, does Spark SQL support indexes? if so, how can I create an index on my temp table? if not, how can I handle some specific queries on a very large table? it would iterate all the table even though all I want is just a small piece of that table. great thanks, *___* Quant

Re: KafkaUtils.createStream not picking smallest offset

2016-08-13 Thread Diwakar Dhanuskodi

Not using check pointing now. Source is producing 1.2million messages to topic. We are using zookeeper offsets for other downstreams too. That's the reason going with createstream which stores offsets in zookeeper. Sent from Samsung Mobile. Original message Fr

Re: mesos or kubernetes ?

2016-08-13 Thread Jacek Laskowski

Hi, Thanks Michael! That's exactly what I missed in my understanding of the different options for Spark on XYZ. Thanks! And the last sentence was excellent to help me understand DC/OS to, say, CDH. Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 ht

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Jacek Laskowski

Hi, The point is that I could go full-type with Dataset[String] and wonder why it's possible with ints. You're working with DataFrames which are Dataset[Row]. It's too little to me these days :) Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http:

Re: mesos or kubernetes ?

2016-08-13 Thread Michael Gummelt

DC/OS Spark *is* Apache Spark on Mesos, along with some packaging that makes it easy to install and manage on DC/OS. For example: $ dcos package install spark $ dcos spark run --submit-args="--class SparkPi ..." The single command install gives runs the cluster dispatcher and the history server

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Mich Talebzadeh

Would not that be as simple as: scala> (0 to 9).toDF res14: org.apache.spark.sql.DataFrame = [value: int] scala> (0 to 9).toDF.map(_.toString) res13: org.apache.spark.sql.Dataset[String] = [value: string] with my little knowledge Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profil

Re: Spark 2.0.0 - Java API - Modify a column in a dataframe

2016-08-13 Thread Jacek Laskowski

Hi, Could Encoders.STRING work? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Thu, Aug 11, 2016 at 5:28 AM, Aseem Bansal wrote: > Hi > > I have a Dataset >

Re: mesos or kubernetes ?

2016-08-13 Thread Jacek Laskowski

Hi, I'm wondering why not DC/OS (with Mesos)? Pozdrawiam, Jacek Laskowski https://medium.com/@jaceklaskowski/ Mastering Apache Spark 2.0 http://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Sat, Aug 13, 2016 at 11:24 AM, guyoh wrote: > My company is tryi

[SQL] Why does (0 to 9).toDF("num").as[String] work?

2016-08-13 Thread Jacek Laskowski

Hi, Just ran into it and can't explain why it works. Please help me understand it. Q1: Why can I `as[String]` with Ints? Is this type safe? scala> (0 to 9).toDF("num").as[String] res12: org.apache.spark.sql.Dataset[String] = [num: int] Q2: Why can I map over strings even though there are really

Re: mesos or kubernetes ?

2016-08-13 Thread Shuai Lin

Good summary! One more advantage of running spark on mesos: community support. There are quite a big user base that runs spark on mesos, so if you encounter a problem with your deployment, it's very likely you can get the answer by a simple google search, or asking in the spark/mesos user list. By

Re: mesos or kubernetes ?

2016-08-13 Thread Michael Gummelt

Spark has a first-class scheduler for Mesos, whereas it doesn't for Kubernetes. Running Spark on Kubernetes means running Spark in standalone mode, wrapped in a Kubernetes service: https://github.com/kubernetes/kubernetes/tree/master/examples/spark So you're effectively comparing standalone vs. M

mesos or kubernetes ?

2016-08-13 Thread guyoh

My company is trying to decide whether to use kubernetes or mesos. Since we are planning to use Spark in the near future, I was wandering what is the best choice for us. Thanks, Guy -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/mesos-or-kubernetes-tp2753

Re: call a mysql stored procedure from spark

2016-08-13 Thread Mich Talebzadeh

to be executed in MySQL and results sent back to Spark? No I don't think so. On the other hand a stored procedure is nothing but a compiled code so can you use the raw SQL behind the stored proc? You can certainly send the SQL via JDBC and the RS back. HTH Dr Mich Talebzadeh LinkedIn * htt

call a mysql stored procedure from spark

2016-08-13 Thread sujeet jog

Hi, Is there a way to call a stored procedure using spark ? thanks, Sujeet

Re: Accessing HBase through Spark with Security enabled

2016-08-13 Thread Jacek Laskowski

Hi Aneela, My (little to no) understanding of how to make it work is to use hbase.security.authentication property set to kerberos (see [1]). Spark on YARN uses it to get the tokens for Hive, HBase et al (see [2]). It happens when Client starts conversation to YARN RM (see [3]). You should not d

Spark stage concurrency

2016-08-13 Thread Mazen

Suppose a spark job has two stages with independent dependencies (they do not depend on each other) and they are submitted concurrently/simultaneously (as Tasksets) by the DAG scheduler to the task scheduler. Can someone give more detailed insight on how the cores available on executors are distrib

Spark Streaming fault tolerance benchmark

2016-08-13 Thread Dominik Safaric

A few months ago, I've started investigating part of an empirical research several stream processing engines, including but not limited to Spark Streaming. As the benchmark should extend the scope further from performance metrics such as throughput and latency, I've focused onto fault tolerance a

Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

2016-08-13 Thread Mich Talebzadeh

Hi, SPARK-17047 created Thanks Dr Mich Talebzadeh LinkedIn * https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw * http://tale

Unsubscribe

2016-08-13 Thread bijuna

Unsubscribe - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

[no subject]

Re: Does Spark SQL support indexes?

How Spark sql query optimisation work if we are using .rdd action ?

Re: Why I can't use broadcast var defined in a global object?

Re: Why I can't use broadcast var defined in a global object?

Re: Does Spark SQL support indexes?

Does Spark SQL support indexes?

Re: KafkaUtils.createStream not picking smallest offset

Re: mesos or kubernetes ?

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: mesos or kubernetes ?

Re: [SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: Spark 2.0.0 - Java API - Modify a column in a dataframe

Re: mesos or kubernetes ?

[SQL] Why does (0 to 9).toDF("num").as[String] work?

Re: mesos or kubernetes ?

Re: mesos or kubernetes ?

mesos or kubernetes ?

Re: call a mysql stored procedure from spark

call a mysql stored procedure from spark

Re: Accessing HBase through Spark with Security enabled

Spark stage concurrency

Spark Streaming fault tolerance benchmark

Re: Spark 2 cannot create ORC table when CLUSTERED. This worked in Spark 1.6.1

Unsubscribe

25 matches

Site Navigation

Mail list logo

Footer information