Re: RFC: Remote "HBaseTest" from examples?

2016-08-18 Thread Ignacio Zendejas
I'm very late to this party and I get hbase-spark... what's the recommendation for pyspark + hbase? I realize this isn't necessarily a concern of the spark project, but it'd be nice to at least document it here with a very short and sweet response because I haven't found anything useful in the

Parquet partitioning / appends

2016-08-18 Thread Jeremy Smith
Hi, I'm running into an issue wherein Spark (both 1.6.1 and 2.0.0) will fail with a GC Overhead limit when creating a DataFrame from a parquet-backed partitioned Hive table with a relatively large number of parquet files (~ 175 partitions, and each partition contains many parquet files). If I

Early Draft Structured Streaming Machine Learning

2016-08-18 Thread Holden Karau
Hi Everyone (that cares about structured streaming and ML), Seth and I have been giving some thought to support structured streaming in machine learning - we've put together an early design doc (its been in JIRA (SPARK-16424) for awhile, but

Re: Setting YARN executors' JAVA_HOME

2016-08-18 Thread Ryan Williams
Ah, I guess I missed that by only looking in the YARN config docs, but this is a more general parameter and not documented there. Thanks! On Thu, Aug 18, 2016 at 2:51 PM dhruve ashar wrote: > Hi Ryan, > > You can get more info on this here: Spark documentation >

Re: Setting YARN executors' JAVA_HOME

2016-08-18 Thread dhruve ashar
Hi Ryan, You can get more info on this here: Spark documentation . The page addresses what you need. You can look for spark.executorEnv.[EnvironmentVariableName] and set your java home as spark.executorEnv.JAVA_HOME= Regards, Dhruve

Setting YARN executors' JAVA_HOME

2016-08-18 Thread Ryan Williams
I need to tell YARN a JAVA_HOME to use when spawning containers (to run a Java 8 app on Java 7 YARN). The only way I've found that works is setting SPARK_YARN_USER_ENV="JAVA_HOME=/path/to/java8". The code

Re: How to convert spark data-frame to datasets?

2016-08-18 Thread Oscar Batori
>From the docs , DataFrame is just Dataset[Row]. The are various converters for subtypes of Product if you want, using "as[T]", where T <:

How to convert spark data-frame to datasets?

2016-08-18 Thread Minudika Malshan
Hi all, Most of Spark ML algorithms requires a dataset to train the model. I would like to know how to convert a spark *data-frame* to a *dataset* using Java. Your support is much appreciated. Thank you! Minudika

Re: Found a typo in Catalyst's exception and want to write a test -- help needed

2016-08-18 Thread Reynold Xin
I'd use the new SQLQueryTestSuite. Test cases defined in sql files. On Wed, Aug 17, 2016 at 11:46 PM, Jacek Laskowski wrote: > Hi devs, > > While reviewing the code in Catalyst for doing query parsing I found > that UnresolvedStar has this typo in the exception [1]. > > I do

Found a typo in Catalyst's exception and want to write a test -- help needed

2016-08-18 Thread Jacek Laskowski
Hi devs, While reviewing the code in Catalyst for doing query parsing I found that UnresolvedStar has this typo in the exception [1]. I do understand that it's a very trivial issue but I thought I'd write a test for it as part of the change so I could improve my understanding of the low-level

Re: Aggregations with scala pairs

2016-08-18 Thread Jean-Baptiste Onofré
Agreed. Regards JB On Aug 18, 2016, 07:32, at 07:32, Olivier Girardot wrote: >CC'ing dev list, you should open a Jira and a PR related to it to >discuss it c.f.

Re: Aggregations with scala pairs

2016-08-18 Thread Olivier Girardot
CC'ing dev list, you should open a Jira and a PR related to it to discuss it c.f. https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingCodeChanges On Wed, Aug 17, 2016 4:01 PM, Andrés Ivaldi iaiva...@gmail.com wrote: Hello, I'd like to

Re: Spark SQL and Kryo registration

2016-08-18 Thread Olivier Girardot
Hi everyone, it seems that it works now out of the box. So nevermind, registration is compatible with spark 2.0 when using dataframes. Regards, Olivier. On Fri, Aug 5, 2016 10:07 AM, Maciej Bryński mac...@brynski.pl wrote: Hi Olivier, Did you check performance of Kryo ? I have observations