Re: [SQL] parse_url does not work for Internationalized domain names ?

2018-01-12 Thread yash datta
Thanks for the prompt reply!. Opened a ticket here: https://issues.apache.org/jira/browse/SPARK-23056 BR Yash On Fri, Jan 12, 2018 at 3:41 PM, StanZhai wrote: > This problem was introduced by > which is designed to > improve performance of P

[SQL] parse_url does not work for Internationalized domain names ?

2018-01-11 Thread yash datta
Hi devs, Stumbled across an interesting problem with the parse_url function that has been implemented in spark in https://issues.apache.org/jira/browse/SPARK-16281 When using internationalized Domains in the urls like: val url = "http://правительство.рф " T

Re: Dataframe Partitioning

2016-03-01 Thread yash datta
+1 This is one of the most common problems we encounter in our flow. Mark, I am happy to help if you would like to share some of the workload. Best Yash On Wednesday 2 March 2016, Mark Hamstra wrote: > I don't entirely agree. You're best off picking the right size :). > That's almost impossib

Re: [discuss] dropping Python 2.6 support

2016-01-05 Thread yash datta
+1 On Tue, Jan 5, 2016 at 1:57 PM, Jian Feng Zhang wrote: > +1 > > We use Python 2.7+ and 3.4+ to call PySpark. > > 2016-01-05 15:58 GMT+08:00 Kushal Datta : > >> +1 >> >> >> Dr. Kushal Datta >> Senior Research Scientist >> Big Data Research & Pathfinding >> Intel Corporation, USA. >> >> On

Re: KryoSerializer for closureSerializer in DAGScheduler

2015-08-31 Thread yash datta
https://github.com/apache/spark/pull/6361 and > https://issues.apache.org/jira/browse/SPARK-7708 for some discussion of > the difficulties here. > > On Mon, Aug 31, 2015 at 3:44 AM, yash datta wrote: > >> Hi devs, >> >> Curently the only supported serializer

KryoSerializer for closureSerializer in DAGScheduler

2015-08-31 Thread yash datta
UnionRDD finally created has 127 partitions Calling unioned.collect leads to serialization of UnionRDD. I am using spark 1.2.1 Any help regarding this will be highly appreciated. Best Yash Datta -- When events unfold with calm and ease When the winds that blow are merely breeze Learn from natur

External Shuffle service over yarn

2015-06-25 Thread yash datta
Hi devs, Can someone point out if there are any distinct advantages of using external shuffle service over yarn (runs on node manager as an auxiliary service https://issues.apache.org/jira/browse/SPARK-3797) instead of the default execution in the executor containers ? Please also mention if y

Re: creating hive packages for spark

2015-04-27 Thread yash datta
Hi, you can build spark-project hive from here : https://github.com/pwendell/hive/tree/0.13.1-shaded-protobuf Hope this helps. On Mon, Apr 27, 2015 at 3:23 PM, Manku Timma wrote: > Hello Spark developers, > I want to understand the procedure to create the org.spark-project.hive > jars. Is th

Re: Stackoverflow in createDataFrame.

2015-04-24 Thread yash datta
This is already reported : https://issues.apache.org/jira/browse/SPARK-6999 On 24 Apr 2015 18:11, "Jan-Paul Bultmann" wrote: > Hey, > I get a stack overflow when calling the following method on SQLContext. > > def createDataFrame(rowRDD: JavaRDD[Row], columns: > java.util.List[String]): DataFram

Re: Building spark 1.2 from source requires more dependencies

2015-03-30 Thread yash datta
Hi all, When selecting large data in sparksql (Select * query) , I see Buffer overflow exception from kryo : 15/03/27 10:32:19 WARN scheduler.TaskSetManager: Lost task 6.0 in stage 3.0 (TID 30, machine159): com.esotericsoftware.kryo.KryoException: Buffer overflow. Available: 1, required: 2 Seri

Re: Spark SQL, Hive & Parquet data types

2015-02-20 Thread yash datta
For the old parquet path (available in 1.2.1) , i made a few changes for being able to read/write to a table partitioned on timestamp type column https://github.com/apache/spark/pull/4469 On Fri, Feb 20, 2015 at 8:28 PM, The Watcher wrote: > > > > > >1. In Spark 1.3.0, timestamp support wa