Re: Learning Spark

2019-07-05 Thread Kurt Fehlhauer
lso. > > On Fri, 5 Jul 2019 at 10:32, Kurt Fehlhauer wrote: > >> Are you a data scientist or data engineer? >> >> >> On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg wrote: >> >>> Hi, >>> >>> I am new Spark learner. Can someone guide me with the strategy towards >>> getting expertise in PySpark. >>> >>> Thanks!!! >>> >>

Re: Learning Spark

2019-07-04 Thread Kurt Fehlhauer
Are you a data scientist or data engineer? On Thu, Jul 4, 2019 at 10:34 PM Vikas Garg wrote: > Hi, > > I am new Spark learner. Can someone guide me with the strategy towards > getting expertise in PySpark. > > Thanks!!! >

Re: Can an UDF return a custom class other than case class?

2019-01-06 Thread Kurt Fehlhauer
Is there a reason why case classes won't work for your use case? On Sun, Jan 6, 2019 at 10:43 PM wrote: > Hi , > > > > Is it possible to return a custom class from an UDF other than a case > class? > > > > If so , how can we avoid this exception ? : > java.lang.UnsupportedOperationException:

Re: Why doesn't spark use broadcast join?

2018-04-18 Thread Kurt Fehlhauer
Try running AnalyzeTableCommand on both tables first. On Wed, Apr 18, 2018 at 2:57 AM Matteo Cossu wrote: > Can you check the value for spark.sql.autoBroadcastJoinThreshold? > > On 29 March 2018 at 14:41, Vitaliy Pisarev > wrote: > >> I am

Re: EDI (Electronic Data Interchange) parser on Spark

2018-03-13 Thread Kurt Fehlhauer
If no pre-built solution exists, writing your own would not be that difficult. I suggest looking at a parser combinator such as FastParse to create your own. http://www.lihaoyi.com/fastparse/ Regards, Kurt On Tue, Mar 13, 2018 at 7:47 AM Aakash Basu wrote: > Thanks

Re: parquet vs orc files

2018-02-22 Thread Kurt Fehlhauer
Hi Kane, It really depends on your use case. I generally use Parquet because it seems to have better support beyond Spark. However, if you are dealing with partitioned Hive tables, the current versions of Spark have an issue where compression will not be applied. This will be fixed in version

Re: how to create a DataType Object using the String representation in Java using Spark 2.2.0?

2018-01-25 Thread Kurt Fehlhauer
Can you share your code and a sample of your data? WIthout seeing it, I can't give a definitive answer. I can offer some hints. If you have a column of strings you should either be able to create a new column casted to Integer. This can be accomplished two ways: df.withColumn("newColumn",