What is the best way to take the top N entries from a hive table/data source?

2020-04-13 Thread yeikel valdes
When I use .limit() , the number of partitions for the returning dataframe is 1 which normally fails most jobs. val df = spark.sql("select * from table limit n") df.write.parquet() Thanks!

Re: Spark hangs while reading from jdbc - does nothing Removing Guess work from trouble shooting

2020-04-13 Thread jane thorpe
This tool may be useful for you to trouble shoot your problems away. https://www.javacodegeeks.com/2020/04/simplifying-apm-remove-the-guesswork-from-troubleshooting.html "APM tools typically use a waterfall-type view to show the blocking time of different components cascading through the contr