unsubscribe
--
I appreciate your time,
~Randy
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
--
I appreciate your time,
~Randy
You can see from the GitHub history for "toPandas()" that the function has
been in the code for 5 years.
https://github.com/apache/spark/blame/a075cd5b700f88ef447b559c6411518136558d78/python/pyspark/sql/dataframe.py#L923
When I google IllegalArgumentException: 'Unsupported class file major
version
-s3-by-10x-with-alluxio-tiered-storage/
>
>
> On Wed, May 27, 2020 at 6:52 PM Dark Crusader <
> relinquisheddra...@gmail.com> wrote:
>
>> Hi Randy,
>>
>> Yes, I'm using parquet on both S3 and hdfs.
>>
>> On Thu, 28 May, 2020, 2:38 am randy clinto
easons-for-choosing-s3-over-hdfs.html
<https://databricks.com/blog/2017/05/31/top-5-reasons-for-choosing-s3-over-hdfs.html>*
On Wed, May 27, 2020, 9:51 PM Dark Crusader
wrote:
> Hi Randy,
>
> Yes, I'm using parquet on both S3 and hdfs.
>
> On Thu, 28 May, 2020, 2:38 am
Is the file Parquet on S3 or is it some other file format?
In general I would assume that HDFS read/writes are more performant for
spark jobs.
For instance, consider how well partitioned your HDFS file is vs the S3
file.
On Wed, May 27, 2020 at 1:51 PM Dark Crusader
wrote:
> Hi Jörn,
>
> Thank
Does it still plan an inner join if you remove a filter on both tables?
It seems like you are asking for a left join, but your filters demand the
behavior of an inner join.
Maybe you could do the filters on the tables first and then join them.
Something roughly like..
s_DF = s_DF.filter(year =