Hello, I am testing writing my DataFrame to S3 using the DataFrame `write` method. It mostly does a great job. However, it fails one of my requirements. Here are my requirements.
- Write to S3 - use `partitionBy` to automatically make folders based on my chosen partition columns - control the resultant filename (whole or in part) I can get the first two requirements met but not the third. Here's an example. When I use the commands... df.write.partitionBy("year","month").mode("append")\ .json('s3a://bucket_name/test_folder/') ... I get the partitions I need. However, the filenames are something like:part-00000-0e2e2096-6d32-458d-bcdf-dbf7d74d80fd.c000.json Now, I understand Spark's need to include the partition number in the filename. However, it sure would be nice to control the rest of the file name. Any advice? Please and thank you. Marco.