Hi Subash, Short answer: It’s effectively random.
Longer answer: In general the DataFrameWriter expects to be receiving data from multiple partitions. Let’s say you were writing to ORC instead of text. In this case, even when you specify the output path, the writer creates a directory at the specified path and saves one of those funny-named files per partition. Even longer: Assume you are running atop of YARN (or Messi or K8S...) In this case, the resource manager is responsible for provisioning disk on request, and it is the programmers’ responsibility to implement the upstream business logic. The implication is that it’s probably not a good idea to violate the responsibility boundary. Because, if you do, you are probably going to violate some implicit assumptions that the YARN designers are relying upon. For example (just making this up): YARN will calculate available disk after each write action completes. HTH, Jason On Mon, Apr 8, 2019 at 19:55 Subash Prabakar <subashpraba...@gmail.com> wrote: > Hi, > While saving in Spark2 as text file - I see encoded/hash value attached in > the part files along with part number. I am curious to know what is that > value is about ? > > Example: > ds.write.save(SaveMode.Overwrite).option("compression","gzip").text(path) > > Produces, > part-00001-1e4c5369-6694-4012-894a-73b971fe1ab1-c000.txt.gz > > > 1e4c5369-6694-4012-894a-73b971fe1ab1-c000 => what is this value ? > > Is there any options available to remove this part or is it attached for > some reason ? > > Thanks, > Subash > -- Thanks, Jason