Given you already know your input files (input_file_name), why not
getting their size and summing this up?
|import java.io.File ||import java.net.URI|
|import| org.apache.spark.sql.functions.input_file_name
|ds.select(input_file_name.as("filename")) .distinct.as[String]
.map(filename => new File(new URI(filename).getPath).length)
.select(sum($"value")) .show()|
||
Enrico
Am 19.06.22 um 03:16 schrieb Yong Walt:
|import java.io.File val someFile = new File("somefile.txt") val
fileSize = someFile.length|
This one?
On Sun, Jun 19, 2022 at 4:33 AM mbreuer <msbre...@gmail.com> wrote:
Hello Community,
I am working on optimizations for file sizes and number of files.
In the
data frame there is a function input_file_name which returns the file
name. I miss a counterpart to get the size of the file. Just the
size,
like "ls -l" returns. Is there something like that?
Kind regards,
Markus
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org