for clarification...
.saveAsTextFile(rdd) writes to local fs, but not hdfs
anyone?
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi Denis, great to see you here :)
It works, thanks!
Do you know how spark generates datafile names? names look like part-
with uuid appended after
part-0-124a8c43-83b9-44e1-a9c4-dcc8676cdb99.c000.snappy.parquet
2018-03-17 14:15 GMT+01:00 Denis Bolshakov :
Hello Serega,
https://spark.apache.org/docs/latest/sql-programming-guide.html
Please try SaveMode.Append option. Does it work for you?
сб, 17 мар. 2018 г., 15:19 Serega Sheypak :
> Hi, I', using spark-sql to process my data and store result as parquet
> partitioned
Hi, I', using spark-sql to process my data and store result as parquet
partitioned by several columns
ds.write
.partitionBy("year", "month", "day", "hour", "workflowId")
.parquet("/here/is/my/dir")
I want to run more jobs that will produce new partitions or add more files
to existing
> Not sure why you are dividing by 1000. from_unixtime expects a long type
It expects seconds, I have milliseconds.
2018-03-12 6:16 GMT+01:00 vermanurag :
> Not sure why you are dividing by 1000. from_unixtime expects a long type
> which is time in milliseconds
Hi,
I am trying to figure out a way to find the size of *persisted *dataframes
using the *sparkContext.getRDDStorageInfo() *
RDDStorageInfo object has information related to the number of bytes stored
in memory and on disk.
For eg:
I have 3 dataframes which i have cached.
df1.cache()
df2.cache()