RE: Understanding Spark S3 Read Performance

2023-05-16 Thread info
Hi,For clarification, are those 12 / 14 minutes cumulative cpu time or wall clock time? How many executors executed those 1 / 375 tasks?Cheers,Enrico Ursprüngliche Nachricht Von: Shashank Rao Datum: 16.05.23 19:48 (GMT+01:00) An: user@spark.apache.org Betreff:

Handling Very Large volume(500TB) data using spark

2018-08-25 Thread Great Info
Hi All, I have large volume of data nearly 500TB(from 2016-2018-till date), I have to do some ETL on that data. This data is there in the AWS S3, so I planning to use AWS EMR setup to process this data but I am not sure what should be the config I should select . 1. Do I need to process monthly

spark rename or access columns which has special chars " ?:

2018-07-13 Thread Great Info
I have a columns like below root |-- metadata: struct (nullable = true) ||-- "drop":{"dropPath":" https://dstpath.media27.ec2.st-av.net/drop?source_id: string (nullable = true) ||-- "selection":{"AlllURL":" https://dstpath.media27.ec2.st-av.net/image?source_id: string