Re: Spark Configuration

2019-07-19 Thread Amarnath Venkataswamy
Use case : 250 million records per day partition by date. Out of 250 million records have 20 to 30% Update on previous day partition span across 100 days. Currently rebuilding entire 100 days partition. My goal is to build similar 25 billion rows table and do a upsert using 250 million

Re: Spark Configuration

2019-07-19 Thread Vinoth Chandar
sg! As with any database-like systems, performance is dependent on key design and configuration. Happy to share more tips on tuning if you can give more details on - use-case, what operation you are using? - % of the 25 Billion records updated in each run (for e.g if you are upserting the entire