It would be good if you can share the code, someone here or I can guide you better if you can post the code snippet.
Thanks Best Regards On Wed, Jan 20, 2016 at 10:54 PM, Pavel Plotnikov < pavel.plotni...@team.wrike.com> wrote: > Thanks, Akhil! It helps, but this jobs still not fast enough, maybe i > missed something > > Regards, > Pavel > > On Wed, Jan 20, 2016 at 9:51 AM Akhil Das <ak...@sigmoidanalytics.com> > wrote: > >> Did you try re-partitioning the data before doing the write? >> >> Thanks >> Best Regards >> >> On Tue, Jan 19, 2016 at 6:13 PM, Pavel Plotnikov < >> pavel.plotni...@team.wrike.com> wrote: >> >>> Hello, >>> I'm using spark on some machines in standalone mode, data storage is >>> mounted on this machines via nfs. A have input data stream and when i'm >>> trying to store all data for hour in parquet, a job executes mostly on one >>> core and this hourly data are stored in 40- 50 minutes. It is very slow! >>> And it is not IO problem. After research how parquet file works, i'm found >>> that it can be parallelized on row group abstraction level. >>> I think row group for my files is to large, and how can i change it? >>> When i create to big DataFrame i devides in parts very well and writes >>> quikly! >>> >>> Thanks, >>> Pavel >>> >> >>