use pandas or dask If you do want to use spark store the dataset as parquet / orc. And then continue to perform analytical queries on that dataset.
Raymond Xie <xie3208...@gmail.com> schrieb am Di., 19. Juni 2018 um 04:29 Uhr: > I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment > is 20GB ssd harddisk and 2GB RAM. > > The dataset comes with > User ID: 987,994 > Item ID: 4,162,024 > Category ID: 9,439 > Behavior type ('pv', 'buy', 'cart', 'fav') > Unix Timestamp: span between November 25 to December 03, 2017 > > I would like to hear any suggestion from you on how should I process the > dataset with my current environment. > > Thank you. > > *------------------------------------------------* > *Sincerely yours,* > > > *Raymond* >