Re: Best way to process this dataset

Georg Heiler Mon, 18 Jun 2018 22:05:54 -0700

use pandas or dask

If you do want to use spark store the dataset as parquet / orc. And then
continue to perform analytical queries on that dataset.


Raymond Xie <xie3208...@gmail.com> schrieb am Di., 19. Juni 2018 um
04:29 Uhr:

> I have a 3.6GB csv dataset (4 columns, 100,150,807 rows), my environment
> is 20GB ssd harddisk and 2GB RAM.
>
> The dataset comes with
> User ID: 987,994
> Item ID: 4,162,024
> Category ID: 9,439
> Behavior type ('pv', 'buy', 'cart', 'fav')
> Unix Timestamp: span between November 25 to December 03, 2017
>
> I would like to hear any suggestion from you on how should I process the
> dataset with my current environment.
>
> Thank you.
>
> *------------------------------------------------*
> *Sincerely yours,*
>
>
> *Raymond*
>

Re: Best way to process this dataset

Reply via email to