I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster.
Thanks all for helping me. Regards, Xiaohe On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. <anyte...@gmail.com> wrote: > Thanks Gopal, but since it was a while ago and I didn't have to generate > too much data I just run the tpc-ds generator binaries in parallel and > uploaded it manually. Anyway if you want to have a look at the error: > http://hortonworks.com/community/forums/topic/hive-testbench-error/ > Maybe it's trivial and it can help someone else. > > Regards > > Fabio > > On Thu, Apr 2, 2015 at 7:20 PM, Gopal Vijayaraghavan <gop...@apache.org> > wrote: > >> >> >> > https://github.com/hortonworks/hive-testbench >> > >> > The official procedure to generate and upload the data has never worked >> >for me (and it looks like it's not a supported software), so it could be >> >a bit tricky to do it manually and on a single host. >> >> I wrote the MapReduce jobs for that (tpcds-gen/tpch-gen) after waiting a >> whole weekend for 1Tb of data to be generated on a single machine. >> >> If you or anyone else has issues with it, I can take a look at it. >> >> Cheers, >> Gopal >> >> >> >