Dataset for hive

2015-04-01 Thread xiaohe lan
Hi All, I am new to Hive. Just set up a 5 nodes Hadoop environment and want to have a try on HiveQL. Is there any dataset I can download to play HiveQL. The dataset should have several tables some I can write some complex join. About 100G should be fine. Thanks, Xiaohe

Re: Dataset for hive

2015-04-01 Thread vivek veeramani
Hi Xiaohe, If it's data set that you're looking for, you can find wikipedia data dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the dumps @ http://meta.wikimedia.org/wiki/Data_dumps. Hope this helps.. On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan wrote: > Hi All, > > I am new

Re: Dataset for hive

2015-04-01 Thread xiaohe lan
Hi Vivek Veeramani, Actually, I already have that. But with the wiki dataset, I can only do "select *" queries. Thanks, Xiaohe On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani wrote: > Hi Xiaohe, > > If it's data set that you're looking for, you can find wikipedia data > dumps @ http://dumps.wi

Re: Dataset for Hive

2015-04-01 Thread Chao Sun
Hi Xiaohe, You can try TPC-DS from https://github.com/hortonworks/hive-testbench. It contains large number of queries with complex joins. Chao On Wed, Apr 1, 2015 at 9:30 PM, xiaohe lan wrote: > Hi All, > > I am new to Hive. Just set up a 5 node Hadoop environment and want to have > a try on H

Re: Dataset for hive

2015-04-02 Thread Fabio C.
https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. The good point is you already have several queries and

Re: Dataset for hive

2015-04-02 Thread Gopal Vijayaraghavan
> https://github.com/hortonworks/hive-testbench > > The official procedure to generate and upload the data has never worked >for me (and it looks like it's not a supported software), so it could be >a bit tricky to do it manually and on a single host. I wrote the MapReduce jobs for that (tpcds-g

Re: Dataset for hive

2015-04-03 Thread Fabio C.
Thanks Gopal, but since it was a while ago and I didn't have to generate too much data I just run the tpc-ds generator binaries in parallel and uploaded it manually. Anyway if you want to have a look at the error: http://hortonworks.com/community/forums/topic/hive-testbench-error/ Maybe it's trivia

Re: Dataset for hive

2015-04-15 Thread xiaohe lan
I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster. Thanks all for helping me. Regards, Xiaohe On Fri, Apr 3, 2015 at 9:00 PM, Fabio C. wrote: > Thanks Gopal, but since it was a while ago and I didn't have to gener

Re: Dataset for hive

2015-04-15 Thread venkatanathen kannan
HI Gopal & Xiaohe,  Thanks for sharing. Thanks,VK   On Wednesday, April 15, 2015 9:23 AM, xiaohe lan wrote: I just have time to generate the data a few minutes ago. It can generate 100G data for me in tens of minutes on my 5 nodes cluster. Thanks all for helping me. Regards,Xiaohe O