https://github.com/hortonworks/hive-testbench The official procedure to generate and upload the data has never worked for me (and it looks like it's not a supported software), so it could be a bit tricky to do it manually and on a single host. The good point is you already have several queries and you can set the size of the data you want to generate.
On Thu, Apr 2, 2015 at 8:29 AM, xiaohe lan <zombiexco...@gmail.com> wrote: > Hi Vivek Veeramani, > > Actually, I already have that. But with the wiki dataset, I can only do > "select *" queries. > > Thanks, > Xiaohe > > On Thu, Apr 2, 2015 at 1:44 PM, vivek veeramani < > vivek.veeraman...@gmail.com> wrote: > >> Hi Xiaohe, >> >> If it's data set that you're looking for, you can find wikipedia data >> dumps @ http://dumps.wikimedia.org/enwiki/. Also documentation on the >> dumps @ http://meta.wikimedia.org/wiki/Data_dumps. >> >> Hope this helps.. >> >> >> On Thu, Apr 2, 2015 at 10:56 AM, xiaohe lan <zombiexco...@gmail.com> >> wrote: >> >>> Hi All, >>> >>> I am new to Hive. Just set up a 5 nodes Hadoop environment and want to >>> have a try on HiveQL. >>> Is there any dataset I can download to play HiveQL. The dataset should >>> have several tables some I can write some complex join. About 100G should >>> be fine. >>> >>> Thanks, >>> Xiaohe >>> >> >> >> >> -- >> Thanks , >> Vivek Veeramani >> >> >> cell : +91-9632 975 975 >> +91-9895 277 101 >> > >