How does hive decide to launch how many map tasks?

2012-11-15 Thread Cheng Su
Hi, all How does hive decide to launch how many map tasks? I know there are some configs to help hive to decide how many reduce task to launch? But how about map tasks? I thought that number of map tasks equals to the number of the store files. I have a table now with 2 partitions, and one has 4

RE: Re:

2012-11-15 Thread Kanna Karanam
Hi Dean, HDInsight enables the developers to deploy and run Hadoop on Windows based personal computers. You can download and install it from http://www.microsoft.com/en-us/download/details.aspx?id=35397 For more info - Please check this YouTube video

Re:

2012-11-15 Thread Carl Steinbach
Hi Dean, I think using a Linux VM is the current path of least resistance for Windows users who want to experiment with Hive and Hadoop. Cloudera has a CentOS VM with CDH4.1 pre-installed and configured that can be downloaded for free from here: https://ccp.cloudera.com/display/SUPPORT/Cloudera%2

Re: hive 0.7.1 Error: Non-Partition column appears in the partition specification

2012-11-15 Thread Nitin Pawar
you are little complicating your query with as and tmp tables you can just write simple query for same INSERT OVERWRITE TABLE table2 PARTITION (author) SELECT text, author FROM table1 Tolstory is not any column in table1 so even that fail for query parsing if you want to all the data where autho

Re: Jaspersoft reports over Hive

2012-11-15 Thread Manish Malhotra
Hi, As per my understanding, the JDBC driver for hive is not scalable, it's a single threaded model. Though even if you get handle of Data Access API, the latency to generate report would be high !! If you are ok with that , then please checkout the Thrift API and CLI Driver class code: http://hi

Re: hive 0.7.1 Error: Non-Partition column appears in the partition specification

2012-11-15 Thread Павел Мезенцев
Thank you for right idea. It is very strange, but normally executed query looks like: INSERT OVERWRITE TABLE table2 PARTITION (author) SELECT text*, author* FROM (SELECT text, 'Tolstoy' AS author FROM table1) tmp; Best regards Mezentsev Pavel 2012/11/15 Nitin Pawar > when you add data to a par

Re: Can I merge files after I loaded them into hive?

2012-11-15 Thread Cheng Su
Thank you guys. I will try this later. And sorry for additional questions: if I do this, could the file become too big? Does hive have a config to control the max file size? Does hive can automatically split files? On Thu, Nov 15, 2012 at 6:20 PM, Роман Павленко wrote: > Example: > insert overwri

Re: Can I merge files after I loaded them into hive?

2012-11-15 Thread Роман Павленко
Example: insert overwrite table my_table PARTITION (year=2012,month=9,day=4) select `data`, `timestamp`, `hour`, `minute`, `second` from my_table WHERE year=2012 AND month=9 AND day=4; 2012/11/15 Bejoy KS > Hi Chen > > You can do it in hive as well. Enable hive merge and Insert OverWrite the

Re: Can I merge files after I loaded them into hive?

2012-11-15 Thread Bejoy KS
Hi Chen You can do it in hive as well. Enable hive merge and Insert OverWrite the Partition once agin with Select *. Hive.merge.mapfiles=true. Regards Bejoy KS Sent from handheld, please excuse typos. -Original Message- From: "Bejoy KS" Date: Thu, 15 Nov 2012 08:10:12 To: Reply-To:

Re: hive 0.7.1 Error: Non-Partition column appears in the partition specification

2012-11-15 Thread Nitin Pawar
when you add data to a partitioned table the partition column name in insert statement should match the table definition so try changing your insert query to "INSERT OVERWRITE TABLE table2 PARTITION (author)" where author is the column in your table definition Thanks, Nitin On Thu, Nov 15, 2012

Re: Hive UDAF convert problem

2012-11-15 Thread Cheng Su
Thanks a lot. I will look into it. > I don't have the entire code so it's hard for me to say but does it help if > you change: > private TreeMap sessionMap =... > to be > private Map sessionMap =... > Actually I did change the source like what you suggested to avoid the convert exception. But I

hive 0.7.1 Error: Non-Partition column appears in the partition specification

2012-11-15 Thread Павел Мезенцев
Hello all! I have a problem with dynamic partitions in hive 0.7.1. For example I have 2 tables: CREATE TABLE table1 (text STRING); CREATE TABLE table2 (text STRING) PARTITIONED BY (author STRING); And make insert into dynamic partition from table1 to table2 SET hive.exec.dynamic.partition = tru

Re: Can I merge files after I loaded them into hive?

2012-11-15 Thread Bejoy KS
Hi chen You can use Flume for ingestion into hdfs . Flume takes care of the file sizes, combines the files and stores as one large file. This is a better approach. You can have custom MR jobs to merge these files in hdfs as well. Use combineFileInputFormat and start a map only job with Identit

Can I merge files after I loaded them into hive?

2012-11-15 Thread Cheng Su
Hi, all. Can I merge files after I loaded them into hive? This is my situation: There is a log table partitioned by date, which is store the nginx access logs. The raw log files are loaded into hive every hour. By now, a single log file size is small, say 10 MB or even smaller. So there are 24 sm