Re: Hive and engine performance tez vs mr

2015-04-06 Thread max scalf
Try setting the below in Hive and see what happens..btw what are you configs in hive if any? set mapred.map.tasks = 20; On Thu, Apr 2, 2015 at 11:01 AM, Erwan MAS er...@mas.nom.fr wrote: Hello , I have a issue on hive , with tez engine . When try to execute a query , with tez engine ,

Re: Over-logging by ORC packages

2015-04-06 Thread Owen O'Malley
Sorry for the excessive logging. The pushdown logging should only be at the start, is there a particular message that was being repeated per a row? Thanks, Owen On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas douglas.mo...@thinkbiganalytics.com wrote: On a cluster recently upgraded to

Re: Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
Owen, we're seeing a millions of those log entries. There are three, one for each package listed below in the revised hive-log4j.settings., one full example provided below. Seems to repeat fewer than per-row (that would be billions). Perhaps repeats for each and every partition in a table

Re: Hive and engine performance tez vs mr

2015-04-06 Thread Erwan MAS
On Mon, Apr 06, 2015 at 12:15:05PM -0500, max scalf wrote: Try setting the below in Hive and see what happens..btw what are you configs in hive if any? set mapred.map.tasks = 20; Does not change the behavior :( -- /

RE: Can WebHCat show non-MapReduce jobs?

2015-04-06 Thread Xiaoyong Zhu
Ping... not sure anybody has more ideas here...? Xiaoyong From: Xiaoyong Zhu [mailto:xiaoy...@microsoft.com] Sent: Thursday, March 26, 2015 9:07 AM To: user@hive.apache.org Cc: ekoif...@hortonworks.com Subject: RE: Can WebHCat show non-MapReduce jobs? Thanks for the reply, Eugene. However, when

Re: How efficient is memory allocation in tez.

2015-04-06 Thread Gopal Vijayaraghavan
I have a map join in which the smaller tables together are 200 MB and trying to have one block of main table be processed by one tez task. ... What am I missing and is this even the right way of approaching the problem ? You need to be more specific about the Hive version. Hive-13 needs ~6x

Re: Hive and engine performance tez vs mr

2015-04-06 Thread Carter Shanklin
Erwan, Faced with a similar situation last week I found that decreasing mapred.max.split.size Increased my parallelism by 6x. Yes mapred even though it was a Tez job. I reduced it to 10mb from 256mb which I believe is the default. The other variables to try are: tez.grouping.min-size (make it

Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Eugene Koifman
can you check that the schema in your metastore db has transaction related tables? You can find the list of tables in hive-txn-schema-0.14.0.mysql.sql, for example. From: Mich Talebzadeh m...@peridale.co.ukmailto:m...@peridale.co.uk Reply-To: user@hive.apache.orgmailto:user@hive.apache.org

hive 0.14 hive 1.1.0 lost some columinfo

2015-04-06 Thread r7raul1...@163.com
I found difference form log: In hive 0.14 DEBUG lazy.LazySimpleSerDe: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe initialized with: columnNames=[date_id, chanl_id, sessn_id, gu_id, prov_id, city_id, landing_page_type_id, landing_track_time, landing_url, nav_refer_tracker_id,

Re: documentation link wrong?

2015-04-06 Thread Lefty Leverenz
The TaskController class is gone from Hadoop 2.6 (the current stable release, where the link points) as well as 2.5.2, but I found it in 1.2.1: http://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapred/TaskController.html . For now, I can change the WebHCat doc to that link. But a

documentation link wrong?

2015-04-06 Thread Xiaoyong Zhu
It seems that the link (Class TaskControllerhttp://hadoop.apache.org/docs/stable/api/org/apache/hadoop/mapred/TaskController.html) is wrong in this page: https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive it returns a 404 for me. Not sure what is the correct link...

Re: documentation link wrong?

2015-04-06 Thread Lefty Leverenz
I've fixed the link to the TaskController class in four places: - WebHCat Reference -- Hive Job https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Hive#WebHCatReferenceHive-Results - WebHCat Reference -- MapReduce Job

Unable to make Sort Merge Bucket Join work

2015-04-06 Thread Timothy Manuel
Hi,   I have two large tables which I need to perform an equijoinon. I have bucketed and sorted the two tables on the join key. I have then madethe following specifications when running the join SQL:-   SET hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;SET

admin user in hive

2015-04-06 Thread Megha Garg
Hi, I want to enable authentication+authorization on my hive server. But I want only one admin user who can create other user as admin/public. But by default any user who types 'set role admin' can become admin and grants any user any permission. How can i avoid this behavior? Hive version is

Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Mich Talebzadeh
Hi, I turned on concurrency for hive for DML with settings in hive-site.xml as follows: hive.support.concurrency=true hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager hive.compactor.initiator.on=true hive.compactor.worker.threads=2 hive.support.concurrency=true

HiveServer2 addressing standby namenode

2015-04-06 Thread Daniel Haviv
Hi, We get a lot of error messaged on the standby namenode indicating that hive is trying to address the standby namenode. As all of our jobs function normally, my guess is that Hive is constantly trying to address both namenodes and only works with the active one. Is this correct? Can this be

Re: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread @Sanjiv Singh
Not sure ..It should work Try adding below configuration and then check.. property namehive.in.test/name valuetrue/value /property Regards Sanjiv Singh Mob : +091 9990-447-339 On Mon, Apr 6, 2015 at 7:21 PM, Mich Talebzadeh m...@peridale.co.uk wrote: Hi, I turned on concurrency

RE: Concurrency issue, Setting hive.txn.manager to org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

2015-04-06 Thread Mich Talebzadeh
Thanks Sanjiv. Unfortunately after resetting hive.txn.manager=org.apache.hadoop.hive.ql.lockmgr.DbTxnManager and doing property namehive.in.test/name valuetrue/value /property Still getting the same error message hive show databases; FAILED: LockException [Error

Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel these log entries should be at the DEBUG level. Is there an existing bug in Hive or ORC? Here is one example: 2015-04-06