Re: Hive Generic UDF invoking Hbase

2015-09-29 Thread Moore, Douglas
I'm guessing you might now be using tez now where you were using MR before. You can tell hive to run in map reduce mode, by setting the hive execution mode, from within the hive script. See this page for details https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties To answer

Re: UDF Configure method not getting called

2015-08-28 Thread Moore, Douglas
Writing side files from a map reduce job was more common a while ago. There are severe disadvantages to doing so and resulting complexities. One complexity is failure handling and retry, the other is speculative execution running multiple attempts over the same split. You say you want to look a

Re: Unable to move files on Hive/Hdfs

2015-05-04 Thread Moore, Douglas
is issue, I am looking for a resolution. On Tue, May 5, 2015 at 4:42 AM, Moore, Douglas mailto:douglas.mo...@thinkbiganalytics.com>> wrote: Yep, permission problem. Weird though it seems to be moving a file within the same dir. Thanks for the update! - Douglas From: amit kumar mailto:ak3..

Re: Unable to move files on Hive/Hdfs

2015-05-04 Thread Moore, Douglas
Conf.checkAclsConfigFlag(NNConf.java:85) at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAclStatus(FSNamesystem.java:8553) After rolling those same changes out, the problem resolved itself. On Tue, May 5, 2015 at 4:28 AM, Moore, Douglas mailto:douglas.mo...@thinkbiganalytics.com&g

Re: Unable to move files on Hive/Hdfs

2015-05-04 Thread Moore, Douglas
Hi Amit, We've seen the same error on MoveTask with Hive 0.14 / HDP 2.2 release. There are lots of reasons for this though. Can you provide more details about the stack trace and version so we can compare? For our problem we've seen some relief with SET hive.metastore.client.socket.timeout=60s

Re: Hive and Impala

2015-04-27 Thread Moore, Douglas
Hive is great for massive transformations needed in ETL type processing and full data set analytics. Impala is better suited for fast analytical queries returning a tiny subset of the original data set. Both are improving in terms of concurrency and latency however they have a long ways to go to

Re: Hive documentation update for isNull, isNotNull etc.

2015-04-18 Thread Moore, Douglas
unction I think it might be more clear to just write the query as 'column IS NULL' that would be a more portable query. On Sat, Apr 18, 2015 at 6:26 PM, Moore, Douglas mailto:douglas.mo...@thinkbiganalytics.com>> wrote: Dmitry &Lefty the Hive docs updated<https://cwiki.a

Re: Hive documentation update for isNull, isNotNull etc.

2015-04-18 Thread Moore, Douglas
>> wrote: I also recently realized that NVL function is available, but not documented :( Dmitry Tolpeko -- PL/HQL - Procedural SQL-on-Hadoop - www.plhql.org<http://www.plhql.org> On Sat, Apr 18, 2015 at 12:22 AM, Moore, Douglas mailto:douglas.mo...@thinkbiganalytics.com>> wrot

Hive documentation update for isNull, isNotNull etc.

2015-04-17 Thread Moore, Douglas
I'm having major trouble finding documentation on hive functions isNull and isNotNull. At first I was assuming the function just wasn't available, now I believe these functions are not documented. I believe that the LanguageManual+UDF#LanguageManualUDF-Built-inFunctions

Re: Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
row? Thanks, Owen On Mon, Apr 6, 2015 at 9:15 AM, Moore, Douglas mailto:douglas.mo...@thinkbiganalytics.com>> wrote: On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel

Over-logging by ORC packages

2015-04-06 Thread Moore, Douglas
On a cluster recently upgraded to Hive 0.14 (HDP 2.2) we found that Gigabytes and millions more INFO level hive.log entries from ORC packages were being logged. I feel these log entries should be at the DEBUG level. Is there an existing bug in Hive or ORC? Here is one example: 2015-04-06 15:12:4

Re: view over partitioned table

2015-03-16 Thread Moore, Douglas
Mich, What version of Hive are you running? Have you seen this? https://cwiki.apache.org/confluence/display/Hive/PartitionedViews - Douglas From: Mich Talebzadeh mailto:m...@peridale.co.uk>> Reply-To: mailto:user@hive.apache.org>> Date: Sun, 15 Mar 2015 19:01:57 + To: mailto:user@hive.apache.

Re: Hive Insert overwrite creating a single file with large block size

2015-01-09 Thread Moore, Douglas
There's nothing intrinsically wrong with a large output file that's in a split-able format such as Avro. Are your downstream queries too slow? Are you using some kind of compression? Within an avro file there are blocks of avro objects. Each block can be compressed. Splits can occur only on a bl

Re: custom binary format

2014-12-12 Thread Moore, Douglas
You want to look into ADD JAR and CREATE FUNCTION (for UDFs) and STORED AS 'full.class.name' for serde. For tutorials, google for "adding custom serde", I found one from Cloudera: http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/ Depending on your numbers (rows / file, byt

Re: Question

2014-12-05 Thread Moore, Douglas
We use Hive to manage 100's of millions machine log data files. These files are semi-structured. Semi-structured in that we don't care about the full structure of the file up front, nor do they have a format that's easy to understand. Even data with less structure (e.g. Medical notes) there is a

Re: what is the bench mark using SSD for HDFS over HDD

2014-12-03 Thread Moore, Douglas
Run IO intensive tests, such as TESTDFSIO and Terasort. From: Amit Behera mailto:amit.bd...@gmail.com>> Reply-To: mailto:user@hive.apache.org>> Date: Wed, 3 Dec 2014 00:09:39 +0530 To: mailto:user@hive.apache.org>> Subject: what is the bench mark using SSD for HDFS over HDD Hi User, I want to kn

Re: HIVE::START WITH and CONNECT BY implementation in Hive

2014-10-20 Thread Moore, Douglas
Look up "Transitive Closure" It's a fast technique to analyze hierarchical data w/o proprietary SQL extensions. - Douglas On Oct 20, 2014, at 3:12 AM, yogesh dhari mailto:yogeshh...@gmail.com>> wrote: Hello All How can we achive start with .. connect by clause can be used to select data that