RE: Hive Join returns incorrect results on bigint=string

2014-10-07 Thread java8964
Based on this wiki page: https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-TypeSystem The string will do a implicit conversion to double, as "Double" is the only common ancestor between bigint and string. So the result is unpredictable if you are talking about double. Yong Date:

How put the third party jar first in classpath of Hive UDF

2014-10-02 Thread java8964
Hi, Currently our production is using Hive 0.9.0. There is already a complex Hive query running on hadoop daily to generate millions records output. What I want to do is to transfer this result to Cassandra. I tried to do it in UDF, as then I can send the data at reducer level, to maximum the t

RE: does the HBase-Hive integration support using HBase index (primary key or secondary index) in the JOIN implementatoin?

2014-07-24 Thread java8964
I don't think Hbase-Hive integration part is that smart, be able to utilize the index existing in the HBase. But I think it depends on the version you are using. >From my experience, there are a lot of improvement space in the Hbase-hive >integration, especially "push down" logic into HBase engi

RE: python UDF and Avro tables

2014-07-24 Thread java8964
Are you trying to read the Avro file directly in your UDF? If so, that is not the correct way to do it in UDF. Hive can support Avro file natively. Don't know your UDF requirement, but here is normally what I will do: Create the table in hive as using AvroContainerInputFormat create external tabl

RE: Efficient Equality Joins of Large Tables

2014-06-10 Thread java8964
I agree that the originally request is not very clear. >From my understanding, the reference_id is very unique in both Ad load and Ad >click tables, but both tables could contain huge amount of data. (But in >theory, click table should be much smaller than the load table, right? But >let's just

RE: Vectorization with UDFs returns incorrect results

2014-05-30 Thread java8964
(18.778 seconds) On Fri, May 30, 2014 at 10:52 AM, java8964 wrote: When you turn "vectorized" on, does the following query consistently return 1 in the output? select ten_thousand() from testTabOrc Yong Date: Fri, 30 May 2014 08:24:43 -0400 Subject: Vectorizat

RE: Hive Avro union data access

2014-05-30 Thread java8964
Your "alias_host" column is an array, from your Avro specification, right? If so, just use [] to access the specified element in the array select alias_host[0] from array_tests where aliat_host[0] like '%test%' If you want to query all the elements in the array, google "explode lateral view" of hi

RE: Vectorization with UDFs returns incorrect results

2014-05-30 Thread java8964
When you turn "vectorized" on, does the following query consistently return 1 in the output? select ten_thousand() from testTabOrc Yong Date: Fri, 30 May 2014 08:24:43 -0400 Subject: Vectorization with UDFs returns incorrect results From: bbowman...@gmail.com To: user@hive.apache.org Hive 0.

RE: LEFT SEMI JOIN

2014-05-16 Thread java8964
>From Hive manual, there is only "left semi join", no "semi join", nor "inner >semi join". >From the Database world, it is just a traditional name for this kind of join: >"LEFT semi join", as a reminder to the reader that the resultset comes out >from the LEFT table ONLY. Yong > From: lukas.e..

RE: get_json_object for nested field returning a String instead of an Array

2014-04-07 Thread java8964
Hi, Narayanan: The current problem is that for a generic solution, there is no way that we know that element in the Json is an array. Keep in mind that in any element of Json, it could be any valid structure. So it could be array, another structure, or map etc. You know your data, so you can sa

RE: Does hive instantiate new udf object for each record

2014-03-25 Thread java8964
The reason you saw that is because when you provide evaluate() method, you didn't specified the type of column it can be used. So Hive will just create test instance again and again for every new row, as it doesn't know how or which column to apply your UDF. I changed your code as below: public

RE: Joins Failing

2014-03-24 Thread java8964
It looks like his job failed in OOM in mapper tasks: Job failed as tasks failed. failedMaps:1 failedReduces:0 So what he need is to increase the mapper heap size request. Yong Date: Mon, 24 Mar 2014 16:16:50 -0400 Subject: Re: Joins Failing From: divakarredd...@gmail.com To: user@hive.apache.org C

RE: Does hive instantiate new udf object for each record

2014-03-24 Thread java8964
Your UDF object will only initialized once per map or reducer. When you said your UDF object being initialized for each row, why do you think so? Do you have log to make you think that way? If OK, please provide more information, so we can help you, like your example code, log etc Yong Date

RE: Issue with Querying External Hive Table created on hbase

2014-03-21 Thread java8964
I am not sure about your question. Do you mean the query runs very fast if you run like 'select * from hbase_table', but very slow for 'select * from hbase where row_key = ?' I think it should be the other way round, right? Yong Date: Wed, 19 Mar 2014 11:42:39 -0700 From: sunil_ra...@yahoo.com S

RE: Using an UDF in the WHERE (IN) clause

2014-03-11 Thread java8964
but a static value. Thanks, Petter 2014-03-11 0:16 GMT+01:00 java8964 : I don't know from syntax point of view, if Hive will allow to do "columnA IN UDF(columnB)". What I do know that even let's say above work, it won't do the partition pruning. The partition

RE: Using an UDF in the WHERE (IN) clause

2014-03-10 Thread java8964
I don't know from syntax point of view, if Hive will allow to do "columnA IN UDF(columnB)". What I do know that even let's say above work, it won't do the partition pruning. The partition pruning in Hive is strict static, any dynamic values provided to partition column won't enable partition pru

RE: Setting | Verifying | Hive Query Parameters from Java

2014-03-06 Thread java8964
If you want to set some properties of hive, just run it as it is in your JDBC connection. Any command in the hive JDBC will send to the server as the same if you run "set hive.server2.async.exec.threads=50;" in the hive session. Run the command "set hive.server2.async.exec.threads=50;" as a SQL

RE: Best way to avoid cross join

2014-03-05 Thread java8964
t_keywords) prep_kw; ... Hadoop job information for Stage-1: number of mappers: 3; number of reducers: 1 What could be setup wrong here? Or can it be avoided to use this ugly cross join at all? I mean my original problem is actually something else ;-) CheersWolli 2014-03-05 15:07 GMT+01:0

RE: Best way to avoid cross join

2014-03-05 Thread java8964
Hi, Wolli: Cross join doesn't mean Hive has to use one reduce. >From query point of view, the following cases will use one reducer: 1) Order by in your query (Instead of using sort by)2) Only one reducer group, which means all the data have to send to one reducer, as there is only one reducer gro

RE: Hive query parser bug resulting in "FAILED: NullPointerException null"

2014-02-27 Thread java8964
Can you reproduce with an empty table? I can't reproduce it. Also, can you paste the stack trace? Yong From: krishnanj...@gmail.com Date: Thu, 27 Feb 2014 12:44:28 + Subject: Hive query parser bug resulting in "FAILED: NullPointerException null" To: user@hive.apache.org Hi all, we've experien

RE: Metastore performance on HDFS-backed table with 15000+ partitions

2014-02-27 Thread java8964
That is good to know. We are using Hive 0.9. Right now the biggest table contains 2 years data, and we partitioned by hour, as the data volume is big. So right now, it has 2*365*24 around 17000+ partitions. So far we didn't see too much problem yet, but I do have some concerns about it. We are us

RE: Hive trunk unit test failed

2014-02-26 Thread java8964
OK. Now I understand that this error is due to missing the Hadoop native library. If I manually add "libhadoop.so" into java.library.path for this unit test, it passed. So either the hadoop 2.2.0 coming from Maven reponsitory includes 32bit of hadoop native library, or totally missed it. Now the

Hive trunk unit test failed

2014-02-26 Thread java8964
Hi, I tried to run the all tests in my local Linux x64 of current Hive trunk code. My "mvn clean package -DskipTests -Phadoop-2 -Pdist" will work fine if I skip tests. The following unit test failed, and then it stopped. I traced the code down to a native method invoked at"org.apache.hadoop.sec

RE: part-m-00000 files and their size - Hive table

2014-02-25 Thread java8964
Yes, it is good that the file sizes are evenly close, but not very important, unless there are files very small (compared to the block size). The reasons are: Your files should be splitable to be used in Hadoop (Or in Hive, it is the same thing). If they are splitable, then 1G file will use 10 bl

RE: hive query to calculate percentage

2014-02-25 Thread java8964
one query won't work, as totalcount is not in "group by". You have 2 options: 1) use the sub query select a.timestamp_dt, a.totalcount/b.total_sumfrom daily_count_per_kg_domain a join(select timestamp_dt, sum(totalcount) as total_sumfromdaily_count_per_kg_domaingroup by timestamp_dt) b on (a.tim

RE: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col2, 2:_col3]

2014-02-25 Thread java8964
it works there. If you get a chance to reproduce this problem on hive 0.10, please let me know. Thanks. On Monday, February 24, 2014 10:59 PM, java8964 wrote:My guess is that your UDTF will return an array of struct. I don't have Hive 0.10 in handy right now, but I write a s

RE: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col2, 2:_col3]

2014-02-24 Thread java8964
ss(MapOperator.java:658) ... 9 more On Friday, February 21, 2014 11:18 AM, java8964 wrote:What is your stracktrace? Can you paste here?It is maybe a different bug.If you put e.f3 <> null at an outsider query? Does that work?Or maybe you have to enhance your UDTF to push tha

RE: java.lang.RuntimeException: cannot find field key from [0:_col0, 1:_col2, 2:_col3]

2014-02-21 Thread java8964
What is your stracktrace? Can you paste here? It is maybe a different bug. If you put e.f3 <> null at an outsider query? Does that work? Or maybe you have to enhance your UDTF to push that filter into your UDTF. It is not perfect, but maybe a solution for you as now. You can create a new Jira if i

Hbase + Hive scan performance

2014-02-10 Thread java8964
Hi, I know this has been asked before. I did google around this topic and tried to understand as much as possible, but I kind of got difference answers based on different places. So I like to ask what I have faced and if someone can help me again on this topic. I created one table with one colu

RE: HiveMetaStoreClient only sees one of my DBs ?

2013-12-30 Thread java8964
Best mailing list for this question is hive, but I will try to give my guess here anyway. If you only see 'default' database, most likely you are using hive 'LocalMetaStore'. For helping yourself to find out the problem, try to find out following information: 1) What kind of Hive metastore you a

Why from_utc_timestamp works for some bigint, but not others

2013-12-06 Thread java8964
Hi, I am using Hive 0.9.0, and not sure why the from_utc_timestamp gave me error to the following value, but works for others. The following example shows 2 bigint as 2 epoch value of milliseconds level. They are only 11 seconds difference. One works fine in hive 0.9.0 with from_utc_timestamp UD

hive partition pruning on joining on partition column

2013-10-11 Thread java8964 java8964
I have the requirement trying to support in hive, not sure if it is doable. I have the hadoop 1.1.1 with Hive 0.9.0 (Using deby as the meta store) If I partition my data by a dt column, so if my table 'foo' have some partitions like 'dt=2013-07-01' to 'dt=2013-07-30'. Now the user want to query al

RE: Question about how to add the debug info into the hive core jar

2013-03-20 Thread java8964 java8964
conf while hive client is running. #hive -hiveconf hive.root.logger=ALL,console -e " DDL statement ;"#hive -hiveconf hive.root.logger=ALL,console -f ddl.sql ; Hope this helps Thanks On Mar 20, 2013, at 1:45 PM, java8964 java8964 wrote:Hi, I have the hadoop running in pseudo-distributed mode on

A bug belongs to Hive or Elephant-bird

2013-03-08 Thread java8964 java8964
Hi, Hive 0.9.0 + Elephant-Bird 3.0.7 I faced a problem to use the elephant-bird with hive. I know what maybe cause this problem, but I don't know which side this bug belongs to. Let me know explain what is the problem. If we define a google protobuf file, with field name like 'dateString' (the

RE: difference between add jar in hive session and hive --auxpath

2013-03-08 Thread java8964 java8964
This is in HIVE-0.9.0 hive> list jars;/nfs_home/common/userlibs/google-collections-1.0.jar/nfs_home/common/userlibs/elephant-bird-hive-3.0.7.jar/nfs_home/common/userlibs/protobuf-java-2.3.0.jar/nfs_home/common/userlibs/elephant-bird-core-3.0.7.jarfile:/usr/lib/hive/lib/hive-builtins-0.9.0-cdh4.1.

difference between add jar in hive session and hive --auxpath

2013-03-07 Thread java8964 java8964
Hi, I have a hive table which uses the jar file provided from the elephant-bird, which is a framework integrated between lzo and google protobuf data and hadoop/hive. If I use the hive command like this: hive --auxpath path_to_jars, it works fine to query my table, but if I use the add jar aft

RE: hive add jar question

2012-12-21 Thread java8964 java8964
can access them just by their name in your code. > > About #2, doesn't sound normal to me. Did you figure that out or still > running into it? > > Mark > > On Thu, Dec 20, 2012 at 5:01 PM, java8964 java8964 > wrote: > > Hi, I have 2 questions related to the h

RE: reg : getting table values in inputFormat in serde

2012-12-21 Thread java8964 java8964
Actually I am backing up this question. In additional for that, I wonder if it is possible we can access the table properties from the UDF too. I also have XML data, but with namespace into it. The XPATH UDF coming from HIVE doesn't support namespace. To support the namespace in XML is simple, j

RE: xpath UDF in hive support namespace?

2012-12-19 Thread java8964 java8964
ou are > welcome to do so by creating a JIRA and posting a patch. UDFs are an > easy and excellent way to contribute back to the Hive community. > > Thanks! > > Mark > > On Wed, Dec 19, 2012 at 8:52 AM, java8964 java8964 > wrote: > > Hi, I have a question related to

xpath UDF in hive support namespace?

2012-12-19 Thread java8964 java8964
Hi, I have a question related to the XPATH UDF currently in HIVE. >From the original Jira story about this UDF: >https://issues.apache.org/jira/browse/HIVE-1027, It looks like the UDF won't >support namespace in the XML, is that true? Any later HIVE version does support namespace, if so, what i

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
optimize.cp=false; > set hive.optimize.ppd=false; > > 2012/12/13 java8964 java8964 : > > Hi, > > > > I played my query further, and found out it is very puzzle to explain the > > following behaviors: > > > > 1) The following query works: > > > > select

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
Hi, I played my query further, and found out it is very puzzle to explain the following behaviors: 1) The following query works: select c_poi.provider_str, c_poi.name from (select darray(search_results, c.rank) as c_poi from nulf_search lateral view explode(search_clicks) clickTable as c) a I g

RE: Array index support non-constant expresssion

2012-12-12 Thread java8964 java8964
OK. I followed the hive source code of org.apache.hadoop.hive.ql.udf.generic.GenericUDFArrayContains and wrote the UDF. It is quite simple. It works fine as I expected for simple case, but when I try to run it under some complex query, the hive MR jobs failed with some strange errors. What I

Array index support non-constant expresssion

2012-12-11 Thread java8964 java8964
Hi, In our project to use the HIVE on CDH3U4 release (Hive 0.7.1), I have a hive table like the following: Table foo ( search_results array> search_clicks array>) As you can see, the 2nd column, which represents the list of search results clicked, contains the index location of which result

Hive 0.7 use the old mapred API

2012-12-03 Thread java8964 java8964
Hi, Our company current is using CDH3 release, which comes with Hive 0.7.1. Right now, I have the data coming from another team, which also provides the custom InputFormat and RecorderReader, but using the new mapreduce API. I am trying to build a hive table on these data, and hope I can reuse t

RE: need help on writing hive query

2012-11-03 Thread java8964 java8964
This is not a hive but a SQL question. You need to be more clear about your data, and try to think a way to solve your problem. Without the detail about your data, no easy way to answer your question. For example, just based on your example data you provide, does the 'abc' and 'cde' only happen

RE: need help on writing hive query

2012-10-31 Thread java8964 java8964
If you don't need to join current_web_page and previous_web_page, assuming you can just trust the time stamp, as Phil points out, an custom UDF of collect_list() is the way to go. You need to implement collect_list() UDF by yourself, hive doesn't have one by default.But it should be straight fo

is it possible to disable running termiatePartial and merge() methods in UDAF

2012-10-01 Thread java8964 java8964
Hi, I am trying to implement a UDAF of Kurtosis (<�a href="http://en.wikipedia.org/wiki/Kurtosis";>http://en.wikipedia.org/wiki/Kurtosis<�/a> in the hive. I already found a library to do it, from Apache commons math (<�a href="http://commons.apache.org/math/apidocs/org/apache/commons/math/stat

RE: How can I get the constant value from the ObjectInspector in the UDF

2012-09-26 Thread java8964 java8964
2012 at 4:17 AM, java8964 java8964 wrote: Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version. I am trying to write a hive UDF function as to calculate the moving sum. Right now, I am having trouble to get the constrant value passed in in the initialization stage. For exampl

How can I get the constant value from the ObjectInspector in the UDF

2012-09-25 Thread java8964 java8964
Hi, I am using Cloudera release cdh3u3, which has the hive 0.71 version. I am trying to write a hive UDF function as to calculate the moving sum. Right now, I am having trouble to get the constrant value passed in in the initialization stage. For example, let's assume the function is like the fo

Question about org.apache.hadoop.hive.contrib.serde2.RegexSerDe

2012-04-03 Thread java8964 java8964
Hi, I have a question about the behavior of the class org.apache.hadoop.hive.contrib.serde2.RegexSerDe. Here is the example I tested using the Cloudra hive-0.7.1-cdh3u3 release. The above class did NOT do what I expect, any one knows the reason? user:~/tmp> more Test.javaimport java.io.*;impor