RE: Failed to create table in Hive (AlreadyExistsException)

2017-06-23 Thread Markovitz, Dudu
Kaustubh, there is not much to do without you suppling a way to reproduce the issue and/or the relevant logs. Dudu From: Kaustubh Deshpande [mailto:kaustubh.deshpa...@exadatum.com] Sent: Friday, June 23, 2017 10:29 AM To: user@hive.apache.org; dev-subscr...@hive.apache.org Subject: Failed to cr

RE: Unable to use "." in column name

2017-05-06 Thread Markovitz, Dudu
Hi Ben Check http://stackoverflow.com/questions/43808435/cannot-use-a-in-a-hive-table-column-name Dudu From: Ben Johnson [mailto:b...@timber.io] Sent: Friday, May 05, 2017 6:25 PM To: user@hive.apache.org Subject: Unable to use "." in column name Hi, I have a fairly basic question. I'm attemp

RE: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-04 Thread Markovitz, Dudu
it not the case that the LOAD causes the data to get inserted into Hive? Based on that I'd like to understand whether we can get away with using LOAD INPATH instead of INSERT/SELECT FROM. On Apr 4, 2017, at 1:43 PM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: I just wa

RE: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-04 Thread Markovitz, Dudu
mitry On Tue, Apr 4, 2017 at 12:20 PM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Are your files already in Parquet format? From: Dmitry Goldenberg [mailto:dgoldenb...@hexastax.com<mailto:dgoldenb...@hexastax.com>] Sent: Tuesday, April 04, 2017 7:03 PM To: user@hive.apach

RE: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-04 Thread Markovitz, Dudu
altogether. Am I missing something in groking this latter part of your response? Thanks, - Dmitry On Tue, Apr 4, 2017 at 11:26 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Since LOAD DATA INPATH only moves files the answer is very simple. If you’re files are already in a

RE: Is it possible to use LOAD DATA INPATH with a PARTITIONED, STORED AS PARQUET table?

2017-04-04 Thread Markovitz, Dudu
Since LOAD DATA INPATH only moves files the answer is very simple. If you’re files are already in a format that matches the destination table (storage type, number and types of columns etc.) then – yes and if not, then – no. But – You don’t need to load the files into intermediary table. You sh

How to reset textinputformat.record.delimiter?

2017-03-16 Thread Markovitz, Dudu
Good morning If you have the knowledge of how to reset the textinputformat.record.delimiter within hive cli / beeline I'll appreciate it if you could share it. The full question is posted in Stack Overflow and it has an open bounty worth +50 reputation ending tomorrow. How to reset textinputfor

RE: Hive table for a single file: CREATE/ALTER TABLE differences

2017-01-25 Thread Markovitz, Dudu
Wow. This is gold. Dudu From: Dmitry Tolpeko [mailto:dmtolp...@gmail.com] Sent: Wednesday, January 25, 2017 6:47 PM To: user@hive.apache.org Subject: Hive table for a single file: CREATE/ALTER TABLE differences ​ ​I accidentally noticed​​ one feature: (it is well know ​n​ that in CREATE TABLE yo

RE: import sql file

2016-11-23 Thread Markovitz, Dudu
Hi Patcharee The question is not clear. Dudu -Original Message- From: patcharee [mailto:patcharee.thong...@uni.no] Sent: Wednesday, November 23, 2016 11:37 AM To: user@hive.apache.org Subject: import sql file Hi, How can I import .sql file into hive? Best, Patcharee

RE: Nested JSON Parsing

2016-11-12 Thread Markovitz, Dudu
And your issue/question is? From: Ajay Tirpude [mailto:tirpudeaj...@gmail.com] Sent: Sunday, November 13, 2016 4:46 AM To: user@hive.apache.org Subject: Nested JSON Parsing Dear All, I am trying to parse this json file given below and my intention is to convert this json file into a csv. { "

RE: Hive Left Join inequality condition

2016-11-11 Thread Markovitz, Dudu
My pleasure ☺ Dudu From: Goden Yao [mailto:goden@gmail.com] Sent: Friday, November 11, 2016 1:26 AM To: user@hive.apache.org Subject: Re: Hive Left Join inequality condition This worked!! Thanks so much Dudu!! On Sat, Nov 5, 2016 at 1:03 PM Markovitz, Dudu mailto:dmarkov...@paypal.com

RE: Hive Left Join inequality condition

2016-11-05 Thread Markovitz, Dudu
Ugly as hell, but should work. Dudu SELECT r_id, CASE WHEN table1.property_value = 'False' THEN FALSE WHEN table1.property_value = 'True' THEN TRUE WHEN r.rea < rg.laa THEN FALSE WHEN r.rea >= rg.laa THEN TRUE ELSE FALSE END AS flag FROM

RE: HDFS small files to Sequence file using Hive

2016-09-23 Thread Markovitz, Dudu
Hi I’m not sure how this will solve the issue you were mentioned, but just for the fun of it – Here is the code. Dudu set textinputformat.record.delimiter='\0'; set hive.mapred.supports.subdirectories=true; set mapred.input.dir.recursive=true; create external table if not exists files_ext (txt

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
If these are dimension tables, what do you need to update there? Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 1:46 PM To: user@hive.apache.org Subject: Re: on duplicate update equivalent? On Fri, Sep 23, 2016 at 3:47 PM, Mich Talebzadeh mailto:mich

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
alesce (s.i,t.i) as i,coalesce (s.c,t.c) from src as s full join trg as t on t.i = s.i; alter view trg as select * from trg1; drop table if exists trg2; etc… From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Friday, September 23, 2016 1:02 PM To: user@hive.apache.org Subject: RE: on

RE: on duplicate update equivalent?

2016-09-23 Thread Markovitz, Dudu
We’re not there yet… https://issues.apache.org/jira/browse/HIVE-10924 Dudu From: Vijay Ramachandran [mailto:vi...@linkedin.com] Sent: Friday, September 23, 2016 11:47 AM To: user@hive.apache.org Subject: on duplicate update equivalent? Hello. Is there a way to write a query with a behaviour equi

RE: Extracting data from ELB log date format

2016-09-21 Thread Markovitz, Dudu
select to_date(ts),year(ts),month(ts),day(ts),hour(ts),minute(ts),second(ts) from (select from_unixtime (unix_timestamp ('2016-09-15T23:45:22.943762Z',"-MM-dd'T'HH:mm:ss")) as ts) as t; OK 2016-09-15 2016 915 23 45 22 Dudu From: Manish Rangari [mailto:linuxtricksfordev...@gmail.co

RE: ELB Log processing

2016-09-20 Thread Markovitz, Dudu
#x27;,'aid') as aid, parse_url(url, 'QUERY','tid') as tid, parse_url(url, 'QUERY','eid') as eid, parse_url(url, 'QUERY','did') as did, protocol, useragent, ssl_cipher, ssl_protocol from elblog; Dudu From: Markovitz, Dudu [

RE: ELB Log processing

2016-09-20 Thread Markovitz, Dudu
create view elb_raw_log_detailed as select request_date, elbname, requestip, requestport, backendip, backendport, requestprocessingtime, backendprocessingtime, clientresponsetime, elbresponsecode, backendresponsecode, receivedbytes, sentbytes, requestverb, url, u.aid, u.tid, u.eid,u.did, protoco

RE: What's the best way to find the nearest neighbor in Hive? Any windowing function?

2016-09-14 Thread Markovitz, Dudu
It seems you’ll have to go with JOIN. Here are 2 options. Dudu select t0.id as id_0 ,min (named_struct ("dist",abs((t1.price - t0.price)/100) + abs((t1.number - t0.number)/

RE: Beeline throws OOM on large input query

2016-09-06 Thread Markovitz, Dudu
d a work around and troubleshooting beeline wasn't my primary goal :) Reply to Markovitz, Dudu The query is basically finding geometry intersections. If you are familiar with Postgis, it is a Java version of the Postgis function ST_Intersects (http://postgis.net/docs/ST_Intersects.html) wrapped

RE: Beeline throws OOM on large input query

2016-09-03 Thread Markovitz, Dudu
Hi Adam I’m not clear about what you are trying to achieve in your query. Can you please give a small example? Thanks Dudu From: Adam [mailto:work@gmail.com] Sent: Friday, September 02, 2016 4:13 PM To: user@hive.apache.org Subject: Re: Beeline throws OOM on large input query I set the he

RE: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Markovitz, Dudu
The managed/external table is non-issue since you can create the table as managed and after that alter it to external. ALTER TABLE t SET TBLPROPERTIES('EXTERNAL'='TRUE') Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Sunday, August 07, 2016 5:02 PM To: Marcin Tustin Cc: us

RE: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Markovitz, Dudu
It was not clear from the original post so I have asked for clarification and the answer was that the partition columns should be excluded. Dudu From: Marcin Tustin [mailto:mtus...@handybook.com] Sent: Sunday, August 07, 2016 4:40 PM To: Mich Talebzadeh Cc: user Subject: Re: Crate Non-partitio

RE: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-07 Thread Markovitz, Dudu
It won’t help him since ‘*’ represent all columns including the partition columns which he wants to exclude. Dudu From: Marcin Tustin [mailto:mtus...@handybook.com] Sent: Sunday, August 07, 2016 3:17 PM To: user@hive.apache.org Subject: Re: Crate Non-partitioned table from partitioned table usin

RE: Crate Non-partitioned table from partitioned table using CREATE TABLE .. LIKE

2016-08-06 Thread Markovitz, Dudu
Hi Should your destination table contain the source partitions values? e.g. Assuming this is the source table: create table src (cust_id int,cust_first_name string,cust_last_name string) partitioned by (yr string,mn string,dt string); Should the destination table look like create table dst (

RE: Error running SQL query through Hive JDBC

2016-08-05 Thread Markovitz, Dudu
e.printStackTrace(); } catch (SQLException e) { e.printStackTrace(); } From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Friday, August 05, 2016 3:04 PM To: user@hive.apache.org<mailto:user@

RE: Error running SQL query through Hive JDBC

2016-08-05 Thread Markovitz, Dudu
Can you please share the query? From: Amit Bajpai [mailto:amit.baj...@flextronics.com] Sent: Friday, August 05, 2016 10:40 PM To: user@hive.apache.org Subject: Error running SQL query through Hive JDBC Hi, I am getting the below error when running the SQL query through Hive JDBC. Can suggestion

RE: Alternatives to self join

2016-07-26 Thread Markovitz, Dudu
Hi Can you please send your original query and perhaps a small dataset sample? Thanks Dudu From: Buntu Dev [mailto:buntu...@gmail.com] Sent: Tuesday, July 26, 2016 10:46 AM To: user@hive.apache.org Subject: Alternatives to self join I'm currently doing a self-join on a table 4 times on varying

RE: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-07-14 Thread Markovitz, Dudu
. however not sure if this meets your use case. Sent from my iPhone On 13 Jul 2016, at 19:50, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Hi I’m personally not aware of other methods to achieve case insensitivity comparison but to use lower() / upper() Dudu From: Mahende

RE: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-07-14 Thread Markovitz, Dudu
rg Subject: Re: Any way in hive to have functionality like SQL Server collation on Case sensitivity You can use use any Java function in Hive without (!) the need to wrap it in an UDF via the reflect command. however not sure if this meets your use case. Sent from my iPhone On 13 Jul 2016, at

RE: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-07-13 Thread Markovitz, Dudu
? /MS On 5/25/2016 9:05 AM, Markovitz, Dudu wrote: It will not be suitable for JOIN operation since it will cause a Cartesian product. Any chosen solution should determine a single representation for any given string. Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On 12 July 2016 at 08:16, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote:

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
ng from such loss, damage or destruction. On 12 July 2016 at 08:16, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: This is a simple task – Read the files, find the local max value and combine the results (find the global max value). How do you explain the differences in the results? Sp

RE: Using Spark on Hive with Hive also using Spark as its execution engine

2016-07-12 Thread Markovitz, Dudu
This is a simple task – Read the files, find the local max value and combine the results (find the global max value). How do you explain the differences in the results? Spark reads the files and finds a local max 10X (+) faster than MR? Can you please attach the execution plan? Thanks Dudu F

RE: RegexSerDe with Filters

2016-07-02 Thread Markovitz, Dudu
planning to join these 3 views based on TID. Do I need to take any special considerations? Regards, Venkat On Fri, Jun 24, 2016 at 5:17 PM, Arun Patel mailto:arunp.bigd...@gmail.com>> wrote: Dudu, Thanks for the clarification. Looks like I have an issue with my Hive installation.

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-07-01 Thread Markovitz, Dudu
ber () over (partition by a.group_id order by null) + a.accumulated_rows as ETL_ROW_ID from INTER_ETL as t joingroup_rows_accumulated as a on a.group_id = abs(hash(MyColumn))%1 ; From: Markovitz, D

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-30 Thread Markovitz, Dudu
1 39567412227 40529759537 From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Wednesday, June 29, 2016 11:37 PM To: sanjiv.is...@gmail.com Cc: user@hive.apache.org Subject: RE: Query Performance Issue : Group By and Distinct and load on reducer 1. This is

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-29 Thread Markovitz, Dudu
a1.group_id group bya1.group_id ; From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Wednesday, June 29, 2016 10:55 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer Hi Dudu, I tried the same on same ta

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
: Tuesday, June 28, 2016 11:52 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and load on reducer ETL_ROW_ID is to be consecutive number. I need to check if having unique number would not break any logic. Considering unique number for

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
I’m guessing ETL_ROW_ID should be unique but not necessarily contain only consecutive numbers? From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 10:57 PM To: Markovitz, Dudu Cc: user@hive.apache.org Subject: Re: Query Performance Issue : Group By and Distinct and

RE: Query Performance Issue : Group By and Distinct and load on reducer

2016-06-28 Thread Markovitz, Dudu
The row_number operation seems to be skewed. Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Tuesday, June 28, 2016 8:54 PM To: user@hive.apache.org Subject: Query Performance Issue : Group By and Distinct and load on reducer Hi All, I am having performance issue with data skew o

RE: Hive error : Can not convert struct<> to

2016-06-28 Thread Markovitz, Dudu
The staging table has no partitions, so no issue there. Also, the error specifically refers to the covertion between the struct types. Dudu FAILED: SemanticException [Error 10044]: Line 2:23 Cannot insert into target table because column number/types are different ''CA'': Cannot convert c

RE: Hive error : Can not convert struct<> to

2016-06-28 Thread Markovitz, Dudu
Hi The fields' names are part of the struct definition. Different names, different types of structs. Dudu e.g. Setup create table t1 (s struct); create table t2 (s struct); insert into table t1 select named_struct('c1',1,'c2',2);

RE: Querying Hive tables from Spark

2016-06-27 Thread Markovitz, Dudu
Hi Mich I could not figure out what is the point you are trying to make. Could you please clarify? Thanks Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Monday, June 27, 2016 12:20 PM To: user @spark ; user Subject: Querying Hive tables from Spark Hi, I have done some e

RE: RegexSerDe with Filters

2016-06-24 Thread Markovitz, Dudu
Filters Looks like Regex pattern is not working. I tested the pattern on https://regex101.com/ and it does not find any match. Any suggestions? On Thu, Jun 23, 2016 at 3:01 PM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: My pleasure. Please feel free to reach me if needed. Dudu

RE: Optimize Hive Query

2016-06-23 Thread Markovitz, Dudu
Thanks, I wanted to rule out skewedness over m_d_key,sb_gu_key Dudu From: @Sanjiv Singh [mailto:sanjiv.is...@gmail.com] Sent: Thursday, June 23, 2016 11:55 PM To: user@hive.apache.org; Markovitz, Dudu ; sanjiv singh (ME) Subject: Re: Optimize Hive Query Hi Dudu, find below query response

RE: Optimized Hive query

2016-06-23 Thread Markovitz, Dudu
Any progress on this one? Dudu From: Aviral Agarwal [mailto:aviral12...@gmail.com] Sent: Wednesday, June 15, 2016 1:04 PM To: user@hive.apache.org Subject: Re: Optimized Hive query I ok to digging down to the AST Builder class. Can you guys point me to the right class ? Meanwhile "explain (rew

RE: RegexSerDe with Filters

2016-06-23 Thread Markovitz, Dudu
, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Hi Here is the code (without the log data). I’ve created some of the views using different text processing technics. The rest of the views could be create in similar ways. Dudu

RE: Optimize Hive Query

2016-06-23 Thread Markovitz, Dudu
Could you also add the results of the following query? Thanks Dudu select m_d_key ,sb_gu_key ,count (*) as cnt fromtuning_dd_key group bym_d_key ,sb_gu_key order bycnt desc limit 100 ; -Original Message-

RE: RegexSerDe with Filters

2016-06-21 Thread Markovitz, Dudu
,tid ,regexp_extract (txt,'rowkey (\\S+)',1) as rowkey ,regexp_extract (txt,'lock id (\\S+)',1) as lock_id fromlog_V where txt like 'Unlock request for schema DU %' ; From: Markovitz, Dudu [mailto:dmarkov...@paypal.com]

RE: if else condition in hive

2016-06-21 Thread Markovitz, Dudu
I understand that you’re looking for the functionality of the MERGE statement. 1) MERGE is currently an open issue. https://issues.apache.org/jira/browse/HIVE-10924 2) UPDATE and DELETE (and MERGE in the future) work under a bunch of limitations, e.g. – Currently only ORC tables are supported ht

RE: RegexSerDe with Filters

2016-06-21 Thread Markovitz, Dudu
Hi I would suggest creating a single external table with daily partitions and multiple views each with the appropriate filtering. If you’ll send me log sample (~100 rows) I’ll send you an example. Dudu From: Arun Patel [mailto:arunp.bigd...@gmail.com] Sent: Tuesday, June 21, 2016 1:51 AM To: us

RE: Is there any GROUP_CONCAT Function in Hive

2016-06-15 Thread Markovitz, Dudu
Have you tried to increase the heap size (worked for me)? E.g. - bash mkdir t awk 'BEGIN{OFS=",";for(i=0;i<1000;++i){print i,i}}' > t/t.csv hdfs dfs -put t /tmp export HADOOP_OPTS="$HADOOP_OPTS -Xmx1024m" hive create external table t (i int,s string) row format delimited fields terminated b

RE: Is there any GROUP_CONCAT Function in Hive

2016-06-15 Thread Markovitz, Dudu
Hi Out of curiosity, could you share what is the motivation for that? Thanks Dudu From: Mahender Sarangam [mailto:mahender.bigd...@outlook.com] Sent: Thursday, June 16, 2016 1:47 AM To: user@hive.apache.org Subject: Is there any GROUP_CONCAT Function in Hive Hi, We have Hive table with 3 GB

RE: column statistics for non-primitive types

2016-06-15 Thread Markovitz, Dudu
Guys, While some types of statistics are quite cheap to compute (min / max / count etc.) some are quite expensive (distinct values / histograms etc.). Furthermore, some of my experience involves working with tables of hundreds and thousands of columns, where due to usage, only few of them are re

RE: column statistics for non-primitive types

2016-06-15 Thread Markovitz, Dudu
Hi Michael Case ‘b’ (“answer query directly”) seems to be risky in an open system. Files/directories can be deleted directly in the filesystem without Hive having any knowledge about it which will lead to wrong queries results. Dudu From: Michael Häusler [mailto:mich...@akatose.de] Sent: Tuesda

RE: Optimized Hive query

2016-06-14 Thread Markovitz, Dudu
CCdOABUrV8Pw http://talebzadehmich.wordpress.com<http://talebzadehmich.wordpress.com/> On 14 June 2016 at 17:46, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: It’s not the query that is being optimized but the syntax tree that is created upon the query (execute “explain extended selec

RE: Optimized Hive query

2016-06-14 Thread Markovitz, Dudu
It’s not the query that is being optimized but the syntax tree that is created upon the query (execute “explain extended select …”) In no point do we have a “flattened query” Dudu From: Aviral Agarwal [mailto:aviral12...@gmail.com] Sent: Tuesday, June 14, 2016 10:37 AM To: user@hive.apache.org S

RE: Issue in Insert Overwrite directory operation

2016-06-14 Thread Markovitz, Dudu
There seems to be a known bug fixed on version 1.3 https://issues.apache.org/jira/browse/HIVE-12364 Dudu From: Udit Mehta [mailto:ume...@groupon.com] Sent: Tuesday, June 14, 2016 2:55 AM To: user@hive.apache.org Subject: Issue in Insert Overwrite directory operation Hi All, I see a weird issue

RE: same hdfs location with different schema exception

2016-06-14 Thread Markovitz, Dudu
Hi Can you please share the query? Thanks Dudu From: 赵升/赵荣生 [mailto:roncenz...@qq.com] Sent: Tuesday, June 14, 2016 5:26 AM To: user Subject: same hdfs location with different schema exception Hi all: I have a question when using hive. It's described as follows: Firstly, I create two tab

RE: Optimized Hive query

2016-06-13 Thread Markovitz, Dudu
Hi You don’t need to do anything, the optimizer does it for you. You can see that you get identical execution plans for the nested query and the flatten one. Dudu > create multiset table t (i int); > explain select * from t; +---

RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-11 Thread Markovitz, Dudu
/2016 7:24 AM, Markovitz, Dudu wrote: regexp_extract ('(,?[^,]*){0,10}',0) (...){0,10} The expression surrounded by brackets repeats 0 to 10 times. (,?[…]*) Optional comma followed by sequence (0 or more) of characters [^,] Any character which is not comma regexp_extract (...,0

RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-10 Thread Markovitz, Dudu
June 10, 2016 2:54 PM To: user@hive.apache.org Subject: Re: Get 100 items in Comma Separated strings from Hive Column. Thanks Dudu. I will check. Can you please throw some light on regexp_replace (((,?[^,]*){0,10}).*','$1') regexp_extract ('(,?[^,]*){0,10}',0), On 6/9/2

RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu
+ bug fix This version will differentiate between empty strings and strings with a single token (both have no commas) hive> select RowID,length(regexp_replace(stringColumn,'[^,]',''))+if(length(stringColumn)=0,0,1) as count from t; From: Markovitz, Dudu [mailto:dmar

RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu
17 5 8 6 11 7 26 8 18 9 9 From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Thursday, June 09, 2016 11:30 PM To: user@hive.apache.org Subject: RE: Get 100 items in Comma Separated s

RE: Get 100 items in Comma Separated strings from Hive Column.

2016-06-09 Thread Markovitz, Dudu
-- bash mkdir t cat>t/data.txt 1|44,85 2|56,37,83,68,43 3|33,48,42,18,23,80,31,86,48,42,37,52,9

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
d","rail","shock"],["Quick","as","a","wink"],["She","changed","her","mind"]] 2 Goodnight Saigon [["We","met","as","soul","ma

RE: Need Your Inputs For Below Scenario

2016-06-09 Thread Markovitz, Dudu
Explode + joins -- bash mkdir t1 mkdir t2 cat>t1/data.txt A B1 B2B4

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
"stood","on","the","tracks","Waving","her","arms","Leading","me","to","that","third","rail","shock","Quick","as","a","wink&q

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-09 Thread Markovitz, Dudu
should resolve your issue. Plz try and let me know if it works. Abhi Sent from my iPhone On Jun 3, 2016, at 3:33 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only

RE: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Markovitz, Dudu
P.s. There are some risky data manipulations going there. I’m not sure this is a desired result… ☺ hive> select CAST(REGEXP_REPLACE('And the Lord spake, saying, "First shalt thou take out the Holy Pin * Then shalt thou count to 3, no more, no less * 3 shall be the number thou shalt count, and t

RE: Why does the user need write permission on the location of external hive table?

2016-06-06 Thread Markovitz, Dudu
Hi guys I would strongly recommend not to work with zipped files. “Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel” https://cwiki.apache.org/confluence/display/Hive/CompressedStorage Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] S

RE: alter partitions on hive external table

2016-06-06 Thread Markovitz, Dudu
< date '2015-02-01') or (user='raj' and substr(cust,-1) = 3) ; 1 dudu cust1 2015-01-22 hdfs://quickstart.cloudera:8020/tmp/t/20150122/dudu/cust1/data.txt 2 dudu cust1 2015-01-22 hdfs://quickstart.cloudera:8020/tmp/t/201501

RE: alter partitions on hive external table

2016-06-06 Thread Markovitz, Dudu
… are just logical connections between certain values and specific directories … From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Monday, June 06, 2016 6:07 PM To: user@hive.apache.org Subject: RE: alter partitions on hive external table Hi Raj 1. I don’t understand the reason

RE: alter partitions on hive external table

2016-06-06 Thread Markovitz, Dudu
Hi Raj 1. I don’t understand the reason for this change, can you please elaborate? 2. External table is just an interface. Instructions for how to read existing data. Partitions of external table are just a logical connections between certain values and a specific directories.

RE: Convert date in string format to timestamp in table definition

2016-06-05 Thread Markovitz, Dudu
data with integer or string column partition and see which one is more convenient for our use. On Sat, Jun 4, 2016 at 2:20 PM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: I’m not aware of an option to do what you request in the external table definition but you might want to

RE: Convert date in string format to timestamp in table definition

2016-06-04 Thread Markovitz, Dudu
I’m not aware of an option to do what you request in the external table definition but you might want to that using a view. P.s. I seems to me that defining the partition column as a string would be more user friendly than integer, e.g. – select * from threads_test where mmdd like ‘2016%’ –

RE: External partitoned table based on yyyy/mm/dd HDFS structure

2016-06-03 Thread Markovitz, Dudu
27;Alter Table...'. Does Hive have function similar to EXEC in MS SQL? I am thinking to construct 'alter table...' string dynamically every day and execute it somehow. On Fri, Jun 3, 2016 at 5:22 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: 1. If the directory n

RE: LINES TERMINATED BY only supports newline '\n' right now

2016-06-03 Thread Markovitz, Dudu
Here is an example, but first – some warnings: · You should set textinputformat.record.delimiter not only for the populating of the table but also for querying it · There seems to be many issues around this area – o When I tried to insert multiple values in a single stateme

RE: External partitoned table based on yyyy/mm/dd HDFS structure

2016-06-03 Thread Markovitz, Dudu
table t; if I want partitions automatically picked up. Hive does not have this feature. Correct? What is the optimal directory size for a partition? Is about 2GB OK? On Wed, Jun 1, 2016 at 4:38 PM, Markovitz, Dudu mailto:dmarkov...@paypal.com>> wrote: The short answer: In this naming convention it

RE: External partitoned table based on yyyy/mm/dd HDFS structure

2016-06-01 Thread Markovitz, Dudu
The short answer: In this naming convention it will require to specifically define each partition. If the naming convention was =2016/mm=11/dd=28 instead of 2016/11/28 it would have been straight forward. Dudu + The long answer: --

RE: How to disable SMB join?

2016-05-31 Thread Markovitz, Dudu
Hi The documentation describes a scenario where SMB join leads to the same error you’ve got. It claims that changing the order of the tables solves the problem. Dudu https://cwiki.apache.org/confluence/display/Hive/LanguageManual+JoinOptimization#LanguageManualJoinOptimization-SMBJoinacrossTab

RE: Does hive need exact schema in Hive Export/Import?

2016-05-30 Thread Markovitz, Dudu
Hi 1) I was able to do the import by doing the following manipulation: · Export table dev101 · Create an empty table dev102 · Export table dev102 · replace the _metadata file of dev101 with the _metadata file of dev102 · import table dev101 to table de

RE: Test

2016-05-29 Thread Markovitz, Dudu
Hi ☺ From: Igor Kravzov [mailto:igork.ine...@gmail.com] Sent: Sunday, May 29, 2016 8:02 PM To: user@hive.apache.org Subject: Test Please someone reply. Not sure if subscribed properly

RE: Any way in hive to have functionality like SQL Server collation on Case sensitivity

2016-05-25 Thread Markovitz, Dudu
It will not be suitable for JOIN operation since it will cause a Cartesian product. Any chosen solution should determine a single representation for any given string. Dudu From: Mich Talebzadeh [mailto:mich.talebza...@gmail.com] Sent: Wednesday, May 25, 2016 1:31 AM To: user Subject: Re: Any w

RE: Hive 2 database Entity-Relationship Diagram

2016-05-19 Thread Markovitz, Dudu
Thanks Mich I’m afraid the current format is not completely user friendly. I would suggest to divide the tables to multiple sets by subjects / graph connectivity (BTW, it seems odd that most of the tables are disconnected) Also – · HIVEUSER.PARTITION_KEY_VALS is partially covering anoth

RE: Could i use Hive SQL parser in our application?

2016-05-18 Thread Markovitz, Dudu
Hi Can you please share what was the problem? Thanks Dudu From: Heng Chen [mailto:heng.chen.1...@gmail.com] Sent: Thursday, May 19, 2016 7:07 AM To: user@hive.apache.org Subject: Re: Could i use Hive SQL parser in our application? Got it now! Thanks again for your help! guys! 2016-05-19 11:0

RE: Would like to be a user

2016-05-17 Thread Markovitz, Dudu
Thanks ☺ From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: Tuesday, May 17, 2016 10:06 PM To: user@hive.apache.org Subject: Re: Would like to be a user Done. Welcome to the Hive wiki team, Dudu! -- Lefty On Tue, May 17, 2016 at 3:04 AM, Markovitz, Dudu mailto:dmarkov...@paypal.com

RE: Would like to be a user

2016-05-17 Thread Markovitz, Dudu
Hi Lefty Can you add me as well? My user is dudu.markovitz Thanks Dudu From: Lefty Leverenz [mailto:leftylever...@gmail.com] Sent: Monday, May 16, 2016 11:44 PM To: user@hive.apache.org Subject: Re: Would like to be a user Welcome Gail, I've given you write access to the Hive wiki. Thanks in

RE: Hive cte Alias problem

2016-05-11 Thread Markovitz, Dudu
Hi It seem that you are right and it a bug with the CTE when there’s an “IS NULL” predicate involved. I’ve opened a bug for this. https://issues.apache.org/jira/browse/HIVE-13733 Dudu hive> create table t (i int,a string,b string); hive> insert into t values (1,'hello','world'),(2,'bye',null);

RE: Create external table

2016-05-11 Thread Markovitz, Dudu
Could not reproduced that issue on Cloudera quickstart VM. I’ve created an HDFS directory with 10,000 files. I’ve create external table from within beeline. The creation was immediate. Dudu --- bash -

RE: Any difference between LOWER and LCASE

2016-05-10 Thread Markovitz, Dudu
Hi According to documentation LCASE is a synonym for LOWER. From what I've seen in the source code, it seems right. https://github.com/apache/hive/blob/f089f2e64241592ecf8144d044bec8a0659ff422/ql/src/java/org/apache/hadoop/hive/ql/exec/FunctionRegistry.java system.registerGenericUDF("lower",

RE: Unsupported SubQuery Expression '1': Only SubQuery expressions that are top level conjuncts are allowed

2016-05-10 Thread Markovitz, Dudu
dification as per our requirement. ur query helped me to modify as per our requirement. On 5/4/2016 10:57 AM, Markovitz, Dudu wrote: Hi The syntax is not Hive specific but SQL ANSI/ISO. In a series of “JOIN … ON …” any “ON” can (but not necessarily have to) refer any of its preceding t

RE: Unsupported SubQuery Expression '1': Only SubQuery expressions that are top level conjuncts are allowed

2016-05-04 Thread Markovitz, Dudu
below logic, I see First you are starting join of table1 with result set of Group by > 1 and perform left join with table2, how can we get reference a. alias of joined result or will hive pickup "a" column from table 1 and 3 column in table2. thanks in advance On 5/3/2016 11:24 A

RE: multiple selects on a left join give incorrect result

2016-05-03 Thread Markovitz, Dudu
There is no issue on Cloudera VM Dudu [cloudera@quickstart ~]$ hadoop version Hadoop 2.6.0-cdh5.5.0 Subversion http://github.com/cloudera/hadoop -r fd21232cef7b8c1f536965897ce20f50b83ee7b2 Compiled by jenkins on 2015-11-09T20:37Z Compiled with protoc 2.5.0 From source with checksum 98e07176d178

RE: Query fails if condition placed on Parquet struct field

2016-05-03 Thread Markovitz, Dudu
Hi Can you send the execution plans of both versions? Thanks Dudu From: Jose Rozanec [mailto:jose.roza...@mercadolibre.com] Sent: Tuesday, May 03, 2016 11:13 PM To: Haas, Nichole Cc: user@hive.apache.org Subject: Re: Query fails if condition placed on Parquet struct field Hi! Is not due to m

RE: Unsupported SubQuery Expression '1': Only SubQuery expressions that are top level conjuncts are allowed

2016-05-03 Thread Markovitz, Dudu
Forget about the BTW… Apparently hive behaves like sqlite in that matter and not like other databases hive> select 1 from table1 having 1=1; FAILED: SemanticException HAVING specified without GROUP BY From: Markovitz, Dudu [mailto:dmarkov...@paypal.com] Sent: Tuesday, May 03, 2016 8:36 PM

RE: Unsupported SubQuery Expression '1': Only SubQuery expressions that are top level conjuncts are allowed

2016-05-03 Thread Markovitz, Dudu
ol2B.Col3 From Table1 A LEFT OUTER JOIN Table2 B ON A.Col3= B.Col3 AND NOT EXISTS(SELECT 1 FROM Table2 B WHERE B.Col1= A.Col1 GROUP BY A.Col1 HAVING COUNT(*)>1 ) AND (CASE WHEN ISNULL(A.Col2,'\;') = '\;' THEN 'NOT-NULL' ELSE 'NULL' END

RE: Unsupported SubQuery Expression '1': Only SubQuery expressions that are top level conjuncts are allowed

2016-05-02 Thread Markovitz, Dudu
Hi Before dealing the issue itself, can you please fix the query? There are 3 aliased tables - Table1 (A), Table2 (B) & Table2 (mb) but you’re using additional 2 aliases – ma & adi1. Thanks Dudu select A.Col1,A.Col2B.Col3 From Table1 A LEFT OUTER JOIN Table2 B ON A.Col3= B.Col3 AND

  1   2   >