Re: [ANN] Hivemall: Hive scalable machine learning library

2013-10-03 Thread Dean Wampler
scalable > than Mahout for classification/regression tasks, please check it by > yourself. If you have a Hive environment, you can evaluate Hivemall > within 5 minutes or so. > > Hope you enjoy the release! Feedback (and pull request) is always welcome. > > Thank you, > Makoto > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: [ANNOUNCE]: Apache Sentry 1.2.0 released

2013-09-26 Thread Dean Wampler
> stability of the code, it does indicate that the project has yet to be > fully endorsed by the ASF. > > Regards, > > Sentry team > > > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Inner Map key and value separators

2013-09-13 Thread Dean Wampler
Unfortunately, I believe there's no way to do this. Sent from my rotary phone. On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian wrote: > Hi guys > > I have to load data into the following data type in hive > > map > > > Is there a way to define custom SEPARATORS (while creating the tabl

Re: question about partition table in hive

2013-09-13 Thread Dean Wampler
og data and put it in hdfs ,i want to use > hive to do some caculate, query based on timerange,i want to use parttion > table , > but the data file in hdfs is a big file ,how can i put it into pratition > table in hive? > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Interesting claims that seem untrue

2013-09-12 Thread Dean Wampler
rivileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: DISCUSS: Hive language manual to be source control managed

2013-09-02 Thread Dean Wampler
(as in every > udf, or every input format) but I believe the language manual surely does. > > Please review the current wiki and discuss the concept of moving the > language manual to source control, or suggest other options. > > Thank you, > Edward > > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: FAILED: Error in metadata: MetaException

2013-08-16 Thread Dean Wampler
I wonder if the message is misleading. Could there be a problem with the metastore. 1. MySQL isn't running or the JDBC connection is wrong. 2. You upgraded to a newer Hive and didn't migrate the metadata 3. ...? I'm speculating here. Perhaps the logs have useful info. Dean Sent from my rota

Re: documentation issue for RLIKE/REGEXP

2013-08-11 Thread Dean Wampler
rg/Hive/languagemanual-udf.html >> >> says: >> >> A RLIKE B >> if A or B is NULL, TRUE if any (possibly empty) substring of A >> matches the Java regular expression B, otherwise FALSE. E.g. 'foobar' >> RLIKE 'foo' evaluates to FALSE whereas 'foobar' RLIKE '^f.*r$' >> evaluates to TRUE. >> >> 1) "if A or B is NULL" seems like an unfinished part. >> 2) "any (possibly empty) substring of A [that] matches the Java >> regular expression B" should be "foo" at 0 for 'foobar' RLIKE 'foo', >> and result in TRUE, right? >> > > > > -- > Lefty > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Hive - external (dynamically) partitioned table

2013-07-26 Thread Dean Wampler
gt; Regards, > > Omkar Joshi > > ** ** > > ------ > The contents of this e-mail and any attachment(s) may contain confidential > or privileged information for the intended recipient(s). Unintended > recipients are prohibited from taking action on the basis of information in > this e-mail and using or disseminating the information, and must notify the > sender and delete it from their system. L&T Infotech will not accept > responsibility or liability for the accuracy or completeness of, or the > presence of any virus or disabling code in this e-mail" > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Hive Query

2013-07-12 Thread Dean Wampler
ive does not support IN clause. > Then what is the effective replacement for this? i need to execute around > 250 inputs. I'm using hive 0.9.0 version. > > Please guide me. > > > Thanks, > Manickam P > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Hive - UDF

2013-07-09 Thread Dean Wampler
write separate UDF for each? > Please let me know. > > > > Thanks, > Manickam P > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Partition performance

2013-07-03 Thread Dean Wampler
ed recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator.* > *** > > ** ** > > ** ** > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Performance difference between tuning reducer num and partition table

2013-06-29 Thread Dean Wampler
, in my observation of some more complex >>> queries, the second solution is about 15% faster than the first solution, >>> is it simply because the setting of reducer num is not optimal? >>> If the resource is not a limit and it is possible to set the proper >>> reducer nums in the first solution , can they achieve the same performance? >>> Is there any other fact that can cause performance difference between >>> them(non-partition VS partition+concurrent) besides the job parameter >>> issues? >>> >>> Thanks! >>> >> >> > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Question regarding nested complex data type

2013-06-21 Thread Dean Wampler
at 9:34 PM, Stephen Sprague wrote: > look at it the other around if you want. knowing an array of a two > element struct is topologically the same as a map - they darn well better > be the same. :) > > > > On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler wrote: > >>

Re: Hive built-in functions not working

2013-06-21 Thread Dean Wampler
t;>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407) >>> at >>> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) >>> at >>> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:543) >>> at >>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375) >>> at >>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100) >>> ... 18 more >>> Caused by: java.lang.ClassNotFoundException: >>> org.codehaus.jackson.JsonFactory >>> >>> what am i doing wrong here? the jackson-core-asl-1.8.8.jar is in the >>> $HIVE_HOME/lib directory ... >>> >>> SHOW FUNCTIONS; >>> >>> shows me that these functions are in there ... i already tried >>> downgrading to hive 0.10 but the error is the same over there. i need to >>> work with hadoop 0.20, so unfortunately i can't try hadoop 1.x.x >>> >>> thanks in advance >>> cheers >>> Wolli >>> >> >> > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Question regarding nested complex data type

2013-06-20 Thread Dean Wampler
process. Hive used ^A for the field separator, ^B for the collection separator, in this case, to separate structs in the array, and ^C to separate the elements in each struct, e.g.,: Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3 In other words, the structure you would expect for this table: CREAT

Re: Create table like with partitions

2013-06-11 Thread Dean Wampler
I confirmed it is a pirate site. Sent from my rotary phone. On Jun 11, 2013, at 10:33 AM, Edward Capriolo wrote: > For reference, any that puts the entire book online like this is likely > pirated. > > > > > On Tue, Jun 11, 2013 at 8:34 AM, Richa Sharma > wrote: >> Hi all, >> >> Found

Re: Difference between like %A% and %a%

2013-05-24 Thread Dean Wampler
that the rlike is based on regex and can be told to do > case insensitive matching. > > > On Fri, May 24, 2013 at 9:16 AM, Dean Wampler wrote: > >> Hortonworks has announced plans to make Hive more SQL compliant. I >> suspect bugs like this will be addressed sooner or later

Re: Difference between like %A% and %a%

2013-05-24 Thread Dean Wampler
to > include it in training that may or may not work. I've added this comment > to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for > fun. :) > > Please? :) > > > > > On Fri, May 24, 2013 at 7:53 AM, Dean Wampler wrote: > >> Your where

Re: Difference between like %A% and %a%

2013-05-24 Thread Dean Wampler
eviation l > > > unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal with > '%a%'. > > > -- > Jov > blog: http:amutu.com/blog <http://amutu.com/blog> > > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Hive skipping first line

2013-05-23 Thread Dean Wampler
ny unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator.* > *** > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Dean Wampler
> > > Where should I create the HDFS directory ? > > > *From:* Sanjay Subramanian > *To:* "user@hive.apache.org" ; Raj Hadoop < > hadoop...@yahoo.com>; Dean Wampler > *Cc:* User > *Sent:* Tuesday, May 21, 2013 1:53 PM > > *Subject:* Re: hive.met

Re: hive.metastore.warehouse.dir - Should it point to a physical directory

2013-05-21 Thread Dean Wampler
variable hive.metastore.warehouse.dir > > Thanks, > Raj > > > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: [ANNOUNCE] Apache Hive 0.11.0 Released

2013-05-16 Thread Dean Wampler
eNote.jspa?version=12323587&styleName=Html&projectId=12310843 > > We would like to thank the many contributors who made this release > possible. > > Regards, > > The Apache Hive Team > -- Dean Wampler, Ph.D. @deanwampler http://polyglotprogramming.com

Re: Hive query problem on S3 table

2013-04-18 Thread Dean Wampler
t; at > org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) > > > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > > at > java.util.concur

Re: Partition performance

2013-04-04 Thread Dean Wampler
;>> is a better partition strategy? >>> >>> Thanks. >>> >>> >>> >>> CONFIDENTIALITY NOTICE >>> == >>> This email message and any attachments are for the exclusive use of the >>> intended recipient(s) and may contain confidential and privileged >>> information. Any unauthorized review, use, disclosure or distribution is >>> prohibited. If you are not the intended recipient, please contact the >>> sender by reply email and destroy all copies of the original message along >>> with any attachments, from your computer system. If you are the intended >>> recipient, please be advised that the content of this message is subject to >>> access, review and disclosure by the sender's Email System Administrator. >>> >> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Metastore question

2013-04-03 Thread Dean Wampler
e I understand this correctly. All databases and tables > are stored in hive.metastore.warehouse.dir but the actual metadata for > the database and tables (columns, types, partitions, etc) are stored in the > hive database (ie.. mysql)? > > Is that correct? > --

Re: Bucketing external tables

2013-03-30 Thread Dean Wampler
al tables have to be managed tables; not external tables, > right? > . > Thank again for your time and help. > > Sadu > > > > On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler < > dean.wamp...@thinkbiganalytics.com> wrote: > >> I don't know of an

Re: Bucketing external tables

2013-03-29 Thread Dean Wampler
> jobs while creating the Avro files? > > Any help / insight would greatly be appreciated. > > Thank you very much for your time and help. > > Sadu > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Noob question on creating tables

2013-03-29 Thread Dean Wampler
e and any attachments are for the exclusive use of the > intended recipient(s) and may contain confidential and privileged > information. Any unauthorized review, use, disclosure or distribution is > prohibited. If you are not the intended recipient, please contact the > sender by reply email and destroy all copies of the original message along > with any attachments, from your computer system. If you are the intended > recipient, please be advised that the content of this message is subject to > access, review and disclosure by the sender's Email System Administrator. > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Noob question on creating tables

2013-03-29 Thread Dean Wampler
if you're using a compression scheme supported by Hadoop. > > Thanks -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Using TABLESAMPLE on inner queries

2013-03-20 Thread Dean Wampler
> * join* >> * (select distinct s from test2table)table2* >> * on table1.s=table2.s* >> >> >> How do I use TABLESAMPLE in this case to sample the results of the outer >> query? I tried placing TABLESAMPLE(BUCKET 1 OUT OF 4 ON s) in various >> places of my query but it always returns some sort of syntax error and thus >> not allowing the query to run. >> >> Any help is appreciated. >> >> Robert >> ** >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Hive 0.10.0 metastore thrift server installation error

2013-03-14 Thread Dean Wampler
Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:601) > at org.apache.hadoo

Re: UDFs and Thread Safety?

2013-03-10 Thread Dean Wampler
wes wrote: > Hi All, > > Could anyone describe what the required thread safety for a UDF is? I > understand that one is instantiated for each use of the function in an > expression, but can there be multiple threads executing the methods of a > single UDF object at once? > >

Re: Error while table creation

2013-03-10 Thread Dean Wampler
Duplicate entry 'X' for key 'PRIMARY' >>>>at >>>> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313) >>>>at >>>> org.datanucleus.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:660

Re: Hive query started map task being killed during execution

2013-03-08 Thread Dean Wampler
log:2013-03-08 > 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator: > StatsPublishing error: cannot connect to database > > Please suggest if I need to set anything in Hive when I invoke this query. > The query that runs successfully has lot less rows compared to on that > fails. > > Thanks, > DK > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: difference between add jar in hive session and hive --auxpath

2013-03-08 Thread Dean Wampler
nd like this: > > hive --auxpath path_to_jars, it works fine to query my table, > > but if I use the add jar after I started the hive session, I will get > ClassNotFoundException in the runtime of my query of the classes in those > jars. > > My questions are: > > 1) What is the different between hive --auxpath and "add jar" in the hive > session? > 2) This problem makes it is hard to access my table in the HUE, as it only > supports "add jar", but not --auxpath option. Any suggestions? > > > Thanks > > Yong > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Find current db we r using in Hive

2013-03-08 Thread Dean Wampler
here is any command in Hive which will show us the >> current db we r using similar to pwd in Unix. >> Thanks >> Sai >> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: java.lang.NoClassDefFoundError: com/jayway/jsonpath/PathUtil

2013-03-08 Thread Dean Wampler
; at java.net.URLClassLoader$1.run(URLClassLoader.java:217) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:205) > at java.lang.ClassLoader.loadClass(ClassLoader.java:321) > at java.lang.ClassLoader.loadClass(ClassLoader.java:266) > ... 23 more > FAILED: Execution Error, return code -101 from > org.apache.hadoop.hive.ql.exec.DDLTask > > > Any help would be really appreciated. > Thanks > Sai > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Accessing sub column in hive

2013-03-08 Thread Dean Wampler
uld like to access/display country column from my address struct. > > I have tried this: > > ** ** > > select address["country"] from employees; > > ** ** > > I get an error. > > ** ** > > Please help. > > ** ** > > Thanks > > Sai > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Rename external table, including HDFS directory

2013-03-07 Thread Dean Wampler
"Luminous beings are we, not this crude matter." > -- Yoda > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Variable Substitution

2013-03-06 Thread Dean Wampler
( > clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1 > > I was originally planning to use this for partition pruning, but it > doesn't appear to be the cause as the calendar table is not partitioned. > > Is there something that I've overlooked? > > Thanks! > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Combine two overlapping schema?

2013-03-06 Thread Dean Wampler
h > that > itched but would never itch the scratch from the itch that scratched." > -- Keith Wiley > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Where is the location of hive queries

2013-03-06 Thread Dean Wampler
splay. >> If so where is the location of the results. >> Thanks >> Sai >> > > > > -- > Nitin Pawar > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Best table storage for analytical use case

2013-03-06 Thread Dean Wampler
s the case, since I'm in pseudo distrib for the moment, my number > of mappers =1, so I could try to configure my setup with additional mappers. > > > Does this make sense ? > > Thank you for your help ! > > Sekine > > > > > 2013/3/4 Dean Wampler > >

Re: Hive sample test

2013-03-05 Thread Dean Wampler
something > group by something=something > > to > > select really_expensive_select_clause > from > ( > select > * > from > really_big_table > limit 100 > )t > where > something=something > group by something=something > > > On Tue, Mar 5, 2013 at

Re: Hive sample test

2013-03-05 Thread Dean Wampler
before I run it > against all records. > > Is there a way to test the query against a small subset of the data, > without going into full MapReduce? As silly as this sounds, is there a way > to MapReduce without the overhead of MapReduce? That way I can check my > query is doing wh

Re: Location of external table in hdfs

2013-03-05 Thread Dean Wampler
RMINATED BY '\t' LOCATION '/tmp/states' ; > > Any help is really appreciated. > Thanks > Sai > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Error while exporting table data from hive to Oracle through Sqoop

2013-03-05 Thread Dean Wampler
> > 13/03/05 19:22:09 INFO mapreduce.ExportJobBase: Exported 0 records. > > 13/03/05 19:22:09 ERROR tool.ExportTool: Error during export: Export job > failed!**** > > *[hadoop@NHCLT-PC44-2 sqoop-oper]$* > > * * > > *Regards,* > > *Ajit Kumar Shreevastava* > > > > ::DISCLAIMER:: > > > > The contents of this e-mail and any attachment(s) are confidential and > intended for the named recipient(s) only. > E-mail transmission is not guaranteed to be secure or error-free as > information could be intercepted, corrupted, > lost, destroyed, arrive late or incomplete, or may contain viruses in > transmission. The e mail and its contents > (with or without referred errors) shall therefore not attach any liability > on the originator or HCL or its affiliates. > Views or opinions, if any, presented in this email are solely those of the > author and may not necessarily reflect the > views or opinions of HCL or its affiliates. Any form of reproduction, > dissemination, copying, disclosure, modification, > distribution and / or publication of this message without the prior > written consent of authorized representative of > HCL is strictly prohibited. If you have received this email in error > please delete it and notify the sender immediately. > Before opening any email and/or attachments, please check them for viruses > and other defects. > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: doubt with LEFT OUTER JOIN

2013-03-04 Thread Dean Wampler
file? Here the tables are of different format > types. > > Regards, > Kumar > > > > -Original Message- > From: Dean Wampler > To: user > Sent: Fri, Mar 1, 2013 12:23 pm > Subject: Re: doubt with LEFT OUTER JOIN > > I just tried an experiment where

Re: Best table storage for analytical use case

2013-03-04 Thread Dean Wampler
I thought getting values in columns would speed up the aggregate process. > Maybe the dataset is too small to tell, or I missed something ? Will adding > Snappy compression help (not sure whether RCFiles are compressed or not) ? > > Thank you ! > > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: doubt with LEFT OUTER JOIN

2013-03-01 Thread Dean Wampler
IGHT SIDE table doesn't > have at least one record that matches JOIN condition in Hive? > > Regards, > Kumar > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: regexp_replace with unicode chars

2013-03-01 Thread Dean Wampler
would work as the set of things to remove is > massive. > Yeah, it's a one-off cleanup job while exporting to try redshift on our > datasets. > My guess is it's something about the way hive handles strings? Tried > "\\ufffd" as the replacement str but no joy either

Re: regexp_replace with unicode chars

2013-03-01 Thread Dean Wampler
> Tom > > > [1] > http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane > [2] > http://grokbase.com/t/hive/dev/131a4n562y/unicode-character-as-delimiter > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Books and good starting point for Hive

2013-02-24 Thread Dean Wampler
Wow! You guys are my new best friends! Seriously, I'm grateful you've found my participation in the list and the book helpful. I'm sure Ed and Jason would agree (at least about the book ;) Yours, dean -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330 On Sun, Feb

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

2013-02-18 Thread Dean Wampler
ot;,"colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui&quo

Re: Loading json files into hive table is giving NULL as output(data is in s3 bucket)

2013-02-18 Thread Dean Wampler
quot;:"test_name2","_ts":"2012-01-13","_ip":"IP2"} > {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"} > > > When I query :- > select uname from table_test; > > Output :- > NULL 13Feb2012 > NULL 13Feb2012 > NULL 13Feb2012 > > > Please help me and let me know how to add json data in a table. > > Thanks, > Chunky. > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Install / Download of Hive 0.7.0 or 0.7.1

2013-02-15 Thread Dean Wampler
te: > Hi, > > ** ** > > Where can I get a install /download of Hive 0.7.0 or 0.7.1? > > ** ** > > Thx… > > ** ** > > Regards, > > ** ** > > Vince George > > Composite Software**** > > Mobile: 201-519-3777 > >

Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Dean Wampler
the directory--wasn't clear on that.. > > Joey > > > > -- > *From:* Dean Wampler > *To:* user@hive.apache.org; Joseph D Antoni > *Sent:* Friday, February 15, 2013 11:37 AM > *Subject:* Re: CREATE EXTERNAL TABLE Fails on Some Directories > > You confirmed that 715 is an

Re: CREATE EXTERNAL TABLE Fails on Some Directories

2013-02-15 Thread Dean Wampler
lines terminated by '\n' > stored as textfile > location '/715/file.csv'; > > This is failing with: > > Error in Metadata MetaException(message:Got except: > org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a > directory: /715 715... > > Li

Re: Generate Hive DDL

2013-02-15 Thread Dean Wampler
schema as DDL so that we can keep it in under >> version control. We could replicate the entire metastore would we just >> want 4 tables out of the 100 we already have in place. >> >> Thanks, >> murtaza >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re:

2013-02-14 Thread Dean Wampler
Operator.java:40) > at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357) > at > org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150) > ... 14 more > 2013-02-13 23:30:29,819 INFO org.apache.hadoop.mapred.TaskInProgress: > TaskInProgress task

Re: INSERT INTO table with STRUCT, SELECT FROM

2013-02-13 Thread Dean Wampler
t; but when I do the above I get: > > FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into > target table because column number/types are different 'oc': Cannot convert > column 0 from struct to struct. > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Change timestamp format in hive

2013-02-13 Thread Dean Wampler
because of difference in > format. > > > > Is there any way to set the timestamp format while creating the table. > Or is there some other solution for this issue ? > > > > Thanks, > > Chunky. > > -- > Alexander Alten-Lorenz > http://mapredit.blogspot.com > German Hadoop LinkedIn Group: http://goo.gl/N8pCF > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: How to load hive metadata from conf dir

2013-02-12 Thread Dean Wampler
e.org/docs/r0.7.0/api/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.Client.html#get_table(java.lang.String > , > java.lang.String) > > Table t = > t.getSd().getLocation() > > > On Tue, Feb 12, 2013 at 9:41 AM, Dean Wampler > wrote: > > I'll men

Re: Transfer Data to new location

2013-02-12 Thread Dean Wampler
er solution for safely data transfer?? > > -- > *Muhammad Hamza Asad * > +923457261988 > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: How to load hive metadata from conf dir

2013-02-12 Thread Dean Wampler
library. > 2. Use HiveMetastoreClient[2] > > Is this correct? If yes, how to read the hive configuration[3] from > HIVE_CONF_DIR? > > [1] http://mvnrepository.com/artifact/org.apache.hive/hive-metastore > [2] > http://hive.apache.org/docs/r0.7.1/api/org/apache/hadoop/hive/met

Re: Combine multiple row values based upon a condition.

2013-02-03 Thread Dean Wampler
o rows where the offset of > the two rows are no more then 1 character apart. > > Is this type of data manipulation is possible and if it is could someone > point me to the right direction hopefully with some explaination? > > Kind regards > Martijn -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread Dean Wampler
gt; StreamThread.run(): Java heap space >>>>> Cause: null >>>>> 2013-01-29 08:27:34,277 WARN >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: java.lang.OutOfMemoryError: >>>>> Java heap space >>>>> at java.util.Arrays.copyOfRange(Arrays.java:3209) >>>>> at java.lang.String.(String.java:215) >>>>> at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) >>>>> at java.nio.CharBuffer.toString(CharBuffer.java:1157) >>>>> at org.apache.hadoop.io.Text.decode(Text.java:350) >>>>> at org.apache.hadoop.io.Text.decode(Text.java:327) >>>>> at org.apache.hadoop.io.Text.toString(Text.java:254) >>>>> at java.lang.String.valueOf(String.java:2826) >>>>> at java.lang.StringBuilder.append(StringBuilder.java:115) >>>>> at >>>>> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:873) >>>>> at >>>>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181) >>>>> at >>>>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163) >>>>> at >>>>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76) >>>>> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) >>>>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) >>>>> at >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$OutputStreamProcessor.processLine(ScriptOperator.java:477) >>>>> at >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$StreamThread.run(ScriptOperator.java:563) >>>>> >>>>> 2013-01-29 08:27:34,306 INFO >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: ErrorStreamProcessor >>>>> calling >>>>> reporter.progress() >>>>> 2013-01-29 08:27:34,307 INFO >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread ErrorProcessor >>>>> done >>>>> 2013-01-29 08:27:34,307 ERROR >>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: Script failed with code 1 >>>>> >>>> >>>> >>> >> -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: ALTER TABLE CHANGE COLUMN issue

2013-01-30 Thread Dean Wampler
n bug? >> >> Thanks, >> Hardik. >> >> PS:- My alter command: ALTER TABLE hardiktest CHANGE COLUMN col1 col2 >> array>. >> > > > > -- > Nitin Pawar > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Automating the partition creation process

2013-01-29 Thread Dean Wampler
>>>>>> For example, say the M/R job loads files into the following 3 >>>>>> sub-folders >>>>>> >>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=21 >>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=22 >>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=23 >>>>>> >>>>>> Then it should create 3 alter table statements >>>>>> >>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21); >>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22); >>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23); >>>>>> >>>>>> I thought of changing M/R jobs to load all files into same folder, >>>>>> then first load the files into non-partitioned table and then to load the >>>>>> partitioned table from non-partitioned table (using dynamic partition); >>>>>> but >>>>>> would prefer to avoid that extra step if possible (esp. since data is >>>>>> already in the correct sub-folders). >>>>>> >>>>>> Any help would greately be appreciated. >>>>>> >>>>>> Regards, >>>>>> Sadu >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Automating the partition creation process

2013-01-29 Thread Dean Wampler
artitioned table and then to load the >> partitioned table from non-partitioned table (using dynamic partition); but >> would prefer to avoid that extra step if possible (esp. since data is >> already in the correct sub-folders). >> >> Any help would greately be appreciated. >> >> Regards, >> Sadu >> >> >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: A few JIRAs closed in v0.10.0 that don't actually appear to be working

2013-01-29 Thread Dean Wampler
Thanks! On Tue, Jan 29, 2013 at 5:34 AM, Navis류승우 wrote: > HIVE-446 - Implement TRUNCATE : is on trunk (v0.11.0) > > HIVE-887 - Allow SELECT without a mapreduce job : It needs "set > hive.fetch.task.conversion=more" > > 2013/1/29 Dean Wampler : > >

Re: A few JIRAs closed in v0.10.0 that don't actually appear to be working

2013-01-28 Thread Dean Wampler
Oh, another one is https://issues.apache.org/jira/browse/HIVE-446 - Implement TRUNCATE. The CLI doesn't recognize it. dean On Mon, Jan 28, 2013 at 11:44 AM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > I've noticed a few JIRA items for new features that a

A few JIRAs closed in v0.10.0 that don't actually appear to be working

2013-01-28 Thread Dean Wampler
tried use MR. Did I misread the JIRA items? dean -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Cartesian product detection in the query plan?

2013-01-28 Thread Dean Wampler
ndition expressions: >>>> 0 {VALUE._col0} {VALUE._col1} >>>> 1 {VALUE._col1} >>>> handleSkewJoin: false >>>> outputColumnNames: _col0, _col1, _col3 >>>> File Output Operator >>>> compressed: true >>>> GlobalTableId: 0 >>>> table: >>>> input format: >>>> >>> org.apache.hadoop.mapred.SequenceFileInputFormat >>> >>>> output format: >>>> >>> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat >>> >>>> >>>> Is there anything in there that should have alerted me? >>>> >>>> I found out by looking at the query, but I wonder if the query plan (if >>>> I could read it) would have given me that information. >>>> >>>> Thanks a lot >>>> >>>> David Morel >>>> >>>> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Whatever happened to the MACRO facility, Hive-2655

2013-01-26 Thread Dean Wampler
Great! It would be nice to have. dean On Sat, Jan 26, 2013 at 10:30 AM, Edward Capriolo wrote: > That is my fault I was hoping it would get in because it seems close. Ill > see if i can shove the ticket along. It is a cool feature. > > > On Sat, Jan 26, 2013 at 8:59 A

Whatever happened to the MACRO facility, Hive-2655

2013-01-26 Thread Dean Wampler
We mentioned it in our book and now I realize it's not actually implemented, even in 0.10.0. OOPS!! https://issues.apache.org/jira/browse/HIVE-2655 dean -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: querying objects and list fields

2013-01-25 Thread Dean Wampler
then to be able to query down > into the contexts like above. Is there some way my ObjectInspector could > respond to > > select messageId, lastmodifiedDate,contexts; as if it were select > messageId,lastmodifiedDate.contexts.contextId > but also still respond correctly to > select messageId. lastmodifiedDate.contexts.conceptId > ? > > Thanks for the help, > Lauren > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Real-life experience of forcing smaller input splits?

2013-01-25 Thread Dean Wampler
... > > That will be my approach for now, or disabling compression altogether for > these files. The only problem I have is that compression is so efficient > that any operation in the mapper (so on the uncompressed data) just makes > the mapper throw an OOM exception, no matter how much memory I

Re: Loading a Hive table simultaneously from 2 different sources

2013-01-24 Thread Dean Wampler
r for 1 load job and while the >>> other job loaded the data successfully into the table.. >>> >>> I guess it was because of lock acquired on the table by the first load >>> process. >>> >>> Is there anyway to handle this ? >>> >>> Please give your insights. >>> >>> Regards, >>> Krishnan >>> >>> >>> >> > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: serde jar causing problems in loading other jars.

2013-01-23 Thread Dean Wampler
erFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:255) > at > org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:182) > ... 43 more > > > > On Wed, Jan 23, 2013 at 3:05 PM, Ehsan Haq

Re: Problem with using Postgres as hive meta store DB.

2013-01-23 Thread Dean Wampler
gt; >> You can set standard_conforming_strings = off in postgresql.conf to avoid >> this. >> >> > > > -- > *Muhammad Ehsan ul Haque* > Klarna AB > Norra Stationsgatan 61 > SE-113 43 Stockholm > > Tel: +46 (0)8- 120 120 00 > Fax: +46 (0)8- 120 120 99 > Web: www.klarna.com > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: serde jar causing problems in loading other jars.

2013-01-23 Thread Dean Wampler
gt; fine if put it somewhere else and add it via add jar. >Any idea what might be wrong? > > /Ehsan > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Missing tables!

2013-01-22 Thread Dean Wampler
s > are showing in warehouse. > It seems some configuration error but exact solution is yet to know. > Any idea? > > Regards, > Ashish > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: HWI use on AWS/EMR

2013-01-18 Thread Dean Wampler
ar.file >> >> lib/hive-hwi-0.8.1.war >> >> This is the WAR file with the jsp content for Hive Web >> Interface >> >> >> >> ** ** >> >> Run this command to start up hwi: >> >> ** ** >> >> hive --service hwi >> >> ** ** >> >> And finally point your browser

Re: HWI use on AWS/EMR

2013-01-18 Thread Dean Wampler
n Fri, Jan 18, 2013 at 10:06 AM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > That's the internal hostname, not visible outside. Use the name like > ec2-NNN-NN-NN-NNN.compute-1.amazonaws.com. It's shown in the EMR console > and the elastic-mapreduce script you

Re: HWI use on AWS/EMR

2013-01-18 Thread Dean Wampler
Gambling Commission (reg. no. > 000-027343-R-308898-001). Any financial promotion contained herein has > been issued > and approved by Sporting Index Ltd. > > Outbound email has been scanned for viruses and SPAM > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Execution of udf

2013-01-18 Thread Dean Wampler
feel > that reduce phase can be there > > > On Friday, January 18, 2013, Dean Wampler wrote: > >> There is no reduce phase needed in this query. >> >> On Fri, Jan 18, 2013 at 6:59 AM, nagarjuna kanamarlapudi < >> nagarjuna.kanamarlap...@gmail.com> wr

Re: Execution of udf

2013-01-18 Thread Dean Wampler
udf at reducer > phase rather than at Mapper phase. > > > Regards, > Nagarjuna > > > -- > Sent from iPhone > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Interaction between Java and Transform Scripts on Hive

2013-01-16 Thread Dean Wampler
in a transform script but not run > without java around? > > I am curious on what steps I can take to trouble shoot or eliminate this > problem. > > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Re: create a hive table: always a tab space before each line

2013-01-14 Thread Dean Wampler
output is continously used by > hive, it is fine. The problem is that I may use a self-define map-reduce > job to read these files. Does that mean I have to take care of > this \t by myself? > > is there any option that I can disable this \t in hive? > > > > At 2013-01-09 22:38:11

Re: Best practice for automating jobs

2013-01-10 Thread Dean Wampler
h (potentially) > >overlapping job, it will be difficult to keep track of the partitions > >that have been added. In the context of the preceding question, what > >is the best way to add metadata about new partitions? > > > >Thanks in advance! > > > >--Tom > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Stack function in Hive : how to specify multiple aliases?

2013-01-10 Thread Dean Wampler
/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529 >>>>> >>>>> Hive asks me to provide the multiple aliases for the resulting columns >>>>> ("The number of aliases in the AS clause does not match the number of >>>>> colums output by the UDTF, expected 3 aliases but got 1"). >>>>> >>>>> What's the syntax to provide multiple aliases ? >>>>> >>>>> Thanks, >>>>> Mathieu >>>>> >>>> >>>> >>>> >>>> -- >>>> Nitin Pawar >>>> >>> >>> >> >> >> -- >> Nitin Pawar >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: create a hive table: always a tab space before each line

2013-01-09 Thread Dean Wampler
t; 41 pay bigint, >> 42 spay bigint, >> 43 ipv bigint, >> 44 sellerid string, >> 45 cate string >> 46 ) >> 47 partitioned by(ds string) >> 48 row format delimited fields terminated by '\001' lines terminated by '\n' >> 49 stored as sequencefile >> 50 location '${HADOOP_PATH_4_MY_HIVE}/${HIVETBL_my_table}'; >> >> >> thanks for help. >> >> >> Richard >> >> >> >> >> >> > > > -- > Nitin Pawar > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Map Reduce Local Task

2013-01-08 Thread Dean Wampler
m sure >> how to implement Map reduce local tasks with hash tables. >> >> Good wishes,always ! >> Santosh >> > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Does Hue (Hadoop User Experience) works with Apache HIVE/HADOOP

2012-12-29 Thread Dean Wampler
can > tell me that can HUE be used with this setup instead of CHD Hadoop cluster. > If not, then is there any alternate UI similar to HUE. > > Please help. > Thanks, > Chunky. > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: hive regular expression

2012-12-26 Thread Dean Wampler
grdg&olfll3onsl' > > or > > '?MovieTitle=949303sjkskld&sososodn' > > > how to extract 'MovieTitle=321grgrdg' or 'MovieTitle=949303sjkskld' using > Hive reg expression functions > > > thanks > and happy holidays > > -- *Dean Wampler, Ph.D.* thinkbiganalytics.com +1-312-339-1330

Re: Reflect MySQL updates into Hive

2012-12-24 Thread Dean Wampler
ested flow as follows: > > MySQL ---(Extract / Load)---> HDFS (Table/Year/Month/Day) ---> Load in > Hive as External Table ---(Transform Data & Join Tables)--> Save it in Hive > tables for reporting. > > > Correct? > > Appreciated. > > > -- > Ibrahi

Re: Reflect MySQL updates into Hive

2012-12-24 Thread Dean Wampler
es for reporting. > > My questions are: > >1. What is the best way to reflect MySQL updates into Hive with >minimal resources? >2. Is sqoop the right tool to do the ETL? >3. Is Hive the right tool to do this kind of queries or we should >search for

  1   2   >