scalable
> than Mahout for classification/regression tasks, please check it by
> yourself. If you have a Hive environment, you can evaluate Hivemall
> within 5 minutes or so.
>
> Hope you enjoy the release! Feedback (and pull request) is always welcome.
>
> Thank you,
> Makoto
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
> stability of the code, it does indicate that the project has yet to be
> fully endorsed by the ASF.
>
> Regards,
>
> Sentry team
>
>
>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
Unfortunately, I believe there's no way to do this.
Sent from my rotary phone.
On Sep 13, 2013, at 6:42 PM, Sanjay Subramanian
wrote:
> Hi guys
>
> I have to load data into the following data type in hive
>
> map >
>
> Is there a way to define custom SEPARATORS (while creating the tabl
og data and put it in hdfs ,i want to use
> hive to do some caculate, query based on timerange,i want to use parttion
> table ,
> but the data file in hdfs is a big file ,how can i put it into pratition
> table in hive?
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
rivileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
(as in every
> udf, or every input format) but I believe the language manual surely does.
>
> Please review the current wiki and discuss the concept of moving the
> language manual to source control, or suggest other options.
>
> Thank you,
> Edward
>
>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
I wonder if the message is misleading. Could there be a problem with the
metastore.
1. MySQL isn't running or the JDBC connection is wrong.
2. You upgraded to a newer Hive and didn't migrate the metadata
3. ...?
I'm speculating here. Perhaps the logs have useful info.
Dean
Sent from my rota
rg/Hive/languagemanual-udf.html
>>
>> says:
>>
>> A RLIKE B
>> if A or B is NULL, TRUE if any (possibly empty) substring of A
>> matches the Java regular expression B, otherwise FALSE. E.g. 'foobar'
>> RLIKE 'foo' evaluates to FALSE whereas 'foobar' RLIKE '^f.*r$'
>> evaluates to TRUE.
>>
>> 1) "if A or B is NULL" seems like an unfinished part.
>> 2) "any (possibly empty) substring of A [that] matches the Java
>> regular expression B" should be "foo" at 0 for 'foobar' RLIKE 'foo',
>> and result in TRUE, right?
>>
>
>
>
> --
> Lefty
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
gt; Regards,
>
> Omkar Joshi
>
> ** **
>
> ------
> The contents of this e-mail and any attachment(s) may contain confidential
> or privileged information for the intended recipient(s). Unintended
> recipients are prohibited from taking action on the basis of information in
> this e-mail and using or disseminating the information, and must notify the
> sender and delete it from their system. L&T Infotech will not accept
> responsibility or liability for the accuracy or completeness of, or the
> presence of any virus or disabling code in this e-mail"
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
ive does not support IN clause.
> Then what is the effective replacement for this? i need to execute around
> 250 inputs. I'm using hive 0.9.0 version.
>
> Please guide me.
>
>
> Thanks,
> Manickam P
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
write separate UDF for each?
> Please let me know.
>
>
>
> Thanks,
> Manickam P
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
ed recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.*
> ***
>
> ** **
>
> ** **
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
, in my observation of some more complex
>>> queries, the second solution is about 15% faster than the first solution,
>>> is it simply because the setting of reducer num is not optimal?
>>> If the resource is not a limit and it is possible to set the proper
>>> reducer nums in the first solution , can they achieve the same performance?
>>> Is there any other fact that can cause performance difference between
>>> them(non-partition VS partition+concurrent) besides the job parameter
>>> issues?
>>>
>>> Thanks!
>>>
>>
>>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
at 9:34 PM, Stephen Sprague wrote:
> look at it the other around if you want. knowing an array of a two
> element struct is topologically the same as a map - they darn well better
> be the same. :)
>
>
>
> On Thu, Jun 20, 2013 at 7:00 PM, Dean Wampler wrote:
>
>>
t;>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>>> at
>>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
>>> at
>>> org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
>>> at
>>> org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
>>> at
>>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>>> at
>>> org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:543)
>>> at
>>> org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
>>> at
>>> org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
>>> ... 18 more
>>> Caused by: java.lang.ClassNotFoundException:
>>> org.codehaus.jackson.JsonFactory
>>>
>>> what am i doing wrong here? the jackson-core-asl-1.8.8.jar is in the
>>> $HIVE_HOME/lib directory ...
>>>
>>> SHOW FUNCTIONS;
>>>
>>> shows me that these functions are in there ... i already tried
>>> downgrading to hive 0.10 but the error is the same over there. i need to
>>> work with hadoop 0.20, so unfortunately i can't try hadoop 1.x.x
>>>
>>> thanks in advance
>>> cheers
>>> Wolli
>>>
>>
>>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
process. Hive used ^A for the
field separator, ^B for the collection separator, in this case, to separate
structs in the array, and ^C to separate the elements in each struct, e.g.,:
Dean Wampler^Afirst^C1^Bsecond^C2^Bthird^C3
In other words, the structure you would expect for this table:
CREAT
I confirmed it is a pirate site.
Sent from my rotary phone.
On Jun 11, 2013, at 10:33 AM, Edward Capriolo wrote:
> For reference, any that puts the entire book online like this is likely
> pirated.
>
>
>
>
> On Tue, Jun 11, 2013 at 8:34 AM, Richa Sharma
> wrote:
>> Hi all,
>>
>> Found
that the rlike is based on regex and can be told to do
> case insensitive matching.
>
>
> On Fri, May 24, 2013 at 9:16 AM, Dean Wampler wrote:
>
>> Hortonworks has announced plans to make Hive more SQL compliant. I
>> suspect bugs like this will be addressed sooner or later
to
> include it in training that may or may not work. I've added this comment
> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for
> fun. :)
>
> Please? :)
>
>
>
>
> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler wrote:
>
>> Your where
eviation l
>
>
> unlike MySQL, string in Hive is case sensitive,so '%A%' is not equal with
> '%a%'.
>
>
> --
> Jov
> blog: http:amutu.com/blog <http://amutu.com/blog>
>
>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
ny unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.*
> ***
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
>
>
> Where should I create the HDFS directory ?
>
>
> *From:* Sanjay Subramanian
> *To:* "user@hive.apache.org" ; Raj Hadoop <
> hadoop...@yahoo.com>; Dean Wampler
> *Cc:* User
> *Sent:* Tuesday, May 21, 2013 1:53 PM
>
> *Subject:* Re: hive.met
variable hive.metastore.warehouse.dir
>
> Thanks,
> Raj
>
>
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
eNote.jspa?version=12323587&styleName=Html&projectId=12310843
>
> We would like to thank the many contributors who made this release
> possible.
>
> Regards,
>
> The Apache Hive Team
>
--
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com
t; at
> org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
>
>
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>
>
> at
> java.util.concur
;>> is a better partition strategy?
>>>
>>> Thanks.
>>>
>>>
>>>
>>> CONFIDENTIALITY NOTICE
>>> ==
>>> This email message and any attachments are for the exclusive use of the
>>> intended recipient(s) and may contain confidential and privileged
>>> information. Any unauthorized review, use, disclosure or distribution is
>>> prohibited. If you are not the intended recipient, please contact the
>>> sender by reply email and destroy all copies of the original message along
>>> with any attachments, from your computer system. If you are the intended
>>> recipient, please be advised that the content of this message is subject to
>>> access, review and disclosure by the sender's Email System Administrator.
>>>
>>
>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
e I understand this correctly. All databases and tables
> are stored in hive.metastore.warehouse.dir but the actual metadata for
> the database and tables (columns, types, partitions, etc) are stored in the
> hive database (ie.. mysql)?
>
> Is that correct?
>
--
al tables have to be managed tables; not external tables,
> right?
> .
> Thank again for your time and help.
>
> Sadu
>
>
>
> On Fri, Mar 29, 2013 at 5:57 PM, Dean Wampler <
> dean.wamp...@thinkbiganalytics.com> wrote:
>
>> I don't know of an
> jobs while creating the Avro files?
>
> Any help / insight would greatly be appreciated.
>
> Thank you very much for your time and help.
>
> Sadu
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
e and any attachments are for the exclusive use of the
> intended recipient(s) and may contain confidential and privileged
> information. Any unauthorized review, use, disclosure or distribution is
> prohibited. If you are not the intended recipient, please contact the
> sender by reply email and destroy all copies of the original message along
> with any attachments, from your computer system. If you are the intended
> recipient, please be advised that the content of this message is subject to
> access, review and disclosure by the sender's Email System Administrator.
> >
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
if you're using a compression scheme supported by Hadoop.
>
> Thanks
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
> * join*
>> * (select distinct s from test2table)table2*
>> * on table1.s=table2.s*
>>
>>
>> How do I use TABLESAMPLE in this case to sample the results of the outer
>> query? I tried placing TABLESAMPLE(BUCKET 1 OUT OF 4 ON s) in various
>> places of my query but it always returns some sort of syntax error and thus
>> not allowing the query to run.
>>
>> Any help is appreciated.
>>
>> Robert
>> **
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:601)
> at org.apache.hadoo
wes wrote:
> Hi All,
>
> Could anyone describe what the required thread safety for a UDF is? I
> understand that one is instantiated for each use of the function in an
> expression, but can there be multiple threads executing the methods of a
> single UDF object at once?
>
>
Duplicate entry 'X' for key 'PRIMARY'
>>>>at
>>>> org.datanucleus.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:313)
>>>>at
>>>> org.datanucleus.jdo.JDOPersistenceManager.jdoMakePersistent(JDOPersistenceManager.java:660
log:2013-03-08
> 08:46:54,394 ERROR o.apache.hadoop.hive.ql.exec.FileSinkOperator:
> StatsPublishing error: cannot connect to database
>
> Please suggest if I need to set anything in Hive when I invoke this query.
> The query that runs successfully has lot less rows compared to on that
> fails.
>
> Thanks,
> DK
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
nd like this:
>
> hive --auxpath path_to_jars, it works fine to query my table,
>
> but if I use the add jar after I started the hive session, I will get
> ClassNotFoundException in the runtime of my query of the classes in those
> jars.
>
> My questions are:
>
> 1) What is the different between hive --auxpath and "add jar" in the hive
> session?
> 2) This problem makes it is hard to access my table in the HUE, as it only
> supports "add jar", but not --auxpath option. Any suggestions?
>
>
> Thanks
>
> Yong
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
here is any command in Hive which will show us the
>> current db we r using similar to pwd in Unix.
>> Thanks
>> Sai
>>
>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
; at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
> ... 23 more
> FAILED: Execution Error, return code -101 from
> org.apache.hadoop.hive.ql.exec.DDLTask
>
>
> Any help would be really appreciated.
> Thanks
> Sai
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
uld like to access/display country column from my address struct.
>
> I have tried this:
>
> ** **
>
> select address["country"] from employees;
>
> ** **
>
> I get an error.
>
> ** **
>
> Please help.
>
> ** **
>
> Thanks
>
> Sai
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
"Luminous beings are we, not this crude matter."
> -- Yoda
>
>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
(
> clndr_dt >= "2013-02-01" AND clndr_dt <= "2013-02-10" ) LIMIT 1
>
> I was originally planning to use this for partition pruning, but it
> doesn't appear to be the cause as the calendar table is not partitioned.
>
> Is there something that I've overlooked?
>
> Thanks!
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
h
> that
> itched but would never itch the scratch from the itch that scratched."
> -- Keith Wiley
>
>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
splay.
>> If so where is the location of the results.
>> Thanks
>> Sai
>>
>
>
>
> --
> Nitin Pawar
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
s the case, since I'm in pseudo distrib for the moment, my number
> of mappers =1, so I could try to configure my setup with additional mappers.
>
>
> Does this make sense ?
>
> Thank you for your help !
>
> Sekine
>
>
>
>
> 2013/3/4 Dean Wampler
>
>
something
> group by something=something
>
> to
>
> select really_expensive_select_clause
> from
> (
> select
> *
> from
> really_big_table
> limit 100
> )t
> where
> something=something
> group by something=something
>
>
> On Tue, Mar 5, 2013 at
before I run it
> against all records.
>
> Is there a way to test the query against a small subset of the data,
> without going into full MapReduce? As silly as this sounds, is there a way
> to MapReduce without the overhead of MapReduce? That way I can check my
> query is doing wh
RMINATED BY '\t' LOCATION '/tmp/states' ;
>
> Any help is really appreciated.
> Thanks
> Sai
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
>
> 13/03/05 19:22:09 INFO mapreduce.ExportJobBase: Exported 0 records.
>
> 13/03/05 19:22:09 ERROR tool.ExportTool: Error during export: Export job
> failed!****
>
> *[hadoop@NHCLT-PC44-2 sqoop-oper]$*
>
> * *
>
> *Regards,*
>
> *Ajit Kumar Shreevastava*
>
>
>
> ::DISCLAIMER::
>
>
>
> The contents of this e-mail and any attachment(s) are confidential and
> intended for the named recipient(s) only.
> E-mail transmission is not guaranteed to be secure or error-free as
> information could be intercepted, corrupted,
> lost, destroyed, arrive late or incomplete, or may contain viruses in
> transmission. The e mail and its contents
> (with or without referred errors) shall therefore not attach any liability
> on the originator or HCL or its affiliates.
> Views or opinions, if any, presented in this email are solely those of the
> author and may not necessarily reflect the
> views or opinions of HCL or its affiliates. Any form of reproduction,
> dissemination, copying, disclosure, modification,
> distribution and / or publication of this message without the prior
> written consent of authorized representative of
> HCL is strictly prohibited. If you have received this email in error
> please delete it and notify the sender immediately.
> Before opening any email and/or attachments, please check them for viruses
> and other defects.
>
>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
file? Here the tables are of different format
> types.
>
> Regards,
> Kumar
>
>
>
> -Original Message-
> From: Dean Wampler
> To: user
> Sent: Fri, Mar 1, 2013 12:23 pm
> Subject: Re: doubt with LEFT OUTER JOIN
>
> I just tried an experiment where
I thought getting values in columns would speed up the aggregate process.
> Maybe the dataset is too small to tell, or I missed something ? Will adding
> Snappy compression help (not sure whether RCFiles are compressed or not) ?
>
> Thank you !
>
>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
IGHT SIDE table doesn't
> have at least one record that matches JOIN condition in Hive?
>
> Regards,
> Kumar
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
would work as the set of things to remove is
> massive.
> Yeah, it's a one-off cleanup job while exporting to try redshift on our
> datasets.
> My guess is it's something about the way hive handles strings? Tried
> "\\ufffd" as the replacement str but no joy either
> Tom
>
>
> [1]
> http://en.wikipedia.org/wiki/Plane_%28Unicode%29#Basic_Multilingual_Plane
> [2]
> http://grokbase.com/t/hive/dev/131a4n562y/unicode-character-as-delimiter
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
Wow! You guys are my new best friends!
Seriously, I'm grateful you've found my participation in the list and the
book helpful. I'm sure Ed and Jason would agree (at least about the book ;)
Yours,
dean
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
On Sun, Feb
ot;,"colnameyhgb":["1234","12345","2345","56789"],"colnamepoix":["12","4567","123","5678"],"colnamedswer":["100","567","123","678"],"colnamewerui&quo
quot;:"test_name2","_ts":"2012-01-13","_ip":"IP2"}
> {"_u":"test_name3","_ts":"2012-01-13","_ip":"IP3"}
>
>
> When I query :-
> select uname from table_test;
>
> Output :-
> NULL 13Feb2012
> NULL 13Feb2012
> NULL 13Feb2012
>
>
> Please help me and let me know how to add json data in a table.
>
> Thanks,
> Chunky.
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
te:
> Hi,
>
> ** **
>
> Where can I get a install /download of Hive 0.7.0 or 0.7.1?
>
> ** **
>
> Thx…
>
> ** **
>
> Regards,
>
> ** **
>
> Vince George
>
> Composite Software****
>
> Mobile: 201-519-3777
>
>
the directory--wasn't clear on that..
>
> Joey
>
>
>
> --
> *From:* Dean Wampler
> *To:* user@hive.apache.org; Joseph D Antoni
> *Sent:* Friday, February 15, 2013 11:37 AM
> *Subject:* Re: CREATE EXTERNAL TABLE Fails on Some Directories
>
> You confirmed that 715 is an
lines terminated by '\n'
> stored as textfile
> location '/715/file.csv';
>
> This is failing with:
>
> Error in Metadata MetaException(message:Got except:
> org.apache.hadoop.fs.FileAlreadyExistsException Parent Path is not a
> directory: /715 715...
>
> Li
schema as DDL so that we can keep it in under
>> version control. We could replicate the entire metastore would we just
>> want 4 tables out of the 100 we already have in place.
>>
>> Thanks,
>> murtaza
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
Operator.java:40)
> at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:357)
> at
> org.apache.hadoop.hive.ql.exec.ExecReducer.configure(ExecReducer.java:150)
> ... 14 more
> 2013-02-13 23:30:29,819 INFO org.apache.hadoop.mapred.TaskInProgress:
> TaskInProgress task
t; but when I do the above I get:
>
> FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into
> target table because column number/types are different 'oc': Cannot convert
> column 0 from struct to struct.
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
because of difference in
> format.
> >
> > Is there any way to set the timestamp format while creating the table.
> Or is there some other solution for this issue ?
> >
> > Thanks,
> > Chunky.
>
> --
> Alexander Alten-Lorenz
> http://mapredit.blogspot.com
> German Hadoop LinkedIn Group: http://goo.gl/N8pCF
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
e.org/docs/r0.7.0/api/org/apache/hadoop/hive/metastore/api/ThriftHiveMetastore.Client.html#get_table(java.lang.String
> ,
> java.lang.String)
>
> Table t =
> t.getSd().getLocation()
>
>
> On Tue, Feb 12, 2013 at 9:41 AM, Dean Wampler
> wrote:
> > I'll men
er solution for safely data transfer??
>
> --
> *Muhammad Hamza Asad *
> +923457261988
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
library.
> 2. Use HiveMetastoreClient[2]
>
> Is this correct? If yes, how to read the hive configuration[3] from
> HIVE_CONF_DIR?
>
> [1] http://mvnrepository.com/artifact/org.apache.hive/hive-metastore
> [2]
> http://hive.apache.org/docs/r0.7.1/api/org/apache/hadoop/hive/met
o rows where the offset of
> the two rows are no more then 1 character apart.
>
> Is this type of data manipulation is possible and if it is could someone
> point me to the right direction hopefully with some explaination?
>
> Kind regards
> Martijn
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
gt; StreamThread.run(): Java heap space
>>>>> Cause: null
>>>>> 2013-01-29 08:27:34,277 WARN
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: java.lang.OutOfMemoryError:
>>>>> Java heap space
>>>>> at java.util.Arrays.copyOfRange(Arrays.java:3209)
>>>>> at java.lang.String.(String.java:215)
>>>>> at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542)
>>>>> at java.nio.CharBuffer.toString(CharBuffer.java:1157)
>>>>> at org.apache.hadoop.io.Text.decode(Text.java:350)
>>>>> at org.apache.hadoop.io.Text.decode(Text.java:327)
>>>>> at org.apache.hadoop.io.Text.toString(Text.java:254)
>>>>> at java.lang.String.valueOf(String.java:2826)
>>>>> at java.lang.StringBuilder.append(StringBuilder.java:115)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:873)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76)
>>>>> at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471)
>>>>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$OutputStreamProcessor.processLine(ScriptOperator.java:477)
>>>>> at
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$StreamThread.run(ScriptOperator.java:563)
>>>>>
>>>>> 2013-01-29 08:27:34,306 INFO
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: ErrorStreamProcessor
>>>>> calling
>>>>> reporter.progress()
>>>>> 2013-01-29 08:27:34,307 INFO
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread ErrorProcessor
>>>>> done
>>>>> 2013-01-29 08:27:34,307 ERROR
>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: Script failed with code 1
>>>>>
>>>>
>>>>
>>>
>>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
n bug?
>>
>> Thanks,
>> Hardik.
>>
>> PS:- My alter command: ALTER TABLE hardiktest CHANGE COLUMN col1 col2
>> array>.
>>
>
>
>
> --
> Nitin Pawar
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
>>>>>> For example, say the M/R job loads files into the following 3
>>>>>> sub-folders
>>>>>>
>>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=21
>>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=22
>>>>>> /user/hive/warehouse/sales/year=2013/month=1/day=23
>>>>>>
>>>>>> Then it should create 3 alter table statements
>>>>>>
>>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=21);
>>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=22);
>>>>>> ALTER TABLE sales ADD PARTITION (year=2013, month=1, day=23);
>>>>>>
>>>>>> I thought of changing M/R jobs to load all files into same folder,
>>>>>> then first load the files into non-partitioned table and then to load the
>>>>>> partitioned table from non-partitioned table (using dynamic partition);
>>>>>> but
>>>>>> would prefer to avoid that extra step if possible (esp. since data is
>>>>>> already in the correct sub-folders).
>>>>>>
>>>>>> Any help would greately be appreciated.
>>>>>>
>>>>>> Regards,
>>>>>> Sadu
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
artitioned table and then to load the
>> partitioned table from non-partitioned table (using dynamic partition); but
>> would prefer to avoid that extra step if possible (esp. since data is
>> already in the correct sub-folders).
>>
>> Any help would greately be appreciated.
>>
>> Regards,
>> Sadu
>>
>>
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
Thanks!
On Tue, Jan 29, 2013 at 5:34 AM, Navis류승우 wrote:
> HIVE-446 - Implement TRUNCATE : is on trunk (v0.11.0)
>
> HIVE-887 - Allow SELECT without a mapreduce job : It needs "set
> hive.fetch.task.conversion=more"
>
> 2013/1/29 Dean Wampler :
> >
Oh, another one is
https://issues.apache.org/jira/browse/HIVE-446 - Implement TRUNCATE.
The CLI doesn't recognize it.
dean
On Mon, Jan 28, 2013 at 11:44 AM, Dean Wampler <
dean.wamp...@thinkbiganalytics.com> wrote:
> I've noticed a few JIRA items for new features that a
tried use MR.
Did I misread the JIRA items?
dean
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
ndition expressions:
>>>> 0 {VALUE._col0} {VALUE._col1}
>>>> 1 {VALUE._col1}
>>>> handleSkewJoin: false
>>>> outputColumnNames: _col0, _col1, _col3
>>>> File Output Operator
>>>> compressed: true
>>>> GlobalTableId: 0
>>>> table:
>>>> input format:
>>>>
>>> org.apache.hadoop.mapred.SequenceFileInputFormat
>>>
>>>> output format:
>>>>
>>> org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
>>>
>>>>
>>>> Is there anything in there that should have alerted me?
>>>>
>>>> I found out by looking at the query, but I wonder if the query plan (if
>>>> I could read it) would have given me that information.
>>>>
>>>> Thanks a lot
>>>>
>>>> David Morel
>>>>
>>>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
Great! It would be nice to have.
dean
On Sat, Jan 26, 2013 at 10:30 AM, Edward Capriolo wrote:
> That is my fault I was hoping it would get in because it seems close. Ill
> see if i can shove the ticket along. It is a cool feature.
>
>
> On Sat, Jan 26, 2013 at 8:59 A
We mentioned it in our book and now I realize it's not actually
implemented, even in 0.10.0. OOPS!!
https://issues.apache.org/jira/browse/HIVE-2655
dean
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
then to be able to query down
> into the contexts like above. Is there some way my ObjectInspector could
> respond to
>
> select messageId, lastmodifiedDate,contexts; as if it were select
> messageId,lastmodifiedDate.contexts.contextId
> but also still respond correctly to
> select messageId. lastmodifiedDate.contexts.conceptId
> ?
>
> Thanks for the help,
> Lauren
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
...
>
> That will be my approach for now, or disabling compression altogether for
> these files. The only problem I have is that compression is so efficient
> that any operation in the mapper (so on the uncompressed data) just makes
> the mapper throw an OOM exception, no matter how much memory I
r for 1 load job and while the
>>> other job loaded the data successfully into the table..
>>>
>>> I guess it was because of lock acquired on the table by the first load
>>> process.
>>>
>>> Is there anyway to handle this ?
>>>
>>> Please give your insights.
>>>
>>> Regards,
>>> Krishnan
>>>
>>>
>>>
>>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
erFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:255)
> at
> org.datanucleus.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:182)
> ... 43 more
>
>
>
> On Wed, Jan 23, 2013 at 3:05 PM, Ehsan Haq
gt;
>> You can set standard_conforming_strings = off in postgresql.conf to avoid
>> this.
>>
>>
>
>
> --
> *Muhammad Ehsan ul Haque*
> Klarna AB
> Norra Stationsgatan 61
> SE-113 43 Stockholm
>
> Tel: +46 (0)8- 120 120 00
> Fax: +46 (0)8- 120 120 99
> Web: www.klarna.com
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
gt; fine if put it somewhere else and add it via add jar.
>Any idea what might be wrong?
>
> /Ehsan
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
s
> are showing in warehouse.
> It seems some configuration error but exact solution is yet to know.
> Any idea?
>
> Regards,
> Ashish
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
ar.file
>>
>> lib/hive-hwi-0.8.1.war
>>
>> This is the WAR file with the jsp content for Hive Web
>> Interface
>>
>>
>>
>> ** **
>>
>> Run this command to start up hwi:
>>
>> ** **
>>
>> hive --service hwi
>>
>> ** **
>>
>> And finally point your browser
n Fri, Jan 18, 2013 at 10:06 AM, Dean Wampler <
dean.wamp...@thinkbiganalytics.com> wrote:
> That's the internal hostname, not visible outside. Use the name like
> ec2-NNN-NN-NN-NNN.compute-1.amazonaws.com. It's shown in the EMR console
> and the elastic-mapreduce script you
Gambling Commission (reg. no.
> 000-027343-R-308898-001). Any financial promotion contained herein has
> been issued
> and approved by Sporting Index Ltd.
>
> Outbound email has been scanned for viruses and SPAM
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
feel
> that reduce phase can be there
>
>
> On Friday, January 18, 2013, Dean Wampler wrote:
>
>> There is no reduce phase needed in this query.
>>
>> On Fri, Jan 18, 2013 at 6:59 AM, nagarjuna kanamarlapudi <
>> nagarjuna.kanamarlap...@gmail.com> wr
udf at reducer
> phase rather than at Mapper phase.
>
>
> Regards,
> Nagarjuna
>
>
> --
> Sent from iPhone
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
in a transform script but not run
> without java around?
>
> I am curious on what steps I can take to trouble shoot or eliminate this
> problem.
>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
output is continously used by
> hive, it is fine. The problem is that I may use a self-define map-reduce
> job to read these files. Does that mean I have to take care of
> this \t by myself?
>
> is there any option that I can disable this \t in hive?
>
>
>
> At 2013-01-09 22:38:11
h (potentially)
> >overlapping job, it will be difficult to keep track of the partitions
> >that have been added. In the context of the preceding question, what
> >is the best way to add metadata about new partitions?
> >
> >Thanks in advance!
> >
> >--Tom
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
/Hive/languagemanual-udf.html#LanguageManualUDF-BuiltinTableGeneratingFunctions%2528UDTF%2529
>>>>>
>>>>> Hive asks me to provide the multiple aliases for the resulting columns
>>>>> ("The number of aliases in the AS clause does not match the number of
>>>>> colums output by the UDTF, expected 3 aliases but got 1").
>>>>>
>>>>> What's the syntax to provide multiple aliases ?
>>>>>
>>>>> Thanks,
>>>>> Mathieu
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Nitin Pawar
>>>>
>>>
>>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
t; 41 pay bigint,
>> 42 spay bigint,
>> 43 ipv bigint,
>> 44 sellerid string,
>> 45 cate string
>> 46 )
>> 47 partitioned by(ds string)
>> 48 row format delimited fields terminated by '\001' lines terminated by '\n'
>> 49 stored as sequencefile
>> 50 location '${HADOOP_PATH_4_MY_HIVE}/${HIVETBL_my_table}';
>>
>>
>> thanks for help.
>>
>>
>> Richard
>>
>>
>>
>>
>>
>>
>
>
> --
> Nitin Pawar
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
m sure
>> how to implement Map reduce local tasks with hash tables.
>>
>> Good wishes,always !
>> Santosh
>>
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
can
> tell me that can HUE be used with this setup instead of CHD Hadoop cluster.
> If not, then is there any alternate UI similar to HUE.
>
> Please help.
> Thanks,
> Chunky.
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
grdg&olfll3onsl'
>
> or
>
> '?MovieTitle=949303sjkskld&sososodn'
>
>
> how to extract 'MovieTitle=321grgrdg' or 'MovieTitle=949303sjkskld' using
> Hive reg expression functions
>
>
> thanks
> and happy holidays
>
>
--
*Dean Wampler, Ph.D.*
thinkbiganalytics.com
+1-312-339-1330
ested flow as follows:
>
> MySQL ---(Extract / Load)---> HDFS (Table/Year/Month/Day) ---> Load in
> Hive as External Table ---(Transform Data & Join Tables)--> Save it in Hive
> tables for reporting.
>
>
> Correct?
>
> Appreciated.
>
>
> --
> Ibrahi
es for reporting.
>
> My questions are:
>
>1. What is the best way to reflect MySQL updates into Hive with
>minimal resources?
>2. Is sqoop the right tool to do the ETL?
>3. Is Hive the right tool to do this kind of queries or we should
>search for
1 - 100 of 122 matches
Mail list logo