Re: how to write a SerDe

2009-07-24 Thread Zheng Shao
Sorry about the delay on this. Here are several example SerDes that got added to the code base recently: RegexSerDe: A SerDe for parsing text using regex (and an example for parsing Apache Log using a regex) https://issues.apache.org/jira/browse/HIVE-167 contrib/src/java/org/apache/hadoop/hi

Re: loading data from HDFS or local file to

2009-07-24 Thread Zheng Shao
Hi Manhee, You don't need to do "load" for an external table. You already specified the location of the external table in the "create external table" command, so you can directly use that external table. Zheng On Wed, Jul 22, 2009 at 7:12 PM, Manhee Jo wrote: > Hi Zheng, > > I've tried to load a

Re: Importing log files in custom (non-delimited) format

2009-07-24 Thread Saurabh Nanda
Hi Zheng, Thanks for the reply, but I gave up on UDFs & SerDe and resorted to custom map/reduce scripts instead. In case you're interested, I've written about my Hive experience at http://nandz.blogspot.com/2009/07/using-hive-for-weblog-analysis.html Saurabh. On Thu, Jul 23, 2009 at 2:15 AM, Zhe

Re: bz2 Splits.

2009-07-24 Thread Saurabh Nanda
Please excuse my ignorance, but can I import gzip compressed files directly as Hive tables? I have separate gzip files for each days weblog data. Right now I am gunzipping them and then importing into a raw table. Can I import the gzipped files directly into Hive? Saurabh. On Wed, Jul 22, 2009 at

Re: Re: bz2 Splits.

2009-07-24 Thread bcraig7
Have not checked gzip out yet but Hive is happy with .bz2 files. The documentation on this is spotty. It seems that any Hadoop supported compression will work. The issue with .gz files is that they will not be splittable. That is one map will process an entire file so if your .gz files are

Re: Re: bz2 Splits.

2009-07-24 Thread Neal Richter
gz files work fine. We're attaching daily directories of gziped logs in S3 as hive table partitions. Best to have your logrotator do hourly rotation to create lots of gz files for better mapping. OR one could use zcat, split, and gzip to divide into smaller chunks if you really only have one gz

Hive Class path libjars, auxjars, etc

2009-07-24 Thread Edward Capriolo
I have been following some threads on the hadoop mailing list about speeding up MR jobs. I have a few questions I am sure I can find the answer to if I dig into the source code but I thought I could get a quick answer. 1 ADD JAR 'myfile.jar' uses the distributed cache. Using the distributed cache

Re: Hive Class path libjars, auxjars, etc

2009-07-24 Thread Zheng Shao
Hive only needs to be installed at the node that runs the hive query. All the jars will be sent to the hadoop JobClient via -libjars. The code is in ExecDriver.java. In hadoop 0.17, I don't think there is a way to add a path to classpath for a job (unless we put it in hadoop-env.sh and start TaskT

Re: Hive Class path libjars, auxjars, etc

2009-07-24 Thread Edward Capriolo
On Fri, Jul 24, 2009 at 1:36 PM, Zheng Shao wrote: > Hive only needs to be installed at the node that runs the hive query. > All the jars will be sent to the hadoop JobClient via -libjars. The > code is in ExecDriver.java. > > In hadoop 0.17, I don't think there is a way to add a path to > classpat

error code 2

2009-07-24 Thread Keven Chen
I wrote a simple program to a query to pull some information from the database. There is only one query is called with different parameters. It was going fine at the beginning. However, it gives exceptions when the query is executed many times. (around 70 times) Here is the error infomation: 09/07

Re: error code 2

2009-07-24 Thread Prasad Chakka
The below information is not enough to figure out what is going on. Send the correct stack trace from /tmp//hive.log You can change the log level to DEBUG in log4j.properties and rerun the query. Also if the mappers/reducers are failing then check the log of the failed tasks from hadoop. Prasa