Re: how to run jobs every 30 minutes?

2010-12-14 Thread Alejandro Abdelnur
Ed, Actually Oozie is quite different from Cascading. * Cascading allows you to write 'queries' using a Java API and they get translated into MR jobs. * Oozie allows you compose sequences of MR/Pig/Hive/Java/SSH jobs in a DAG (workflow jobs) and has timer+data dependency triggers (coordinator

Libfb303.jar

2010-12-14 Thread Adarsh Sharma
Dear all, I am using Hadoop-0.20.2 and Hadoopdb Hive on a 5 node cluster. I am connecting Hive through Eclipse but I got the error below : Hive history file=/tmp/hadoop/hive_job_log_hadoop_201012141618_1092196256.txt 10/12/14 16:18:37 INFO exec.HiveHistory: Hive history

Symbol Link as InputFormat Folder

2010-12-14 Thread lamfeeli...@gmail.com
Dear All, I've got a Folder A, and has a Symbol Link folder A' linked to A, but when I add A' as one of the inputformat folders, it gives me this error: Exception in thread main org.apache.hadoop.hdfs.protocol.UnresolvedPathException: hdfs://localhost:9000/user/songliu/W at

Re: Task fails: starts over with first input key?

2010-12-14 Thread Keith Wiley
Hmmm, I'll take that under advisement. So, even if I manually avoided redoing earlier work (by keeping a log of which input key/values have been processed and short-circuiting the map() if a key/value has already been processed, you're saying those previously completed key/values would not be

Re: files that don't really exist?

2010-12-14 Thread Allen Wittenauer
On Dec 13, 2010, at 3:14 PM, Seth Lepzelter wrote: Alright, a little further investigation along that line (thanks for the hint, can't believe I didn't think of that), shows that there's actually a carriage return character (%0D, aka \r) at the end of the filename. This falls into

Re: Task fails: starts over with first input key?

2010-12-14 Thread Keith Wiley
On Dec 13, 2010, at 17:58 , li ping wrote: I think the *org.apache.hadoop.mapred.SkipBadRecords* is you are looking for. Yes, I considered that at one point. I don't like how it insists on iteratively retrying the records. I wish it would simply skip the failed records and move on, just

Re: Task fails: starts over with first input key?

2010-12-14 Thread Keith Wiley
On Dec 14, 2010, at 09:30 , Harsh J wrote: Hi, On Tue, Dec 14, 2010 at 10:43 PM, Keith Wiley kwi...@keithwiley.com wrote: I wish there were a less burdensome version of skipbadrecords. I don't want it to perform a binary search trying to find the bad record while reprocessing data over

Re: how to run jobs every 30 minutes?

2010-12-14 Thread Chris K Wensel
I see it this way. You can glue a bunch of discrete command line apps together that may or may not have dependencies between one another in a new syntax. which is darn nice if you already have a bunch of discrete ready to run command line apps sitting around that need to be strung together,

Re: Question about AvatarNode

2010-12-14 Thread ChingShen
Hi Ted, Thanks for your reply. Shen On Tue, Dec 14, 2010 at 1:37 PM, Ted Yu yuzhih...@gmail.com wrote: Check out the code on github You can find contrib/highavailability/src/java/org/apache/hadoop/hdfs/AvatarZooKeeperClient.java On Sun, Dec 12, 2010 at 11:54 PM, ChingShen

Hive import question

2010-12-14 Thread Mark
When I load a file from HDFS into hive i notice that the original file has been removed. Is there anyway to prevent this? If not, how can I got back and dump it as a file again? Thanks

Re: Hive import question

2010-12-14 Thread 김영우
Hi Mark, You can use 'External table' in Hive. http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL http://wiki.apache.org/hadoop/Hive/LanguageManual/DDLHive external table does not move or delete files. - Youngwoo 2010/12/15 Mark static.void@gmail.com When I load a file from HDFS into