[CVE-2012-3376] Apache Hadoop HDFS information disclosure vulnerability

2012-07-06 Thread Aaron T. Myers
ctions. The project team will be announcing a release vote shortly for Apache Hadoop 2.0.1-alpha, which will be comprised of the contents of Apache Hadoop 2.0.0-alpha, this security patch, and a few patches for YARN. Best, Aaron T. Myers Software Engineer, Cloudera CVE-2012-3376: Apache Hado

Re: Hadoop Datacenter Setup

2012-01-31 Thread Aaron Tokhy
ael Segel wrote: If you are going this route why not net boot the nodes in the cluster? Sent from my iPhone On Jan 30, 2012, at 8:17 PM, "Patrick Angeles" wrote: Hey Aaron, I'm still skeptical when it comes to flash drives, especially as pertains to Hadoop. The write cycl

Re: Hadoop Datacenter Setup

2012-01-31 Thread Aaron Tokhy
u don't want a large Hive query to knock out your RegionServers thereby causing cascading failures. We were thinking about another cluster that would just run Hive jobs. We do not have that flexibility at the moment. On 01/30/2012 09:17 PM, Patrick Angeles wrote: Hey Aaron, I'm sti

Re: Hadoop Datacenter Setup

2012-01-30 Thread Aaron Tokhy
ed Java VM overhead across services, which comes up to be around a max of 16-20GB used. This gives us around 4-8GB for tasks that would work with HBase. We may also use Hive on the same cluster for queries. On 01/30/2012 05:40 PM, Aaron Tokhy wrote: Hi, Our group is trying to set up a pro

Hadoop Datacenter Setup

2012-01-30 Thread Aaron Tokhy
Hi, Our group is trying to set up a prototype for what will eventually become a cluster of ~50 nodes. Anyone have experiences with a stateless Hadoop cluster setup using this method on CentOS? Are there any caveats with a read-only root file system approach? This would save us from having

Pig Output

2011-12-05 Thread Aaron Griffith
Using PigStorage() my pig script output gets put into partial files on the hadoop file system. When I use the copyToLocal fuction from Hadoop it creates a local directory with all the partial files. Is there a way to copy the partial files from hadoop into a single local file? Thanks

December 2011 SF Hadoop User Group

2011-11-16 Thread Aaron Kimball
schedule: - 6pm - Welcome - 6:30pm - Introductions; start creating agenda - Breakout sessions begin as soon as we're ready - 8pm - Conclusion Food and refreshments will be provided, courtesy of Splunk. Please RSVP at http://www.meetup.com/hadoopsf/events/41427512/ Regards, - Aaron Kimball

Writing an Hbase Result object out to SequenceFileOutputFormat

2011-10-24 Thread Aaron Baff
So, I'm trying to write out an Hbase Result object (same one I get from my TableMapper) to a SequenceFileOutputFormat from my Reducer as the value, but I'm getting an error when it's trying to get a serializer. It looks like the SerializationFactory can't find a Serialization (only one listed in

October SF Hadoop Meetup

2011-09-30 Thread Aaron Kimball
up.com/hadoopsf/events/35650052/ Regards, - Aaron Kimball

RE: Running multiple MR Job's in sequence

2011-09-29 Thread Aaron Baff
Yea, we don't want it to sit there waiting for the Job to complete, even if it's just a few minutes. --Aaron -Original Message- From: turboc...@gmail.com [mailto:turboc...@gmail.com] On Behalf Of John Conwell Sent: Thursday, September 29, 2011 10:50 AM To: common-user@hadoop.

RE: Running multiple MR Job's in sequence

2011-09-29 Thread Aaron Baff
obTracker which runs them all in order without the client application needing to do anthing further. Sounds like that doesn't really exist as part of Hadoop framework, and needs something like Oozie (or a home-built system) to do this. --Aaron -Original Message- From: Harsh J [

Running multiple MR Job's in sequence

2011-09-28 Thread Aaron Baff
or is a fire & forget, and occasionally check back to see if it's done. So client-side doesn't need to really know anything or keep track of anything. Does something like that exist within the Hadoop framework? --Aaron

RE: I need help talking to HDFS over a firewall

2011-09-23 Thread Aaron Baff
Are you sure you have the right port number? As you say, if it's been reconfigured, could they have changed the port the NN runs on? Also, could they have changed the hostname of the NN? Instead of connecting to the NN you actually are trying to connect to one of the datanodes? --

Re: Submitting Jobs from different user to a queue in capacity scheduler

2011-09-18 Thread Aaron T. Myers
o disregard all permissions on HDFS, you can just set the config value "dfs.permissions" to "false" and restart your NN. This is still overkill, but at least you could roll back if you change your mind later. :) -- Aaron T. Myers Software Engineer, Cloudera

RE: Datanodes going down frequently

2011-09-16 Thread Aaron Baff
don't really have a clue why we're seeing this behavior. We're running on FreeBSD with the Diablo-JVM (Java 1.6), which a guy on their list feels is a pretty unusual configuration that people aren't really running. --Aaron -Original Message- From: john smith [mailt

RE: Datanodes going down frequently

2011-09-16 Thread Aaron Baff
n you attach a KVM to a machine when it becomes unreachable and take a look? Or add some monitoring to keep an eye on the network mbufs? Don't know if this is your problem as well or not. --Aaron -Original Message- From: john smith [mailto:js1987.sm...@gmail.com] Sent: Thursday, Se

RE: Datanodes going down frequently

2011-09-15 Thread Aaron Baff
s its check, it resumes talking to the NN and the NN adds it back in. --Aaron -Original Message- From: john smith [mailto:js1987.sm...@gmail.com] Sent: Thursday, September 15, 2011 3:07 PM To: common-user@hadoop.apache.org Subject: Datanodes going down frequently Hi all, I am running a 10

Re: NPE in TaskLogAppender

2011-08-18 Thread aaron morton
n the cassandra config those appenders are not re-activated. I'll put up a patch to make the TaskLogAppender a little safer by checking if if it's closed before flush. But we need to keep the diff configs away from each other. Cheers ----- Aaron Morton Freelance Cass

RE: NPE in TaskLogAppender

2011-08-18 Thread Aaron Baff
that would keep my code's static blocks from reconfiguring log4j. -Original Message----- From: aaron morton [mailto:aa...@thelastpickle.com] Sent: Thursday, August 18, 2011 9:30 AM To: common-user@hadoop.apache.org Subject: Re: NPE in TaskLogAppender An update incase anyone else has this p

Re: NPE in TaskLogAppender

2011-08-18 Thread aaron morton
this problem ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronmorton http://www.thelastpickle.com On 15/08/2011, at 2:04 PM, aaron morton wrote: > I'm running the Cassandra Brisk server with Haddop core 20.203 on OSX, > everything is local. > > I

NPE in TaskLogAppender

2011-08-14 Thread aaron morton
* setting mapred.acls.enabled to true * setting mapred.queue.default.acl-submit-job and mapred.queue.default.acl-administer-jobs to * There was no discernible increase in joy though. Any thoughts ? Cheers - Aaron Morton Freelance Cassandra Developer @aaronm

RE: Skipping Bad Records in M/R Job

2011-08-09 Thread Aaron Baff
I'm curious, what error could be thrown that can't be handled via try/catch by catching Exception or Throwable? --Aaron -Original Message- From: Maheshwaran Janarthanan [mailto:ashwinwa...@hotmail.com] Sent: Tuesday, August 09, 2011 10:41 AM To: HADOOP USERGROUP Subject: RE: Sk

RE: Skipping Bad Records in M/R Job

2011-08-09 Thread Aaron Baff
If the 3rd party library is used as part of your Map() function, you could just catch the appropriate Exceptions, and simply not emit that record and return from the Map() normally. --Aaron -Original Message- From: Maheshwaran Janarthanan [mailto:ashwinwa...@hotmail.com] Sent: Tuesday

RE: next gen map reduce

2011-07-29 Thread Aaron Baff
It was my understanding (could easily be wrong) that 0.21.0 was never going to be considered a stable, production version and 0.22.0 was going to be the next big stable revision. --Aaron -Original Message- From: Roger Chen [mailto:rogc...@ucdavis.edu] Sent: Friday, July 29, 2011 10:20

RE: next gen map reduce

2011-07-28 Thread Aaron Baff
Does this mean 0.22.0 has reached stable and will be released as the stable version soon? --Aaron -Original Message- From: Robert Evans [mailto:ev...@yahoo-inc.com] Sent: Thursday, July 28, 2011 6:39 AM To: common-user@hadoop.apache.org Subject: Re: next gen map reduce It has not been

RE: NullPointerException when running multiple reducers with Hadoop 0.22.0-SNAPSHOT

2011-06-30 Thread Aaron Baff
when submitting from a Windows machine. --Aaron -Original Message- From: Paolo Castagna [mailto:castagna.li...@googlemail.com] Sent: Wednesday, June 29, 2011 11:54 PM To: common-user@hadoop.apache.org Subject: NullPointerException when running multiple reducers with Hadoop 0.22.0-SNAPSHOT

Meetup Announcement: July 2011 SF HUG (7/13/2011)

2011-06-15 Thread Aaron Kimball
Breakout sessions begin as soon as we're ready * 8pm - Conclusion Food and refreshments will be provided, courtesy of CBSi. I hope to see you there! Please RSVP at http://bit.ly/kLpLQR so we can get an accurate count for food and beverages. Cheers, - Aaron Kimball

Make reducer task exit early

2011-06-03 Thread Aaron Baff
the output, and as soon as that is exceeded, simply return at the top of the reduce() function. Is there any way to optimize it even more to tell the Reduce task, "stop reading data, I don't need any more data"? --Aaron

Next SF HUG: June 8, at RichRelevance

2011-05-19 Thread Aaron Kimball
ons begin as soon as we're ready - 8pm - Conclusion Food and refreshments will be provided, courtesy of RichRelevance. If you're going to attend, please RSVP at http://bit.ly/kxaJqa. Hope to see you all there! - Aaron Kimball

NPE during RunningJob.getCounters()

2011-05-03 Thread Aaron Baff
the NPE from the getCounters(). Stack trace is below. Anyone have any ideas what's happening? Is the JobClient not meant to be persistent and I should create a new one every single time? --Aaron java.lang.NullPointerException at org.apache.hadoop.mapred.Counters.downgrade

April SFHUG recap, May SFHUG meetup announcement

2011-04-18 Thread Aaron Kimball
Conclusion Food and refreshments will be provided, courtesy of Cloudera. Please RSVP at http://bit.ly/hwMCI2 Looking forward to seeing you there! Regards, - Aaron Kimball

RE: Are there any Hadoop books in print that use the new API?

2011-04-06 Thread Aaron Baff
I'll volunteer to proof read & test it ;) I've been meaning to get around to using the new API, just haven't had the time to learn it and convert all the existing MR jobs to it. --Aaron -Original Message- From: Mark Kerzner [mailto:markkerz...@gmail.com] Sent: Wednes

Using FileContext for FS API

2011-03-22 Thread Aaron Baff
ntext class states that only the default filesystem and umask are pulled from the Configuration object. Any documentation that I'm missing? Do I need to go look through the source code? --Aaron

RE: Test, please respond

2011-03-22 Thread Aaron Baff
Ok, thanks. Guess I'm just having no luck getting my posts replied to. Aaron Baff | Developer | Telescope, Inc. email: aaron.b...@telescope.tv | office: 424 270 2913 | www.telescope.tv AMERICAN IDOL is back and better than ever with a new team of judges on Season 10! Voting begins Tu

Test, please respond

2011-03-22 Thread Aaron Baff
Does anyone see this? Can someone at least respond to this to indicate that it's getting to the mailing list fine? I've just gotten 0 replies to a few previous emails so I'm wondering if it's nobody is seeing these, or if people just don't have any idea. --Aaron

Problem trying to append file

2011-03-21 Thread Aaron Baff
he sysadmin to restart the DFS. This will be early tomorrow at the earliest, but I can try just about any other suggestions. Help! --Aaron 03-21-11 15:58:17 [INFO ] Exception in createBlockOutputStream java.io.EOFException 03-21-11 15:58:17 [WARN ] Error Recovery for block blk_8212105008236569520_

Re: How does sqoop distribute it's data evenly across HDFS?

2011-03-17 Thread Aaron Kimball
ise unavailable, and only a specific subset of the nodes are capable of actually receiving the writes. But in this regard, Sqoop is no different than any other custom MapReduce program you might write; it's not particularly more or less resilient to any pathological conditions of the underlying

Reducer error/hang on small single file Map input

2011-03-15 Thread Aaron Baff
files (5, 10, whichever, just >1), then it runs fine, no issues. Very strange, anyone have any ideas? --Aaron stderr logs log4j:ERROR Failed to flush writer, java.io.InterruptedIOException at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOu

SF Hadoop Meetup - March review and April announcement (April 13)

2011-03-11 Thread Aaron Kimball
s begin as soon as we're ready * 8pm - Conclusion Regards, - Aaron Kimball

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
ate memory" errors are coming from. If they're from the OS, could it be because it needs to fork() and momentarily exceed the ulimit before loading the native libs? - Aaron On Fri, Mar 4, 2011 at 1:26 PM, Aaron Kimball wrote: > I don't know if putting native-code .so files in

Re: Problem running a Hadoop program with external libraries

2011-03-04 Thread Aaron Kimball
;t know if it is true for native libs as well.) - Aaron On Fri, Mar 4, 2011 at 12:53 PM, Ratner, Alan S (IS) wrote: > We are having difficulties running a Hadoop program making calls to > external libraries - but this occurs only when we run the program on our > cluster and not from

Reminder: SF Hadoop meetup in 1 week

2011-03-02 Thread Aaron Kimball
d volunteer to facilitate a discussion. All members of the Hadoop community are welcome to attend. While all Hadoop-related subjects are on topic, this month's discussion theme is "integration." Regards, - Aaron Kimball

Small file Map performance

2011-03-02 Thread Aaron Baff
27;s it's TTL, and then that slot becomes available for another Job. Is there a way to adjust this TTL? Or be able to re-use the JVM for a different Job? This is all with 0.21.0. --Aaron

Error in Reducer stderr preventing reducer from completing

2011-02-25 Thread Aaron Baff
r.freeHost(ShuffleScheduler.java:345) at org.apache.hadoop.mapreduce.task.reduce.Fetcher.run(Fetcher.java:152) Aaron Baff | Developer | Telescope, Inc. email: aaron.b...@telescope.tv<mailto:aaron.b...@telescope.tv> | office: 424 270 2913 | www.telescope.tv<http://www.telescope.tv>

March 2011 San Francisco Hadoop User Meetup ("integration")

2011-02-23 Thread Aaron Kimball
the theme of "integration." Yelp has asked that all attendees RSVP in advance, to comply with their security policy. Please join the meetup group and RSVP at http://www.meetup.com/hadoopsf/events/16678757/ Refreshments will be provided. Regards, - Aaron Kimball

RE: How do I get a JobStatus object?

2011-02-18 Thread Aaron Baff
> On Thu, Feb 17, 2011 at 12:09 AM, Aaron Baff wrote: >> I'm submitting jobs via JobClient.submitJob(JobConf), and then waiting until >> it completes with RunningJob.waitForCompletion(). I then want to get how >> long the entire MR takes, which appears t

RE: How do I get a JobStatus object?

2011-02-17 Thread Aaron Baff
;t give me access to the data I'm looking for. I'm specifically looking at org.apache.hadoop.mapreduce.JobStatus and it's getStartTime() and getFinishTime() methods. The only place I've seen to get a JobStatus object is the JobClient getAllJobs(), getJobsFromQueue(), and job

How do I get a JobStatus object?

2011-02-16 Thread Aaron Baff
way I can see how to do it right now is JobClient.getAllJobs(), which gives me an array of all the jobs that are submitted (currently running? all previous?). Anyone know how I could go about doing this? --Aaron

SF Hadoop meetup report

2011-02-11 Thread Aaron Kimball
etup announcement! Sign up at http://www.meetup.com/hadoopsf/ Regards, - Aaron Kimball

Re: Unable to accesss the HDFS hadoop .21 please help

2011-02-04 Thread Aaron Eng
I think it wants you to type a capital Y, as silly as that may sound... On Feb 4, 2011, at 7:38 AM, ahmednagy wrote: > > I have a cluster with a master and 7 nodes when i try to start hadoop it > starts the mapreduce processes and the hdfs processes on all the nodes. > formated the hdfs but

Re: Benchmarking performance in Amazon EC2/EMR environment

2011-02-01 Thread Aaron Eng
ed in multiple availability zones in the us-west and us-east regions and the experience has been the same. For cc1.4xlarge instances I've only tested in us-east. On Tue, Feb 1, 2011 at 7:48 AM, Steve Loughran wrote: > On 31/01/11 23:22, Aaron Eng wrote: > >> Hi all, >> >

Benchmarking performance in Amazon EC2/EMR environment

2011-01-31 Thread Aaron Eng
Hi all, I was wondering if any of you have had a similar experience working with Hadoop in Amazon's environment. I've been running a few jobs over the last few months and have noticed them taking more and more time. For instance, I was running teragen/terasort/teravalidate as a benchmark and I'v

Re: Click Stream Data

2011-01-30 Thread Aaron Kimball
Start with the student's CS department's web server? I believe the wikimedia foundation also makes the access logs to wikipedia et al. available publicly. That is quite a lot of data though. - Aaron On Sun, Jan 30, 2011 at 10:54 AM, Bruce Williams wrote: > Does anyone know of a so

Re: How do I log from my map/reduce application?

2010-12-15 Thread Aaron Kimball
up(Context context) { logger.setLevel(Level.DEBUG); } - Aaron On Wed, Dec 15, 2010 at 2:23 PM, W.P. McNeill wrote: > I'm running on a cluster. I'm trying to write to the log files on the > cluster machines, the ones that are visible through the jobtracker web > interfa

Re: How do I log from my map/reduce application?

2010-12-15 Thread Aaron Kimball
"syslog" in the right-most column. - Aaron On Mon, Dec 13, 2010 at 10:05 AM, W.P. McNeill wrote: > I would like to use Hadoop's Log4j infrastructure to do logging from my > map/reduce application. I think I've got everything set up correctly, but > I > am still una

Re: Hadoop/Elastic MR on AWS

2010-12-09 Thread Aaron Eng
Pros: - Easier to build out and tear down clusters vs. using physical machines in a lab - Easier to scale up and scale down a cluster as needed Cons: - Reliability. In my experience I've had machines die, had machines fail to start up, had network outages between Amazon instances, etc. These pro

Worker JVM's not terminating quickly with JVM reuse

2010-12-02 Thread Aaron Baff
they need to terminate the worker JVM's? Or is there a setting to reduce the time that the worker JVM's hang around before terminating? --Aaron

Re: Not a host:port pair: local

2010-11-24 Thread Aaron Eng
Can you send the mapred-site.xml config for reference? It could be a formatting issue. I've seen that problem when there was a type in the XML after hand-editing. On Tue, Nov 23, 2010 at 10:35 AM, Skye Berghel wrote: > On 11/19/2010 10:07 PM, Harsh J wrote: > >> How are you starting your JobTr

Re: Not a host:port pair: local

2010-11-19 Thread Aaron Eng
Maybe try doing a "grep -R local " to see if its picking it up from somewhere in there. Also, maybe try specifying an actual IP instead of myserver as a test to see if name resolution is an issue. On Fri, Nov 19, 2010 at 5:56 PM, Skye Berghel wrote: > I'm trying to set up a Hadoop cluster. Howe

RE: Problem with custom WritableComparable

2010-11-12 Thread Aaron Baff
>On Thu, Nov 11, 2010 at 4:29 PM, Aaron Baff wrote: > >> I'm having a problem with a custom WritableComparable that I created >> to use as a Key object. I basically have a number of identifier's with >> a timestamp, and I'm wanting to group the Identifier&

Problem with custom WritableComparable

2010-11-11 Thread Aaron Baff
f the 2 Identifiers. Have I made a wrong assumption somewhere about how it's supposed to work? Did I do something wrong? --Aaron public class IdentifierTimestampKey implements WritableComparable { private String identifier = ""; private long timestamp = 0L;

Re: Single setup documentation error

2010-11-09 Thread Aaron Eng
>bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z]+' Have you tried specifying the actual file name instead of the using the '*' wildcard? On Tue, Nov 9, 2010 at 2:10 PM, Fabio A. Miranda wrote: > Give a fresh installation, I followed the Single Node Setup doc from > hadoop websit

Re: fs.defaultFS value

2010-11-09 Thread Aaron Eng
Did you set the namenode URI? 2010-11-09 15:38:38,255 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority. You should have some config defined in the core-site.xml file similar t

Re: Cluster setup

2010-11-09 Thread Aaron Eng
Hi Fabio, I found this site extremely helpful in explaining how to do a one node setup for a first time user: http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29 On Tue, Nov 9, 2010 at 10:54 AM, Fabio A. Miranda wrote: > Hello, > > > > You don't need 4 mach

San Francisco Hadoop meetup

2010-11-04 Thread Aaron Kimball
in joining us, please fill out the following: * I've created a short survey to help understand days / times that would work for the most people: http://bit.ly/ajK26U * Please also join the meetup group at http://meetup.com/hadoopsf -- We'll use this to plan the event, RSVP information, et

Re: Dumping Cassandra into Hadoop

2010-10-19 Thread aaron morton
read from Cassandra and create a file should be OK. You can then copy it onto the HFS and read from there. Hope that helps. Aaron On 20 Oct 2010, at 04:01, Mark wrote: > As the subject implies I am trying to dump Cassandra rows into Hadoop. What > is the easiest way for me to accomplis

Re: Finding replicants of an HDFS file

2010-10-13 Thread Aaron Myers
. I'm pretty sure you can give it either a single file or a directory, in which case it will show you the details for every file under that directory. Hope that helps. Aaron

Custom Key classes with same parent class

2010-09-14 Thread Aaron Baff
class that distinguishes between the two 'types' based on an additional field? Aaron Baff | Developer | Telescope, Inc. email: aaron.b...@telescope.tv | office: 424 270 2913 | www.telescope.tv The information contained in this email is confidential and may be legally privileged. It is

RE: TOP N items

2010-09-10 Thread Aaron Baff
b where you map the Count as the Key, and Item as the Value, use 1 Reducer, and Identity Reduce it (e.g. don't do any reducing, just output the Count,Item). Aaron Baff | Developer | Telescope, Inc. email: aaron.b...@telescope.tv | office: 424 270 2913 | www.telescope.tv Bored with summer re

Custom Key class not working correctly

2010-09-10 Thread Aaron Baff
/** * @return the oli */ public byte getOli() { return oli; } /** * @param oli the oli to set */ public void setOli(byte oli) { this.oli = oli; } } =

Re: Partitioned Datasets Map/Reduce

2010-07-05 Thread Aaron Kimball
and /dataset2/part-(n) in your mapper. If you wanted to be more clever, it might be possible to subclass MultiFileInputFormat to group together both datasets "file-number-wise" when generating splits, but I don't have specific guidance here. - Aaron On Sat, Jul 3, 2010 at 9

Re: create error

2010-07-05 Thread Aaron Kimball
Is there a reason you're using that particular interface? That's very low-level. See http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample for the proper API to use. - Aaron On Sat, Jul 3, 2010 at 1:36 AM, Vidur Goyal wrote: > Hi, > > I am trying to create a file in

Re: Text files vs. SequenceFiles

2010-07-05 Thread Aaron Kimball
ight? For data at either "edge" of your problem--either input or final output data--you might want the greater ubiquity of text-based files. - Aaron On Fri, Jul 2, 2010 at 3:35 PM, Joe Stein wrote: > David, > > You can also set compression to occur of your data between your map

Re: Is it possible ....!!!

2010-06-10 Thread Aaron Kimball
estion is probably more "correct," but might incur additional work on your part. Cheers, - Aaron On Thu, Jun 10, 2010 at 3:54 PM, Allen Wittenauer wrote: > > On Jun 10, 2010, at 3:25 AM, Ahmad Shahzad wrote: > > Reason for doing that is that i want all the communication to happen

Re: help on CombineFileInputFormat

2010-05-10 Thread Aaron Kimball
urce at http://github.com/cloudera/sqoop) :) Cheers, - Aaron On Thu, May 6, 2010 at 7:32 AM, Zhenyu Zhong wrote: > Hi, > > I tried to use CombineFileInputFormat in 0.20.2. It seems I need to extend > it because it is an abstract class. > However, I need to implement getRecordRe

Sqoop is moving to github!

2010-03-29 Thread Aaron Kimball
f you have any questions about this move process, please ask me. Regards, - Aaron Kimball Cloudera, Inc.

Re: Sqoop Installation on Apache Hadop 0.20.2

2010-03-17 Thread Aaron Kimball
rate package to install Sqoop independent of the rest of CDH; thus no extra download link on our site. I hope this helps! Good luck, - Aaron On Wed, Mar 17, 2010 at 4:30 AM, Reik Schatz wrote: > At least for MRUnit, I was not able to find it outside of the Cloudera > distribution (CDH). Wha

Re: CombineFileInputFormat in 0.20.2 version

2010-03-16 Thread Aaron Kimball
rsion of Hadoop and recompile, but that might be tricky since the filenames will most likely not line up (due to the project split). - Aaron On Tue, Mar 16, 2010 at 8:11 AM, Aleksandar Stupar < stupar.aleksan...@yahoo.com> wrote: > Hi all, > > I want to use CombineFileInputFormat i

Re: Unexpected termination of a job

2010-03-03 Thread Aaron Kimball
If it's terminating before you even run a job, then you're in luck -- it's all still running on the local machine. Try running it in Eclipse and use the debugger to trace its execution. - Aaron On Wed, Mar 3, 2010 at 4:13 AM, Rakhi Khatwani wrote: > Hi, >I am ru

Re: Separate mail list for streaming?

2010-03-03 Thread Aaron Kimball
We've already got a lot of mailing lists :) If you send questions to mapreduce-user, are you not getting enough feedback? - Aaron On Wed, Mar 3, 2010 at 12:09 PM, Michael Kintzer wrote: > Hi, > > Was curious if anyone else thought it would be useful to have a separate > mail li

Re: dataset

2010-03-03 Thread Aaron Kimball
Look at implementing your own Partitioner implementation to control which records are sent to which reduce shards. - Aaron On Wed, Mar 3, 2010 at 12:15 PM, Gang Luo wrote: > Hi all, > I want to generate some datasets with data skew to test my mapreduce jobs. > I am using TPC-DS but i

Re: Why is $JAVA_HOME/lib/tools.jar in the classpath?

2010-02-17 Thread Aaron Kimball
(including the one in the most recently-released CDH2: 0.20.1+169.56-1) include MAPREDUCE-1146 which eliminates that dependency. - Aaron On Tue, Feb 16, 2010 at 3:19 AM, Steve Loughran wrote: > Thomas Koch wrote: > >> Hi, >> >> I'm working on the Debian package fo

Re: mapred.system.dir

2010-02-12 Thread Aaron Kimball
pose of the config file comment is to let you know that you're free to pick a path name like "/system/mapred" here even though your local Linux machine doesn't have a path named "/system"; this HDFS path is in a separate (HDFS-specific) namespace from "/home",

Re: Ubuntu Single Node Tutorial failure. No live or dead nodes.

2010-02-12 Thread Aaron Kimball
job, that's another story, but you can accomplish that with: $HADOOP_HOME/hadoop dfsadmin -safemode wait ... which will block until HDFS is ready for user commands in read/write mode. - Aaron On Fri, Feb 12, 2010 at 8:44 AM, Sonal Goyal wrote: > Hi > > I had faced a similar i

Re: Identity Reducer

2010-02-11 Thread Aaron Kimball
Can you post the entire exception with its accompanying stack trace? - Aaron On Thu, Feb 11, 2010 at 5:26 PM, Prabhu Hari Dhanapal < dragonzsn...@gmail.com> wrote: > @ Jeff > I seem to have used the Mapper you are pointing to ... > > > import org.apache.hadoop.mapred.Ma

Re: Any alternative for MultipleOutputs class in hadoop 0.18.3

2010-02-11 Thread Aaron Kimball
There's an older mechanism called MultipleOutputFormat which may do what you need. - Aaron On Fri, Feb 5, 2010 at 10:13 AM, Udaya Lakshmi wrote: > Hi, > MultipleOutput class is not available in hadoop 0.18.3. Is there any > alternative for this class? Please point me useful li

Re: map side only behavior

2010-01-31 Thread Aaron Kimball
dvice. - Aaron On Fri, Jan 29, 2010 at 8:32 AM, Jones, Nick wrote: > A single unity reducer should enforce a merge and sort to generate one > file. > > Nick Jones > > -Original Message- > From: Jeff Zhang [mailto:zjf...@gmail.com] > Sent: Friday, January 29, 20

Re: DBOutputFormat Speed Issues

2010-01-31 Thread Aaron Kimball
pretty good support for parallel imports, and uses this InputFormat instead. - Aaron On Thu, Jan 28, 2010 at 11:39 AM, Nick Jones wrote: > Hi all, > I have a use case for collecting several rows from MySQL of > compressed/unstructured data (n rows), expanding the data set, and storin

Re: hadoop under cygwin issue

2010-01-31 Thread Aaron Kimball
Brian, it looks like you missed a step in the instructions. You'll need to format the hdfs filesystem instance before starting the NameNode server: You need to run: $ bin/hadoop namenode -format .. then you can do bin/start-dfs.sh Hope this helps, - Aaron On Sat, Jan 30, 2010 at 12:

Re: build and use hadoop-git

2010-01-24 Thread Aaron Kimball
See http://wiki.apache.org/hadoop/HowToContribute for more step-by-step instructions. - Aaron On Fri, Jan 22, 2010 at 7:36 PM, Kay Kay wrote: > Start with hadoop-common to start building . > > hadoop-hdfs / hadoop-mapred pull the dependencies from apache snapshot > repository that

Re: Multiple file output

2010-01-07 Thread Aaron Kimball
Note that org.apache.hadoop.mapreduce.lib.output.MultipleOutputs is scheduled for the next CDH 0.20 release -- ready "soon." - Aaron 2010/1/6 Amareshwari Sri Ramadasu > No. It is part of branch 0.21 onwards. For 0.20*, people can use old api > only, though JobCo

Re: Problems on configure FairScheduler

2009-12-11 Thread Aaron Kimball
You'll need to configure mapred.fairscheduler.allocation.file to point to your fairscheduler.xml file; this file must contain at least the following: - Aaron On Thu, Dec 10, 2009 at 10:34 PM, Rekha Joshi wrote: > What’s your hadoop version/distribution? In anycase, to eliminate th

Re: Re: Re: Re: Doubt in Hadoop

2009-11-30 Thread Aaron Kimball
You need to send a jar to the cluster so it can run your code there. Hadoop doesn't magically know which jar is the one containing your main class, or that of your mapper/reducer -- so you need to tell it via that call so it knows which jar file to upload. - Aaron On Sun, Nov 29, 2009 at 7:

Re: Processing 10MB files in Hadoop

2009-11-27 Thread Aaron Kimball
f course, by the time you've got several hundred GB of data to work with, your current workload imbalance issues should be moot anyway. - Aaron On Fri, Nov 27, 2009 at 4:33 PM, CubicDesign wrote: > > > Aaron Kimball wrote: > >> (Note: this is a tasktracker setting, not a jo

Re: Processing 10MB files in Hadoop

2009-11-27 Thread Aaron Kimball
ual records require around a minute each to process as you claimed earlier, you're nowhere near in danger of hitting that particular performance bottleneck. - Aaron On Thu, Nov 26, 2009 at 12:23 PM, CubicDesign wrote: > > > Are the record processing steps bound by a local machine

Re: Good idea to run NameNode and JobTracker on same machine?

2009-11-27 Thread Aaron Kimball
fault.name and mapred.job.tracker; when the day comes that these services are placed on different nodes, you'll then be able to just move one of the hostnames over and not need to reconfigure all 20--40 other nodes. - Aaron On Thu, Nov 26, 2009 at 8:27 PM, Srigurunath Chakravarthi < srig..

Re: part-00000.deflate as output

2009-11-27 Thread Aaron Kimball
ppear in plaintext if a human operator is inspecting the output for debugging. - Aaron On Thu, Nov 26, 2009 at 4:59 PM, Mark Kerzner wrote: > It worked! > > But why is it "for testing?" I only have one job, so I need by related as > text, can I use this fix all the time? &

Re: Re: Doubt in Hadoop

2009-11-27 Thread Aaron Kimball
When you set up the Job object, do you call job.setJarByClass(Map.class)? That will tell Hadoop which jar file to ship with the job and to use for classloading in your code. - Aaron On Thu, Nov 26, 2009 at 11:56 PM, wrote: > Hi, > I am running the job from command line. The job runs f

Re: RE: please help in setting hadoop

2009-11-27 Thread Aaron Kimball
(probably either 'root' or 'hadoop') will need the ability to mkdir /home/hadoop/hadoop-root underneath of /home/hadoop. If that directory doesn't exist, or is chown'd to someone else, this will probably be the result. - Aaron On Thu, Nov 26, 2009 at 10:22

Re: error setting up hdfs?

2009-11-10 Thread Aaron Kimball
#x27; --> directory not found. When you mkdir'd 'lol', you were actually effectively doing "mkdir -p /user/hadoop/lol", so then it created your home directory underneath of that. - Aaron On Tue, Nov 10, 2009 at 1:30 PM, zenkalia wrote: > ok, things are working.. i mu

  1   2   >