Re: Cygwin not working with Hadoop and Eclipse Plugin

2011-07-26 Thread James Seigel
Try using virtual box/vmware and downloading either an image that has hadoop on it or a linux image and installing it there. Good luck James. On 2011-07-26, at 12:33 PM, A Df wrote: Dear All: I am trying to run Hadoop on Windows 7 so as to test programs before moving to Unix/Linux. I

Re: Writing out a single file

2011-07-05 Thread James Seigel
Single reducer. On 2011-07-05, at 9:09 AM, Mark wrote: Is there anyway I can write out the results of my mapreduce job into 1 local file... ie the opposite of getmerge? Thanks

Re: Sanity check re: value of 10GbE NICs for Hadoop?

2011-06-28 Thread James Seigel
If you are very adhoc-y, more bandwidth the merry-er! James Sent from my mobile. Please excuse the typos. On 2011-06-28, at 5:03 PM, Matei Zaharia ma...@eecs.berkeley.edu wrote: Ideally, to evaluate whether you want to go for 10GbE NICs, you would profile your target Hadoop workload and

Re: Poor IO performance on a 10 node cluster.

2011-05-30 Thread James Seigel
Not sure that will help ;) Sent from my mobile. Please excuse the typos. On 2011-05-30, at 9:23 AM, Boris Aleksandrovsky balek...@gmail.com wrote:

Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel
have more data for it to process :) On 2011-05-26, at 4:30 PM, Mohit Anchlia wrote: I ran a simple pig script on this file: -rw-r--r-- 1 root root 208348 May 26 13:43 excite-small.log that orders the contents by name. But it only created one mapper. How can I change this to distribute

Re: No. of Map and reduce tasks

2011-05-26 Thread James Seigel
mohitanch...@gmail.com wrote: I think I understand that by last 2 replies :) But my question is can I change this configuration to say split file into 250K so that multiple mappers can be invoked? On Thu, May 26, 2011 at 3:41 PM, James Seigel ja...@tynt.com wrote: have more data for it to process

Re: Why Only 1 Reducer is running ??

2011-05-20 Thread James Seigel Tynt
The job could be designed to use one reducer On 2011-05-20, at 7:19 AM, praveenesh kumar praveen...@gmail.com wrote: Hello everyone, I am using wordcount application to test on my hadoop cluster of 5 nodes. The file size is around 5 GB. Its taking around 2 min - 40 sec for execution. But

Re: What's the easiest way to count the number of Key, Value pairs in a directory?

2011-05-20 Thread James Seigel
The cheapest way would be to check the counters as you write them in the first place and keep a running score. :) Sent from my mobile. Please excuse the typos. On 2011-05-20, at 10:35 AM, W.P. McNeill bill...@gmail.com wrote: I've got a directory with a bunch of MapReduce data in it. I want

Re: Reducer granularity and starvation

2011-05-18 Thread James Seigel
W.P., Hard to help out without knowing more about the characteristics of your data? How many keys are you expecting? How many values per key? Cheers James. On 2011-05-18, at 3:25 PM, W.P. McNeill wrote: I'm working on a cluster with 360 reducer slots. I've got a big job, so when I launch

Re: Reducer granularity and starvation

2011-05-18 Thread James Seigel
W.P, Sounds like you are going to be taking a long time no matter what. With a keyspace of about 10^7 that means that either hadoop is going to eventually allocate 10^7 reducers (if you set you reducer count to 10^7) or is going to re-use the ones you have 10^6 / (number of reducers you

Re: Reducer granularity and starvation

2011-05-18 Thread James Seigel
W.P, Upping the reduce.tasks to a huge number just means that it will eventually spawn reducers = to (that huge number). You still only have slots for 360 so there is no real advantage, UNLESS you are running into OOM errors, which we’ve seen with higher re-use on the smaller number of

Re: so many failures on reducers.

2011-05-02 Thread James Seigel
:37 hadoop-juneng.2 To Harsh. yes, out cluster has 96 occupied reducer slots. and my job is using 90 reduce tasks at one time to complete it. thanks for all. Junyoung Kim (juneng...@gmail.com) On 05/02/2011 08:32 PM, James Seigel wrote: What are your permissions on your

Re: number of maps it lower than the cluster capacity

2011-05-01 Thread James Seigel
Also an input split size thing in there as well. But definitely # of mappers are proportional to input data size Sent from my mobile. Please excuse the typos. On 2011-05-01, at 11:26 AM, ShengChang Gu gushengch...@gmail.com wrote: If I'm not mistaken,then the map slots = input data size /

Re: fair scheduler issue

2011-04-26 Thread James Seigel
I know cloudera has a bug in their version. They should have filed a Jira for it. Are you getting NPE in the logs? James Sent from my mobile. Please excuse the typos. On 2011-04-26, at 6:53 AM, Saurabh bhutyani s4saur...@gmail.com wrote: Which version of hadoop are you referring to? Thanks

Re: Fixing a bad HD

2011-04-25 Thread James Seigel
Quicker: Shut off power Throw hard drive out put new one in Turn power back on. Sent from my mobile. Please excuse the typos. On 2011-04-25, at 5:38 PM, Mayuran Yogarajah mayuran.yogara...@casalemedia.com wrote: Hello, One of our nodes has a bad hard disk which needs to be replaced. I'm

Re: Fixing a bad HD

2011-04-25 Thread James Seigel
become inaccessible between boots if you simultaneously lose another node. Probably not an issue at 3 replicas, but definitely an issue at 2. Brian On Apr 25, 2011, at 7:58 PM, James Seigel wrote: Quicker: Shut off power Throw hard drive out put new one in Turn power back on. Sent

Re: HDFS permission denied

2011-04-24 Thread James Seigel
Check where the hadoop tmp setting is pointing to. James Sent from my mobile. Please excuse the typos. On 2011-04-24, at 12:41 AM, Peng, Wei wei.p...@xerox.com wrote: Hi, I need a help very bad. I got an HDFS permission error by starting to run hadoop job

Re: Error while using distcp

2011-04-18 Thread James Seigel
Same versions of hadoop in each cluster? Sent from my mobile. Please excuse the typos. On 2011-04-18, at 6:31 PM, sonia gehlot sonia.geh...@gmail.com wrote: Hi All, I am trying to copy files from one hadoop cluster to another hadoop cluster but I am getting following error:

Re: Estimating Time required to compute M/Rjob

2011-04-17 Thread James Seigel Tynt
Yup. I'm boring On 2011-04-17, at 6:07 PM, Ted Dunning tdunn...@maprtech.com wrote: Turing completion isn't the central question here, really. The truth is, map-reduce programs have considerably pressure to be written in a scalable fashion which limits them to fairly simple behaviors that

Re: Dynamic Data Sets

2011-04-14 Thread James Seigel Tynt
If all the seigel/seigal/segel gang don't chime in It'd be weird. What size of data are we talking? James On 2011-04-14, at 11:06 AM, Michael Segel michael_se...@hotmail.com wrote: James, If I understand you get a set of immutable attributes, then a state which can change. If

Re: Including Additional Jars

2011-04-04 Thread James Seigel
James’ quick and dirty, get your job running guideline: -libjars -- for jars you want accessible by the mappers and reducers classpath or bundled in the main jar -- for jars you want accessible to the runner Cheers James. On 2011-04-04, at 12:31 PM, Shuja Rehman wrote: well...i think to

Hadoop in Canada

2011-03-29 Thread James Seigel
. It might be harder to grab coffee, but it would be fun to see where everyone is. Shout out if you’d like or ping me, I think it’d be fun to chat! Cheers James Seigel Captain Hammer at Tynt.com

Re: Hadoop in Canada

2011-03-29 Thread James Seigel
that I think about it, there's probably enough Canucks around here that use Hadoop that we could have our own little user group. If you want to have a nice vacation and geek out with us, feel free to stop by and say hi. /rant J-D On Tue, Mar 29, 2011 at 6:21 AM, James Seigel ja

Re: CDH and Hadoop

2011-03-23 Thread James Seigel
If you are using one of the supported platforms, then it is easy to get up and going fairly quickly as well. ...advice from another seigel/segel Cheers james. On 2011-03-23, at 9:32 AM, Michael Segel wrote: Rita, It sounds like you're only using Hadoop and have no intentions to really

Re: Backupnode web UI showing upgrade status..

2011-03-22 Thread James Seigel
Ther is a step which flips a bit. finalizeupgrade or something that needs to be run. Should be straight forward Cheers James Sent from my mobile. Please excuse the typos. On 2011-03-22, at 7:32 AM, Gokulakannan M gok...@huawei.com wrote: Hi all, A newbie question reg backupnode . I just

Re: Is there any way to add jar when invoking hadoop command

2011-03-22 Thread James Seigel
Hello, some quick advice for you which portion of your job needs the jar? if answer = mapper or reducer, then add it to the -libjars flag. If it is in the job initiation..bundle it in your job jar for fun. Cheers James. On 2011-03-22, at 7:35 PM, Jeff Zhang wrote: Another work around I

Re: Is there any way to add jar when invoking hadoop command

2011-03-22 Thread James Seigel
sequence file. On Wed, Mar 23, 2011 at 9:41 AM, James Seigel ja...@tynt.com wrote: Hello, some quick advice for you which portion of your job needs the jar? if answer = mapper or reducer, then add it to the -libjars flag. If it is in the job initiation..bundle it in your job

Re: decommissioning node woes

2011-03-18 Thread James Seigel
Just a note. If you just shut the node off, the blocks will replicate faster. James. On 2011-03-18, at 10:03 AM, Ted Dunning wrote: If nobody else more qualified is willing to jump in, I can at least provide some pointers. What you describe is a bit surprising. I have zero experience

Re: decommissioning node woes

2011-03-18 Thread James Seigel
I agree. J On 2011-03-18, at 11:34 AM, Ted Dunning wrote: I like to keep that rather high. If I am decommissioning nodes, I generally want them out of the cluster NOW. That is probably a personality defect on my part. On Fri, Mar 18, 2011 at 9:59 AM, Michael Segel

Re: YYC/Calgary/Alberta Hadoop Users?

2011-03-16 Thread James Seigel
Hello again. I am guessing with the lack of response that there are either no hadoop people from Calgary, or they are afraid to meetup :) How about just speaking up if you use hadoop in Calgary :) Cheers James. \ On 2011-03-07, at 8:40 PM, James Seigel wrote: Hello, Just wondering

Re: Cloudera Flume

2011-03-16 Thread James Seigel
I believe sir there should be a flume support group on cloudera. I'm guessing most of us here haven't used it and therefore aren't much help. This is vanilla hadoop land. :) Cheers and good luck! James On a side note, how much data are you pumping through it? Sent from my mobile. Please

Re: Question regardin the block size and the way that a block is used in Hadoop

2011-03-12 Thread James Seigel
Yes. Just your FAT is consumed. Sent from my mobile. Please excuse the typos. On 2011-03-12, at 11:04 AM, Florin Picioroaga florinp...@yahoo.com wrote: Hello! I've been reading in the Hadoop Definitive guide by Tom White about the block emptiness when a file is not large enough to occupy

Re: Hadoop EC2 setup

2011-03-12 Thread James Seigel
Do you have the amazon tools installed and in the appropriate path? James Sent from my mobile. Please excuse the typos. On 2011-03-12, at 11:04 AM, JJ siung latenight...@gmail.com wrote: Hi, I am following a setup guide here: http://wiki.apache.org/hadoop/AmazonEC2 but runs into problems

Re: Setting up hadoop on a cluster

2011-03-10 Thread James Seigel
How many nodes? Sent from my mobile. Please excuse the typos. On 2011-03-10, at 7:05 AM, Lai Will l...@student.ethz.ch wrote: Hello, Currently I've been playing around with my single node cluster. I'm planning to test my code on a real cluster in the next few weeks. I've read some

Re: Setting up hadoop on a cluster

2011-03-10 Thread James Seigel
Sorry, and where are you hosting the cluster? Cloud? Physical? Garage? Sent from my mobile. Please excuse the typos. On 2011-03-10, at 7:05 AM, Lai Will l...@student.ethz.ch wrote: Hello, Currently I've been playing around with my single node cluster. I'm planning to test my code on a

Re: How to count rows of output files ?

2011-03-08 Thread James Seigel
Simplest case, if you need a sum of the lines for A,B, and C is to look at the output that is normally generated which tells you Reduce output records. This can be accessed like the others are telling you, as a counter, which you could access and explicitly print out or with your eyes as the

YYC/Calgary/Alberta Hadoop Users?

2011-03-07 Thread James Seigel
Hello, Just wondering if there are any YYC hadoop users in the crowd and if there is any interest in a meetup of any sort? Cheers James Seigel Captain Hammer Tynt

Re: k-means

2011-03-04 Thread James Seigel
Mahout project? Sent from my mobile. Please excuse the typos. On 2011-03-04, at 6:41 AM, MANISH SINGLA coolmanishh...@gmail.com wrote: Hey ppl... I need some serious help...I m not able to run kmeans code in hadoop...does anyone have a running code...that they would have tried... Regards

Re: k-means

2011-03-04 Thread James Seigel
the typos. On 2011-03-04, at 7:37 AM, MANISH SINGLA coolmanishh...@gmail.com wrote: are u suggesting me that??? if yes can u plzzz tell me the steps to use that...because I havent used it yet...a quick reply will really be appreciated... Thanx Manish On Fri, Mar 4, 2011 at 7:39 PM, James

Re: k-means

2011-03-04 Thread James Seigel
doesn't release it until may. Thanks! Mike Nute --Original Message-- From: James Seigel To: common-user@hadoop.apache.org ReplyTo: common-user@hadoop.apache.org Subject: Re: k-means Sent: Mar 4, 2011 9:46 AM I am not near a computer so I won't be able to give you specifics. So instead

Re: TaskTracker not starting on all nodes

2011-03-04 Thread James Seigel
on shared storage, that might be the issue? Btw, how do I start the services independently on each node? -bikash On Sun, Feb 27, 2011 at 11:05 PM, James Seigel ja...@tynt.com wrote: Did you get it working? What was the fix? Sent from my mobile. Please excuse the typos. On 2011-02-27, at 8

Re: Comparison between Gzip and LZO

2011-03-02 Thread James Seigel
slightly not on point for this conversation, but I thought it worth mentioningLZO is splitable, which makes it a good for for hadoopy things. Just something to remember when you do get some final results on performance. Cheers James. On 2011-03-02, at 8:12 PM, Brian Bockelman wrote:

Re: why quick sort when spill map output?

2011-02-28 Thread James Seigel
Sorting out of the map phase is core to how hadoop works. Are you asking why sort at all? or why did someone use quick sort as opposed to _sort? Cheers James On 2011-02-28, at 3:30 AM, elton sky wrote: Hello forumers, Before spill the data in kvbuffer to local disk in map task, k/v

Re: TaskTracker not starting on all nodes

2011-02-27 Thread James Seigel
, the same problem persists, so was bit confused. Also, checked the TaskTracker logs on those nodes, there does not seem to be any error. -bikash On Sat, Feb 26, 2011 at 10:30 AM, James Seigel ja...@tynt.com wrote: Maybe your ssh keys aren’t distributed the same on each machine or the machines

Re: TaskTracker not starting on all nodes

2011-02-26 Thread James Seigel
Maybe your ssh keys aren’t distributed the same on each machine or the machines aren’t configured the same? J On 2011-02-26, at 8:25 AM, bikash sharma wrote: Hi, I have a 10 nodes Hadoop cluster, where I am running some benchmarks for experiments. Surprisingly, when I initialize the

Re: Packaging for Hadoop - what about the Hadoop libraries?

2011-02-25 Thread James Seigel
The ones that are present. It is a little tricky for the other ones however, well not really once you “get it” -libjars list of supporting jars on the commandline will ship the “supporting” jars out with the job to the map reducers, however if you, for some reason need them in the job

Re: Catching mapred exceptions on the client

2011-02-25 Thread James Seigel
Hello, It is hard to give advice without the specific code. However, if you don’t have your job submission set up to wait for completion then it might be launching all your jobs at the same time. Check to see how your jobs are being submitted. Sorry, I can’t be more helpful. James On

Re: Trouble in installing Hbase

2011-02-24 Thread James Seigel
You probably should ask on the cloudera support forums as cloudera has for some reason changed the users that things run under. James Sent from my mobile. Please excuse the typos. On 2011-02-24, at 8:00 AM, JAGANADH G jagana...@gmail.com wrote: Hi All I was trying to install CDH3 Hhase in

Re: Check lzo is working on intermediate data

2011-02-24 Thread James Seigel
Run a standard job before. Look at the summary data. Run the job again after the changes and look at the summary. You should see less file system bytes written from the map stage. Sorry, might be most obvious in shuffle bytes. I don't have a terminal in front of me right now. James Sent from

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread James Seigel
Well the first thing I'd ask to see (if we can) is the code or a description of what your reducer is doing. If it is holding on to objects too long or accumulating lists well then with the right amount of data you will run OOM. Another thought is that you've just not allocated enough mem for the

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread James Seigel
...oh sorry I didn't scroll below the exception the first time. Try part 2 James Sent from my mobile. Please excuse the typos. On 2011-02-16, at 8:00 AM, Kelly Burkhart kelly.burkh...@gmail.com wrote: Hello, I'm seeing frequent fails in reduce jobs with errors similar to this: 2011-02-15

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread James Seigel
the same md5sum. On Wed, Feb 16, 2011 at 10:15 AM, James Seigel ja...@tynt.com wrote: He might not have that conf distributed out to each machine Sent from my mobile. Please excuse the typos. On 2011-02-16, at 9:10 AM, Kelly Burkhart kelly.burkh...@gmail.com wrote: Our clust admin (who's out

Re: Reduce java.lang.OutOfMemoryError

2011-02-16 Thread James Seigel
the cluster. I'm running again; we'll see if it completes this time. -K On Wed, Feb 16, 2011 at 10:30 AM, James Seigel ja...@tynt.com wrote: Hrmmm. Well as you've pointed out. 200m is quite small and is probably the cause. Now thEre might be some overriding settings in something you are using

Re: recommendation on HDDs

2011-02-12 Thread James Seigel
The only thing of concern is that the hdfs stuff doesn't seem to do exceptionally well with different sized disks in practice James Sent from my mobile. Please excuse the typos. On 2011-02-12, at 8:43 AM, Edward Capriolo edlinuxg...@gmail.com wrote: On Fri, Feb 11, 2011 at 7:14 PM, Ted

Re: IdentityReducer is called instead of my own

2011-02-02 Thread James Seigel
Share code from your mapper? Check to see if there are any errors on the job tracker reports that might indicate the inability to find the class. James. On 2011-02-02, at 2:23 PM, Christian Kunz wrote: Without seeing the source code of the reduce method of the InvIndexReduce class my

Re: Reduce progress goes backward?

2011-02-01 Thread James Seigel
It means that the scheduler is killing off some of your reducer jobs or some of them are dieing. Maybe they are taking too long. You should check out your job tracker and look at some of the details and then drill down to see if you are getting any errors in some of your reducers. Cheers

Re: Distributed indexing with Hadoop

2011-01-29 Thread James Seigel
Has anyone tried to do the reuters example with both approaches? I seem to have problems getting them to run. Cheers James. On 2011-01-29, at 9:25 AM, Ted Yu wrote: $MAHOUT_HOME/examples/bin/build-reuters.shFYI On Sat, Jan 29, 2011 at 12:57 AM, Marco Didonna m.didonn...@gmail.comwrote:

Re: Can MapReduce run simultaneous producer/consumer processes?

2011-01-06 Thread James Seigel
Not sure if this would work, or the right approach, but looking into hadoop streaming, ?might? find something? Cheers James. On 2011-01-06, at 3:27 PM, W.P. McNeill wrote: Say I have two MapReduce processes, A and B. The two are algorithmically dissimilar, so they have to be implemented as

Re: When does Reduce job start

2011-01-04 Thread James Seigel
As the other gentleman said. The reduce task kinda needs to know all the data is available before doing its work. By design. Cheers James Sent from my mobile. Please excuse the typos. On 2011-01-04, at 6:14 PM, sagar naik sn...@attributor.com wrote: Hi Jeff, To be clear on my end I m not

Re: Retrying connect to server

2010-12-30 Thread James Seigel
Or 3) The configuration (or lack thereof) on the machine you are trying to run this, has no idea where your DFS or JobTracker is :) Cheers James. On 2010-12-30, at 8:53 PM, Adarsh Sharma wrote: Cavus,M.,Fa. Post Direkt wrote: I process this ./hadoop jar ../../hadoopjar/hd.jar

Re: ClassNotFoundException

2010-12-28 Thread James Seigel
jar -tvf the jar file and double check that it is a class that is listed. Can't be in an included jar file. Sent from my mobile. Please excuse the typos. On 2010-12-28, at 7:58 AM, Cavus,M.,Fa. Post Direkt m.ca...@postdirekt.de wrote: Hi, I process this command: ./hadoop jar

Re: UI doesn't work

2010-12-28 Thread James Seigel
For job tracker go to port 50030 see if that helps James Sent from my mobile. Please excuse the typos. On 2010-12-28, at 1:36 PM, maha m...@umail.ucsb.edu wrote: James said: Is the job tracker running on that machine?YES Is there a firewall in the way? I don't think so, because it

Re: UI doesn't work

2010-12-28 Thread James Seigel
PM, James Seigel wrote: For job tracker go to port 50030 see if that helps James Sent from my mobile. Please excuse the typos. On 2010-12-28, at 1:36 PM, maha m...@umail.ucsb.edu wrote: James said: Is the job tracker running on that machine?YES Is there a firewall in the way? I

Re: help for using mapreduce to run different code?

2010-12-28 Thread James Seigel
Not sure what you mean. Can you write custom code for your map functions?: yes Cheers James Sent from my mobile. Please excuse the typos. On 2010-12-28, at 3:54 PM, Jander g jande...@gmail.com wrote: Hi, all Whether Hadoop supports the map function running different code? If yes, how to

Re: Hadoop/Elastic MR on AWS

2010-12-27 Thread James Seigel
Thank you for sharing. Sent from my mobile. Please excuse the typos. On 2010-12-27, at 11:18 AM, Sudhir Vallamkondu sudhir.vallamko...@icrossing.com wrote: We recently crossed this bridge and here are some insights. We did an extensive study comparing costs and benchmarking local vs EMR for

Re: UI doesn't work

2010-12-27 Thread James Seigel
Two quick questions first. Is the job tracker running on that machine? Is there a firewall in the way? James Sent from my mobile. Please excuse the typos. On 2010-12-27, at 4:46 PM, maha m...@umail.ucsb.edu wrote: Hi, I get Error 404 when I try to use hadoop UI to monitor my job

Re: Friends of friends with MapReduce

2010-12-19 Thread James Seigel
I may be wrong but it seems that you are approaching this problem like you would in the normal programming world and rightly so. Depending on the set of the data ( triangle square or other ), you could do some simple things like spit out the right side of your pair as the key and the left side as

Re: Please help with hadoop configuration parameter set and get

2010-12-17 Thread James Seigel
If you're wondering where to get a counter from, I'll point you to context Sent from my mobile. Please excuse the typos. On 2010-12-17, at 7:39 AM, Ted Yu yuzhih...@gmail.com wrote: You can use hadoop counter to pass this information. This way, you see the counters in job report. On Thu,

Re: Please help with hadoop configuration parameter set and get

2010-12-17 Thread James Seigel
Exactly. It is hard to assume anything about order or coordination between maps or reducers. You should try and design with “sloppy” coordination strategies. James. On 2010-12-17, at 10:32 AM, Ted Dunning wrote: Statics won't work the way you might think because different mappers and

Re: Hadoop Certification Progamme

2010-12-15 Thread James Seigel
But it would give you the right creds for people that you’d want to work for :) James On 2010-12-15, at 10:26 AM, Konstantin Boudnik wrote: Hey, commit rights won't give you a nice looking certificate, would it? ;) On Wed, Dec 15, 2010 at 09:12, Steve Loughran ste...@apache.org wrote: On

Re: Hadoop File system performance counters

2010-12-15 Thread James Seigel
They represent the amount data written to the physical disk on the slaves, as intermediate files before or during the shuffle phase. Where HDFS bytes are the files written back into hdfs containing the data you wish to see. J On 2010-12-15, at 10:37 AM, abhishek sharma wrote: Hi, What do

Re: Two questions.

2010-11-03 Thread James Seigel
Option 1 = good Sent from my mobile. Please excuse the typos. On 2010-11-03, at 8:27 PM, shangan shan...@corp.kaixin001.com wrote: I don't think the first two options can work, even you stop the tasktracker these to-be-retired nodes are still connected to the namenode. Option 3 can work.

Re: Urgent Need: Sr. Developer - Hadoop Hive | Cupertino, CA

2010-10-29 Thread James Seigel
Seems a little light ;) J On 2010-10-29, at 3:45 PM, Pablo Cingolani wrote: Seriously? 10 of experience in Hive, Hadoop and MongoDb? :-) On Fri, Oct 29, 2010 at 5:38 PM, Ram Prakash ram.prak...@e-solutionsinc.com wrote: Job Title: Sr. Developer - Hadoop Hive Location: Cupertino, CA

Re: BUG: Anyone use block size more than 2GB before?

2010-10-18 Thread James Seigel
If there is a hard requirement for input split being one block you could just make your input split fit a smaller block size. Just saying, in case you can't overcome the 2G ceiling J Sent from my mobile. Please excuse the typos. On 2010-10-18, at 5:08 PM, elton sky eltonsky9...@gmail.com

Re: TOP N items

2010-09-10 Thread James Seigel
Welcome to the land of the fuzzy elephant! Of course there are many ways to do it. Here is one, it might not be brilliant or the right was, but I am sure you will get more :) Use the identity mapper... job.setMapperClass(Mapper.class); then have one reducer

Re: Custom Key class not working correctly

2010-09-10 Thread James Seigel
Is the footer on this email a little rough for content that will be passed around and made indexable on the internets? Just saying :) Cheers James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 8:01 PM, Kaluskar, Sanjay skalus...@informatica.com wrote: Have you considered

Re: Question on classpath

2010-09-10 Thread James Seigel
Are the libs exploded inside the main jar? If not then no it probably won't work. James Sent from my mobile. Please excuse the typos. On 2010-09-10, at 7:43 PM, Mark static.void@gmail.com wrote: If I deploy 1 jar (that contains a lib directory with all the required dependencies)

Re: Sorting Numbers using mapreduce

2010-09-06 Thread James Seigel
There is a call to seethe sort order as well, by changing the comparator. James Sent from my mobile. Please excuse the typos. On 2010-09-06, at 12:06 AM, Owen O'Malley omal...@apache.org wrote: The critical item is that your map's output key should be IntWritable instead of Text. The

Re: From X to Hadoop MapReduce

2010-09-01 Thread James Seigel
added to the examples. On Thu, Jul 22, 2010 at 4:59 AM, James Seigel ja...@tynt.com wrote: Oh yeah, it would help if I put the url: http://github.com/seigel/MRPatterns James On 2010-07-21, at 2:55 PM, James Seigel wrote: Here is a skeleton project I stuffed up on github (feel free

Re: Basic question

2010-08-25 Thread James Seigel
The output of the reducer is Text/IntWritable. To set the input to the reducer you set the mapper output classes. Cheers James Sent from my mobile. Please excuse the typos. On 2010-08-25, at 8:13 PM, Mark static.void@gmail.com wrote: job.setOutputKeyClass(Text.class);

Re: Starting a job on a hadoop cluster remotly

2010-07-28 Thread James Seigel
Not sure exactly your goals, but look into SOCKS proxy stuff as well. You can have the hadoop command binary running locally and talking over a socks proxy to the actual cluster, without having to have the machines exposed all over the place. Cheers James. On 2010-07-28, at 10:42 AM,

Re: From X to Hadoop MapReduce

2010-07-21 Thread James Seigel
Jeff, I agree that cascading looks cool and might/should have a place in everyone’s tool box, however at some corps it takes a while to get those kinds of changes in place and therefore they might have to hand craft some java code before moving (if they ever can) to a different technology. I

From X to Hadoop MapReduce

2010-07-20 Thread James Seigel
Hello! Here is what I have been thinking over the last while. There are probably a number of us that have prototyped stuff in pig or hive and now think that we should convert it to some java map reduce code. Would anyone be interested in building a “patterns” area somewhere possibly with

Re: Newbie to HDFS compression

2010-06-24 Thread James Seigel
Cool. Maybe we should start a page. J On 2010-06-24, at 8:16 PM, Harsh J wrote: On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III raymondj...@yahoo.com wrote: Oh, maybe that's what I meant :-) I recall reading something on this mail group that the compression in not included with the

Re: Newbie to HDFS compression

2010-06-24 Thread James Seigel
...this is a no brainer. Stick with it...it is complicated ( a bit ) to install Cheers J On 2010-06-24, at 8:45 PM, James Seigel wrote: Cool. Maybe we should start a page. J On 2010-06-24, at 8:16 PM, Harsh J wrote: On Fri, Jun 25, 2010 at 2:42 AM, Raymond Jennings III raymondj...@yahoo.com wrote

Re: Why hadoop-u...@lucene.a.o ?

2010-06-18 Thread James Seigel
Great segway! Is there a definitive guide to indexing with lucene and hadoop and serving up the results some how distributed like! Thanks gang James Sent from my mobile. Please excuse the typos. On 2010-06-18, at 3:44 AM, Steve Loughran ste...@apache.org wrote: Otis Gospodnetic wrote:

Re: Hadoop JobTracker Hanging

2010-06-17 Thread James Seigel
Up the memory from the default to about 4x the default (heap setting). This should make it better I’d think! We’d been having the same issue...I believe this fixed it. James On 2010-06-17, at 3:00 PM, Li, Tan wrote: Folks, I need some help on job tracker. I am running a two hadoop

Re: the same key in different reducers

2010-06-09 Thread James Seigel
Oleg, Are you wanting to have them in different reducers? If so then you can write a Comparable object to make that happen. If you want them to be on the same reducer, then that is what hadoop will do. :) On 2010-06-09, at 3:06 PM, Ted Yu wrote: Can you disclose more about how K3 is

Re: copying file into hdfs

2010-04-10 Thread James Seigel
Maybe copy your hdfs config here and we can see why it took up 16 gigs of space. Cheers Sent from my mobile. Please excuse the typos. On 2010-04-10, at 3:22 PM, Michael Segel michael_se...@hotmail.com wrote: Mike, First, you need to see what you set your block size to in Hadoop. By

Distributed Clusters

2010-04-07 Thread James Seigel
I am new to this group, and relatively new to hadoop. I am looking at building a large cluster. I was wondering if anyone has any best practices for a cluster in the hundreds of nodes? As well, has anyone had experience with a cluster spanning multiple data centers. Is this a bad practice?