date:20120229

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question

Thank you for your time and suggestions, I've already tried starfish, but not jmap. I'll check it out. Thanks again, Mark On Wed, Feb 29, 2012 at 1:17 PM, Charles Earl wrote: > I assume you have also just tried running locally and using the jdk > performance tools (e.g. jmap) to gain insight by c

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

2012-02-29 Thread Merto Mertek

Varun sorry for my late response. Today I have deployed a new version and I can confirm that patches you provided works well. I' ve been running some jobs on a 5node cluster for an hour without a core on full load so now thinks works as expected. Thank you again! I have used just your first optio

Re: Invocation exception

2012-02-29 Thread Harsh J

Mohit, I'm positive the real exception lies a few scrolls below that message on the attempt page. Possibly a class not found issue. The message you see on top is when something throws up an exception while being configure()-ed. It is most likely a job config or setup-time issue from your code or

Re: Invocation exception

2012-02-29 Thread Mohit Anchlia

Thanks for the example. I did look at the logs and also at the admin page and all I see is the exception that I posted initially. I am not sure why adding an extra jar to the classpath in DistributedCache causes that exception. I tried to look at Configuration code in hadoop.util package but it do

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia

I think I've found the problem. There was one line of code that caused this issue :) that was output.collect(key, value); I had to add more logging to the code to get to it. For some reason kill -QUIT didn't send the stacktrace to the userLogs///syslog , I searched all the logs and couldn't find

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl

I assume you have also just tried running locally and using the jdk performance tools (e.g. jmap) to gain insight by configuring hadoop to run absolute minimum number of tasks? Perhaps the discussion http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-t

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Niels Basjes

Hi, On Wed, Feb 29, 2012 at 19:13, Robert Evans wrote: > What I really want to know is how well does this new CompressionCodec > perform in comparison to the regular gzip codec in various different conditions and what type of impact does it have on > network traffic and datanode load. My gut

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia

I can't seem to find what's causing this slowness. Nothing in the logs. It's just painfuly slow. However, pig job is awesome in performance that has the same logic. Here is the mapper code and the pig code: *public* *static* *class* Map *extends* MapReduceBase *implements* Mapper { *public* *vo

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl

The documentation on Starfish http://www.cs.duke.edu/starfish/index.html looks promising , I have not used it. I wonder if others on the list have found it more useful than setting mapred.task.profile. C On Feb 29, 2012, at 3:53 PM, Mark question wrote: > I've used hadoop profiling (.prof) to sho

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question

I've used hadoop profiling (.prof) to show the stack trace but it was hard to follow. jConsole locally since I couldn't find a way to set a port number to child processes when running them remotely. Linux commands (top,/proc), showed me that the virtual memory is almost twice as my physical which m

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl

Mark, So if I understand, it is more the memory management that you are interested in, rather than a need to run an existing C or C++ application in MapReduce platform? Have you done profiling of the application? C On Feb 29, 2012, at 2:19 PM, Mark question wrote: > Thanks Charles .. I'm running

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question

Thanks Charles .. I'm running Hadoop for research to perform duplicate detection methods. To go deeper, I need to understand what's slowing my program, which usually starts with analyzing memory to predict best input size for map task. So you're saying piping can help me control memory even though

Re: Streaming Hadoop using C

2012-02-29 Thread Charles Earl

Mark, Both streaming and pipes allow this, perhaps more so pipes at the level of the mapreduce task. Can you provide more details on the application? On Feb 29, 2012, at 1:56 PM, Mark question wrote: > Hi guys, thought I should ask this before I use it ... will using C over > Hadoop give me the u

Streaming Hadoop using C

2012-02-29 Thread Mark question

Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the usual C memory management? For example, malloc() , sizeof() ? My guess is no since this all will eventually be turned into bytecode, but I need more control on memory which obviously is hard for me to do wit

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Robert Evans

If many people are going to use it then by all means put it in. If there is only one person, or a very small handful of people that are going to use it then I personally would prefer to see it a separate project. However, Edward, you have convinced me that I am trying to make a logical judgmen

Re: "Browse the filesystem" weblink broken after upgrade to 1.0.0: HTTP 404 "Problem accessing /browseDirectory.jsp"

2012-02-29 Thread W.P. McNeill

I can do perform HDFS operations from the command line like "hadoop fs -ls /". Doesn't that meant that the datanode is up?

Re: 100x slower mapreduce compared to pig

2012-02-29 Thread Mohit Anchlia

I am going to try few things today. I have a JAXBContext object that marshals the xml, this is static instance but my guess at this point is that since this is in separate jar then the one where job runs and I used DistributeCache.addClassPath this context is being created on every call for some re

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Edward Capriolo

Too bad we can not up the replication on the first few blocks of the file or distributed cache it. The crontrib statement is arguable. I could make a case that the majority of stuff should not be in hadoop-core. NLineInputFormat for example, nice to have. Took a long time to get ported to the new

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Robert Evans

I can see a use for it, but I have two concerns about it. My biggest concern is maintainability. We have had lots of things get thrown into contrib in the past, very few people use them, and inevitably they start to suffer from bit rot. I am not saying that it will happen with this, but if yo

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Niels Basjes

Hi, On Wed, Feb 29, 2012 at 16:52, Edward Capriolo wrote: ... > But being able to generate split info for them and processing them > would be good as well. I remember that was a hot thing to do with lzo > back in the day. The pain of once overing the gz files to generate the > split info is detra

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Niels Basjes

Hi, On Wed, Feb 29, 2012 at 13:10, Michel Segel wrote: > Let's play devil's advocate for a second? > I always like that :) > Why? Because then datafiles from other systems (like the Apache HTTP webserver) can be processed without preprocessing more efficiently. Snappy exists. > Compared to

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Edward Capriolo

Mike, Snappy is cool and all, but I was not overly impressed with it. GZ zipps much better then Snappy. Last time I checked for our log file gzip took them down from 100MB-> 40MB, while snappy compressed them from 100MB->55MB. That was only with sequence files. But still that is pretty significan

Re: TaskTracker without datanode

2012-02-29 Thread Harsh J

Yes this is fine to do. TTs are not dependent on co-located DNs, but only benefit if they are. On Wed, Feb 29, 2012 at 8:14 PM, Daniel Baptista wrote: > Forgot to mention that I am using Hadoop 0.20.2 > > From: Daniel Baptista > Sent: 29 February 2012 14:44 > To: common-user@hadoop.apache.org > S

RE: TaskTracker without datanode

2012-02-29 Thread Daniel Baptista

Forgot to mention that I am using Hadoop 0.20.2 From: Daniel Baptista Sent: 29 February 2012 14:44 To: common-user@hadoop.apache.org Subject: TaskTracker without datanode Hi All, I was wondering (network traffic considerations aside) is it possible to run a TaskTracker without a DataNode. I was

TaskTracker without datanode

2012-02-29 Thread Daniel Baptista

Hi All, I was wondering (network traffic considerations aside) is it possible to run a TaskTracker without a DataNode. I was hoping to test this method as a means of scaling processing power temporarily. Are there better approaches, I don't (currently) need the additional storage that a DataNo

Hadoop fair scheduler doubt: allocate jobs to pool

2012-02-29 Thread Austin Chungath

How can I set the fair scheduler such that all jobs submitted from a particular user group go to a pool with the group name? I have setup fair scheduler and I have two users: A and B (belonging to the user group hadoop) When these users submit hadoop jobs, the jobs from A got to a pool named A an

Re: Should splittable Gzip be a "core" hadoop feature?

2012-02-29 Thread Michel Segel

Let's play devil's advocate for a second? Why? Snappy exists. The only advantage is that you don't have to convert from gzip to snappy and can process gzip files natively. Next question is how large are the gzip files in the first place? I don't disagree, I just want to have a solid argument in

FW: NullPointer during startup & debugging DN

2012-02-29 Thread Evert Lammerts

Cross-posting with common-user, since there's only little activity on hdfs-user these last days. Evert > Hi list, > > I'm having trouble starting up a DN (0.20.2) with Kerberos > authentication and SSL enabled - I'm getting a NullPointerException > during startup and the daemon exists. It's a b

Re: Streaming Hadoop using C

Re: Does Hadoop 0.20.205 and Ganglia 3.1.7 compatible with each other ?

Re: Invocation exception

Re: Invocation exception

Re: 100x slower mapreduce compared to pig

Re: Streaming Hadoop using C

Re: Should splittable Gzip be a "core" hadoop feature?

Re: 100x slower mapreduce compared to pig

Re: Streaming Hadoop using C

Re: Streaming Hadoop using C

Re: Streaming Hadoop using C

Re: Streaming Hadoop using C

Re: Streaming Hadoop using C

Streaming Hadoop using C

Re: Should splittable Gzip be a "core" hadoop feature?

Re: "Browse the filesystem" weblink broken after upgrade to 1.0.0: HTTP 404 "Problem accessing /browseDirectory.jsp"

Re: 100x slower mapreduce compared to pig

Re: Should splittable Gzip be a "core" hadoop feature?

Re: Should splittable Gzip be a "core" hadoop feature?

Re: Should splittable Gzip be a "core" hadoop feature?

Re: Should splittable Gzip be a "core" hadoop feature?

Re: Should splittable Gzip be a "core" hadoop feature?

Re: TaskTracker without datanode

RE: TaskTracker without datanode

TaskTracker without datanode

Hadoop fair scheduler doubt: allocate jobs to pool

Re: Should splittable Gzip be a "core" hadoop feature?

FW: NullPointer during startup & debugging DN

28 matches

Site Navigation

Mail list logo

Footer information