Re: different input/output formats

2012-05-29 Thread Mark question
the status after the change On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com wrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations

Re: different input/output formats

2012-05-29 Thread Mark question
. I am not getting any error. On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.comwrote: Hi guys, this is a very simple program, trying to use TextInputFormat and SequenceFileoutputFormat. Should be easy but I get the same error. Here is my configurations

Re: How to add debugging to map- red code

2012-04-20 Thread Mark question
I'm interested in this too, but could you tell me where to apply the patch and is the following the right command to write it: https://issues.apache.org/jira/secure/attachment/12416955/MAPREDUCE-336_0_20090818.patchpatch

Has anyone installed HCE and built it successfully?

2012-04-18 Thread Mark question
Hey guys, I've been stuck with HCE installation for two days now and can't figure out the problem. Errors I get from running (sh build.sh) is can not execute binary file . I tried setting my JAVA_HOME and ANT_HOME manually and using the script build.sh, no luck. So, please if you've used HCE

Re: Hadoop streaming or pipes ..

2012-04-07 Thread Mark question
. --Bobby Evans On 4/5/12 1:54 PM, Mark question markq2...@gmail.com wrote: Hi guys, quick question: Are there any performance gains from hadoop streaming or pipes over Java? From what I've read, it's only to ease testing by using your favorite language. So I guess

Hadoop pipes and streaming ..

2012-04-05 Thread Mark question
Hi guys, Two quick questions: 1. Are there any performance gains from hadoop streaming or pipes ? As far as I read, it is to ease testing using your favorite language. Which I think implies that everything is translated to bytecode eventually and executed.

Hadoop streaming or pipes ..

2012-04-05 Thread Mark question
Hi guys, quick question: Are there any performance gains from hadoop streaming or pipes over Java? From what I've read, it's only to ease testing by using your favorite language. So I guess it is eventually translated to bytecode then executed. Is that true? Thank you, Mark

Re: Hadoop streaming or pipes ..

2012-04-05 Thread Mark question
:54 PM, Mark question markq2...@gmail.com wrote: Hi guys, quick question: Are there any performance gains from hadoop streaming or pipes over Java? From what I've read, it's only to ease testing by using your favorite language. So I guess it is eventually translated to bytecode then executed

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
a default constructor. On Sat, Mar 3, 2012 at 4:56 AM, Mark question markq2...@gmail.com wrote: Hello, I'm trying to debug my code through eclipse, which worked fine with given Hadoop applications (eg. wordcount), but as soon as I run it on my application with my custom sequence input

Re: Custom Seq File Loader: ClassNotFoundException

2012-03-05 Thread Mark question
Unfortunately, public didn't change my error ... Any other ideas? Has anyone ran Hadoop on eclipse with custom sequence inputs ? Thank you, Mark On Mon, Mar 5, 2012 at 9:58 AM, Mark question markq2...@gmail.com wrote: Hi Madhu, it has the following line: TermDocFreqArrayWritable

Re: Streaming Hadoop using C

2012-03-01 Thread Mark question
Starfish worked great for wordcount .. I didn't run it on my application because I have only map tasks. Mark On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl charles.ce...@gmail.comwrote: How was your experience of starfish? C On Mar 1, 2012, at 12:35 AM, Mark question wrote: Thank you

Streaming Hadoop using C

2012-02-29 Thread Mark question
Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the usual C memory management? For example, malloc() , sizeof() ? My guess is no since this all will eventually be turned into bytecode, but I need more control on memory which obviously is hard for me to do

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
:56 PM, Mark question wrote: Hi guys, thought I should ask this before I use it ... will using C over Hadoop give me the usual C memory management? For example, malloc() , sizeof() ? My guess is no since this all will eventually be turned into bytecode, but I need more control on memory

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
...@gmail.comwrote: Mark, So if I understand, it is more the memory management that you are interested in, rather than a need to run an existing C or C++ application in MapReduce platform? Have you done profiling of the application? C On Feb 29, 2012, at 2:19 PM, Mark question wrote

Re: Streaming Hadoop using C

2012-02-29 Thread Mark question
) to gain insight by configuring hadoop to run absolute minimum number of tasks? Perhaps the discussion http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task might be relevant? On Feb 29, 2012, at 3:53 PM, Mark question wrote: I've used hadoop

Re: memory of mappers and reducers

2012-02-16 Thread Mark question
of mapred.child.ulimit value of mapred.child.java.opts On Thu, Feb 16, 2012 at 12:38 AM, Mark question markq2...@gmail.com wrote: Thanks for the reply Srinivas, so option 2 will be enough, however, when I tried setting it to 512MB, I see through the system monitor that the map task is given 275MB

memory of mappers and reducers

2012-02-15 Thread Mark question
Hi, My question is what's the difference between the following two settings: 1. mapred.task.default.maxvmem 2. mapred.child.java.opts The first one is used by the TT to monitor the memory usage of tasks, while the second one is the maximum heap space assigned for each task. I want to limit

Re: memory of mappers and reducers

2012-02-15 Thread Mark question
, Mark question markq2...@gmail.com wrote: Hi, My question is what's the difference between the following two settings: 1. mapred.task.default.maxvmem 2. mapred.child.java.opts The first one is used by the TT to monitor the memory usage of tasks, while the second one

Namenode no lease exception ... what does it mean?

2012-02-09 Thread Mark question
Hi guys, Even though there is enough space on HDFS as shown by -report ... I get the following 2 error shown first in the log of a datanode and the second on Namenode log: 1)2012-02-09 10:18:37,519 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.addToInvalidates:

Re: Too many open files Error

2012-01-27 Thread Mark question
to run 1 million block transfer (in/out) threads by doing that. It does not take up resources by default sure, but now it can be abused with requests to make your DN run out of memory and crash cause its not bound to proper limits now. On Fri, Jan 27, 2012 at 5:49 AM, Mark question markq2

Re: Too many open files Error

2012-01-26 Thread Mark question
-a? That should give you the number of open files allowed by a single user... Sent from a remote device. Please excuse any typos... Mike Segel On Jan 26, 2012, at 5:13 AM, Mark question markq2...@gmail.com wrote: Hi guys, I get this error from a job trying to process 3Million records

Re: connection between slaves and master

2012-01-11 Thread Mark question
are on different machines. Praveen On Mon, Jan 9, 2012 at 11:41 PM, Mark question markq2...@gmail.com wrote: Hello guys, I'm requesting from a PBS scheduler a number of machines to run Hadoop and even though all hadoop daemons start normally on the master and slaves, the slaves don't have worker

connection between slaves and master

2012-01-09 Thread Mark question
Hello guys, I'm requesting from a PBS scheduler a number of machines to run Hadoop and even though all hadoop daemons start normally on the master and slaves, the slaves don't have worker tasks in them. Digging into that, there seems to be some blocking between nodes (?) don't know how to

Re: Expected file://// error

2012-01-08 Thread Mark question
mapred-site.xml: configuration property namemapred.job.tracker/name valuelocalhost:10001/value /property property namemapred.child.java.opts/name value-Xmx1024m/value /property property namemapred.tasktracker.map.tasks.maximum/name value10/value /property

Re: Expected file://// error

2012-01-08 Thread Mark question
) in there or it won't pick up the correct configs. -Joey On Sun, Jan 8, 2012 at 12:59 PM, Mark question markq2...@gmail.com wrote: mapred-site.xml: configuration property namemapred.job.tracker/name valuelocalhost:10001/value /property property namemapred.child.java.opts

Expected file://// error

2012-01-06 Thread Mark question
Hello, I'm running two jobs on Hadoop-0.20.2 consecutively, such that the second one reads the output of the first which would look like: outputPath/part-0 outputPath/_logs But I get the error: 12/01/06 03:29:34 WARN fs.FileSystem: localhost:12123 is a deprecated filesystem name.

Re: Expected file://// error

2012-01-06 Thread Mark question
fs.default.name set to? It should be set to hdfs://host:port and not just host:port. Can you ensure this and retry? On 06-Jan-2012, at 5:45 PM, Mark question wrote: Hello, I'm running two jobs on Hadoop-0.20.2 consecutively, such that the second one reads the output of the first which would

Connection reset by peer Error

2011-11-20 Thread Mark question
Hi, I've been getting this error multiple times now, the namenode mentions something about peer resetting connection, but I don't know why this is happening, because I'm running on a single machine with 12 cores any ideas? The job starting running normally, which contains about 200 mappers

reading Hadoop output messages

2011-11-16 Thread Mark question
Hi all, I'm wondering if there is a way to get output messages that are printed from the main class of a Hadoop job. Usually 21 out.log would wok, but in this case it only saves the output messages printed in the main class before starting the job. What I want is the output messages that

Re: Cannot access JobTracker GUI (port 50030) via web browser while running on Amazon EC2

2011-10-24 Thread Mark question
I have the same issue and the output of curl localhost:50030 is like yours, and I'm running on a remote cluster on pesudo-distributed mode. Can anyone help? Thanks, Mark On Mon, Oct 24, 2011 at 11:02 AM, Sameer Farooqui cassandral...@gmail.comwrote: Hi guys, I'm running a 1-node Hadoop

Re: Cannot access JobTracker GUI (port 50030) via web browser while running on Amazon EC2

2011-10-24 Thread Mark question
ACCEPT: filter [ OK ] iptables: Unloading modules: [ OK ] iptables: Applying firewall rules: [ OK ] On Mon, Oct 24, 2011 at 1:37 PM, Mark question markq2...@gmail.com wrote: I have the same issue and the output of curl

Remote Blocked Transfer count

2011-10-21 Thread Mark question
Hello, I wonder if there is a way to measure how many of the data blocks have transferred over the network? Or more generally, how many times where there a connection/contact between different machines? I thought of checking the Namenode log file which usually shows blk_ from src= to dst

fixing the mapper percentage viewer

2011-10-19 Thread Mark question
Hi all, I'm written a custom mapRunner, but it seems to have ruined the percentage shown for maps on console. I want to know which part of code is responsible for adjusting the percentage of maps ... Is it the following in MapRunner: if(incrProcCount) {

Re: hadoop input buffer size

2011-10-10 Thread Mark question
of lines into a buffer, Actually, for the TextInputFormat, it read io.file.buffer.size bytes of text into a buffer each time, this can be seen from the hadoop source file LineReader.java 2011/10/5 Mark question markq2...@gmail.com Hello, Correct me if I'm wrong, but when

hadoop input buffer size

2011-10-05 Thread Mark question
Hello, Correct me if I'm wrong, but when a program opens n-files at the same time to read from, and start reading from each file at a time 1 line at a time. Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and not actually one line. If this is correct, I set up my

Mapper Progress

2011-07-21 Thread Mark question
Hi, I have my custom MapRunner which apparently seemed to affect the progress report of the mapper and showing 100% while the mapper is still reading files to process. Where can I change/add a progress object to be shown in UI ? Thank you, Mark

Re: One file per mapper

2011-07-05 Thread Mark question
Hi Govind, You should use overwrite your FileInputFormat isSplitable function in a class say myFileInputFormat extends FileInputFormat as follows: @Override public boolean isSplitable(FileSystem fs, Path filename){ return false; } Then one you use your myFileInputFormat class.

One node with Rack-local mappers ?!!!

2011-06-16 Thread Mark question
Hi, this is weird ... I'm running a job on single node with 32 mappers, running one at a time. Output says this: .. 11/06/16 00:59:43 INFO mapred.JobClient: Rack-local map tasks=18 == 11/06/16 00:59:43 INFO mapred.JobClient: Launched map tasks=32 11/06/16 00:59:43 INFO

Hadoop Runner

2011-06-11 Thread Mark question
Hi, 1) Where can I find the main class of hadoop? The one that calls the InputFormat then the MapperRunner and ReducerRunner and others? This will help me understand what is in memory or still on disk , exact flow of data between split and mappers . My problem is, assuming I have a

org.apache.hadoop.mapred.Utils can not be resolved

2011-06-09 Thread Mark question
Hi, My question here is general to this problem. How can you know which jar file will solve such error: *org.apache.hadoop.mapred.Utils can not be resolved. *I don't plan to include all hadoop jars ... Well, hope so .. Can you tell me your techniques? Thanks, Mark * *

DiskUsage class DU Error

2011-06-09 Thread Mark question
Hi, Has Anyone tried using DU class to report hdfs-files size? Both of the following lines are causing errors , running on Mac: DU DiskUsage = new DU(new File(outDir.getPath()),12L); DU DiskUsage = new DU(new File(outDir.getName()),Configuration)conf); where, Path outDir =

Re: re-reading

2011-06-08 Thread Mark question
()... 2011/6/8 Mark question markq2...@gmail.com: Hi, I'm trying to read the inputSplit over and over using following function in MapperRunner: @Override public void run(RecordReader input, OutputCollector output, Reporter reporter) throws IOException { RecordReader copyInput

Re: re-reading

2011-06-08 Thread Mark question
? Thanks, Mark On Wed, Jun 8, 2011 at 9:13 AM, Mark question markq2...@gmail.com wrote: Thanks for the replies, but input doesn't have 'clone' I don't know why ... so I'll have to write my custom inputFormat ... I was hoping for an easier way though. Thank you, Mark On Wed, Jun 8, 2011 at 1

Re: re-reading

2011-06-08 Thread Mark question
I assumed before reading the split API that it is the actual split, my bad. Thanks alot Harsh, it's working great! Mark

re-reading

2011-06-07 Thread Mark question
Hi, I'm trying to read the inputSplit over and over using following function in MapperRunner: @Override public void run(RecordReader input, OutputCollector output, Reporter reporter) throws IOException { RecordReader copyInput = input; //First read while(input.next(key,value));

Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Hi, Does anyone have a way to reduce InputSplit size in general ? By default, the minimum size chunk that map input should be split into is set to 0 (ie.mapred.min.split.size). Can I change dfs.block.size or some other configuration to reduce the split size and spawn many mappers? Thanks, Mark

Re: Reducing Mapper InputSplit size

2011-06-06 Thread Mark question
Great! Thanks guys :) Mark 2011/6/6 Panayotis Antonopoulos antonopoulos...@hotmail.com Hi Mark, Check: http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html I think that setMaxInputSplitSize(Job job, long size)

SequenceFile.Reader

2011-06-02 Thread Mark question
Hi, Does anyone knows if : SequenceFile.next(key) is actually not reading value into memory *nexthttp://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable%29

Re: SequenceFile.Reader

2011-06-02 Thread Mark question
skips to the next key? Thanks, Mark On Thu, Jun 2, 2011 at 3:49 PM, John Armstrong john.armstr...@ccri.comwrote: On Thu, 2 Jun 2011 15:43:37 -0700, Mark question markq2...@gmail.com wrote: Does anyone knows if : SequenceFile.next(key) is actually not reading value into memory I think

Re: SequenceFile.Reader

2011-06-02 Thread Mark question
) would skip reading value from disk. Mark On Thu, Jun 2, 2011 at 6:20 PM, Mark question markq2...@gmail.com wrote: Hi John, thanks for the reply. But I'm not asking about the key memory allocation here. I'm just saying what's the difference between: Next(key,value) and Next(key

UI not working

2011-05-28 Thread Mark question
Hi, My UI for hadoop 20.2 on a single machine suddenly is giving the following errors for NN and JT web-sites respectively: HTTP ERROR: 404 /dfshealth.jsp RequestURI=/dfshealth.jsp *Powered by Jetty:// http://jetty.mortbay.org/* HTTP ERROR: 503 SERVICE_UNAVAILABLE

Increase node-mappers capacity in single node

2011-05-27 Thread Mark question
Hi, I tried changing mapreduce.job.maps to be more than 2 , but since I'm running in pseudo distributed mode, JobTracker is local and hence this property is not changed. I'm running on a 12 core machine and would like to make use of that ... Is there a way to trick Hadoop? I also tried

Re: How to copy over using dfs

2011-05-27 Thread Mark question
I don't think so, becauseI read somewhere that this is to insure the safety of the produced data. Hence Hadoop will force you to do this to know what exactly is happening. Mark On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia mohitanch...@gmail.comwrote: If I have to overwrite a file I

Re: web site doc link broken

2011-05-27 Thread Mark question
I also got the following from learn about : Not Found The requested URL /common/docs/stable/ was not found on this server. -- Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at hadoop.apache.orgPort 80 Mark On Fri, May 27, 2011 at 8:03 AM, Harsh J

Re: Sorting ...

2011-05-26 Thread Mark question
that is fairly fast and a lot less dev work to get going you might want to look at pig. They can do a distributed order by that is fairly good. --Bobby Evans On 5/26/11 2:45 AM, Luca Pireddu pire...@crs4.it wrote: On May 25, 2011 22:15:50 Mark question wrote: I'm using SequenceFileInputFormat

Re: one question about hadoop

2011-05-26 Thread Mark question
web.xml is in: hadoop-releaseNo/webapps/job/WEB-INF/web.xml Mark On Thu, May 26, 2011 at 1:29 AM, Luke Lu l...@vicaya.com wrote: Hadoop embeds jetty directly into hadoop servers with the org.apache.hadoop.http.HttpServer class for servlets. For jsp, web.xml is auto generated with the

Re: I can't see this email ... So to clarify ..

2011-05-25 Thread Mark question
24, 2011 at 9:26 PM, Mapred Learn mapred.le...@gmail.comwrote: Do u Hv right permissions on the new dirs ? Try stopping n starting cluster... -JJ On May 24, 2011, at 9:13 PM, Mark question markq2...@gmail.com wrote: Well, you're right ... moving it to hdfs-site.xml had an effect at least

Re: Sorting ...

2011-05-25 Thread Mark question
I'm using SequenceFileInputFormat, but then what to write in my mappers? each mapper is taking a split from the SequenceInputFile then sort its split ?! I don't want that.. Thanks, Mark On Wed, May 25, 2011 at 2:09 AM, Luca Pireddu pire...@crs4.it wrote: On May 25, 2011 01:43:22 Mark

UI not working ..

2011-05-25 Thread Mark question
Hi, My UI for hadoop 20.2 on a single machine suddenly is giving the following errors for NN and JT web-sites respectively: HTTP ERROR: 404 /dfshealth.jsp RequestURI=/dfshealth.jsp *Powered by Jetty:// http://jetty.mortbay.org/* HTTP ERROR: 503 SERVICE_UNAVAILABLE

Re: UI not working ..

2011-05-25 Thread Mark question
Hi, My UI for hadoop 20.2 on a single machine suddenly is giving the following errors for NN and JT web-sites respectively: HTTP ERROR: 404 /dfshealth.jsp RequestURI=/dfshealth.jsp *Powered by Jetty:// http://jetty.mortbay.org/* HTTP ERROR: 503 SERVICE_UNAVAILABLE

Re: get name of file in mapper output directory

2011-05-24 Thread Mark question
On Sat, May 21, 2011 at 8:03 PM, Mark question markq2...@gmail.com wrote: Hi, I'm running a job with maps only and I want by end of each map (ie.Close() function) to open the file that the current map has wrote using its output.collector. I know job.getWorkingDirectory

Re: Sorting ...

2011-05-24 Thread Mark question
:21:53 Mark question wrote: I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But after taking a couple of minutes .. output is empty. snip I'm trying to find what the input format for the TeraSort is, but it is not specified. Thanks for any thought, Mark

Cannot lock storage, directory is already locked

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage

I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error: org.apache.hadoop.hdfs.server. common.Storage: Cannot lock storage

Re: I can't see this email ... So to clarify ..

2011-05-24 Thread Mark question
On Tue, May 24, 2011 at 10:22 PM, Mark question markq2...@gmail.com wrote: Hi guys, I'm using an NFS cluster consisting of 30 machines, but only specified 3 of the nodes to be my hadoop cluster. So my problem is this. Datanode won't start in one of the nodes because of the following error

I didn't see my email sent yesterday ... So here is the question again ..

2011-05-22 Thread Mark question
Hi, I'm running a job with maps only and I want by end of each map (ie. in its Close() function) to open the file that the current map has wrote using its output.collector. I know job.getWorkingDirectory() would give me the parent path of the file written, but how to get the full path or

Re: How hadoop parse input files into (Key,Value) pairs ??

2011-05-22 Thread Mark question
The case your talking about is when you use FileInputFormat ... So usually the InputFormat Interface is the one responsible for that. For FileInputFormat, it uses a LineRecordReader which will take your text file and assigns key to be the offset within your text file and value to be the line

get name of file in mapper output directory

2011-05-21 Thread Mark question
Hi, I'm running a job with maps only and I want by end of each map (ie.Close() function) to open the file that the current map has wrote using its output.collector. I know job.getWorkingDirectory() would give me the parent path of the file written, but how to get the full path or the name

Sorting ...

2011-05-21 Thread Mark question
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But after taking a couple of minutes .. output is empty. HDFS has the following Sequence files: -rw-r--r-- 1 Hadoop supergroup 196113760 2011-05-21 12:16 /user/Hadoop/out/part-0 -rw-r--r-- 1 Hadoop supergroup 250935096

Re: current line number as key?

2011-05-21 Thread Mark question
What if you run a MapReduce program to generate a Sequence File from your text file where key is the line number and value is the whole line, then for the second job, the splits are done record wise hence, each mapper will be getting a split/block of records [lineNumberline] ~Cheers, Mark On Wed,

Re: outputCollector vs. Localfile

2011-05-20 Thread Mark question
I thought it was, because of FileBytesWritten counter. Thanks for the clarification. Mark On Fri, May 20, 2011 at 4:23 AM, Harsh J ha...@cloudera.com wrote: Mark, On Fri, May 20, 2011 at 10:17 AM, Mark question markq2...@gmail.com wrote: This is puzzling me ... With a mapper producing

outputCollector vs. Localfile

2011-05-19 Thread Mark question
This is puzzling me ... With a mapper producing output of size ~ 400 MB ... which one is supposed to be faster? 1) output collector: which will write to local file then copy to HDFS since I don't have reducers. 2) Open a unique local file inside mapred.local.dir for each mapper. I

Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Hi I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 patch.20.2 and set a property *“mapred.performance.diagnose”* to true in * mapred-site.xml*. but I don't

Again ... Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
Sorry for the spam, but I didn't see my previous email yet. I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkout and applied the patch in my hadoop.20.2 directory as: patch -p0 patch.20.2 and set a property

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
So what other memory consumption tools do you suggest? I don't want to do it manually and dump statistics into file because IO will affect performance too. Thanks, Mark On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer a...@apache.org wrote: On May 17, 2011, at 1:01 PM, Mark question wrote

Re: Hadoop tool-kit for monitoring

2011-05-17 Thread Mark question
, 2011, at 3:11 PM, Mark question wrote: So what other memory consumption tools do you suggest? I don't want to do it manually and dump statistics into file because IO will affect performance too. We watch memory with Ganglia. We also tune our systems such that a task will only

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
I usually do this setting inside my java program (in run function) as follows: JobConf conf = new JobConf(this.getConf(),My.class); conf.set(*mapred*.task.*profile*, true); then I'll see some output files in that same working directory. Hope that helps, Mark On Tue, May

Re: How do you run HPROF locally?

2011-05-17 Thread Mark question
or conf.setBoolean(mapred.task.profile, true); Mark On Tue, May 17, 2011 at 4:49 PM, Mark question markq2...@gmail.com wrote: I usually do this setting inside my java program (in run function) as follows: JobConf conf = new JobConf(this.getConf(),My.class); conf.set

Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned ? I tried looking up the Reporter class and see how is it printing the logical splits on the UI for each

I can't see my messages immediately, and sometimes doesn't even arrive why !

2011-05-12 Thread Mark question

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
omal...@apache.org wrote: On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com wrote: Hi I'm using FileInputFormat which will split files logically according to their sizes into splits. Can the mapper get a pointer to these splits? and know which split it is assigned

Re: how to get user-specified Job name from hadoop for running jobs?

2011-05-12 Thread Mark question
you mean by user-specified is when you write your job name via JobConf.setJobName(myTask) ? Then using the same object you can recall your name as follows: JobConf conf ; conf.getJobName() ; ~Cheers Mark On Tue, May 10, 2011 at 10:16 AM, Mark Zand mz...@basistech.com wrote: While I can get

Re: Can Mapper get paths of inputSplits ?

2011-05-12 Thread Mark question
: On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com wrote: So there is no way I can see the other possible splits (start+length)? like some function that returns strings of map.input.file and map.input.offset of the other mappers ? No, there isn't any way to do

Space needed to user SequenceFile.Sorter

2011-04-28 Thread Mark question
I don't know why I can't see my emails immediately sent to the group ... anyways, I'm sorting a sequenceFile using it's sorter on my local filesystem. The inputFile size is 1937690478 bytes. but after 14 minutes of sorting.. I get : TEST SORTING .. java.io.FileNotFoundException: File does not

Reading from File

2011-04-26 Thread Mark question
Hi, My mapper opens a file and read records using next() . However, I want to stop reading if there is no memory available. What confuses me here is that even though I'm reading record by record with next(), hadoop actually reads them in dfs.block.size. So, I have two questions: 1. Is it true

Re: Sequence.Sorter Performance

2011-04-25 Thread Mark question
Thanks Owen ! Mark On Mon, Apr 25, 2011 at 11:43 AM, Owen O'Malley omal...@apache.org wrote: The SequenceFile sorter is ok. It used to be the sort used in the shuffle. *grin* Make sure to set io.sort.factor and io.sort.mb to appropriate values for your hardware. I'd usually use

SequenceFile.Sorter performance

2011-04-24 Thread Mark question
Hi guys, I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting . I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter

SequenceFile.Sorter

2011-04-24 Thread Mark question
Hi guys, I'm trying to sort a 2.5 GB sequence file in one mapper using its implemented Sort function, but it's taking long that the map is killed for not reporting . I would increase the default time to get reports from the mapper, but I'll do this only if sorting using SequenceFile.sorter