the status after the change
On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.com
wrote:
Hi guys, this is a very simple program, trying to use TextInputFormat
and
SequenceFileoutputFormat. Should be easy but I get the same error.
Here is my configurations
.
I am not getting any error.
On Wed, May 30, 2012 at 1:27 AM, Mark question markq2...@gmail.comwrote:
Hi guys, this is a very simple program, trying to use TextInputFormat and
SequenceFileoutputFormat. Should be easy but I get the same error.
Here is my configurations
I'm interested in this too, but could you tell me where to apply the patch
and is the following the right command to write it:
https://issues.apache.org/jira/secure/attachment/12416955/MAPREDUCE-336_0_20090818.patchpatch
Hey guys, I've been stuck with HCE installation for two days now and can't
figure out the problem.
Errors I get from running (sh build.sh) is can not execute binary file .
I tried setting my JAVA_HOME and ANT_HOME manually and using the script
build.sh, no luck. So, please if you've used HCE
.
--Bobby Evans
On 4/5/12 1:54 PM, Mark question markq2...@gmail.com wrote:
Hi guys,
quick question:
Are there any performance gains from hadoop streaming or pipes over
Java? From what I've read, it's only to ease testing by using your
favorite
language. So I guess
Hi guys,
Two quick questions:
1. Are there any performance gains from hadoop streaming or pipes ? As
far as I read, it is to ease testing using your favorite language. Which I
think implies that everything is translated to bytecode eventually and
executed.
Hi guys,
quick question:
Are there any performance gains from hadoop streaming or pipes over
Java? From what I've read, it's only to ease testing by using your favorite
language. So I guess it is eventually translated to bytecode then executed.
Is that true?
Thank you,
Mark
:54 PM, Mark question markq2...@gmail.com wrote:
Hi guys,
quick question:
Are there any performance gains from hadoop streaming or pipes over
Java? From what I've read, it's only to ease testing by using your favorite
language. So I guess it is eventually translated to bytecode then executed
a default constructor.
On Sat, Mar 3, 2012 at 4:56 AM, Mark question markq2...@gmail.com wrote:
Hello,
I'm trying to debug my code through eclipse, which worked fine with
given Hadoop applications (eg. wordcount), but as soon as I run it on my
application with my custom sequence input
Unfortunately, public didn't change my error ... Any other ideas? Has
anyone ran Hadoop on eclipse with custom sequence inputs ?
Thank you,
Mark
On Mon, Mar 5, 2012 at 9:58 AM, Mark question markq2...@gmail.com wrote:
Hi Madhu, it has the following line:
TermDocFreqArrayWritable
Starfish worked great for wordcount .. I didn't run it on my application
because I have only map tasks.
Mark
On Thu, Mar 1, 2012 at 4:34 AM, Charles Earl charles.ce...@gmail.comwrote:
How was your experience of starfish?
C
On Mar 1, 2012, at 12:35 AM, Mark question wrote:
Thank you
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the usual C memory management? For example, malloc() ,
sizeof() ? My guess is no since this all will eventually be turned into
bytecode, but I need more control on memory which obviously is hard for me
to do
:56 PM, Mark question wrote:
Hi guys, thought I should ask this before I use it ... will using C over
Hadoop give me the usual C memory management? For example, malloc() ,
sizeof() ? My guess is no since this all will eventually be turned into
bytecode, but I need more control on memory
...@gmail.comwrote:
Mark,
So if I understand, it is more the memory management that you are
interested in, rather than a need to run an existing C or C++ application
in MapReduce platform?
Have you done profiling of the application?
C
On Feb 29, 2012, at 2:19 PM, Mark question wrote
) to gain insight by configuring hadoop to run
absolute minimum number of tasks?
Perhaps the discussion
http://grokbase.com/t/hadoop/common-user/11ahm67z47/how-do-i-connect-java-visual-vm-to-a-remote-task
might be relevant?
On Feb 29, 2012, at 3:53 PM, Mark question wrote:
I've used hadoop
of mapred.child.ulimit value of mapred.child.java.opts
On Thu, Feb 16, 2012 at 12:38 AM, Mark question markq2...@gmail.com
wrote:
Thanks for the reply Srinivas, so option 2 will be enough, however, when
I
tried setting it to 512MB, I see through the system monitor that the map
task is given 275MB
Hi,
My question is what's the difference between the following two settings:
1. mapred.task.default.maxvmem
2. mapred.child.java.opts
The first one is used by the TT to monitor the memory usage of tasks, while
the second one is the maximum heap space assigned for each task. I want to
limit
, Mark question markq2...@gmail.com
wrote:
Hi,
My question is what's the difference between the following two settings:
1. mapred.task.default.maxvmem
2. mapred.child.java.opts
The first one is used by the TT to monitor the memory usage of tasks,
while
the second one
Hi guys,
Even though there is enough space on HDFS as shown by -report ... I get the
following 2 error shown first in
the log of a datanode and the second on Namenode log:
1)2012-02-09 10:18:37,519 INFO org.apache.hadoop.hdfs.StateChange: BLOCK*
NameSystem.addToInvalidates:
to run 1 million block transfer
(in/out) threads by doing that. It does not take up resources by
default sure, but now it can be abused with requests to make your DN
run out of memory and crash cause its not bound to proper limits now.
On Fri, Jan 27, 2012 at 5:49 AM, Mark question markq2
-a?
That should give you the number of open files allowed by a single
user...
Sent from a remote device. Please excuse any typos...
Mike Segel
On Jan 26, 2012, at 5:13 AM, Mark question markq2...@gmail.com wrote:
Hi guys,
I get this error from a job trying to process 3Million records
are on different machines.
Praveen
On Mon, Jan 9, 2012 at 11:41 PM, Mark question markq2...@gmail.com
wrote:
Hello guys,
I'm requesting from a PBS scheduler a number of machines to run Hadoop
and even though all hadoop daemons start normally on the master and
slaves,
the slaves don't have worker
Hello guys,
I'm requesting from a PBS scheduler a number of machines to run Hadoop
and even though all hadoop daemons start normally on the master and slaves,
the slaves don't have worker tasks in them. Digging into that, there seems
to be some blocking between nodes (?) don't know how to
mapred-site.xml:
configuration
property
namemapred.job.tracker/name
valuelocalhost:10001/value
/property
property
namemapred.child.java.opts/name
value-Xmx1024m/value
/property
property
namemapred.tasktracker.map.tasks.maximum/name
value10/value
/property
) in there or
it won't pick up the correct configs.
-Joey
On Sun, Jan 8, 2012 at 12:59 PM, Mark question markq2...@gmail.com
wrote:
mapred-site.xml:
configuration
property
namemapred.job.tracker/name
valuelocalhost:10001/value
/property
property
namemapred.child.java.opts
Hello,
I'm running two jobs on Hadoop-0.20.2 consecutively, such that the second
one reads the output of the first which would look like:
outputPath/part-0
outputPath/_logs
But I get the error:
12/01/06 03:29:34 WARN fs.FileSystem: localhost:12123 is a deprecated
filesystem name.
fs.default.name set to? It should be set to hdfs://host:port
and not just host:port. Can you ensure this and retry?
On 06-Jan-2012, at 5:45 PM, Mark question wrote:
Hello,
I'm running two jobs on Hadoop-0.20.2 consecutively, such that the
second
one reads the output of the first which would
Hi,
I've been getting this error multiple times now, the namenode mentions
something about peer resetting connection, but I don't know why this is
happening, because I'm running on a single machine with 12 cores any
ideas?
The job starting running normally, which contains about 200 mappers
Hi all,
I'm wondering if there is a way to get output messages that are printed
from the main class of a Hadoop job.
Usually 21 out.log would wok, but in this case it only saves the
output messages printed in the main class before starting the job.
What I want is the output messages that
I have the same issue and the output of curl localhost:50030 is like
yours, and I'm running on a remote cluster on pesudo-distributed mode.
Can anyone help?
Thanks,
Mark
On Mon, Oct 24, 2011 at 11:02 AM, Sameer Farooqui
cassandral...@gmail.comwrote:
Hi guys,
I'm running a 1-node Hadoop
ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
iptables: Applying firewall rules: [ OK ]
On Mon, Oct 24, 2011 at 1:37 PM, Mark question markq2...@gmail.com
wrote:
I have the same issue and the output of curl
Hello,
I wonder if there is a way to measure how many of the data blocks have
transferred over the network? Or more generally, how many times where there
a connection/contact between different machines?
I thought of checking the Namenode log file which usually shows blk_
from src= to dst
Hi all,
I'm written a custom mapRunner, but it seems to have ruined the percentage
shown for maps on console. I want to know which part of code is responsible
for adjusting the percentage of maps ... Is it the following in MapRunner:
if(incrProcCount) {
of lines
into a buffer,
Actually, for the TextInputFormat, it read io.file.buffer.size
bytes of text
into a buffer each time,
this can be seen from the hadoop source file LineReader.java
2011/10/5 Mark question markq2...@gmail.com
Hello,
Correct me if I'm wrong, but when
Hello,
Correct me if I'm wrong, but when a program opens n-files at the same time
to read from, and start reading from each file at a time 1 line at a time.
Isn't hadoop actually fetching dfs.block.size of lines into a buffer? and
not actually one line.
If this is correct, I set up my
Hi,
I have my custom MapRunner which apparently seemed to affect the progress
report of the mapper and showing 100% while the mapper is still reading
files to process. Where can I change/add a progress object to be shown in UI
?
Thank you,
Mark
Hi Govind,
You should use overwrite your FileInputFormat isSplitable function in a
class say myFileInputFormat extends FileInputFormat as follows:
@Override
public boolean isSplitable(FileSystem fs, Path filename){
return false;
}
Then one you use your myFileInputFormat class.
Hi, this is weird ... I'm running a job on single node with 32 mappers,
running one at a time.
Output says this: ..
11/06/16 00:59:43 INFO mapred.JobClient: Rack-local map tasks=18
==
11/06/16 00:59:43 INFO mapred.JobClient: Launched map tasks=32
11/06/16 00:59:43 INFO
Hi,
1) Where can I find the main class of hadoop? The one that calls the
InputFormat then the MapperRunner and ReducerRunner and others?
This will help me understand what is in memory or still on disk , exact
flow of data between split and mappers .
My problem is, assuming I have a
Hi,
My question here is general to this problem. How can you know which jar
file will solve such error:
*org.apache.hadoop.mapred.Utils can not be resolved.
*I don't plan to include all hadoop jars ... Well, hope so .. Can you
tell me your techniques?
Thanks,
Mark
*
*
Hi,
Has Anyone tried using DU class to report hdfs-files size?
Both of the following lines are causing errors , running on Mac:
DU DiskUsage = new DU(new File(outDir.getPath()),12L);
DU DiskUsage = new DU(new File(outDir.getName()),Configuration)conf);
where, Path outDir =
()...
2011/6/8 Mark question markq2...@gmail.com:
Hi,
I'm trying to read the inputSplit over and over using following
function
in MapperRunner:
@Override
public void run(RecordReader input, OutputCollector output, Reporter
reporter) throws IOException {
RecordReader copyInput
?
Thanks,
Mark
On Wed, Jun 8, 2011 at 9:13 AM, Mark question markq2...@gmail.com wrote:
Thanks for the replies, but input doesn't have 'clone' I don't know why ...
so I'll have to write my custom inputFormat ... I was hoping for an easier
way though.
Thank you,
Mark
On Wed, Jun 8, 2011 at 1
I assumed before reading the split API that it is the actual split, my bad.
Thanks alot Harsh, it's working great!
Mark
Hi,
I'm trying to read the inputSplit over and over using following function
in MapperRunner:
@Override
public void run(RecordReader input, OutputCollector output, Reporter
reporter) throws IOException {
RecordReader copyInput = input;
//First read
while(input.next(key,value));
Hi,
Does anyone have a way to reduce InputSplit size in general ?
By default, the minimum size chunk that map input should be split into is
set to 0 (ie.mapred.min.split.size). Can I change dfs.block.size or some
other configuration to reduce the split size and spawn many mappers?
Thanks,
Mark
Great! Thanks guys :)
Mark
2011/6/6 Panayotis Antonopoulos antonopoulos...@hotmail.com
Hi Mark,
Check:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapreduce/lib/input/FileInputFormat.html
I think that setMaxInputSplitSize(Job job,
long size)
Hi,
Does anyone knows if : SequenceFile.next(key) is actually not reading
value into memory
*nexthttp://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.Reader.html#next%28org.apache.hadoop.io.Writable%29
skips to the next key?
Thanks,
Mark
On Thu, Jun 2, 2011 at 3:49 PM, John Armstrong john.armstr...@ccri.comwrote:
On Thu, 2 Jun 2011 15:43:37 -0700, Mark question markq2...@gmail.com
wrote:
Does anyone knows if : SequenceFile.next(key) is actually not reading
value into memory
I think
) would skip reading value from disk.
Mark
On Thu, Jun 2, 2011 at 6:20 PM, Mark question markq2...@gmail.com wrote:
Hi John, thanks for the reply. But I'm not asking about the key memory
allocation here. I'm just saying what's the difference between:
Next(key,value) and Next(key
Hi,
My UI for hadoop 20.2 on a single machine suddenly is giving the following
errors for NN and JT web-sites respectively:
HTTP ERROR: 404
/dfshealth.jsp
RequestURI=/dfshealth.jsp
*Powered by Jetty:// http://jetty.mortbay.org/*
HTTP ERROR: 503
SERVICE_UNAVAILABLE
Hi,
I tried changing mapreduce.job.maps to be more than 2 , but since I'm
running in pseudo distributed mode, JobTracker is local and hence this
property is not changed.
I'm running on a 12 core machine and would like to make use of that ... Is
there a way to trick Hadoop?
I also tried
I don't think so, becauseI read somewhere that this is to insure the safety
of the produced data. Hence Hadoop will force you to do this to know what
exactly is happening.
Mark
On Fri, May 27, 2011 at 12:28 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
If I have to overwrite a file I
I also got the following from learn about :
Not Found
The requested URL /common/docs/stable/ was not found on this server.
--
Apache/2.3.8 (Unix) mod_ssl/2.3.8 OpenSSL/1.0.0c Server at
hadoop.apache.orgPort 80
Mark
On Fri, May 27, 2011 at 8:03 AM, Harsh J
that is fairly fast and a lot less dev work to
get going you might want to look at pig. They can do a distributed order by
that is fairly good.
--Bobby Evans
On 5/26/11 2:45 AM, Luca Pireddu pire...@crs4.it wrote:
On May 25, 2011 22:15:50 Mark question wrote:
I'm using SequenceFileInputFormat
web.xml is in:
hadoop-releaseNo/webapps/job/WEB-INF/web.xml
Mark
On Thu, May 26, 2011 at 1:29 AM, Luke Lu l...@vicaya.com wrote:
Hadoop embeds jetty directly into hadoop servers with the
org.apache.hadoop.http.HttpServer class for servlets. For jsp, web.xml
is auto generated with the
24, 2011 at 9:26 PM, Mapred Learn mapred.le...@gmail.comwrote:
Do u Hv right permissions on the new dirs ?
Try stopping n starting cluster...
-JJ
On May 24, 2011, at 9:13 PM, Mark question markq2...@gmail.com wrote:
Well, you're right ... moving it to hdfs-site.xml had an effect at
least
I'm using SequenceFileInputFormat, but then what to write in my mappers?
each mapper is taking a split from the SequenceInputFile then sort its
split ?! I don't want that..
Thanks,
Mark
On Wed, May 25, 2011 at 2:09 AM, Luca Pireddu pire...@crs4.it wrote:
On May 25, 2011 01:43:22 Mark
Hi,
My UI for hadoop 20.2 on a single machine suddenly is giving the following
errors for NN and JT web-sites respectively:
HTTP ERROR: 404
/dfshealth.jsp
RequestURI=/dfshealth.jsp
*Powered by Jetty:// http://jetty.mortbay.org/*
HTTP ERROR: 503
SERVICE_UNAVAILABLE
Hi,
My UI for hadoop 20.2 on a single machine suddenly is giving the
following errors for NN and JT web-sites respectively:
HTTP ERROR: 404
/dfshealth.jsp
RequestURI=/dfshealth.jsp
*Powered by Jetty:// http://jetty.mortbay.org/*
HTTP ERROR: 503
SERVICE_UNAVAILABLE
On Sat, May 21, 2011 at 8:03 PM, Mark question markq2...@gmail.com
wrote:
Hi,
I'm running a job with maps only and I want by end of each map
(ie.Close() function) to open the file that the current map has wrote
using its output.collector.
I know job.getWorkingDirectory
:21:53 Mark question wrote:
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But
after taking a couple of minutes .. output is empty.
snip
I'm trying to find what the input format for the TeraSort is, but it is
not
specified.
Thanks for any thought,
Mark
Hi guys,
I'm using an NFS cluster consisting of 30 machines, but only specified 3 of
the nodes to be my hadoop cluster. So my problem is this. Datanode won't
start in one of the nodes because of the following error:
org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage
Hi guys,
I'm using an NFS cluster consisting of 30 machines, but only specified 3 of
the nodes to be my hadoop cluster. So my problem is this. Datanode won't
start in one of the nodes because of the following error:
org.apache.hadoop.hdfs.server.
common.Storage: Cannot lock storage
On Tue, May 24, 2011 at 10:22 PM, Mark question markq2...@gmail.com
wrote:
Hi guys,
I'm using an NFS cluster consisting of 30 machines, but only specified 3
of
the nodes to be my hadoop cluster. So my problem is this. Datanode won't
start in one of the nodes because of the following error
Hi,
I'm running a job with maps only and I want by end of each map (ie. in
its Close() function) to open the file that the current map has wrote using
its output.collector.
I know job.getWorkingDirectory() would give me the parent path of the
file written, but how to get the full path or
The case your talking about is when you use FileInputFormat ... So usually
the InputFormat Interface is the one responsible for that.
For FileInputFormat, it uses a LineRecordReader which will take your text
file and assigns key to be the offset within your text file and value to be
the line
Hi,
I'm running a job with maps only and I want by end of each map
(ie.Close() function) to open the file that the current map has wrote using
its output.collector.
I know job.getWorkingDirectory() would give me the parent path of the
file written, but how to get the full path or the name
I'm trying to sort Sequence files using the Hadoop-Example TeraSort. But
after taking a couple of minutes .. output is empty.
HDFS has the following Sequence files:
-rw-r--r-- 1 Hadoop supergroup 196113760 2011-05-21 12:16
/user/Hadoop/out/part-0
-rw-r--r-- 1 Hadoop supergroup 250935096
What if you run a MapReduce program to generate a Sequence File from your
text file where key is the line number and value is the whole line, then for
the second job, the splits are done record wise hence, each mapper will be
getting a split/block of records [lineNumberline] ~Cheers,
Mark
On Wed,
I thought it was, because of FileBytesWritten counter. Thanks for the
clarification.
Mark
On Fri, May 20, 2011 at 4:23 AM, Harsh J ha...@cloudera.com wrote:
Mark,
On Fri, May 20, 2011 at 10:17 AM, Mark question markq2...@gmail.com
wrote:
This is puzzling me ...
With a mapper producing
This is puzzling me ...
With a mapper producing output of size ~ 400 MB ... which one is supposed
to be faster?
1) output collector: which will write to local file then copy to HDFS since
I don't have reducers.
2) Open a unique local file inside mapred.local.dir for each mapper.
I
Hi
I need to use hadoop-tool-kit for monitoring. So I followed
http://code.google.com/p/hadoop-toolkit/source/checkout
and applied the patch in my hadoop.20.2 directory as: patch -p0 patch.20.2
and set a property *“mapred.performance.diagnose”* to true in *
mapred-site.xml*.
but I don't
Sorry for the spam, but I didn't see my previous email yet.
I need to use hadoop-tool-kit for monitoring. So I followed
http://code.google.com/p/hadoop-toolkit/source/checkout
and applied the patch in my hadoop.20.2 directory as: patch -p0 patch.20.2
and set a property
So what other memory consumption tools do you suggest? I don't want to do it
manually and dump statistics into file because IO will affect performance
too.
Thanks,
Mark
On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer a...@apache.org wrote:
On May 17, 2011, at 1:01 PM, Mark question wrote
, 2011, at 3:11 PM, Mark question wrote:
So what other memory consumption tools do you suggest? I don't want to
do it
manually and dump statistics into file because IO will affect
performance
too.
We watch memory with Ganglia. We also tune our systems such that
a task will only
I usually do this setting inside my java program (in run function) as
follows:
JobConf conf = new JobConf(this.getConf(),My.class);
conf.set(*mapred*.task.*profile*, true);
then I'll see some output files in that same working directory.
Hope that helps,
Mark
On Tue, May
or conf.setBoolean(mapred.task.profile, true);
Mark
On Tue, May 17, 2011 at 4:49 PM, Mark question markq2...@gmail.com wrote:
I usually do this setting inside my java program (in run function) as
follows:
JobConf conf = new JobConf(this.getConf(),My.class);
conf.set
Hi
I'm using FileInputFormat which will split files logically according to
their sizes into splits. Can the mapper get a pointer to these splits? and
know which split it is assigned ?
I tried looking up the Reporter class and see how is it printing the
logical splits on the UI for each
omal...@apache.org wrote:
On Thu, May 12, 2011 at 8:59 PM, Mark question markq2...@gmail.com
wrote:
Hi
I'm using FileInputFormat which will split files logically according to
their sizes into splits. Can the mapper get a pointer to these splits?
and
know which split it is assigned
you mean by user-specified is when you write your job name via
JobConf.setJobName(myTask) ?
Then using the same object you can recall your name as follows:
JobConf conf ;
conf.getJobName() ;
~Cheers
Mark
On Tue, May 10, 2011 at 10:16 AM, Mark Zand mz...@basistech.com wrote:
While I can get
:
On Thu, May 12, 2011 at 9:23 PM, Mark question markq2...@gmail.com
wrote:
So there is no way I can see the other possible splits (start+length)?
like
some function that returns strings of map.input.file and map.input.offset
of
the other mappers ?
No, there isn't any way to do
I don't know why I can't see my emails immediately sent to the group ...
anyways,
I'm sorting a sequenceFile using it's sorter on my local filesystem. The
inputFile size is 1937690478 bytes.
but after 14 minutes of sorting.. I get :
TEST SORTING ..
java.io.FileNotFoundException: File does not
Hi,
My mapper opens a file and read records using next() . However, I want to
stop reading if there is no memory available. What confuses me here is that
even though I'm reading record by record with next(), hadoop actually reads
them in dfs.block.size. So, I have two questions:
1. Is it true
Thanks Owen !
Mark
On Mon, Apr 25, 2011 at 11:43 AM, Owen O'Malley omal...@apache.org wrote:
The SequenceFile sorter is ok. It used to be the sort used in the shuffle.
*grin*
Make sure to set io.sort.factor and io.sort.mb to appropriate values for
your hardware. I'd usually use
Hi guys,
I'm trying to sort a 2.5 GB sequence file in one mapper using its
implemented Sort function, but it's taking long that the map is killed for
not reporting .
I would increase the default time to get reports from the mapper, but I'll
do this only if sorting using SequenceFile.sorter
Hi guys,
I'm trying to sort a 2.5 GB sequence file in one mapper using its
implemented Sort function, but it's taking long that the map is killed for
not reporting .
I would increase the default time to get reports from the mapper, but I'll
do this only if sorting using SequenceFile.sorter
88 matches
Mail list logo