n Apr 24 15:50:36 PDT 2011
Can I change the configured capacity ? or is it set up automatically by
Hadoop based on available resources?
Thanks,
Maha
from memory .. right? if yes, what parameter is used for the buffer size?
Thank you,
Maha
On Mar 31, 2011, at 11:59 PM, Harsh J wrote:
> On Fri, Apr 1, 2011 at 9:00 AM, maha wrote:
>> Hello Everyone,
>>
>>As far as I know, when my java program opens a sequenc
with input about 6 MB, but the memory allocated was
13 MB! .. which might be a fragmentation problem, but I doubt it.
Thank you,
Maha
ase
that is shown in the UI .. which I think is supposed to be ... right?
Thank you,
Maha
> Hello,
>
> My map tasks are freezing after 100% .. I'm suspecting my mapper.close().
>
> output is the following:
>
> 11/03/30 08:13:54 INFO mapred.JobClient: map 9
uce 0%
...
Thank you for any thought,
Maha
er get the
record-by-record from memory ?
Assuming the single-thread mapper class.
Thanks,
Maha
On Mar 22, 2011, at 11:22 AM, Harsh J wrote:
> NullOutputFormat
I get: java.io.IOException: Undefined
job output-path
How can I tell the job configuration not to prepare an output path (or anything
produced by output.collect()) ?
Thank you,
Maha
The Reader idea worked fine I guess :) Thanks,
Maha
On Mar 22, 2011, at 1:28 AM, Harsh J wrote:
> I do not know of an API-side thing that does this, but basically the
> first three bytes of a given sequence file would be 'S', 'E', 'Q'
> (which is ch
Hello,
Is there a way to check if a file foo is a Hadoop SequenceFile ?
Thanks,
Maha
That's absolutely correct :) thanks Simon.
Maha
On Mar 19, 2011, at 7:13 PM, Simon wrote:
> It is hard to judge without the code. But my guess is that your
> TermFreqArrayWritable
> is not properly compiled or imported into your job control file.
>
> HTH.
> Simon
>
and TermFreqArrayWritable is inside the same project under a default package.
Has any one tried their custom Writable with SequenceFiles ?
Thank you,
Maha
causes a NullPointerException.
>
Do you mean I have to use Integer objects instead of primitive types ??
> You can fix this in CreateNewVector(), by explicitly allocating a new
> twoInteger object for each location in the "vector" array, or in the
> readFields() lo
Hello,
I'm stuck with this for two days now ...I found a previous post discussing
this, but not with arrays.
I know how to write Writable class with primitive type elements but this time
I'm using an ARRAY of primitive type element, here it is in a simplified
version for easy readability :
I found it :)
http://hadoop.apache.org/mapreduce/docs/current/api/org/apache/hadoop/mapreduce/lib/map/MultithreadedMapper.html
Maha
On Mar 15, 2011, at 2:18 PM, maha wrote:
> By the way, how do I know if my map task is single threaded (ie. one thread
> executing for each record ) ? and
By the way, how do I know if my map task is single threaded (ie. one thread
executing for each record ) ? and how to change that into multi-threading ?
Thank you,
Maha
On Mar 12, 2011, at 9:11 PM, Harsh J wrote:
> Hello,
>
> On Sat, Mar 12, 2011 at 3:51 PM, Jun Young Kim wro
Try running ...
$bin/hadoop dfs -lsr /
To view your HD-fileSystem ... do you see your input file in there?
Maha
On Mar 14, 2011, at 10:46 AM, vishalgoyal wrote:
> hello,
>
> i am new user to hadoop. when i tried to run a task, it successfully
> compiled my file wo
I'd better restate my problem as it turns out to be my SequenceFile.Writer.
Thanks everyone,
Maha
On Mar 14, 2011, at 10:29 AM, maha wrote:
> Hi,
>
> I'm using SequenceFileAsBinaryOutputFormat to write the job output. Both
> Reduce key,value are of type BytesWritabl
adoop/hadoop-0.20.2/SeqFile at 0
By the way, I "-copyToLocal" the output file then try to read it using the
SequenceFile.reader.
Any idea is appreciated.
Maha
rcome this problem, which I don't
appreciate :( If you have any other idea, let me know.
Thank you,
Maha
On Mar 10, 2011, at 6:44 PM, Harsh J wrote:
> Once you have a JobConf/Configuration conf object in your Mapper (via
> setup/configure methods), you can do the following to get the
?The answer is NO. After checking, I realized that my
mapper HDFS isn't the same as the hdfs in my main function.
How can I open the same HDFS in maps as the one used in main?
Thank you,
Maha
So you're suggesting that using HBase will be an alternative to creating my own
stuff?!! By the way, why don't you use Binary inputs? do you think it's not
gonna have great affect on performance?
Thanks Mike.
On Mar 9, 2011, at 5:27 PM, Michael Segel wrote:
>
>
&
eliminate the
benefits of using Binary files.
If I decided to write my own InputFormat that defines Splits based on my
binary protocol and a recordReader also on my binary protocol.
Will that interfere with the streaming stuff ? or it is doable ?
Thank you,
Maha
Thanks again Harsh, I actually got the book 2 days ago, but didn't have time to
read it yet.
Maha
On Mar 4, 2011, at 7:54 PM, Harsh J wrote:
> Hi,
>
> On Sat, Mar 5, 2011 at 9:03 AM, maha wrote:
>> Hi,
>>
>> I have 2 questions:
>>
>> 1) Is a Se
InputFormat, Do I need to stick to the
header protocol defined in
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/io/SequenceFile.html
?
Thanks everyone,
Maha
Hi,
Using 3 Machines, each has an input-File ' f ' in its local disk in addition
to HDFS , assuming my program spawns a mapper/file .
Does that mean that mappers will be running on different machines?
Thank you,
Maha
;map.input.file");
}
:)
Maha
On Mar 2, 2011, at 9:03 PM, maha wrote:
> Thanks Harsh!!! but you don't think there is another way for the mappers to
> get configuration and access 'map.input.file' because I didn't write my
> recordReader.
>
> I truly apprecia
Thanks Harsh!!! but you don't think there is another way for the mappers to get
configuration and access 'map.input.file' because I didn't write my
recordReader.
I truly appreciate it.
Maha
On Mar 2, 2011, at 8:09 PM, Harsh J wrote:
> The property 'map.input
Hi,
If FileInputFormat is used with File.splitable(false) then each mapper will be
getting a full file. I want the mapper to also know the path or at least name
of the file it's assigned to.
Please help, any ideas are appreciated.
Thank you,
Maha
On a pseudo distributed mode, it actually just "move" the copy and not
reproduce it :)
Thanks anyways,
Maha
On Mar 2, 2011, at 1:04 PM, maha wrote:
> Thanks Mike :)
>
> I was also wondering what if:
>
> hdfs.CopyToLocal( src-file, dst-file) ; // is executed
just move
that copy to dst-file path ?
OR
Will hdfs go ahead with the copy and hence node N will have two copies of the
src-file? (ie. one on HDFS namespace and another in the local file system)
Thanks,
Maha
On Mar 2, 2011, at 12:38 PM, Michael Segel wrote:
>
>
> Run is local to
Hi,
Assuming my program implements the ToolRunner, my question is where does the
"run" function execute? ie. which daemon (DataNode/TT) ? or is it on the local
machine where it is run?
Thank you,
Maha
Hi,
Is it right that Map-output-bytes are different that Map-FILE-BYTES-WRITTEN
are a little different because of serialization to store in sequence files?
Thank you,
Maha
0 9 9
Thanks gain,
Maha
On Feb 26, 2011, at 8:45 PM, maha wrote:
> Ok got this point, thanks Harsh. But my experiment now is to eliminate # of
> spilled records for this small light job.
>
> This part of the map log:
> 2011-02-26 16:05:35,307 INFO org.apache.had
rsh J wrote:
> Hello,
>
> On Sun, Feb 27, 2011 at 9:30 AM, maha wrote:
>> 2011-02-26 16:05:35,571 INFO org.apache.hadoop.mapred.MapTask: Finished
>> spill 0 <--- WHY IS THIS ZERO WHEN FINAL JOB COUNTER
>> SAYS IT'S 9 SPILLED RECORDS FROM MA
the factor and the sort.mb
parameters but no way. Is that how it's supposed to be ???
Please any idea would be helpful.
Thank you,
Maha
ance created is distributed.
but the jobCoutners never uses it for intermediate results (Eg. for reducers to
read map-outputs)
So if you can answer my question further, I truly appreciate it !
Maha
On Feb 25, 2011, at 12:00 PM, Harsh J wrote:
> From what I could gather, all FileSystem instanc
user to see.
Thank you in advance,
Maha
and other times they're
different (nothing else was changed).
Please any explanation is appreciated !
Thank you,
Maha
On Feb 24, 2011, at 11:00 AM, maha wrote:
> Silly question..
>
>
> bin/hadoop dfs -
has a size of 83 bytes??
Thanks,
Maha
Hi Yang,
The problem could be solved using the following link:
http://www.roseindia.net/java/java-get-example/get-memory-usage.shtml
You need to use other memory managers like the Garbage collector and its
finalize method to measure memory accurately.
Good Luck,
Maha
On Feb 23, 2011
Based on the Java function documentation, it gives approximately the available
memory, so I need to tweak it with other functions.
So it's a Java issue not Hadoop.
Thanks anyways,
Maha
On Feb 23, 2011, at 6:31 PM, maha wrote:
> Hello Everyone,
>
> I'm using &
Hello Everyone,
I'm using " Runtime.getRuntime().freeMemory()" to see current memory
available before and after creation of an object, but this doesn't seem to work
well with Hadoop?
Why? and is there another alternative?
Thank you,
Maha
Thanks a bunch Saurabh! I'd better start optimizing my code then :)
Maha
On Feb 22, 2011, at 3:26 PM, Saurabh Dutta wrote:
> Even if you have 4 GB RAM you should be able to optimize spills. I don't
> think it should be an issue. What you need to do is write the program
&
mory being 4GB ??
I'm using the pseudo distributed mode.
Thank you,
Maha
On Feb 21, 2011, at 7:46 PM, Saurabh Dutta wrote:
> Hi Maha,
>
> The spilled record has to do with the transient data during the map and
> reduce operations. Note that it's not just the
?
Does changing io.sort.record.percent to be .9 instead .8 might produce
unexpected exceptions ?
Thank you,
Maha
How can then I produce an output/file per mapper not map-task?
Thank you,
Maha
On Feb 20, 2011, at 10:22 PM, Ted Dunning wrote:
> This is the most important thing that you have said. The map function
> is called once per unit of input but the mapper object persists for
> many input
Thanks for your answers Ted and Jim :)
Maha
On Feb 21, 2011, at 6:41 AM, Jim Falgout wrote:
> You're scenario matches the capability of NLineInputFormat exactly, so that
> looks to be the best solution. If you wrote your own input format, it would
> have to basically do what NL
Yet the map-function was processed 16 times as described by the
NLineInputSplit. I want the map-function to be one for the whole inputSplit
of 5 Lines and not for each of the 16 lines.
Any ideas other than building my own inputFormat?
Thank you,
Maha
On Feb 20, 2011, at 11:59 AM, maha
etInt("mapred.line.input.format.linespermap", 5); //# of lines per
mapper = 5
If you have any thought of whether the upper solution is worst that writing my
own inputSplit to be about 5 lines, let me know.
Thanks everyone !
Maha
On Feb 20, 2011, at 11:47 AM, maha wrote:
> Hi again J
like map1 has 8 lines and
map2 has 8 lines.
So first question: is there a difference between Mappers and maps ?
Second: Does that mean I need to write my own inputFormat to make the
InputSplit equal to multipleLines ???
Thank you,
Maha
On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote
Thanks Ted and Jim :)
Maha
On Feb 18, 2011, at 11:55 AM, Jim Falgout wrote:
> That's right. The TextInputFormat handles situations where records cross
> split boundaries. What your mapper will see is "whole" records.
>
> -Original Message-
> From: ma
that right?
Thank you,
Maha
opening a file.
Is there a faster way to do that such as background loggers saving mappers
output ??
Thank you,
Maha
27;+word1.substring(word1.indexOf(',')+1, word1.indexOf('>'))+'#'));
Yet the intermediate output still includes "d1":
#d1##1#
#d1##2#
#d1##1#
#d1##5#
#d1##3#
..
I put '#' to see if there was a space or newline included. Any ideas?
Thank you,
Maha
Thanks Ted. Then I have to write my own InputFormat to read a block-of-lines
per mapper.
NLineInputFormat didn't work with me, any working example about it is
appreciate it.
Thanks again,
Maha
On Feb 7, 2011, at 6:32 PM, Mark Kerzner wrote:
> Thanks!
> Mark
>
> On Mo
) will be slower because of scheduling time ?
Thank you,
Maha
Thanks Ted, I needed to know that there is no way I can make my program less
IO-intensive.
Maha
On Feb 7, 2011, at 12:04 PM, Ted Dunning wrote:
> That isn't going to happen.
>
> Remember that all of the mappers are running in different JVM's on
> (typically) different
My question is simply how to have a global variable (eg. HashTable) in hadoop ?
To be available for all mappers. Please help,
Thank you,
Maha
On Feb 7, 2011, at 11:21 AM, maha wrote:
> Thanks Vijay, now my question is how can I build one inverted index and have
> it ready to be acces
Null.
Any help is appreciated ,
Maha
Depending on the scale of data, between the two, it would be best stored in
hdfs
, and use the built-in InputFormat-s , as that is more scalable.
If necessary, (depending on how the data is stored), build a custom
InputFormat,
as per the API and set it
until now is around 1000 mappers. Appreciate
any thought :)
Thank you,
Maha
is still vague, is there a way to skip reading a specific
disk-block ?
Thanks,
Maha
Forgot to mention that, the benchmark is for hadoop so any parallel system
optimization provided by hadoop is appreciated.
Maha
On Jan 15, 2011, at 11:25 AM, maha wrote:
> Hi,
>
> I'm preparing a benchmark and would like to know how to best optimize my
> java program (ign
Hi,
I'm preparing a benchmark and would like to know how to best optimize my java
program (ignoring the IO/time). Any links to read from? or did anyone tried
Java-Optimizer-and-Decompile-Environment (JODE) ?
Thanks in advance,
Maha
I also use another solution for the namespace incompatibility which is to run :
rm -Rf /tmp/hadoop-/ *
then format the namenode. Hope that helps,
Maha
On Jan 9, 2011, at 9:08 PM, Adarsh Sharma wrote:
> Shuja Rehman wrote:
>> hi
>>
>> i have format the name node a
Nice ! I'd better try that. So the trick is only to add "hdfs" to the path to
access that namespace.
Thanks a ton :)
Maha
On Jan 7, 2011, at 1:55 PM, Jacob R Rideout wrote:
>> I'm wondering if there is a way to doing the following commands to HDFS
>>
Hi everyone,
I'm wondering if there is a way to doing the following commands to HDFS ...
File LocalinputDir = new File ("/user/maha/inputDir");
String[] file = LocalinputDir.list();
I'm given Hadoop and input directory with files {f1,f2 ..}. I wo
Never mind. I just saw the left tags on the side of the page in question found
in "Search Hadoop" site.
Thanks all,
Maha
On Jan 3, 2011, at 11:29 AM, maha wrote:
> Hi,
>
> I remember discussing the following error one time, but when I searched for
> it I can
object heap
Could not create the Java virtual machine.
Thank you,
Maha
Very helpful :) thanks Ping.
Maha
On Dec 30, 2010, at 6:13 PM, li ping wrote:
> On Fri, Dec 31, 2010 at 9:28 AM, maha wrote:
>
>> Hi,
>>
>> (1) I declared a global variable in my hadoop mainClass which gets
>> initialized in the 'run' function of this
nning before the maps. My question is in which
node? The JobTracker node?
Thank you,
Maha
Hi Cavus,
Please check that hadoop JobTracker and other daemons are running by typing
"jps". If you see one of (JobTracker,TaskTracker,namenode,datanode) missing
then you need to 'stop-all' then format the namenode and start-all again.
Maha
On Dec 30, 2010, at 7:52 A
hadoop deamons. Isn't
this a clean start??
Maha
On Dec 28, 2010, at 6:02 PM, Sudhir Vallamkondu wrote:
> I recently had this issue. UI links were working for some nodes meaning when
> I go to dfsHealth.jsp page and following some cluster data node links some
> would work and some wo
Hi Jander,
You mean write Map in another language? like python or C, then yes. Check
this http://hadoop.apache.org/common/docs/r0.18.0/streaming.html for Hadoop
Streaming.
Maha
On Dec 28, 2010, at 2:53 PM, Jander g wrote:
> Hi, all
>
> Whether Hadoop supports the map functio
adoop.mapred.JobTracker: Initializing
job_201012281415_0001
2010-12-28 14:18:29,386 INFO org.apache.hadoop.mapred.JobInProgress:
Initializing job_201012281415_0001
2010-12-28 14:18:29,585 INFO org.apache.hadoop.mapred.JobInProgress: Input size
for job job_201012281415_0001 = 459393. Number
Hi James,
I'm accessing ---> http://speed.cs.ucsb.edu:50030/ for the job tracker
and port: 50070 for the name node just like Hadoop quick start.
Did you mean to change the port in my mapred-site.xml file ?
mapred.job.tracker
speed.cs.ucsb.edu:9001
Maha
On Dec
that?
Harsh said:
Did you do any ant operation on your release copy of Hadoop prior to
starting it, by the way?
NO, I get the following error:
BUILD FAILED
/cs/sandbox/student/maha/hadoop-0.20.2/build.xml:316: Unable to find a javac
compiler;
com.sun.tools.javac.Main is not o
acker
speed.cs.ucsb.edu:9001
when I try to open: http://speed.cs.ucsb.edu:50070/ I get the 404 Error.
Any ideas?
Thank you,
Maha
tions:
FileSystem fs = FileSystem.get(new Configuration())// it worked !
Any reason for that?
Thank you,
Maha
On Dec 17, 2010, at 2:59 PM, Peng, Wei wrote:
>
> You can put your local file to distributed file system by hadoop fs -put
> localfile DFSfile.
>
( eg.
split1: /tmp/f1, split2:/tmp/f2 split4: /tmp/f4) instead I want -> (
split1: content of file1 , ).
Thank you,
Maha
On Dec 16, 2010, at 2:49 PM, Ted Dunning wrote:
> Maha,
>
> Remember that the mapper is not running on the same machine as the main
> clas
unt.myconf);
hdfs.copyFromLocalFile(new Path("/Users/file"), new
Path("/tmp/file"));
}catch(Exception e) { System.err.print("\nError");}
Also, the print statement will never print on console unless it's in my run
function..
Appreciate it :)
Maha
Hi Allen and thanks for responding ..
You're answer actually gave me another clue, I set numSplits = numFiles*100;
in myInputFormat and it worked :D ... Do you think there are side effects for
doing that?
Thank you,
Maha
On Dec 15, 2010, at 12:16 PM, Allen Wittenauer
Actually, I just realized that numSplits can't be modified "definitely". Even
if I write numSplits = 5, it's just a hint.
Then how come MultiFileInputFormat claims to use MultiFileSplit to contain one
file/split ?? or is that also just a hint?
Maha
On Dec 15, 2010, at
new myRecordReader((MultiFileSplit) split));
}
Yet, in myRecordReader, for example one split has the following;
" /tmp/input/file1:0+300
/tmp/input/file2:0+199 "
instead of each line in its own split.
Why? Any clues?
Thank you,
Maha
Thanks for the advice Harsh! This worked :)
Maha
On Dec 11, 2010, at 8:48 PM, Harsh J wrote:
> Try adding the commons-logging jar to your build path. It is available
> in the lib/ folder of your Hadoop distribution.
>
> If you use the MapReduce eclipse plugin which comes wit
ugh, I still appreciate your thoughts, thanks,
Maha
What I do is create a new pro
On Dec 11, 2010, at 6:27 PM, li ping wrote:
> Can you try to add the jar file in your Hadoop lib directory.
>
> On Sun, Dec 12, 2010 at 8:00 AM, Maha A. Alabduljalil
> wrote:
>
>>
>>
return (new LineRecordReader());
}
}
Can someone guide me to how to solve this in a different way
(ie.Make each input file unSplittable) ... or how to add the required
missing log.class?
Thank you so much,
Maha
der.
>
> How can I change this property to be FileInputSpilt and Record is the whole
> File ?
>
> something like JobConf.set ("File.input.format","FileInptSplit");
>
> Is there such way?
>
> Thanks in advance,
> Maha
>
ot;File.input.format","FileInptSplit");
Is there such way?
Thanks in advance,
Maha
On Nov 26, 2010, at 9:09 PM, li ping wrote:
> org.apache.hadoop.mapreduce.lib.input.TextInputForma
();
How did we know that map in this case is taking a line and not the whole
input document ?
Happy Thanksgiving everyone,
Maha
A much easier way is use the open source wordcount.java example and give it an
input directory including all the text files. This will output one text file
containing all the words and their frequencies from all the files.
Maha
On Nov 25, 2010, at 1:31 PM, Tri Doan wrote:
> Thurday 25
FileOutputFormat) ?
Maha
On Nov 17, 2010, at 10:11 PM, Alex Baranau wrote:
> In case you need to process the files separately, use one MR job for each
> file. You can add a single file as input. I believe you'll need to iterate
> over all files in input dir and start job instance for
for fileN.txt
Thanks,
Maha
That is exactly what I needed :) Thanks again Alex,
Maha
On Nov 14, 2010, at 9:54 PM, Alex Baranau wrote:
> You might find this search tool valuable: http://search-hadoop.com. You can
> do search in sources and javadocs separately.
>
> Alex Baranau
>
> Sematext :: h
Never mind Jeff ... I guess your answer would be to read Hadoop manual pages
and to keep practicing Java programming!
Because I'm trying to write a Hadoop program and it's taking me time to know
which class to use for myPurpose.
So thanks anyways,
Maha
On Nov 11, 2010,
Thanks Jeff :) How could you recall all possible readInputFile methods from
different classes? Is there like a spacial way to search Java APIs?
Maha
On Nov 10, 2010, at 5:13 PM, Jeff Zhang wrote:
> Use FileInputFormat.setInputPaths
>
>
>
> On Thu, Nov 11, 2010 at 5:45
can
I add it to the list ? I couldn't even edit the JobConf.class because the
source code is unavailable.
any link to where is this issue handled ?
Thanks,
Maha
intermediate values?
Thanks,
Maha
Hi Rohit,
I really learned alot from this link:
http://www.infosci.cornell.edu/hadoop/windows.html
Maha
Quoting Rohit Mishra :
I need clarification on how to run a Hadoop program. I am getting a
ClassNotFoundException error when I try to run the test example given in the
book [Ch 2].
Do
That's exactly what I needed to know! Thanks for the thorough explaination HJ :)
I'll try this today without the ssh and see how it goes.
Maha
On Oct 13, 2010, at 9:56 AM, Harsh J wrote:
> Do the 12 hosts have no identity/address known? AFAIK, you need to
> install Hadoop to
mputer? is it through the hadoop.tmp.dir by
including 'snoopy.cs.ucsb.edu' and 'booboo.cs.ucsb.edu' as hosts?master and
slave?
Thanks,
Maha
On Oct 12, 2010, at 9:04 PM, Medha Atre wrote:
> http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_
1 - 100 of 113 matches
Mail list logo