but the close() function doesn't supply me a Collector to put pairs in.
Is it reasonable for me to store a reference of the collector in advance?
I'm not sure if the collector is still available then.
On Sat, Oct 4, 2008 at 12:17 PM, Joman Chu <[EMAIL PROTECTED]> wrote:
> Hello,
>
> Does Map
Hi.
I have a strange problem with hadoop when I run jobs under windows (my
laptop runs XP, but all cluster machines including namenode run Ubuntu). I
run job (which runs perfectly under linux, and all configs and Java versions
are the same), all mappers finishes successfully, and so does redu
Appreciate any assist on this oppty in New York Cityif you or someone
you know might be in interested in a F/T gig...pls contact me ASAP!
Software Engineer-Hadoop Guru NYC F/T
2-5yrs experience 130K+
Responsibilities
* Develop and
Hello,
Does MapReduceBase.close() fit your needs? Take a look at
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapReduceBase.html#close()
On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said:
> the input is as follows. flag a b flag c d e flag f
>
> then I used a mappe
Next NY Hadoop meetup will take place on Thursday, 10/9 at 6:30 pm.
Jeff Hammerbacher will present HIVE: Data Warehousing using Hadoop.
About HIVE:
- Data Organization into Tables with logical and hash partitioning
- A Metastore to store metadata about Tables/Partitions etc
- A SQL like query l
the input is as follows.
flag
a
b
flag
c
d
e
flag
f
then I used a mapper to first store values and then emit them all when met
with a line contains "flag"
but when the file reached its end, I have no chance to emit the last
record.(in this case ,f)
so how can I detect the mapper's end of its life
Nathan,
On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote:
Hello,
We have been doing some profiling of our MapReduce jobs, and we are
seeing about 20% of the time of our jobs is spent calling "FileSystem
$Statistics.incrementBytesRead" when we interact with the
FileSystem. Is there a way to t
Hello,
We have been doing some profiling of our MapReduce jobs, and we are
seeing about 20% of the time of our jobs is spent calling "FileSystem
$Statistics.incrementBytesRead" when we interact with the FileSystem.
Is there a way to turn this stats-collection off?
Thanks,
Nathan Marz
Raple
I wonder if I am missing something.
I have a .txt file for input, and I placed it under the "input" directory of
hdfs.
Then I called
FileInputFormat.setInputPaths(c, new Path("input"));
and I got an error:
Exception in thread "main"
org.apache.hadoop.mapred.InvalidInputException: Input
The approach that you've described does not fit well in to the MapReduce
paradigm. You may want to consider randomizing your data in a different
way.
Unfortunately some things can't be solved well with MapReduce, and I think
this is one of them.
Can someone else say more?
Alex
On Fri, Oct 3, 2
First, you need to point a MapReduce job at a directory, not an individual
file. Second, when you specify a path in your job conf, using the Path
object, that path you supply is a HDFS path, not a local path.
Yes, you can use the output files of another MapReduce job as input for a
second job, bu
On Oct 3, 2008, at 12:20 PM, Billy Pearson wrote:
Do we not have an option to store the map results in hdfs?
It might be possible eventually, but not soon. The performance would
be lower and it would substantially stress the NameNode.
-- Owen
Do we not have an option to store the map results in hdfs?
Billy
"Owen O'Malley" <[EMAIL PROTECTED]> wrote in
message news:[EMAIL PROTECTED]
It isn't optimal, but it is the expected behavior. In general when we
lose a TaskTracker, we want the map outputs regenerated so that any
reduces that n
Hi all,
I have a maybe naive question on providing input to a mapreduce program:
how can I specify the input with respect to the hdfs path?
right now I can specify a input file from my local directory, say, hadoop
trunk
I can also specify an absolute path for a dfs file using where it is
actua
Hi Owen,
Thanks a lot for the pointers.
In order to use the MultiThreadedMapRunner, if I change the
setMapRunnerClass() method in the jobConf, then does the rest of my code
remain the same (apart from making it thread-safe)?
Thanks in advance,
Dev
On Sat, Oct 4, 2008 at 12:29 AM, Owen O'Malley
On Oct 3, 2008, at 7:49 AM, Devajyoti Sarkar wrote:
Briefly going through the DistributedCache information, it seems to
be a way
to distribute files to mappers/reducers.
Sure, but it handles the distribution problem for you.
One still needs to read the
contents into each map/reduce task V
thanks Owen,
So this may be an enhancement?
- Prasad.
On Thursday 02 October 2008 09:58:03 pm Owen O'Malley wrote:
> It isn't optimal, but it is the expected behavior. In general when we
> lose a TaskTracker, we want the map outputs regenerated so that any
> reduces that need to re-run (includi
I'm running map reduce and have the following lines of code:
public void configure(JobConf job) {
mapTaskId = job.get("mapred.task.id");
inputFile = job.get("mapred.input.file");
The problem I'm facing is that the inputFile I'm getting is null (the
mapTaskId works fine).
Sorry for the confusion, I did make some typos. My example should have looked
like...
> A|B|C
> D|E|G
>
> pivots too...
>
> D|A
> E|B
> G|C
>
> Then for each row, shuffle the contents around randomly...
>
> D|A
> B|E
> C|G
>
> Then pivot the data back...
>
> A|E|G
> D|B|C
The general goal is to
Can you confirm that the example you've presented is accurate? I think you
may have made some typos, because the letter "G" isn't in the final result;
I also think your first pivot accidentally swapped C and G. I'm having a
hard time understanding what you want to do, because it seems like your
o
Hi Arun,
Briefly going through the DistributedCache information, it seems to be a way
to distribute files to mappers/reducers. One still needs to read the
contents into each map/reduce task VM. Therefore, the data gets replicated
across the VMs in a single node. It seems it does not address my bas
On Oct 3, 2008, at 1:10 AM, Devajyoti Sarkar wrote:
Hi Alan,
Thanks for your message.
The object can be read-only once it is initialized - I do not need
to modify
Please take a look at DistributedCache:
http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache
An e
suppose i use TextInputFormat.. i set issplitable false.. and there are 5
files..
so what happens to numsplits now... will that be set to 0..
S.Chandravadana
owen.omalley wrote:
>
> On Oct 2, 2008, at 1:50 AM, chandravadana wrote:
>
>> If we dont specify numSplits in getsplits(), then what
Hi Alan,
Thanks for your message.
The object can be read-only once it is initialized - I do not need to modify
it. Essentially it is an object that allows me to analyze/modify data that I
am mapping/reducing. It comes to about 3-4GB of RAM. The problem I have is
that if I run multiple mappers, th
24 matches
Mail list logo