Re: A question about Mapper

2008-10-03 Thread Zhou, Yunqing
but the close() function doesn't supply me a Collector to put pairs in. Is it reasonable for me to store a reference of the collector in advance? I'm not sure if the collector is still available then. On Sat, Oct 4, 2008 at 12:17 PM, Joman Chu <[EMAIL PROTECTED]> wrote: > Hello, > > Does Map

hadoop under windows.

2008-10-03 Thread Dmitry Pushkarev
Hi. I have a strange problem with hadoop when I run jobs under windows (my laptop runs XP, but all cluster machines including namenode run Ubuntu). I run job (which runs perfectly under linux, and all configs and Java versions are the same), all mappers finishes successfully, and so does redu

Seeking Hadoop Guru

2008-10-03 Thread howard23
Appreciate any assist on this oppty in New York Cityif you or someone you know might be in interested in a F/T gig...pls contact me ASAP! Software Engineer-Hadoop Guru NYC F/T 2-5yrs experience 130K+ Responsibilities * Develop and

Re: A question about Mapper

2008-10-03 Thread Joman Chu
Hello, Does MapReduceBase.close() fit your needs? Take a look at http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/mapred/MapReduceBase.html#close() On Fri, October 3, 2008 11:36 pm, Zhou, Yunqing said: > the input is as follows. flag a b flag c d e flag f > > then I used a mappe

[Hadoop NY User Group Meetup] HIVE: Data Warehousing using Hadoop 10/9

2008-10-03 Thread Alex Dorman
Next NY Hadoop meetup will take place on Thursday, 10/9 at 6:30 pm. Jeff Hammerbacher will present HIVE: Data Warehousing using Hadoop. About HIVE: - Data Organization into Tables with logical and hash partitioning - A Metastore to store metadata about Tables/Partitions etc - A SQL like query l

A question about Mapper

2008-10-03 Thread Zhou, Yunqing
the input is as follows. flag a b flag c d e flag f then I used a mapper to first store values and then emit them all when met with a line contains "flag" but when the file reached its end, I have no chance to emit the last record.(in this case ,f) so how can I detect the mapper's end of its life

Re: Turning off FileSystem statistics during MapReduce

2008-10-03 Thread Arun C Murthy
Nathan, On Oct 3, 2008, at 5:18 PM, Nathan Marz wrote: Hello, We have been doing some profiling of our MapReduce jobs, and we are seeing about 20% of the time of our jobs is spent calling "FileSystem $Statistics.incrementBytesRead" when we interact with the FileSystem. Is there a way to t

Turning off FileSystem statistics during MapReduce

2008-10-03 Thread Nathan Marz
Hello, We have been doing some profiling of our MapReduce jobs, and we are seeing about 20% of the time of our jobs is spent calling "FileSystem $Statistics.incrementBytesRead" when we interact with the FileSystem. Is there a way to turn this stats-collection off? Thanks, Nathan Marz Raple

Re: mapreduce input file question

2008-10-03 Thread Ski Gh3
I wonder if I am missing something. I have a .txt file for input, and I placed it under the "input" directory of hdfs. Then I called FileInputFormat.setInputPaths(c, new Path("input")); and I got an error: Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input

Re: architecture diagram

2008-10-03 Thread Alex Loddengaard
The approach that you've described does not fit well in to the MapReduce paradigm. You may want to consider randomizing your data in a different way. Unfortunately some things can't be solved well with MapReduce, and I think this is one of them. Can someone else say more? Alex On Fri, Oct 3, 2

Re: mapreduce input file question

2008-10-03 Thread Alex Loddengaard
First, you need to point a MapReduce job at a directory, not an individual file. Second, when you specify a path in your job conf, using the Path object, that path you supply is a HDFS path, not a local path. Yes, you can use the output files of another MapReduce job as input for a second job, bu

Re: Maps running after reducers complete successfully?

2008-10-03 Thread Owen O'Malley
On Oct 3, 2008, at 12:20 PM, Billy Pearson wrote: Do we not have an option to store the map results in hdfs? It might be possible eventually, but not soon. The performance would be lower and it would substantially stress the NameNode. -- Owen

Re: Maps running after reducers complete successfully?

2008-10-03 Thread Billy Pearson
Do we not have an option to store the map results in hdfs? Billy "Owen O'Malley" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] It isn't optimal, but it is the expected behavior. In general when we lose a TaskTracker, we want the map outputs regenerated so that any reduces that n

mapreduce input file question

2008-10-03 Thread Ski Gh3
Hi all, I have a maybe naive question on providing input to a mapreduce program: how can I specify the input with respect to the hdfs path? right now I can specify a input file from my local directory, say, hadoop trunk I can also specify an absolute path for a dfs file using where it is actua

Re: Sharing an object across mappers

2008-10-03 Thread Devajyoti Sarkar
Hi Owen, Thanks a lot for the pointers. In order to use the MultiThreadedMapRunner, if I change the setMapRunnerClass() method in the jobConf, then does the rest of my code remain the same (apart from making it thread-safe)? Thanks in advance, Dev On Sat, Oct 4, 2008 at 12:29 AM, Owen O'Malley

Re: Sharing an object across mappers

2008-10-03 Thread Owen O'Malley
On Oct 3, 2008, at 7:49 AM, Devajyoti Sarkar wrote: Briefly going through the DistributedCache information, it seems to be a way to distribute files to mappers/reducers. Sure, but it handles the distribution problem for you. One still needs to read the contents into each map/reduce task V

Re: Maps running after reducers complete successfully?

2008-10-03 Thread pvvpr
thanks Owen, So this may be an enhancement? - Prasad. On Thursday 02 October 2008 09:58:03 pm Owen O'Malley wrote: > It isn't optimal, but it is the expected behavior. In general when we > lose a TaskTracker, we want the map outputs regenerated so that any > reduces that need to re-run (includi

Unable to retrieve filename using "mapred.input.file"

2008-10-03 Thread Yair Even-Zohar
I'm running map reduce and have the following lines of code: public void configure(JobConf job) { mapTaskId = job.get("mapred.task.id"); inputFile = job.get("mapred.input.file"); The problem I'm facing is that the inputFile I'm getting is null (the mapTaskId works fine).

Re: architecture diagram

2008-10-03 Thread Terrence A. Pietrondi
Sorry for the confusion, I did make some typos. My example should have looked like... > A|B|C > D|E|G > > pivots too... > > D|A > E|B > G|C > > Then for each row, shuffle the contents around randomly... > > D|A > B|E > C|G > > Then pivot the data back... > > A|E|G > D|B|C The general goal is to

Re: architecture diagram

2008-10-03 Thread Alex Loddengaard
Can you confirm that the example you've presented is accurate? I think you may have made some typos, because the letter "G" isn't in the final result; I also think your first pivot accidentally swapped C and G. I'm having a hard time understanding what you want to do, because it seems like your o

Re: Sharing an object across mappers

2008-10-03 Thread Devajyoti Sarkar
Hi Arun, Briefly going through the DistributedCache information, it seems to be a way to distribute files to mappers/reducers. One still needs to read the contents into each map/reduce task VM. Therefore, the data gets replicated across the VMs in a single node. It seems it does not address my bas

Re: Sharing an object across mappers

2008-10-03 Thread Arun C Murthy
On Oct 3, 2008, at 1:10 AM, Devajyoti Sarkar wrote: Hi Alan, Thanks for your message. The object can be read-only once it is initialized - I do not need to modify Please take a look at DistributedCache: http://hadoop.apache.org/core/docs/current/mapred_tutorial.html#DistributedCache An e

Re: 1 file per record

2008-10-03 Thread chandravadana
suppose i use TextInputFormat.. i set issplitable false.. and there are 5 files.. so what happens to numsplits now... will that be set to 0.. S.Chandravadana owen.omalley wrote: > > On Oct 2, 2008, at 1:50 AM, chandravadana wrote: > >> If we dont specify numSplits in getsplits(), then what

Re: Sharing an object across mappers

2008-10-03 Thread Devajyoti Sarkar
Hi Alan, Thanks for your message. The object can be read-only once it is initialized - I do not need to modify it. Essentially it is an object that allows me to analyze/modify data that I am mapping/reducing. It comes to about 3-4GB of RAM. The problem I have is that if I run multiple mappers, th