Re: How to connect to a cluster by using eclipse

2012-07-04 Thread Jason Yang
all right, thanks~ 在 2012年7月5日星期四,Marcos Ortiz 写道: > Jason, > Ramon is right. > The best way to debug a MapReduce job is mounting a local cluster, and > then, when you have tested enough your code, then, you can > deploy it in a real distributed cluster. > On 07/04/2012 10

Re: How to connect to a cluster by using eclipse

2012-07-04 Thread Jason Yang
using iptables :( 在 2012年7月4日星期三, 写道: > Jason, > > >the easiest way to debug a MapRedupe program with eclipse is working > on hadoop local. > http://hadoop.apache.org/common/docs/r0.20.2/quickstart.html#Local In > this mode all the components run locally on the same

How to connect to a cluster by using eclipse

2012-07-04 Thread Jason Yang
Hi, all I have a hadoop cluster with 3 nodes, the network topology is like this: 1. For each DataNode, its IP address is like :192.168.0.XXX; 2. For the NameNode, it has two network cards: one is connect with the DataNodes as a local LAN with IP address 192.168.0.110, while the other one is connec

Re: Would I call a DLL in MR by using JNI

2012-05-23 Thread jason Yang
.dll into the PATH. > And everything works. > > Zhu, Guojun > Modeling Sr Graduate > 571-3824370 > guojun_...@freddiemac.com > Financial Engineering > Freddie Mac > > > *jason Yang * > >05/23/2012 05:37 AM > Please respond to > mapreduce-user@had

Would I call a DLL in MR by using JNI

2012-05-23 Thread jason Yang
Hi, All~ Currently, I'm trying to rewrite an algorithm to MapReduce form. Since the algorithm depends on some third-party DLLs which are written in C++, I was wondering would I call a DLL in the Map() / Reduce() by using JNI? Thanks. -- YANG, Lin

Re: Multiple avro outputs from a reducer

2011-07-30 Thread Jason
You can extend/customize MultipleOutputs and pass schema related settings via properties prefixed with MO name, just like it is done with format classes there. Also to send a dummy key or value why not just to use NullWritable? It's efficient as it does not consume any space. Sent from my iPho

Re: Algorithm for cross product

2011-06-23 Thread Jason
everal times until the mapper > had generated all sets of the form > > > On Wed, Jun 22, 2011 at 5:13 PM, Jason wrote: > I remember I had a similar problem. > The way I approached it was by partitioning one of the data sets. At high > level these are the steps: >

Re: Algorithm for cross product

2011-06-22 Thread Jason
I remember I had a similar problem. The way I approached it was by partitioning one of the data sets. At high level these are the steps: Suppose you decide to partition set A. Each partition represents a subset/range of the A keys and must be small enough to fit records in memory. Each partit

Re: Mapping one key per Map Task

2011-05-23 Thread Jason
Look at NLineInputFormat Sent from my iPhone On May 23, 2011, at 2:09 AM, Vincent Xue wrote: > Hello Hadoop Users, > > I would like to know if anyone has ever tried splitting an input > sequence file by key instead of by size. I know that this is unusual > for the map reduce paradigm but I am

Re: How to merge several SequenceFile into one?

2011-05-11 Thread jason
M/R job with a single reducer would do the job. This way you can utilize distributed sort and merge/combine/dedupe key/values as you wish. On 5/11/11, 丛林 wrote: > Hi all, > > There is lots of SequenceFile in HDFS, how can I merge them into one > SequenceFile? > > Thanks for you suggestion. > > -L

Re: Is there any way I could keep both the Mapper and Reducer output in hdfs?

2011-05-03 Thread Jason
It is actually trivial to do using MultipleOutputs. You just need to emit your key-values to both MO and standard output context/collector in your mapper. Two things you should know about MO: 1. Early implementation has a serious (couple of order of magnitude) performance bug 2. Output files not

Re: Best approach for accessing secondary map task outputs from reduce tasks?

2011-02-13 Thread Jason
I think this kind of partitioner is a little hackish. More straight forward approach is to emit the extra data N times under special keys and write a partitioner that would recognize these keys and dispatch them accordingly between partitions 0..N-1 Also if this data needs to be shipped to reduc

Re: cross product of two files using MapReduce - pls suggest

2011-01-19 Thread Jason
I am afraid that by reading an hdfs file manually in your mapper, you are loosing data locality. You can try putting smaller vectors into distributed cache and preload them all in memory in the mapper setup. This implies that they can fit in memory and also that you can change your m/r to run ov

Re: Passing messages

2010-12-18 Thread Jason
t from my iPhone 4 On Dec 18, 2010, at 3:19 PM, Martin Becker <_martinbec...@web.de> wrote: > Hello Jason, > > real time values are not required. Some lagging is tolerable. The > value/threshold communication is only needed to keep other reducers > from doing unnecessary work.

Re: Passing messages

2010-12-18 Thread Jason
> Reducers would retrieve that increased value when accessing the same > Counter? I do not think counters reflect real time value. Even if they get updated the values will lag. If you require uptodate value I am afraid you will have to run a single reducer. Sent from my iPhone 4 On Dec 18, 201

Re: Map-Reduce Applicability With All-In Memory Data

2010-12-09 Thread Jason
Take a look at NLineInputFormat. You might want to use it in combination with DistributedCache. Sent from my iPhone On Dec 9, 2010, at 5:02 AM, Narinder Kumar wrote: > Hi All, > > We have a problem in hand which we would like to solve using Distributed and > Parallel Processing. > > Probl

How to report cleanup progress (new API)?

2010-12-06 Thread Jason
When most of the work is done by reducer at cleanup ( takes 90% of the job time) how can I report a proper progress of the overall job? By default the job tracker shows 100% right after all records are passed through the reduce(). I would rather like to see 10% after all reduce() calls and the

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
Great! mapred.map.tasks and mapred.task.partition work perfectly for me, even for the local job runner. Thanks On Dec 3, 2010, at 5:59 PM, Harsh J wrote: > Hi, > > (Answers may be 0.20 specific) > > On Sat, Dec 4, 2010 at 6:41 AM, Jason wrote: >> In my mapper code I n

Re: Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
> BTW, why not take task attempt id context.getTaskAttemptID() as the > prefix of unique id ? The task attempt id for each task should be > different The reason is that I would prefer to not have big gaps in my int id sequence, so i'd rather store mapper task ID in the low bits (suffix instead of

Is it pissible get a number of mapper tasks?

2010-12-03 Thread Jason
In my mapper code I need to know the total number of mappers which is the same as number of input splits. (I need it for unique int Id generation) Basically Im looking for an analog of context.getNumReduceTasks() but can't find it. Thanks >

Surge 2010 Early Registration ends Tuesday!

2010-08-27 Thread Jason Dixon
t and guarantee your seat to this year's event! -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241

Register now for Surge 2010

2010-08-02 Thread Jason Dixon
your seat to this year's event! http://omniti.com/surge/2010/register Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241

Last day to submit your Surge 2010 CFP!

2010-07-09 Thread Jason Dixon
your business sponsor/exhibit at Surge 2010, please contact us at su...@omniti.com. Thanks! -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241

CFP for Surge Scalability Conference 2010

2010-07-02 Thread Jason Dixon
icipating as an exhibitor, please visit the Surge website or contact us at su...@omniti.com. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241

CFP for Surge Scalability Conference 2010

2010-06-14 Thread Jason Dixon
n Surge is just what you've been waiting for. For more information, including CFP, sponsorship of the event, or participating as an exhibitor, please contact us at su...@omniti.com. Thanks, -- Jason Dixon OmniTI Computer Consulting, Inc. jdi...@omniti.com 443.325.1357 x.241

Re: Hadoop streaming job issue

2009-11-16 Thread Jason Venner
There is a very clear picture in chapter 8 of pro hadoop, on all of the separators for streaming jobs. On Tue, Nov 10, 2009 at 6:53 AM, wd wrote: > You mean the ^A ? > I tried \u0001 and \x01, the streaming job recognise it as a string, not > ^A.. > > :( > > 2009/11/10 Amogh Vasekar > > Hi, >

Re: MapReduce Child don't exit?

2009-11-16 Thread Jason Venner
The dfs client code waits until the all of the datanodes that are going to hold a replica of your output's blocks have ack'd. If you are pausing there, most likely something is wrong in your hdfs cluster. On Thu, Nov 12, 2009 at 7:06 AM, Ted Xu wrote: > hi all, > > We are using hadoop-0.19.1 on

Re: Hadoop Streaming overhead

2009-11-16 Thread Jason Venner
All of your data has to be converted back and forth to strings, and passed through pipes from the jvm to your task and back from the task to the jvm. On Thu, Nov 12, 2009 at 10:06 PM, Alexey Tigarev wrote: > Hi All! > > How much overhead using Hadoop Streming vs. native Java steps does add? > >

Re: Why doesnt my mapper speak to me :(

2009-11-16 Thread Jason Venner
Your log messages to stdout,stderr and syslog will end up in the logs/userlogs directory of your task tracker. If the job is still visible via the web ui for the job tracker host (usually port 50030), you can select the individual tasks that were run for your job, and if you click through enough s

Re: noobie question on hadoop's NoClassDefFoundError

2009-11-16 Thread Jason Venner
Your eclipse instance doesn't have the jar files in the lib directory of your hadoop installation in the class path. On Sat, Nov 14, 2009 at 7:51 PM, felix gao wrote: > I wrote a simple code in my eclipse as > > Text t = new Text("hadoop"); > System.out.println((char)t.charAt(2)); > > when I tr

Re: Map output compression leads to JVM crash (0.20.0)

2009-10-27 Thread Jason Venner
The failure appear or occur in code in the system dynamic linker, which implies a shared library compatibility problem, or a heap shortfall On Mon, Oct 26, 2009 at 2:25 PM, Ed Mazur wrote: > Err, disregard that. > > $ cat /proc/version > Linux version 2.6.9-89.0.9.plus.c4smp (mockbu...@builder10

Re: MultipleTextOutputFormat giving "Bad connect ack with firstBadLink"

2009-10-27 Thread Jason Venner
This error is very common in applications that run out of file descriptors or simply open vast numbers of files on an and HDFS with a very high block density per datanode. It is quite easy to open hundreds of thousands of files with the Multi*OutputFormat classes. If you can collect your output in

Re: MapRed Job Completes; Output Ceases Mid-Job

2009-10-08 Thread Jason Venner
Are you perhaps creating large numbers of files, and running out of file descriptors in your tasks. On Wed, Oct 7, 2009 at 1:52 PM, Geoffry Roberts wrote: > All, > > I have a MapRed job that ceases to produce output about halfway through. > The obvious question is why? > > This job reads a file a

Re: What does MAX_FAILED_UNIQUE_FETCHES mean?

2009-07-28 Thread Jason Venner
I have seen this happen when there are inconsistent hostname to ip address lookups across the cluster and a node running a reducer is not connecting to the host that actually has the map output due to getting a different ip address for the node name. On Mon, Jul 27, 2009 at 9:46 AM, Geoffry Robert