(/users/jeastman/...) but that throws
'input path does not exist' errors.
Jeff
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Monday, January 21, 2008 11:15 AM
To: hadoop-user@lucene.apache.org
Subject: RE: Platform reliability with Hadoop
Is it really t
There has been significant work on building a web-DAV interface for HDFS. I
haven't heard any news for some time, however.
On 1/21/08 11:32 AM, "Dawid Weiss" <[EMAIL PROTECTED]> wrote:
>
>> The Eclipse plug-in also features a DFS browser.
>
> Yep. That's all true, I don't mean to self-promot
The Eclipse plug-in also features a DFS browser.
Yep. That's all true, I don't mean to self-promote, because there really isn't
that much to advertise ;) I was just quite attached to file manager-like user
interface; the mucommander clone I posted served me as a browser, but also for
rudime
unday, January 20, 2008 11:44 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Platform reliability with Hadoop
you might want to change hadoop.tmp.dir entry alone. since others are
derived out of this, everything should be fine.
i am wondering if hadoop.tmp.dir might be used elsewhere
thanks,
lohi
ttps://issues.apache.org/jira/browse/HADOOP-2534
Works well for me, read the docs -- you'll need to update JARs (and
re-sign them) if you work with the HEAD.
Dawid
l need to update JARs (and
re-sign them) if you work with the HEAD.
Dawid
--
~~~
101tec GmbH
Halle (Saale), Saxony-Anhalt, Germany
http://www.101tec.com
The web interface can also be used. This is handy if you are following the
progress of the job via the web.
Scroll to the bottom of the page.
On 1/20/08 11:39 PM, "Jeff Hammerbacher" <[EMAIL PROTECTED]>
wrote:
> ./bin/hadoop job -Dmapred.job.tracker=:
> -kill
>
> you can find the required c
I posted something like this a while ago. See here.
https://issues.apache.org/jira/browse/HADOOP-2534
Works well for me, read the docs -- you'll need to update JARs (and re-sign
them) if you work with the HEAD.
Dawid
bin/hadoop dfs -copyFromLocal (Command Line)
or
read the hadop wiki (Programming)
On Jan 21, 2008 3:31 PM, wang daming <[EMAIL PROTECTED]> wrote:
> I have setup the HDFS cluster under linux enviroment, but I need write
> files to the DFS from windows application, how can I achieve this? thanks
Hello,
I meant like a gui. I found out this is possible tru the web interface
however.
thanks.
On Jan 21, 2008 3:17 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> hadoop dfs -ls DIR
>
> (etc) allows you to see the file system
>
> Miles
>
> On 21/01/2008, Cam Bazz <[EMAIL PROTECTED]> wrote:
> >
>
hadoop dfs -ls DIR
(etc) allows you to see the file system
Miles
On 21/01/2008, Cam Bazz <[EMAIL PROTECTED]> wrote:
>
> hello
> is there any utility to browse hadoop fs?
> Best Regards,
> -C.B.
>
Use the scanner as describe below:
HTable table = new HTable(HBaseConfiguration conf, Text tableName);
HScannerInterface scan = table.obtainScanner(Text[] columns, new Text(""));
HStoreKey key = new HStoreKey();
TreeMap results = new TreeMap();
while (scan.next(key, results) {
..
}
On 1/21/0
no, the jobtracker will take care of that for you.
On Jan 20, 2008 11:44 PM, ma qiang <[EMAIL PROTECTED]> wrote:
> So I must terminate these jobs using this method you mentioned on all
> the computers in my cluster, am I right?
>
>
>
>
> On Jan 21, 2008 3:39 PM, Jeff Hammerbacher <[EMAIL PROTECTE
So I must terminate these jobs using this method you mentioned on all
the computers in my cluster, am I right?
On Jan 21, 2008 3:39 PM, Jeff Hammerbacher <[EMAIL PROTECTED]> wrote:
> ./bin/hadoop job -Dmapred.job.tracker=:
> -kill
>
> you can find the required constants by pointing your browse
./bin/hadoop job -Dmapred.job.tracker=:
-kill
you can find the required constants by pointing your browser to your
jobtracker.
On Jan 20, 2008 11:36 PM, ma qiang <[EMAIL PROTECTED]> wrote:
> Dear colleagues:
> I use eclipse to develop some programs in hadoop, when
> exceptions took plac
I would say that it is generally better practice to deploy hadoop.jar in the
lib directory of the war file that you are deploying so that you can change
versions of hadoop more easily.
Your problem is that you have dropped the tomcat support classes from your
CLASSPATH in the process of getting h
We effectively have this situation on a significant fraction of our
work-load as well. Much of our data is summarized hourly and is encrypted
and compressed which makes it unsplittable. This means that the map
processes are often not local to the data since the data is typically spread
only to
g
Sent: Sunday, January 20, 2008 11:05:28 AM
Subject: RE: Platform reliability with Hadoop
I am almost operational again but something in my configuration is
still
not quite right. Here's what I did:
- I created a directory /u1/cloud-data on every machine's local disk
- I created a new u
ilto:[EMAIL PROTECTED]
Sent: Wednesday, January 16, 2008 10:04 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Platform reliability with Hadoop
The /tmp default has caught us once or twice too. Now we put the files
elsewhere.
[EMAIL PROTECTED] wrote:
>> The DFS is stored in /tmp o
On 1/18/08 3:29 PM, "Jason Venner" <[EMAIL PROTECTED]> wrote:
> We were thinking of doing this with some machines that do not have
> decent disks but have plenty of netbandwidth.
We were doing it for a while, in particular for our data loaders*
but that was months and months ago.
Try like this:
HTable table = new HTable(conf, Text tableName);
long lockId = table.startUpdate(Text row);
table.put(lockId, Text columnName, byte[] value);
table.commit(lockId);
On 1/20/08, ma qiang <[EMAIL PROTECTED]> wrote:
> Dear colleagues:
> I don't know how to insert new rows
Try like this :
HTable table = new HTable(conf, Text tableName);
long lockId = table.startUpdate(getRow());
table.put(lockId, Text columnName, byte[] value);
table.commit(lockId);
On 1/20/08, ma qiang <[EMAIL PROTECTED]> wrote:
> Dear colleagues:
> I don't know how to insert new
HADOOP-2343 describes regionservers 'hanging' inexplicably. Do you
think you are experiencing a similar phenomenon?
St.Ack
Billy wrote:
I thank it might be related to something in the region server as it never
happens to more then one region at a time but they all have failed over time
even t
I thank it might be related to something in the region server as it never
happens to more then one region at a time but they all have failed over time
even the one on the same node as the master so that rules out network/switch
problems. if it was the master then all the regions server would go
regionservers will shut themselves down if they are unable to contact
the master. Can you figure what the master was doing such that it
became non-responsive during this time?
St.Ack
Billy wrote:
I been getting these errors from time to time seams like when the region
servers are under a load
Oh, so it is the task running on the other node (ncdm-15) fails and Hadoop
re-run the task on the local node (ncdm-8). (I only have two nodes, ncdm-8
and ncdm-15. Both namenode and jobtracker are running on ncdm-8. The
program is also started on ncdm-8).
08/01/18 19:08:27 INFO
Hi Yunhong,
As per the output it seems the job ran to successful completion (albeit with
some failures)...
Devaraj
> -Original Message-
> From: Yunhong Gu1 [mailto:[EMAIL PROTECTED]
> Sent: Saturday, January 19, 2008 8:56 AM
> To: hadoop-user@lucene.apache.org
> Subj
Yes, it looks like HADOOP-1374
The program actually failed after a while:
[EMAIL PROTECTED]:~/hadoop-0.15.2$ ./bin/hadoop jar hadoop-0.15.2-test.jar
mrbench
MRBenchmark.0.0.2
08/01/18 18:53:08 INFO mapred.MRBench: creating control file: 1 numLines,
ASCENDING sortOrder
08/01/18 18:53:08 INFO
Looks like we still have this unsolved mysterious problem:
http://issues.apache.org/jira/browse/HADOOP-1374
Could it be related to HADOOP-1246? Arun?
Thanks,
--Konstantin
Yunhong Gu1 wrote:
Hi,
If someone knows how to fix the problem described below, please help me
out. Thanks!
I am test
The program "mrbench" takes 1 second on a single node, so I think waiting
for 1 minute should be long enough. And I also restarted Hadoop after I
updated the config file.
Yunhong
On Fri, 18 Jan 2008, Miles Osborne wrote:
I think it takes a while to actually work, so be patient!
Miles
On
I am using 0.15.2, and in my case, CPUs on both nodes are idle. It looks
like the program is trapped into a synchronization deadlock or some
waiting state that will never be awaken.
Yunhong
On Fri, 18 Jan 2008, Jason Venner wrote:
When this was happening to us, there was a block replication
When this was happening to us, there was a block replication error and
one node was in an endless loop trying to replicate a block to another
node which would not accept it. In our case most of the cluster was idle
but a cpu on the machine trying send the block was heavily used.
We never were
I think it takes a while to actually work, so be patient!
Miles
On 18/01/2008, Yunhong Gu1 <[EMAIL PROTECTED]> wrote:
>
>
> Hi, Miles,
>
> Thanks for your information. I applied this but the problem still exists.
> By the way, when this happens, the CPUs are idle and doing nothing.
>
> Yunhong
>
Hi, Miles,
Thanks for your information. I applied this but the problem still exists.
By the way, when this happens, the CPUs are idle and doing nothing.
Yunhong
On Fri, 18 Jan 2008, Miles Osborne wrote:
I had the same problem. If I recall, the fix is to add the following to
your hadoop-si
I had the same problem. If I recall, the fix is to add the following to
your hadoop-site.xml file:
mapred.reduce.copy.backoff
5
See hadoop-1984
Miles
On 18/01/2008, Yunhong Gu1 <[EMAIL PROTECTED]> wrote:
>
>
> Hi,
>
> If someone knows how to fix the problem described below, please help me
>
may be running on different input tell us if this is map reduce / data problem
thanks,
lohit
- Original Message
From: Ted Dunning <[EMAIL PROTECTED]>
To: hadoop-user@lucene.apache.org
Sent: Friday, January 18, 2008 9:04:37 AM
Subject: Re: Hadoop only processing the first 64 meg bl
more into this when I'm
back next week. Thanks for your help so far. If anyone else has
suggestions over the weekend feel free to share.
--Matt
-Original Message-
From: Ted Dunning [mailto:[EMAIL PROTECTED]
Sent: Friday, January 18, 2008 12:05 PM
To: hadoop-user@lucene.apache.org
3 AM
> To: hadoop-user@lucene.apache.org
> Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig
> file
>
>
> Go into the web interface and look at the file.
>
> See if you can see all of the blocks.
>
>
> On 1/18/08 7:46 AM, "Matt Herndon"
Subject: Re: Hadoop only processing the first 64 meg block of a 2 gig
file
Go into the web interface and look at the file.
See if you can see all of the blocks.
On 1/18/08 7:46 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote:
> Hello,
>
>
>
> I'm trying to ge
Go into the web interface and look at the file.
See if you can see all of the blocks.
On 1/18/08 7:46 AM, "Matt Herndon" <[EMAIL PROTECTED]> wrote:
> Hello,
>
>
>
> I'm trying to get Hadoop to process a 2 gig file but it seems to only be
> processing the first block. I'm running the exact
It seems that your hbase master did not startup properly.
By default, hadoop and hbase should run correctly without any changes.
To locate your problem,
1. run HADOOP_HOME/bin/start-dfs.sh
2. run HADOOP_HBASE/bin/start-hbase.sh
3. run HADOOP_HBASE/bin/hbase shell
4. in hbase shell, run "show tables
Dear Mafish liu:
I haven't changed anything in my hbase-default.xml . I only
changed hbase-site.xml .
I don't know what happened in my hadoop. I deleted all the
hadoop to reinstall the whole hadoop then the error disappear. I
don't know what's wrong .
Thanks very much !
Port 60020 is default hbase.regionserver port in my hbase-defaut.xml.
hbase.regionserver
0.0.0.0:60020
The host and port a HBase region server runs at.
how about yours?
On Jan 18, 2008 4:04 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:
> Hi, ma qiang:
> It seems that you
Hi, ma qiang:
It seems that you had modified hadoop-site.xml to assign new ip/port.
Try to do following (I'm not quite sure it will work):
1. Stop your hadoop and hbase.
2. In your HADOOP_HOME, run "ant jar"
3. Restart hadoop and hbase.
On Jan 18, 2008 3:28 PM, ma qiang <[EMAIL PROTECTED]> wro
Yes, I'm sure I have my hadoop run ar first, and I can run my another
test in my hadoop .
On Jan 18, 2008 3:20 PM, Mafish Liu <[EMAIL PROTECTED]> wrote:
> Are you sure that you hava had your hadoop run before you run your test?
> 127.0.1.1:60020 is the namenode's ip:port, hbase is based on hadoop
Are you sure that you hava had your hadoop run before you run your test?
127.0.1.1:60020 is the namenode's ip:port, hbase is based on hadoop, it need
to
connent to hadoop.
On Jan 18, 2008 12:16 PM, ma qiang <[EMAIL PROTECTED]> wrote:
> Dear colleagues;
> I run the code as described below:
>
thanks, russell smith.
I fix the grub, but the ext3 file system which is restored from the Norton
Ghost seems not preserve correctly.
At last, I use G4L (Ghost for Linux) to copy whole hard disk from source
drive to destination drive, it works very well. Both GRUB and ext3 file
system works corre
Okay, that makes sense. It seems strange to return a reference to a
string object that's not const. Doesn't this break some C++ coding
standards? It seems it should return a pointer or have a set method
to maintain symmetry between the two languages. The counter argument,
of course, is that it'
Bin,
Did you try using dd from the source to the destination drive? That
should preserve grub.
Russell Smith
UKD1 Limited
Bin YANG wrote:
I use the Norton Ghost 8.0 ghost a whole ubuntu hard disk to a image, and
restore another hard disk from the image, but the restored hard disk cannnot
s
Dear ma qiang.
Firstly, please check the hbase status using hbase shell or hbase deamons log.
Also, If you didn't restart the hadoop, please re-start. (It means that
configuration file doesn't loaded)
B. Regards,
Edward yoon @ NHN, corp.
> Date: Thu, 17 Jan 2008 20:21:47
rom: [EMAIL PROTECTED]
> > To: hadoop-user@lucene.apache.org
> > Subject: Re: about using HBase?
> >
> > I have met this problem, When I run the code HBaseAdmin admin = new
> > HbaseAdmin(conf); the console print these messeages as below:
> > 08/01/17 18:46:46 INFO ipc.Client: R
hbase-site.xml should located in ${hadoop-home}/conf.
B. Regards,
Edward yoon @ NHN, corp.
> Date: Thu, 17 Jan 2008 18:51:11 +0800
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: Re: about using HBase?
>
> I have met this problem, When I run the cod
gt;
>
> > Date: Thu, 17 Jan 2008 16:50:29 +0800
> > From: [EMAIL PROTECTED]
> > To: hadoop-user@lucene.apache.org
> > Subject: Re: about using HBase?
> >
> > Thank you for your help!
> >
> > You metioned hadoop-0.16.* , but I still use hadoop-0.15 ,I can
Thu, 17 Jan 2008 16:50:29 +0800
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: Re: about using HBase?
>
> Thank you for your help!
>
> You metioned hadoop-0.16.* , but I still use hadoop-0.15 ,I can't see
> hadoop-0.16.* in the http:
.
>
>
> > From: [EMAIL PROTECTED]
> > To: hadoop-user@lucene.apache.org
> > Subject: RE: about using HBase?
> > Date: Thu, 17 Jan 2008 08:33:58 +
>
> >
> >
> > It's a org.apache.hadoop.hbase.hql.
> >
> > Or, Simply just us
Grandmotherly, If you want to retrieval the cell values, you should use the
HTable.class because Result Set return policy doesn't implemented yet.
Thanks.
B. Regards,
Edward yoon @ NHN, corp.
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: RE: abo
It's a org.apache.hadoop.hbase.hql.
Or, Simply just use the 'ctrl + shift + o' key on eclipse tool. (simle)
Thanks.
B. Regards,
Edward yoon @ NHN, corp.
> Date: Thu, 17 Jan 2008 16:29:28 +0800
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Sub
");
>
>
> B. Regards,
>
> Edward yoon @ NHN, corp.
>
>
> > Date: Thu, 17 Jan 2008 15:58:24 +0800
> > From: [EMAIL PROTECTED]
> > To: hadoop-user@lucene.apache.org
> > Subject: Re: about using HBase?
> >
> > Thanks very much!
uot;create table webtable('content','title');");
B. Regards,
Edward yoon @ NHN, corp.
> Date: Thu, 17 Jan 2008 15:58:24 +0800
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: Re: about using HBase?
>
> Thanks very much!
> When
t want the info server to run.
>
>
>
> hbase.rootdir
> /tmp/hbase
> location of HBase instance in dfs
>
>
>
>
>
>
>
> B. Regards,
>
> Edward yoon @ NHN, corp.
>
>
> > From: [EMAIL PROTECTED]
> > To: hadoop-user@lu
o run.
hbase.rootdir
/tmp/hbase
location of HBase instance in dfs
B. Regards,
Edward yoon @ NHN, corp.
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: RE: about using HBase?
> Date: Thu, 17 Jan 2008 07:46:00 +
>
>
&
Please copy the hadoop-0.16.*-hbase.jar to ${hadoop_home}/lib folder.
And, Here's a exmple of hadoop-site.xml
hbase.master
a51066.nhncorp.com:6
The port for the hbase master web UI
Set to -1 if you do not want the info server to run.
hbase
Myles Grant wrote:
I would like the values for a key to exist in a single file, and only
the values for that key.
Reducer.reduce() gets invoked once per key, i.e just once per key along
with all the values associated with it.
Reducer.reduce(key,So what I suggested should help you generate one f
I would like the values for a key to exist in a single file, and only
the values for that key. Each reduced key/value would get its own
file. If I understand correctly, all output of the reducers is
written to a single file.
-Myles
On Jan 16, 2008, at 9:29 PM, Amar Kamat wrote:
Hi,
Why
Hi,
Why couldn't you just write this logic in your reducer class. The reduce
[reduceClass.reduce()] method is invoked with a key and an iterator over
the values associated with the key. You can simply dump the values into
a file. Since the input to the reducer is sorted you can simply dump the
I made a issue for this problem.
https://issues.apache.org/jira/browse/HADOOP-2635
Thanks.
B. Regards,
Edward yoon @ NHN, corp.
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: RE: copyFromLocal bug?
> Date: Thu, 17 Jan 2008 04:55:20 +
>
>
&g
It seems bug about RawLocalFileSystem class.
I'll made a issue.
B. Regards,
Edward yoon @ NHN, corp.
> Date: Wed, 16 Jan 2008 23:03:03 -0500
> From: [EMAIL PROTECTED]
> To: hadoop-user@lucene.apache.org
> Subject: copyFromLocal bug?
>
> I was trying to copy a bunch of data over to my hadoop in
Thanks, I will try a safer place for the DFS.
Jeff
-Original Message-
From: Jason Venner [mailto:[EMAIL PROTECTED]
Sent: Wednesday, January 16, 2008 10:04 AM
To: hadoop-user@lucene.apache.org
Subject: Re: Platform reliability with Hadoop
The /tmp default has caught us once or twice too
The /tmp default has caught us once or twice too. Now we put the files
elsewhere.
[EMAIL PROTECTED] wrote:
The DFS is stored in /tmp on each box.
The developers who own the machines occasionally reboot and reprofile them
Wont you lose your blocks after reboot since /tmp gets cleaned up?
>The DFS is stored in /tmp on each box.
> The developers who own the machines occasionally reboot and reprofile them
Wont you lose your blocks after reboot since /tmp gets cleaned up? Could this
be the reason you see data corruption?
Good idea is to configure DFS to be any place other than /tmp
Thanks, Miles.
On Jan 16, 2008 11:51 AM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> The number of reduces should be a function of the amount of data needing
> reducing, not the number of mappers.
>
> For example, your mappers might delete 90% of the input data, in which
> case you should only ne
The number of reduces should be a function of the amount of data needing
reducing, not the number of mappers.
For example, your mappers might delete 90% of the input data, in which
case you should only need 1/10 of the number of reducers as mappers.
Miles
On 16/01/2008, Jim the Standing Bear <
>On 1/15/08 12:54 PM, "Miles Osborne" <[EMAIL PROTECTED]> wrote:
>
>> surely the clean way (in a streaming environment) would be to define a
>> representation of some kind which serialises the output.
>>
>> http://en.wikipedia.org/wiki/Serialization
>>
>> after your mappers and reducers have comp
hmm.. interesting... these are supposed to be the output from mappers
(and default reducers since I didn't specify any for those jobs)...
but shouldn't the number of reducers match the number of mappers? If
there was only one reducer, it would mean I only had one mapper task
running?? That is why
The part nomenclature does not refer to splits. It refers to how many
reduce processes were involved in actually writing the output file. Files
are split at read-time as necessary.
You will get more of them if you have more reducers.
On 1/16/08 8:25 AM, "Jim the Standing Bear" <[EMAIL PROTECT
Thanks Ted. I just didn't ask it right. Here is a stupid 101
question, which I am sure the answer lies in the documentation
somewhere, just that I was having some difficulties in finding it...
when I do an "ls" on the dfs, I would see this:
/user/bear/output/part-0
I probably got confused
Parallelizing the processing of data occurs at two steps. The first is
during the map phase where the input data file is (hopefully) split across
multiple tasks. This should happen transparently most of the time unless
you have a perverse data format or use unsplittable compression on your
file
This isn't really a question about Hadoop, but is about system
administration basics.
You are probably missing a master boot record (MBR) on the disk. Ask a
local linux expert to help you or look at the Norton documentation.
On 1/16/08 4:59 AM, "Bin YANG" <[EMAIL PROTECTED]> wrote:
> I use th
There is some considerable and very understandable confusion about map
tasks, mappers and input splits.
It is true that for large inputs the input should ultimately be split into
chunks so that each core that you have has to process 10-100 pieces of data.
To do that, however, you only need one m
wse/HADOOP-2164
>
> Runping
>
>
>
> > -Original Message-
> > From: Vadim Zaliva [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, January 15, 2008 9:59 PM
> > To: hadoop-user@lucene.apache.org
> > Subject: Re: unable to figure out this exception from red
I use the Norton Ghost 8.0 ghost a whole ubuntu hard disk to a image, and
restore another hard disk from the image, but the restored hard disk cannnot
start up ubuntu successfully.
The GRUB said error 22.
Does somebody know how to fix the problem?
Thanks.
Bin YANG
On Jan 16, 2008 4:54 AM, Sagar
I simply followed the wiki "The right level of parallelism for maps
seems to be around 10-100 maps/node",
http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces
We have 8 cores in each machine, so perhaps 100 mappers ought to be
right, it's set to 157 in the config but hadoop used ~200 for
I encountered a similar case.
Here is the Jira: https://issues.apache.org/jira/browse/HADOOP-2164
Runping
> -Original Message-
> From: Vadim Zaliva [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, January 15, 2008 9:59 PM
> To: hadoop-user@lucene.apache.org
> Subject: Re: u
On Jan 15, 2008, at 22:09, Jim the Standing Bear wrote:
Only thing I noticed (compared to my code) is missing:
client.setConf(conf);
before client.run(conf).
In this case default input format is used which uses
LongWriteableComparable as
a key. Not sure if this is a case, but something wo
Well, I also wish it was this simple, but as I said in the original
message, I never wanted to use LongWritable at all. Here is how I set
the job conf, and after that, is the reduce task. Also, if I got the
incorrect output key/value type, shouldn't it always fail as soon as
the reduce task is ru
On Jan 15, 2008, at 21:53, Jim the Standing Bear wrote:
I was asking lot of questions today, so I am glad to contribute at
least one answer. I have this problem when there was type mismatch
for key or values. You need to set up right type at your JobConf like
this:
conf.setOutputKeyCl
On Jan 15, 2008, at 18:08, Ted Dunning wrote:
One option that should work reasonably well is to have each mapper
output
with a constant key (as Rui suggests) and use a combiner to pre-
select the
top N elements. Communication between mappers and combiners is very
fast,
so this will be just
Output a constant key in the map function.
On 1/15/08 9:31 PM, "Vadim Zaliva" <[EMAIL PROTECTED]> wrote:
> On Jan 15, 2008, at 17:56, Peter W. wrote:
>
> That would output last 10 values for each key. I need
> to do this across all the keys in the set.
>
> Vadim
>
>> Hello,
>>
>> Try using
On Jan 15, 2008, at 17:56, Peter W. wrote:
That would output last 10 values for each key. I need
to do this across all the keys in the set.
Vadim
Hello,
Try using Java collection.
untested code follows...
public static class R extends MapReduceBase implements Reducer
{
public void reduc
op-user@lucene.apache.org
> Sent: Tuesday, January 15, 2008 4:13:11 PM
> Subject: Re: single output file
>
>
>
> On Jan 15, 2008, at 13:57, Ted Dunning wrote:
>
>> This is happening because you have many reducers running, only one
>> of which
>> gets any data
>the question remains, how to return, say, last 10 records from Reducer.
>I need to know when last record is processed.
how about storing last 10 records seen so far. each time you see another record
discard the old one and keep fresh copy of last seen 10 records.
Now to know the end, how about s
Hello,
Try using Java collection.
untested code follows...
public static class R extends MapReduceBase implements Reducer
{
public void reduce(WritableComparable wc,Iterator it,
OutputCollector out,Reporter r)throws IOException
{
Stack s=new Stack();
int cnt=0;
On Jan 15, 2008, at 17:02, Rui Shi wrote:
As far as I understand, let mapper produce top N records is not
working
as each mapper only has partial knowledge of the data, which will
not lead to
global optimal... I think your mapper needs to output all records
(combined) and let the reducer to
- Original Message
From: Vadim Zaliva <[EMAIL PROTECTED]>
To: hadoop-user@lucene.apache.org
Sent: Tuesday, January 15, 2008 4:13:11 PM
Subject: Re: single output file
On Jan 15, 2008, at 13:57, Ted Dunning wrote:
> This is happening because you have many reducers running, only one
&
On Jan 15, 2008, at 13:57, Ted Dunning wrote:
This is happening because you have many reducers running, only one
of which
gets any data.
Since you have combiners, this probably isn't a problem. That reducer
should only get as many records as you have maps. It would be a
problem if
your
"Use the code, Jeff"
1) Missing blocks are reported only when all replicas are missing
2) The files are history
3) The dfs won't actually do anything in safe mode
4) Try creating /lost+found first
Jeff
-Original Message-
From: Jeff Eastman [mailto:[EMAIL PROTECTED]
Sent: Tuesday, Januar
This is happening because you have many reducers running, only one of which
gets any data.
Since you have combiners, this probably isn't a problem. That reducer
should only get as many records as you have maps. It would be a problem if
your reducer were getting lots of input records.
You can
Also, this gives you a solution to your race condition (by using hadoop's
mechanisms) and it also gives you much higher
throughput/reliability/scalability than writing to NFS can possibly give
you.
On 1/15/08 12:54 PM, "Miles Osborne" <[EMAIL PROTECTED]> wrote:
> surely the clean way (in a str
surely the clean way (in a streaming environment) would be to define a
representation of some kind which serialises the output.
http://en.wikipedia.org/wiki/Serialization
after your mappers and reducers have completed, you would then have some
code which deserialise (unpacked) the output as desir
Hi
We at Visvo have developed a small script
for command processing on a cluster.
We would like to share it with you , have a it reviewed .
It is available under APL.
We would like to make a project so that we all
can contribute to this script.
For now, you can download this script
from http:/
1 - 100 of 3007 matches
Mail list logo