Thanks Mahadev,
Thanks for letting me know of the patch. I have already applied it and
the archiving seems to run fine for input directory size of about 5GB.
Currently am testing the same programatically, but since it is working
from the command line, it should ideally also work this way.
t
Thank you for all interest.
BTW, Please subscribe to the Hama developer mailing list instead of
send a mail to [EMAIL PROTECTED]
[EMAIL PROTECTED]
- Edward
On Thu, Jul 17, 2008 at 11:26 AM, Edward J. Yoon <[EMAIL PROTECTED]> wrote:
> Hello all,
>
> The Hama team which is trying to port typical
Hi,
Can any one help me understand how to read data distributed oover a cluster.
For instance if we give a path /user/hadoop/parsed_data/part-/data , to
the map reduce program will that find the data on same path on all the
servers in the cluster , or will it be only the local file?
If it on
Count me as another interested party.
--Matt
On Fri, Jul 18, 2008 at 8:59 AM, Alex Dorman <[EMAIL PROTECTED]> wrote:
> Please let me know if you would be interested in joining NY Hadoop user group
> if one existed.
>
> I know about 5-6 people in New York City running Hadoop. I am sure there are
I don't know if there is any in-place mechanism for what you're looking for.
However, you could write a partitioner that distributes data in a way that
lower keys go to lower numbered reduce, and higher keys go to higher
numbered reduce. (e.g. Key starting with 'A~D' goes to part-, 'E~H' goes
I'm unable to access my logs with the JobTracker/TaskTracker web
interface for a Hadoop job running on Amazon EC2. The URLs given for
the task logs are of the form:
http://domu-[...].compute-1.internal:50060/
The Hadoop-EC2 docs suggest that I should be able to get onto port
50060 for t
Hi,
Apology if this question has been answered before, but I could not find
in the archive and twiki pages. I am wondering what's the max number
of open files for writes at the same time given a Hdfs cluster? I am
streaming data into many different files (in the order of thousands) at
the
I am following the example in
http://hadoop.apache.org/core/docs/current/streaming.html about Hadoop's
partitioner: org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner . It seems
that the sorted values are based on dictionary, for eg:
1
12
15
2
28
What if I want to get numerical sorted list:
HQL will be integrated to HRdfStore project.
See http://groups.google.com/group/hrdfstore
Thanks,
Edward J. Yoon
On 7/22/08, stack <[EMAIL PROTECTED]> wrote:
> lucio Piccoli wrote:
> > hi Tho Pham
> >
> > i have checked the HQL api but the only reference i found was the
> org.apache.hadoop.hbase.
hey all,
i was wondering if its possible to split up the reduce task amongst
more than one machine. i figured it might be possible for the map
output to be copied to multiple machines; then each reducer could sort
its keys and then combine them into one big sorted output (a la
mergesort).
thats great, thanks a lot!
Daniel
2008/7/21 Christian Ulrik Søttrup <[EMAIL PROTECTED]>:
> Hi,
>
> I use a counter in my reducer to check whether another iteration (of map
> reduce cycle) is necessary. I have a similar declaration as yours.
> Then in my main program i have:
>
> ***
> client.setC
Hi,
I use a counter in my reducer to check whether another iteration (of map
reduce cycle) is necessary. I have a similar declaration as yours.
Then in my main program i have:
***
client.setConf(conf);
RunningJob rj = JobClient.runJob(conf);
Counters cs = rj.getCounters();
long swaps=cs.getCou
Hi there,
It looks that current hadoop dfs puts the DFSClient as the "primary
node". See http://wiki.apache.org/hadoop/DFS_requirements
In Google file system, the write synchronization by multiple clients
is controlled by the primary node which decide the sequence of the
mutations to a block and
hi,
i defined a counter of my own, and updated it in map method,
protected static enum MyCounter {
INPUT_WORDS
};
...
public void map(...) {
...
reporter.incrCounter(MyCounter.INPUT_WORDS, 1);
}
and can i fetch the counts later? like in the run() method after the job is
finis
I am using Hadoop Streaming.
I figured out a way to do combining using mapper, is it the same as using a
separate combiner?
For example: the input is a list of words, I want to count their total number
for each word.
The traditional mapper is:
while () {
chomp ($_);
$word = $_;
print ($
Nevermind, I figured out my problem. I did not configure OutputFormat.
On Mon, Jul 21, 2008 at 1:44 PM, Khanh Nguyen <[EMAIL PROTECTED]> wrote:
> Hi Daniel,
>
> The outputformat of my 1st hadoop job is TextOutputFormat. The
> skeleton of my code follows:
>
> public int run(String[] args) throws Ex
HI Pratyush,
I think this bug was fixed in
https://issues.apache.org/jira/browse/HADOOP-3545.
Can you apply the patch and see if it works?
Mahadev
On 7/21/08 5:56 AM, "Pratyush Banerjee" <[EMAIL PROTECTED]> wrote:
> Hi All,
>
> I have been using hadoop archives programmatically to generat
Hi Daniel,
The outputformat of my 1st hadoop job is TextOutputFormat. The
skeleton of my code follows:
public int run(String[] args) throws Exception {
//set up and run job 1
...
conf.setOutputFormat(TextOutputFormat.class);
FileOutputFormat.setOutputPath(conf, new
hi k,
i think u should look at ur map output format setting, and check if that
fits ur reduce input .
Daniel
2008/7/21 Khanh Nguyen <[EMAIL PROTECTED]>:
> Hello,
>
> I am getting this error
>
> java.io.IOException: Type mismatch in key from map: expected
> org.apache.hadoop.io.LongWritable, re
Hello,
I am getting this error
java.io.IOException: Type mismatch in key from map: expected
org.apache.hadoop.io.LongWritable, recieved org.apache.hadoop.io.Text
Could someone please explain to me what i am doing wrong. Follow is
the code I think is responsible...
public int run() {
.
sor
A reminder that the next user group meeting is scheduled for July 22nd
from 6 - 7:30 pm at Yahoo! Mission College, Building 1, Training Rooms 3
and 4.
Agenda:
Cascading - Chris Wensel
Performance Benchmarking on Hadoop (Terabyte Sort, Gridmix) - Sameer
Paranjpye, Owen O'Malley, Runping Qi
Hi all,
I have a mapper's Value type which comes from a record like this one:
module org {
class Something {
AnotherRecord aRecord;
int number1;
int number2;
}
}
So, i'm creating one of this Someth
You are correct.
The default 1mb/sec is too low.
1gb/sec is too high.
I changed it to 10mb/sec and its humming along.
Thanks.
Taeho Kang wrote:
> By setting "dfs.balance.bandwidthPerSec" to 1GB/sec, each datanode is able
> to utilize up to 1GB/sec for block balancing. It seems to be too high as
>
On Mon, Jul 21, 2008 at 03:52:01PM +0200, tim robertson wrote:
> Is there a user base in Scandinavia that would be interested in meeting to
> exchange feedback / ideas ?
> (in English...)
>
Yeah, I'd be interested although I barely qualify as a hadoop user yet.
> I can probably host a meeting in
Hi all,
I think these user groups are a great idea, but I can't get to any easily...
Is there a user base in Scandinavia that would be interested in meeting to
exchange feedback / ideas ?
(in English...)
I can probably host a meeting in Copenhagen if there were interest.
Cheers
Tim
Hi All,
I have been using hadoop archives programmatically to generate har
archives from some logfiles which are being dumped into the hdfs.
When the input directory to Hadoop Archiving program has files of size
more than 2GB, strangely the archiving fails with a error message saying
INF
I'd be up for a New York user group.
Alex Newman-3 wrote:
>
> I am down as well.
>
>
--
View this message in context:
http://www.nabble.com/New-York-user-group--tp18528862p18567093.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.
I see that cygwin is the only supported option for building Hadoop Pipes
for windows. I'm trying a mingw build and it looks like the only
thing needing porting is the communications from bsd sockets to say
winsock? Is that correct?
Thanks,
Marc
I've tried it and it works.
Thank you very much
On Mon, Jul 21, 2008 at 6:33 PM, Miles Osborne <[EMAIL PROTECTED]> wrote:
> then just do what i said --set the number of reducers to zero. this should
> just run the mapper phase
>
> 2008/7/21 Zhou, Yunqing <[EMAIL PROTECTED]>:
>
> > since the whol
then just do what i said --set the number of reducers to zero. this should
just run the mapper phase
2008/7/21 Zhou, Yunqing <[EMAIL PROTECTED]>:
> since the whole data is 5TB. the Identity reducer still cost a lot of
> time.
>
> On Mon, Jul 21, 2008 at 5:09 PM, Christian Ulrik Søttrup <[EMAIL
since the whole data is 5TB. the Identity reducer still cost a lot of time.
On Mon, Jul 21, 2008 at 5:09 PM, Christian Ulrik Søttrup <[EMAIL PROTECTED]>
wrote:
> Hi,
>
> you can simply use the built in reducer that just copies the map output:
>
> conf.setReducerClass(org.apache.hadoop.mapred.lib
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Did you try to use the IdentityReducer?
Zhou, Yunqing wrote:
> I only use it to do something in parallel,but the reduce step will cost me
> additional several days, is it possible to make hadoop do not use a reduce
> step?
>
> Thanks
>
-BEGIN P
... or better still, set the number of reducers to zero
Milles
2008/7/21 Christian Ulrik Søttrup <[EMAIL PROTECTED]>:
> Hi,
>
> you can simply use the built in reducer that just copies the map output:
>
> conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
>
> Cheers,
> Chr
Hi,
you can simply use the built in reducer that just copies the map output:
conf.setReducerClass(org.apache.hadoop.mapred.lib.IdentityReducer.class);
Cheers,
Christian
Zhou, Yunqing wrote:
I only use it to do something in parallel,but the reduce step will cost me
additional several days, is
I only use it to do something in parallel,but the reduce step will cost me
additional several days, is it possible to make hadoop do not use a reduce
step?
Thanks
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
I found out that it is not a bug in my code. I can run a
bin $ ./hadoop fs -ls /seDNS/data/33
ls: timed out waiting for rpc response
It times out for this directory, but before it does so, the name node
takes 2GB more heap and never gives it back.
A
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi,
I am running some code dealing with file system operations (copying
files and deleting). While it is runnung the web interface of the name
node tells me that the heap size grows dramatically.
Are there any server-side data structures that I have t
37 matches
Mail list logo