The HBase 0.2.0 release includes 291 changes [1]. New features include a
richer API, a new ruby irb-based shell, an improved UI, and many
improvements to overall stability. To download, visit [4].
HBase 0.2.0 is not backward compatible with HBase 0.1 API (See [2] for an
overview of the changes).
Hi everybody,
When i was running hadoop 0.17.1 it gave me some WARNs like this:
2008-08-09 10:53:37,728 WARN org.apache.hadoop.dfs.StateChange: DIR*
FSDirectory.unprotectedDelete: failed to
remove /tmp/hadoop-thientd/mapred/system because it does not exist
2008-08-09 10:56:05,836 WARN org.apache.h
The HBase 0.2.0 release includes 291 changes [1]. New features include a
richer API, a new ruby irb-based shell, an improved UI, and many
improvements to overall stability. To download, visit [4].
HBase 0.2.0 is not backward compatible with HBase 0.1 API (See [2] for
an overview of the change
If restarting the entire dfs helped, then you might be hitting
http://issues.apache.org/jira/browse/HADOOP-3633
When we were running 0.17.1, I had to grep for OutOfMemory on the
datanode ".out" files at least everyday and restart those zombie
datanodes.
Once datanode gets to this state, as Konst
Thanks Andreas. I'll try it.
On Fri, Aug 8, 2008 at 5:47 PM, Andreas Kostyrka <[EMAIL PROTECTED]>wrote:
> On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote:
> > You are completely right. It's not safe at all. But this is what I have
> for
> > now:
> > two computers distributed acr
Thank you for the reply. Apparently whatever it was is now gone after a
hadoop restart, but I'll keep that in mind should it happen again.
Piotr
On Fri, 2008-08-08 at 17:31 -0700, Dhruba Borthakur wrote:
> It is possible that your namenode is overloaded and is not able to
> respond to RPC request
It is possible that your namenode is overloaded and is not able to
respond to RPC requests from clients. Please check the namenode logs
to see if you see lines of the form "discarding calls...".
dhrua
On Fri, Aug 8, 2008 at 3:41 AM, Alexander Aristov
<[EMAIL PROTECTED]> wrote:
> I come across the
On 8/8/08 1:25 PM, "James Graham (Greywolf)" <[EMAIL PROTECTED]> wrote:
> 226GB of available disk space on each one;
> 4 processors (2 x dualcore)
> 8GB of RAM each.
Some simple stuff:
(Assuming SATA):
Are you using AHCI?
Do you have the write cache enabled?
Is the topologyProgram providing prop
When I try the map-side join example (under Hadoop 0.17.1, running in
standalone mode under Win32), it attempts to dereference a null pointer.
$ cat One/some.txt
A 1
B 1
C 1
E 1
$ cat Two/some.txt
A 2
B 2
C 2
D 2
$ bin/hadoop jar *examples.jar join -
Thus spake lohit::
It depends on your machine configuration, how much resource it has and
what you can afford to lose in case of failures.
It would be good to run NameNode and jobtracker on their own dedicate
nodes and datanodes and tasktracker on rest of the nodes. We have seen
cases where tas
I think at present only SequenceFiles can be compressed.
http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.html
If you have plain text files, they are stored as is into blocks. You can store
them as .gz and hadoop recognizes it and process the gz files. But its not
It depends on your machine configuration, how much resource it has and what you
can afford to lose in case of failures.
It would be good to run NameNode and jobtracker on their own dedicate nodes and
datanodes and tasktracker on rest of the nodes. We have seen cases where
tasktrackers take down
I have tried to connect to it via jconsole.
Apart from that I have seen people of this list use Ganglia to collect metrics
or just dump to a file.
To start off you could easily use FileContext (dumping metrics to file). Check
out the metrics config file (hadoop-metrics.properties) under conf dir
Thanks very much, Chris!
Cheers,
John
-Original Message-
From: Chris Douglas [mailto:[EMAIL PROTECTED]
Sent: Friday, August 08, 2008 1:57 PM
To: core-user@hadoop.apache.org
Subject: Re: "Join" example
The contrib/data_join framework is different from the map-side join
framework, under
The contrib/data_join framework is different from the map-side join
framework, under o.a.h.mapred.join.
To see what the example is doing in an outer join, generate a few
sample, text input files, tab-separated:
join/a.txt:
a0
a1
a2
On Friday 08 August 2008 15:43:46 Lucas Nazário dos Santos wrote:
> You are completely right. It's not safe at all. But this is what I have for
> now:
> two computers distributed across the Internet. I would really appreciate if
> anyone could give me spark on how to configure the namenode's IP in
Which is better, to have the namenode and jobtracker as distinct nodes
or as a single node, and are there pros/cons regarding using either or
both as datanodes?
--
James Graham (Greywolf) |
650.930.1138|925.768.4053
Greetings,
I'm very very new to this (as you could probably tell from my other postings).
I have 20 nodes available as a cluster, less one as the namenode and one as
the jobtracker (unless I can use them too). Specs are:
226GB of available disk space on each one;
4 processors (2 x dualcore)
8G
Hello, I have a simple question. How do I configure DFS to store compressed
block files? I've noticed by looking at the "blk_" files that the text
documents I am storing are uncompressed. Currently our hadoop deployment is
taking up 10x the diskspace as compared to our system before moving to
ha
On Friday 08 August 2008 11:43:50 Rong-en Fan wrote:
> After looking into streaming source, the answer is via environment
> variables. For example, mapred.task.timeout is in
> the mapred_task_timeout environment variable.
Well, another typical way to deal with that is to pass the parameters via
c
Hi,
While submitting a job to Hadoop, how can I set system properties that are
required by my code ?
Passing -Dmy.prop=myvalue to the hadoop job command is not going to work as
hadoop command will pass this to my program as command line argument.
Is there any way to achieve this ?
Thanks,
Taran
On Thursday 07 August 2008 16:43:10 John Heidemann wrote:
> On Thu, 07 Aug 2008 19:42:05 +0200, "Leon Mergen" wrote:
> >Hello John,
> >
> >On Thu, Aug 7, 2008 at 6:30 PM, John Heidemann <[EMAIL PROTECTED]> wrote:
> >> I have a large Hadoop streaming job that generally works fine,
> >> but a few (2-
Hi Sebastian.
Setting of times doesn¹t work, but ls, rm, rmdir, mkdir, cp, etc etc should
work.
Things that are not currently supported include:
Touch, chown, chmod, permissions in general and obviously random writes for
which you would get an IO error.
This is what I get on 0.17 for df h:
> 1) Katta n Distributed Lucene are different projects though, right? Both
> being based on kind of the same paradigm (Distributed Index)?
The design of Katta and that of Distributed Lucene are quite different
last time I checked. I pointed out the Katta project because you can
find the code for D
You are completely right. It's not safe at all. But this is what I have for
now:
two computers distributed across the Internet. I would really appreciate if
anyone could give me spark on how to configure the namenode's IP in a
datanode. As I could identify in log files, the datanode keeps trying to
HI,
I am not an expert on Hadoop configuration but is this safe? As far as I
understand the IP address is public and connection to the datanode port is
not secured. Am I correct?
Lukas
On Fri, Aug 8, 2008 at 8:35 AM, Lucas Nazário dos Santos <
[EMAIL PROTECTED]> wrote:
> Hello again,
>
> In fac
Hello,
I was wondering what the correct way to submit a Job to hadoop using the
Pipes API is -- currently, I invoke a command similar to this:
/usr/local/hadoop/bin/hadoop pipes -conf
/usr/local/mapreduce/reports/reports.xml -input
/store/requests/archive/*/*/* -output out
However, this way of i
Hi,
I have been unable to find any examples on how to use the MBeans
provided from HDFS.
Could anyone that has any experience on the topic share some info.
What is the URL to use to connect to the MBeanServer ?
Is it done through rmi, or only through jvm ?
Any help is highly appreciated.
Plea
I come across the same issue and also with hadoop 0.17.1
would be interesting if someone say the cause of the issue.
Alex
2008/8/8 Steve Loughran <[EMAIL PROTECTED]>
> Piotr Kozikowski wrote:
>
>> Hi there:
>>
>> We would like to know what are the most likely causes of this sort of
>> error:
>>
After looking into streaming source, the answer is via environment
variables. For example, mapred.task.timeout is in
the mapred_task_timeout environment variable.
On Fri, Aug 8, 2008 at 4:26 PM, Rong-en Fan <[EMAIL PROTECTED]> wrote:
> I'm using streaming with a mapper written in perl. However, an
Piotr Kozikowski wrote:
Hi there:
We would like to know what are the most likely causes of this sort of
error:
Exception closing
file
/data1/hdfs/tmp/person_url_pipe_59984_3405334/_temporary/_task_200807311534_0055_m_22_0/part-00022
java.io.IOException: Could not get block locations. Abort
I'm using streaming with a mapper written in perl. However, an
issue is that I want to pass some arguments via command line.
In regular Java mapper, I can access JobConf in Mapper.
Is there a way to do this?
Thanks,
Rong-En Fan
Hi!
I've gotten Hadoop to run a search as I want, but now I'm trying to
add a servlet component to it.
All of Hadoop works properly, but when I set components from the
servlet instead of setting them via the command-line, Hadoop only
produces temporary output files and doesn't complete.
I've look
Hi Pete,
>From within the 0.19 source i did:
ant jar
ant metrics.jar
ant test-core
This resulted in 3 jar files within $HADOOP_HOME/build :
[EMAIL PROTECTED] hadoop-0.19]# ls -l build/*.jar
-rw-r--r-- 1 root root 2201651 Aug 8 08:26 build/hadoop-0.19.0-dev-core.jar
-rw-r--r-- 1 root root 10966
34 matches
Mail list logo