Hi,
We are using hbase-0.20.6 with hdfs (single node setup), while pushing
the data into hbase, using java API's, there are lots of TCP CLOSE_WAIT
connection crops up. These connections persist for a long time, even for day
or two. Linux setting for TCP connection is 72 sec. which are overri
Yes, I use this in a batch job driver. There is a common file with
global configs, and then a per-job config. The driver command line is:
driver -c common-site.xml batchjob.xml
On Tue, Oct 26, 2010 at 11:40 AM, Marc Sturlese wrote:
>
> Thanks, it worked. In case it can help someone else:
>
>
I prefer the latter(MultipleOutputFormat) as I would not have had to
change my code.
Everything would have stayed in the outputformat. And I hardly need
the extra features.
Oh well, got to keep with the times.
Cheers
Saptarshi
On Tue, Oct 26, 2010 at 2:44 AM, Rekha Joshi wrote:
> Hi Saptarshi,
It's worth checking out the "har" tool as well.
I would say that HBase is good fit for binaries so long as the binaries
aren't huge. Anything under a few MB should be fine.
-Todd
On Tue, Oct 26, 2010 at 10:56 AM, Ananth Sarathy wrote:
> Thanks, but that more of a one time use, not ongoing man
[Apologies for cross-posting]
HI all,
I am rewriting a hadoop java code for the new (0.20.2) API- the code
was originally written for versions <= 0.19.
1. What is the equivalent of the getCounter() method ? For example,
the old code is as following:
//import org.apache.hadoop.mapred.RunningJob;
R
Maybe this message can solve your problem as well:
@Shi Yu:
Yes there are built in functions to get the input file Path in the Mapper
(you can use these for counters by putting the file name in the counter
name), however there are some issues if you use MultipleInputs to your job.
Here's some sam
Hi ,
Running a hadoop job which manipulates ~ 4000 files (files ar gz) , and
suppose one of this gz was corrupted. From web console /log files I can see
which task got exception ,but to isolate which files was corrupted it is
really hard. Is it a way to know which files was produced by which hado
Thanks, it worked. In case it can help someone else:
try {
Configuration c = new Configuration() ;
FileSystem fs = FileSystem.get(c) ;
InputStream is = new FSDataInputStream(fs.open(new
Path("hdfs://hadoop_cluster/user/me/conf/extra-props.xml"))) ;
On Oct 26, 2010, at 11:25 AM, Hazem Mahmoud wrote:
> That raises a question that I am currently looking into and would appreciate
> any and all advice people have.
>
> We are replacing our current NetApp solution, which has served us well but we
> have outgrown it.
>
> I am looking at either
That raises a question that I am currently looking into and would appreciate
any and all advice people have.
We are replacing our current NetApp solution, which has served us well but we
have outgrown it.
I am looking at either upgrading to a bigger and meaner NetApp or possibly
going with Had
Thanks, but that more of a one time use, not ongoing management.
Ananth T Sarathy
On Tue, Oct 26, 2010 at 12:31 PM, Mark Kerzner wrote:
> http://stuartsierra.com/2008/04/24/a-million-little-files
>
> On Tue, Oct 26, 2010 at 11:28 AM, Ananth Sarathy <
> ananth.t.sara...@gmail.com
> > wrote:
>
>
Yeah I had looked into hbase, but they are pretty adament about not using it
for binaries. We use hbase for other stuff, so that would have been our
preference I know that BigTable serves some image tiles for google maps,
but images tiles are a lot smaller in general.
Ananth T Sarathy
On Tue, O
Marc,
addResource takes an InputStream, which you could get from a
FileSystem instance, however you'd have yourself something of a
chicken/egg situation in that you'd need a Configuration to get a
FileSystem (via FileSystem.get()), but then you could always just add
it on and hit 'reloadConfigurat
HBase might fit the bill.
On Tue, Oct 26, 2010 at 12:28 PM, Ananth Sarathy wrote:
> I was wondering if there were any projects out there doing a small file
> management layer on top of Hadoop? I know that HDFS is primarily for
> map/reduce but I think companies are going to start using hdfs clus
is it possible to add a custom-site.xml resource (wich is placed in hdfs) to
a Configuration?
Something like:
Configuration cc = new Configuration();
Path p = new Path("hdfs://hadoop_cluster/user/me/conf/extra-props.xml");
c.addResource(p);
It doesn't seem to work for me. If I convert 'c' to St
On Oct 26, 2010, at 9:28 AM, Ananth Sarathy wrote:
> I was wondering if there were any projects out there doing a small file
> management layer on top of Hadoop? I know that HDFS is primarily for
> map/reduce but I think companies are going to start using hdfs clusters as
> storage in the cloud,
http://stuartsierra.com/2008/04/24/a-million-little-files
On Tue, Oct 26, 2010 at 11:28 AM, Ananth Sarathy wrote:
> I was wondering if there were any projects out there doing a small file
> management layer on top of Hadoop? I know that HDFS is primarily for
> map/reduce but I think companies ar
I was wondering if there were any projects out there doing a small file
management layer on top of Hadoop? I know that HDFS is primarily for
map/reduce but I think companies are going to start using hdfs clusters as
storage in the cloud, and i was wondering if any work had been done on this.
Ananth
This is not CDH3 specific... it's related to the kerberos security patch, so
these upgrade issues will pop up in the Y! distribution, and eventually in
0.22 as well.
These aren't bugs in the code per se, it's just that the upgrade process
going from pre- to post- security is somewhat tricky, and c
Hi,
While running Terrior on Hadoop, I am getting the following error again &
again, can someone please point out where the problem is?
attempt_201010252225_0001_m_09_2: WARN - Error running child
attempt_201010252225_0001_m_09_2: java.lang.OutOfMemoryError: GC
overhead limit exceeded
att
With a 755 permission the jobtracker could not operate on the directory and
with
775 permission the datanode's log said " Expecting 755 foound 775 . Exiting".
I will do a more careful attempt today.
Raj
From: Michael Segel
To: common-user@hadoop.apache.or
Calling close() on the MultipleOutputs objects in the cleanup() method of
the reducer fixed the lzo file problem. Thanks!
~Ed
On Thu, Oct 21, 2010 at 9:12 PM, ed wrote:
> Hi Todd,
>
> I don't have the code in front of me right but I was looking over the API
> docs and it looks like I forgot to
In general, unless you run newer kernels and versions of FUSE as that ticket
suggests, it is significantly slower in raw throughput.
However, we generally don't have a day go by at my site where we don't push
FUSE over 30Gbps, as the bandwidth is spread throughout nodes. Additionally,
as we ar
Yes.. This is my scenario..
I have one tasktracker... I configured 10 dirs(volumes)in
mapred.local.dir..if one of the volume has bugs , tasktracker is not executing further tasks..
I remember in datanode, a similar scenario is handled.. when one of the
volume fails, it will mark that volume as
On 26/10/10 04:10, Gokulakannan M wrote:
Hi,
I faced a problem when a volume configured in *mapred.local.dir* fails,
the tasktracker continuously trying to create directory
and fails.
Eventually all the running jobs are getting failed and new jobs cannot
be executed.
I think you can provid
Hi Saptarshi,
AFAIK, this is an intermediate stage where the old api is supported, while
evolving the new api.
In 0.21, the old api - MultipleOutputFormat is not deprecated.
http://hadoop.apache.org/mapreduce/docs/r0.21.0/api/index.html.In future it
might be.
>From usage perspective, what Mult
Ugh, wrong mailing list. Silly GMail.
On Mon, Oct 25, 2010 at 11:45 PM, Bradford Stephens
wrote:
> Hey datamigos,
>
> I'm having trouble getting a finicky .20.6 cluster to behave.
>
> The Master, Zookeeper, and ReigonServers all seem to be happy --
> except the Master doesn't see any RSs. Doing a
27 matches
Mail list logo