NativeS3FileSystem
>
> org.apache.hadoop.fs.ftp.FTPFileSystem
>
> org.apache.hadoop.fs.HarFileSystem
>
>
> org.apache.hadoop.hdfs.DistributedFileSystem
>
> org.apache.hadoop.hdfs.HftpFileSystem
>
> org.apache.hadoop.hdfs.HsftpFileSystem
>
> org.apache.hadoop.hdfs.web.WebHdfsFileSystem
>
>
/CDH configuration deployment issue and not
> something specific to HBase. In the future please consider sending these
> kinds of vendor-specific questions to the community support mechanisms of
> said vendor. In Cloudera's case, that's http://community.cloudera.com/
>
> -S
s.jar
org.gbif.metrics.cube.occurrence.backfill.Backfill
Thanks,
Tim
On Wed, Nov 5, 2014 at 4:30 PM, Sean Busbey wrote:
> How are you submitting the job?
>
> How are your cluster configuration files deployed (i.e. are you using CM)?
>
> On Wed, Nov 5, 2014 at 8:50 AM, Tim Robertson
> wrote:
>
> >
Hi all,
I'm seeing the following
java.io.IOException: No FileSystem for scheme: file
at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2584)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2591)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java
> There isn't hbase-mapreduce module yet. For now, you need to include
> hbase-server module.
>
> Cheers
>
> On Nov 5, 2014, at 1:27 AM, Tim Robertson
> wrote:
>
> > Hey folks,
> >
> > I'm upgrading an application from CDH4.3 to CDH5.2 so jumpin
Hey folks,
I'm upgrading an application from CDH4.3 to CDH5.2 so jumping from 0.94 to
0.98 and wanted to just ask for confirmation on the dependencies now hbase
has split into hbase-client and hbase-server etc.
If I am submitting MR jobs (to Yarn) that use things like
TableMapReduceUtil it seems
Like Amandeep says, it really depends on the access patterns and jobs
running on the cluster.
We are using a single cluster for HBase and MR, with each node running DN,
TT and RS.
We have tried mixed clusters with only some running RS but you start to
suffer from data locality issues during scans.
Cheers,
Tim
On Mon, May 28, 2012 at 3:54 PM, Tim Robertson wrote:
> Thanks Stack. We're looking into this a lot.
>
> As far as we can tell DNS is correct, machine host names are correct etc.
> In .META. it uses fully qualified names (c4n5.gbif.org) so I guess I'll
&g
.226.238.185", "
c4n5.gbif.org");
regionLocation = regionLocation.replaceAll("130.226.238.186", "
c4n6.gbif.org");
More when we know more.
Tim
On Mon, May 28, 2012 at 12:32 AM, Stack wrote:
> On Sun, May 27, 2012 at 1:05 PM, Tim Robertson
> wrote:
&
Hi all,
When I run MR jobs, I don't see data locality because the TT sees
/default-rack/c4n1.gbif.org but the TableInputFormat is
giving /default-rack/130.226.238.181 (the same machine) when it determines
the splits for the job.
Clearly we have set something up wrong - has anyone seen this? I've
Hey Something,
We can share everything, and even our ganglia is public [1] . We are just
setting up a new cluster with Puppet and the HBase master just came up.
HBase RS will be up probably tomorrow, where the first task will be a bulk
load of 400M records - we're just finishing our working day
Hi,
You can call context.write() multiple times in the Reduce(), to emit
more than one row.
If you are creating the Puts in the Map function then you need to
setMapSpeculativeExecution(false) on the job conf, or else Hadoop
*might* spawn more than 1 attempt for a given task, meaning you'll get
du
Hi,
Assuming you use TableOutputFormat [1] you can emit as many PUTs as
you want from a reducer. You will need to handle the row key as you
create the PUT to emit.
HTH,
Tim
[1]
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/mapreduce/TableOutputFormat.html
On Tue, Feb 28, 2012 at 3:3
Hi all,
(cross posted to a few Hadoop mailing lists - apologies for the SPAM)
Are there any users around the Copenhagen area that would like a HUG meetup?
Just reply with +1 and I'll gauge interest. We could probably host a
1/2 or full day if people were coming from Sweden...
We are using Hadoo
> Is HIVE involved? Or is it just raw scan compared to TFIF?
No Hive
> Is this a MR scan or just a shell serial scan (or is it still PE?)?
We are using PE scan to try and "standardize" as much as possible.
> You want to get this scan speed up only? You are not interested in figuring
> how
>
Hey Stack,
We see the difference between a scan and TextFileInputFormat of the
same data as csv being 10x slower. This is what prompted me to look
at MR using an HFIF just out of curiosity.
Cheers,
Tim
On Thu, Feb 9, 2012 at 7:32 PM, Stack wrote:
> On Thu, Feb 9, 2012 at 12:55 AM,
e HBase at all?
>
> (I'm not trying to shoo you away from HBase. Just curious what you are
> trying to accomplish)
>
> Amandeep
>
> On Feb 9, 2012, at 12:19 AM, Tim Robertson wrote:
>
>> Hi all,
>>
>> Can anyone elaborate on the pitfalls or implication
Hi all,
Can anyone elaborate on the pitfalls or implications of running
MapReduce using an HFileInputFormat extending FileInputFormat?
I'm sure scanning goes through the RS for good reasons (guessing
handling splits, locking, RS monitoring etc) but can it ever be "safe"
to run MR over HFiles dire
Hey Stack,
Because we run a couple clusters now, we're using templating for the
*.site.xml etc.
You'll find them in:
http://code.google.com/p/gbif-common-resources/source/browse/cluster-puppet/modules/hadoop/templates/
The values for the HBase 3 node cluster come from:
http://code.google.c
ing and fine tuning a cluster is something you
> have to do on your own. I guess I could say your numbers look fine to me for
> that config... But honestly, it would be a swag.
>
>
> Sent from a remote device. Please excuse any typos...
>
> Mike Segel
>
> On Feb 1, 2012,
ing did you do?
> Why such a small cluster?
>
> Sorry, but when you start off with a bad hardware configuration, you can get
> Hadoop/HBase to work, but performance will always be sub-optimal.
>
>
>
> Sent from my iPhone
>
> On Feb 1, 2012, at 6:52 AM, "Tim Rober
Hi all,
We have a 3 node cluster (CD3u2) with the following hardware:
RegionServers (+DN + TT)
CPU: 2x Intel(R) Xeon(R) CPU E5630 @ 2.53GHz (quad)
Disks: 6x250G SATA 5.4K
Memory: 24GB
Master (+ZK, JT, NN)
CPU: Intel(R) Xeon(R) CPU X3363 @ 2.83GHz, 2x6MB (quad)
Disks: 2x500G SATA 7.2K
Hi Laxman,
We use both #1 and #3 from MySQL which also has hi speed exports.
For our 300G and 340M rows, #1 takes us around 3 hours, with Sqoop it
is closer to 8 hrs to our 3 node cluster.
We are having issues with delimiters though (since we have \r, \t and
\n in the database), and now using Avr
Hey Peter,
I am trying to benchmark our 3 node cluster now and trying to optimize
for scanning.
Using the PerformanceEvaluation tool I did a random write to populate
5M rows (I believe they are 1k each but whatever the tool does by
default).
I am seeing 33k records per second (which I believe to
Hey stack
>> This gave me 32 regions across 2 of our 3 region servers (we have HDFS
>> across 17 nodes but only machines running 3 RS).
>>
>
> The balancer ran? I'd think it'd balance the regions across the three
> servers. Something stuck in transition stopping the balancer running
> (See maste
Hi all,
I am trying to sanitize our setup, and using the PerformanceEvaluation
as a basis to check.
To to this, I ran the following to load it up:
$HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation
randomWrite 5
This gave me 32 regions across 2 of our 3 region servers (we have
Hi Stuti,
I would have thought it was something like:
conf.setOutputFormat(TextOutputFormat.class);
FileOutputFormat.setOutputPath(conf, new Path());
Cheers,
Tim
On Thu, Nov 10, 2011 at 8:31 AM, Stuti Awasthi wrote:
> Hi
> Currently I am understading Hbase MapReduce support. I followed
http://www.cloudera.com/blog/2010/03/clouderas-support-team-shares-some-basic-hardware-recommendations/
"4 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration
2 quad core CPUs, running at least 2-2.5GHz
16-24GBs of RAM (24-32GBs if you’re considering HBase)
Gigabit Ethernet"
HTH,
Tim
regions over and shut down the RS properly.
>
> Lars
>
> On Nov 25, 2010, at 18:24, Tim Robertson wrote:
>
>> Hi all,
>>
>> Please forgive this rather naive question - I have a cluster and want
>> to decommission nodes (including the RS that hold the -ROOT-
Hi all,
Please forgive this rather naive question - I have a cluster and want
to decommission nodes (including the RS that hold the -ROOT- and
.META). Could someone please advise me the best way to do this
gracefully? Can I force HBase to move regions onto the region servers
I will keep up? The
>> I'm using
>> http://hbase.apache.org/docs/r0.20.6/api/org/apache/hadoop/hbase/mapreduce/IdentityTableReducer.html
>
> Did you set it up with TableMapReduceUtil?
>
>> Not explicitly set be me
>
> If you use TableMapReduceUtil, then it's set to 2MB by default, but
> looking at the RS logs the wri
op.hbase.regionserver.HRegion:
Blocking updates for 'IPC Server handler 8 on 60020' on region
I guess this is bad, but could benefit from some guidance...
> Are you monitoring the GCs?
> If so, do you see some pauses longer than a second?
What's the best way to do this p
Hi all,
I am running an MR job that is loading an HBase table in the reduce,
and I am seeing hopeless performance - 10 million records of <1Kb in 2
hours so far.
Please bear in mind I am software guy, so go easy ;) but here is what
I know so far:
(http://code.google.com/p/gbif-occurrencestore/wi
>So updating is okay but Handling deletes is not possible in the current version
> of the data unless a new version of the data is written down.
Not quite. You can delete a record and it will not show up in scans
and gets etc, but physically it will still take up space on the disk
until HBase cle
We just set up a cluster with Dells, and have a pretty fine
relationship with a local Dell supplier.
Tim
On Wed, Nov 3, 2010 at 2:21 PM, Jason Lotz wrote:
> We are in the process of analyzing our options for the future purchases of
> our Hadoop/HBase DN/RS servers. Currently, we purchase Dell
>
> Sean
>
> On Thu, Oct 28, 2010 at 2:52 AM, Tim Robertson
> wrote:
>
>> Hi all,
>>
>> We are setting up a small Hadoop 13 node cluster running 1 HDFS
>> master, 9 region severs for HBase and 3 map reduce nodes, and are just
>> installing zookeeper to
Thanks again. One of the things we struggle with currently on the
RDBMS, is the organisation of 250million records to complex
taxonomies, and also point in polygon intersections. Having such
memory available the MR jobs allows us to consider loading taxonomies
/ polygons / RTree indexes into memo
of them (this means you
> will only have 9 RS). You don't really need an ensemble, unless you're
> planning to share that ZK setup with other apps.
>
> In any case, you should test all setups.
>
> J-D
>
> On Thu, Oct 14, 2010 at 4:51 AM, Tim Robertson
> wrot
Hi all,
We are about to setup a new installation using the following machines,
and CDH3 beta 3:
- 10 nodes of single quad core, 8GB memory, 2x500GB SATA
- 3 nodes of dual quad core, 24GB memory, 6x250GB SATA
We are finding our feet, and will blog tests, metrics etc as we go but
our initial usage
Just in case this was misinterpreted - my proposal was not to use
mapreduce at all, but to take the Set keys submitted and
simply iterate over them, calling the getByKey(key) to populate an H2
DB (or simply an in memory structure) to do the final analytics. My
understanding is that HBase is design
Or build a temporary H2 database and then issue SQL for the final
group by type counts? H2 is hugely fast (20,000 record inserts per
second) when you run it in the same JVM.
Tim
On Sun, Sep 19, 2010 at 7:30 PM, Tim Robertson
wrote:
> If you are only doing 1k-100k record set analytics, wo
If you are only doing 1k-100k record set analytics, would it be
feasible to use the HBase client directly, perform a filtered scan and
do the analytics in memory using Java Collections? It depends on the
number of dimensions you need but 100k rows of a few Integers is not
absurd to hold in memory,
Hi all,
Disclosure: I have been an active member of the Hadoop / HBase / Hive
mailing lists for some time. I am not a recruiter, but looking to
increase a development team that I lead. I sincerely apologize if
this message is against mailing list etiquette; I have not seen any
guidelines forbidd
Hi,
To my knowledge, there is nothing built in so you would have to build
and maintain the spatial index yourself.
If you are only doing a distance query, you might consider keeping a
column containing something like a geohash
(http://en.wikipedia.org/wiki/Geohash) and then build a secondary
inde
> - Do you plan to serve data out of HBase or will you just use it for
> MapReduce? Or will it be a mix (not recommended)?
I am also curious what would be the recommended deployment when you
have this need (e.g. building multiple Lucene indexes which hold only
the Row ID, so building is MR intens
Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
>
>
>
> - Original Message
>> From: Tim Robertson
>> To: hbase-u...@hadoop.apache.org
>> Sent: Sat, March 27, 2010 2:46:00 PM
>> Subject: elastic search or other Lucene for HBase?
>>
Hi Alex,
Is there a publicly visible roadmap somewhere for CDH3 please?
http://archive.cloudera.com/docs/cdh3-top.html doesn't yet mention
HBase but I gather it is actually in there.
I am curious what Hive and Sqoop integration you might have setup out
of the box. I could imagine a CDH3 installat
47 matches
Mail list logo