Splitlog Replication

2014-01-07 Thread Patrick Schless
I need to run through some server maintenance on my data nodes, including a reboot. My splitlogs, though, only seem to have a replication factor of 1 (when a data nodes is taken offline, I sometimes have missing blocks for them). I know I can decommission data nodes with the exclude.dfs file, but t

Re: 3-Hour Periodic Network/CPU/Disk/Latency Spikes

2013-12-16 Thread Patrick Schless
piece of software yourself. > > Best regards, > Vladimir Rodionov > Principal Platform Engineer > Carrier IQ, www.carrieriq.com > e-mail: vrodio...@carrieriq.com > > ____ > From: Patrick Schless [patrick.schl...@gmail.com] > Sent:

Re: 3-Hour Periodic Network/CPU/Disk/Latency Spikes

2013-12-13 Thread Patrick Schless
q.com > > > From: Ted Yu [yuzhih...@gmail.com] > Sent: Friday, December 13, 2013 3:33 PM > To: user@hbase.apache.org > Cc: user > Subject: Re: 3-Hour Periodic Network/CPU/Disk/Latency Spikes > > Patrick: > Attachment didn't go th

Re: 3-Hour Periodic Network/CPU/Disk/Latency Spikes

2013-12-13 Thread Patrick Schless
rs enough store files build up to require compactions. > > There's nothing else automated in HDFS or HBase that I could see causing > this. > > On Fri, Dec 13, 2013 at 3:07 PM, Patrick Schless > wrote: > > > CDH4.1.2 > > HBase 0.92.1 > > HDFS 2.0.0 &

3-Hour Periodic Network/CPU/Disk/Latency Spikes

2013-12-13 Thread Patrick Schless
CDH4.1.2 HBase 0.92.1 HDFS 2.0.0 Every 3 hours, our production HBase cluster does something that causes all the data nodes to have a sustained spike in CPU/network/disk. The spike lasts about 30 mins, and during this time the cluster has greatly increased latencies for our typical application usa

Re: Scanner Caching with wildly varying row widths

2013-11-04 Thread Patrick Schless
to make use of the setBatch > feature > > Regards, > Dhaval > > > ____________ > From: Patrick Schless > To: user > Sent: Monday, 4 November 2013 6:03 PM > Subject: Scanner Caching with wildly varying row widths > > > We have an ap

Scanner Caching with wildly varying row widths

2013-11-04 Thread Patrick Schless
We have an application where a row can contain anywhere between 1 and 360 cells (there's only 1 column family). In practice, most rows have under 100 cells. Now we want to run some mapreduce jobs that touch every cell within a range (eg count how many cells we have). With scanner caching set

Re: Consultant Wanted

2013-11-04 Thread Patrick Schless
I should mention that I shot an email to Cloudera today, and expect to hear from them soon. It might be that they're the standard go-to for this sort of thing, but I'm wondering if there are other options I should be considering. On Mon, Nov 4, 2013 at 4:36 PM, Patrick Schless wro

Consultant Wanted

2013-11-04 Thread Patrick Schless
Our team's strength's lie more around application development and devops than jvm and hadoop tuning. We currently run two CDH4 clusters (without the cloudera manager), and are interested in establishing a relationship with a consultancy that can help us tune and maintain things (config, server spec

ageOfLastAppliedOp, ageOfLastShippedOp

2013-08-23 Thread Patrick Schless
We run two hbase clusters, and one (master) replicates to the other (standby). We did some maintenance last night which involved bringing all of hbase down while we made changes to HDFS. After bringing things back up, our ageOfLastShippedOp on a few of the master region servers jumped to around -9

Re: How to recover data from hadoop/hbase cluster

2013-08-09 Thread Patrick Schless
Are you missing data? Replication/recovery of blocks is automatic, and there isn't a manual process to it. FWIW, for something like changing the hard drive configs on the box, it would be a better idea to unbalance the node ahead of time and then rebalance it. On Fri, Aug 9, 2013 at 7:18 AM, oc

Re: HDFS Restart with Replication

2013-08-08 Thread Patrick Schless
Doing a stop on the master and the region > > servers will screw things up. > > > > J-D > > > > On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless > > wrote: > > > Doesn't stop-hbase.sh (and its ilk) require the server to be able to > > manage > >

Fix Borked Replication

2013-08-08 Thread Patrick Schless
I run HBase replication, and while improperly restarting my standby cluster I lost a few splitlog blocks in my replicated table (on the standby cluster). I'm thinking that my standby table is possibly borked now (I can't use the VerifyRep job because I use the increment API). Is it reasonable to f

Re: HDFS Restart with Replication

2013-08-06 Thread Patrick Schless
ng a stop on the master and the region > servers will screw things up. > > J-D > > On Fri, Aug 2, 2013 at 3:28 PM, Patrick Schless > wrote: > > Doesn't stop-hbase.sh (and its ilk) require the server to be able to > manage > > the clients (using unpassworded SSH k

Re: Cleaning up after failed table rename

2013-08-02 Thread Patrick Schless
at org.apache.hadoop.hbase.util.HBaseFsck.main(HBaseFsck.java:3205) On Fri, Aug 2, 2013 at 11:31 AM, Ted Yu wrote: > Can you try running: > > hbck -repair > > On Fri, Aug 2, 2013 at 9:28 AM, Patrick Schless > wrote: > > > I was testing an hbase table rename script I found in a

Re: HDFS Restart with Replication

2013-08-02 Thread Patrick Schless
iel Cryans wrote: > Doing a bin/stop-hbase.sh is the way to go, then on the Hadoop side > you do stop-all.sh. I think your ordering is correct but I'm not sure > you are using the right commands. > > J-D > > On Fri, Aug 2, 2013 at 8:27 AM, Patrick Schless > wrote: > >

Cleaning up after failed table rename

2013-08-02 Thread Patrick Schless
I was testing an hbase table rename script I found in a JIRA, and it didn't work for me. Not a huge deal (I went with a different solution), but it left some data I want to clean up. I was trying to rename a table from "t1" to "t1.renamed". Now in HBase, 'list' shows 't1.renamed'. In HDFS, I have

Re: HDFS Restart with Replication

2013-08-02 Thread Patrick Schless
the Namenode and > datanode logs? I'd suggest you start by doing a fsck on one of those > files with the option that gives the block locations first. > > By the way why do you have split logs? Are region servers dying every > time you try out something? > > On Thu, Aug 1, 20

Re: HDFS Restart with Replication

2013-08-01 Thread Patrick Schless
nyone else. On Thu, Aug 1, 2013 at 5:04 PM, Jean-Daniel Cryans wrote: > I can't think of a way how your missing blocks would be related to > HBase replication, there's something else going on. Are all the > datanodes checking back in? > > J-D > > On Thu, Aug 1,

Reload configs

2013-08-01 Thread Patrick Schless
Is there a way to reload the HBase configs without restarting the whole system (in other words, without an interruption of service)? I'm on: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Thanks, Patrick

HDFS Restart with Replication

2013-08-01 Thread Patrick Schless
I'm running: CDH4.1.2 HBase 0.92.1 Hadoop 2.0.0 Is there an issue with restarting a standby cluster with replication running? I am doing the following on the standby cluster: - stop hmaster - stop name_node - start name_node - start hmaster When the name node comes back up, it's reliably missing

Re: Replication - some timestamps off by 1 ms

2013-07-11 Thread Patrick Schless
tables in two different clusters. > WARNING: It doesn't work for incrementColumnValues'd cells since the > timestamp is changed after being appended to the log. > > > The problem is that increments' timestamps are different in the WAL > and in the final KV that's sto

Re: Replication - some timestamps off by 1 ms

2013-07-11 Thread Patrick Schless
, Jul 11, 2013 at 12:53 PM, Jean-Daniel Cryans wrote: > Are those incremented cells? > > J-D > > On Thu, Jul 11, 2013 at 10:23 AM, Patrick Schless > wrote: > > I have had replication running for about a week now, and have had a lot > of > > data flowing to o

Replication - some timestamps off by 1 ms

2013-07-11 Thread Patrick Schless
I have had replication running for about a week now, and have had a lot of data flowing to our slave cluster over that time. Now, I'm running the verifyrep MR job over a 1-hour period a couple days ago (which should be fully replicated), and I'm seeing a small number of "BADROWS". Spot-checking a f

Re: hbase.client.scanner.caching - default 1, not 100

2013-07-11 Thread Patrick Schless
n section 2.3.1, you would see that its value is 1. > > Cheers > > On Thu, Jul 11, 2013 at 9:28 AM, Patrick Schless > wrote: > > > In 0.94 I noticed (in the "Job File") my job VerifyRep job was running > with > > hbase.client.scanner.caching set to 1, even

hbase.client.scanner.caching - default 1, not 100

2013-07-11 Thread Patrick Schless
In 0.94 I noticed (in the "Job File") my job VerifyRep job was running with hbase.client.scanner.caching set to 1, even though the hbase docs [1] say it defaults to 100. I didn't have that property being set in any of my configs. I added the properties to hbase-site.xml (set to 100), and now that j

Re: VerifyRep - "Replication needs to be enabled to verify it."

2013-07-11 Thread Patrick Schless
11, 2013 at 8:46 AM, Patrick Schless wrote: > Yes [1], I set that in hbase-site.xml when I turned on replication. This > box is solely my job-tracker, so maybe it doesn't pick up the > hbase-site.xml? Trying this job from the HMaster didn't work, because it > doesn't h

Re: VerifyRep - "Replication needs to be enabled to verify it."

2013-07-11 Thread Patrick Schless
configuration value is false. > Below is the relevant code: > if (!conf.getBoolean(HConstants.REPLICATION_ENABLE_KEY, false)) { > throw new IOException("Replication needs to be enabled to verify > it."); > } > > Jieshan > -Original Message- &g

VerifyRep - "Replication needs to be enabled to verify it."

2013-07-10 Thread Patrick Schless
On 0.92.1, I have (recently) enabled replication, and I'm trying to verify that it's working correctly. I am getting an error saying that replication needs to be enabled, but replication *is* enabled, so I assume I'm doing something wrong. Looking at the age of the last shipped op (on the master cl

Re: HBase replication - EOF while reading

2013-07-03 Thread Patrick Schless
es.apache.org/jira/browse/HBASE-7122 > > Thanks, > Himanshu > > > On Tue, Jul 2, 2013 at 3:09 PM, Patrick Schless > wrote: > > > I've just enabled replication (to 1 peer), and I'm seeing a bunch of > > errors, along the lines of [1]. Replication doe

HBase replication - EOF while reading

2013-07-02 Thread Patrick Schless
I've just enabled replication (to 1 peer), and I'm seeing a bunch of errors, along the lines of [1]. Replication does seem to work, though (data is showing up in the standby cluster). The file exists (I can see it in the HDFS web GUI), but it seems be empty. Is this an error I need to worry about

Re: stop_replication dangerous?

2013-07-01 Thread Patrick Schless
sure thing: https://issues.apache.org/jira/browse/HBASE-8844 On Mon, Jul 1, 2013 at 3:59 PM, Jean-Daniel Cryans wrote: > Yeah that package documentation ought to be changed. Mind opening a jira? > > Thx, > > J-D > > On Mon, Jul 1, 2013 at 1:51 PM, Patrick Schless >

stop_replication dangerous?

2013-07-01 Thread Patrick Schless
The first two tutorials for enabling replication that google gives me [1], [2] take very different tones with regard to stop_replication. The HBase docs [1] make it sound fine to start and stop replication as desired. The Cloudera docs [2] say it may cause data loss. Which is true? If data loss is

Re: CopyTable

2013-06-20 Thread Patrick Schless
in this case you've to disable your table first. > > > Matteo > > > > On Wed, Jun 19, 2013 at 6:19 PM, Patrick Schless > wrote: > > > Unfortunately, I'm on 0.92.1, and the snapshot approach you linked isn't > > available until 0.94. Bummer, looked

Re: CopyTable

2013-06-19 Thread Patrick Schless
sn't seem to be a good way to rename a table >> >> Have you looked at http://hbase.apache.org/book.html#table.rename ? >> >> Cheers >> >> On Mon, Jun 17, 2013 at 12:20 PM, Patrick Schless < >> patrick.schl...@gmail.com >> > wrote: >>

Re: Replication - ports/hosts

2013-06-19 Thread Patrick Schless
On Wed, Jun 19, 2013 at 12:41 AM, Stack wrote: > On Mon, Jun 17, 2013 at 12:06 PM, Patrick Schless < > patrick.schl...@gmail.com > > wrote: > > > Working on setting up HBase replication across a VPN tunnel, and > following > > the docs here: [1] (and here: [2]).

Re: CopyTable

2013-06-17 Thread Patrick Schless
hbase.apache.org/book.html#table.rename ? > > Cheers > > On Mon, Jun 17, 2013 at 12:20 PM, Patrick Schless < > patrick.schl...@gmail.com > > wrote: > > > Context: > > I'm working on getting replication set up, and a prerequisite for me is > to > >

CopyTable

2013-06-17 Thread Patrick Schless
Context: I'm working on getting replication set up, and a prerequisite for me is to rename the table (since you have to replicate to the same name as the source). For this, I'm testing a CopyTable strategy, since there doesn't seem to be a good way to rename a table (please correct me if I'm wrong)

Replication - ports/hosts

2013-06-17 Thread Patrick Schless
Working on setting up HBase replication across a VPN tunnel, and following the docs here: [1] (and here: [2]). Two questions, regarding firewall allowances required: 1) The docs say that the zookeeper clusters must be able to reach each other. I don't see any docs on why this is (the high-level di

Web Admin Pages & SSL

2012-07-30 Thread Patrick Schless
I like having access to the web admin pages that HBase, HDFS, etc provide. I can't find a way to put them behind SSL, though. For the HMaster it's easy enough (nginx+SSL as a reverse proxy), but the HMaster generates links like data01.company.com:60030. Is there a way to change the scheme and port

CellCounter -- Exceeded limits on number of counters

2012-07-11 Thread Patrick Schless
I am trying to find out the number of data points (cells) in a table with "hbase org.apache.hadoop.hbase.mapreduce.CellCounter ". on a very small table (3 cells), it works fine. On a table with a couple thousand cells, I get this error (4 times): org.apache.hadoop.mapred.Counters$CountersExceeded

Re: Migrating Clusters - Broken Metadata

2012-07-06 Thread Patrick Schless
the old nodes), I was able to remove the /etc/hosts entries and bounce hbase without any problem I still have no idea where the new hbase is getting the references to the old nodes.. Filed a bug report: https://issues.apache.org/jira/browse/HBASE-6343 On Thu, Jul 5, 2012 at 6:44 PM, Patrick

Migrating Clusters - Broken Metadata

2012-07-05 Thread Patrick Schless
I have an existing hbase cluster (old.domain.com) and I am trying to migrate the data to a new set of boxes (new.domain.com). Both are running hbase 0.90.x. I would like to minimize downtime, so I'm looking at the Backup tool from mozilla ( http://blog.mozilla.org/data/2011/02/04/migrating-hbase-i