Re: EC2 instance type recommendation ?

2013-07-16 Thread Bryan Keller
For me, it came down to the cc2.8xlarge vs the hi1.4xlarge. In the end I went with the hi1.4xlarge. On Jul 16, 2013, at 11:18 AM, Amit Mor wrote: > Hello, I am curious to hear the recommendations people have here for > running HBase on EC2 instances. I failed to find an instance that has a > g

Re: Poor HBase map-reduce scan performance

2013-07-01 Thread Bryan Keller
rformance >> >> Looking at the tail of HBASE-8369, there were some comments which are yet >> to be addressed. >> >> I think trunk patch should be finalized before backporting. >> >> Cheers >> >> On Mon, Jul 1, 2013 at 12:23 PM, Bryan Keller wrote:

Re: Poor HBase map-reduce scan performance

2013-06-30 Thread Bryan Keller
I'll attach my patch to HBASE-8369 tomorrow. On Jun 28, 2013, at 10:56 AM, lars hofhansl wrote: > If we can make a clean patch with minimal impact to existing code I would be > supportive of a backport to 0.94. > > -- Lars > > > > - Original Message ---

Re: Poor HBase map-reduce scan performance

2013-06-25 Thread Bryan Keller
>>>>>> with my P/C scanner). However, when I set scanner caching to 5000, >>>>>> it's >>>>>> more of a wash compared to the standard ClientScanner: ~53k >>> records/sec >>>>>> with the ClientScanner and ~60k rec

Re: Poor HBase map-reduce scan performance

2013-06-04 Thread Bryan Keller
not (yet) a strength of HBase. >> >> So with HDFS you get to 75% of the theoretical maximum read throughput; >> hence with HBase you to 25% of the theoretical cluster wide maximum disk >> throughput? >> >> >> -- Lars >> >> >> >> --

Re: Poor HBase map-reduce scan performance

2013-05-23 Thread Bryan Keller
It means that Scanner caching and larger block sizes work only to >>>> amortize >>>> the fixed overhead of disk IOs and RPCs -- they do nothing to keep the >>>> IO >>>> subsystems saturated during sequential reads. What *should* happen is >>>&

Re: Poor HBase map-reduce scan performance

2013-05-10 Thread Bryan Keller
time to scan is about the same. Adding in HBase slows things down another 3x. So I'm seeing 9x faster I/O scanning an uncompressed sequence file vs scanning a compressed table. On May 8, 2013, at 10:15 AM, Bryan Keller wrote: > Thanks for the offer Lars! I haven't made much prog

Re: Poor HBase map-reduce scan performance

2013-05-08 Thread Bryan Keller
lar shape. > > -- Lars > > > > ____ > From: Bryan Keller > To: user@hbase.apache.org > Sent: Friday, May 3, 2013 3:44 AM > Subject: Re: Poor HBase map-reduce scan performance > > > Actually I'm not too confident in my resul

Re: Poor HBase map-reduce scan performance

2013-05-03 Thread Bryan Keller
Actually I'm not too confident in my results re block size, they may have been related to major compaction. I'm going to rerun before drawing any conclusions. On May 3, 2013, at 12:17 AM, Bryan Keller wrote: > I finally made some progress. I tried a very large HBase block size (

Re: Poor HBase map-reduce scan performance

2013-05-03 Thread Bryan Keller
ou'd have to > drill in to find the allocate()). > > > During normal scanning (again, without encoding) there should be no > allocation happening except for blocks read from disk (and they should all be > the same size, thus allocation should be cheap). > > -- Lars &g

Re: Poor HBase map-reduce scan performance

2013-05-02 Thread Bryan Keller
changing the block size, either HDFS or HBase, help here? Also, if anyone has tips on how else to profile, that would be appreciated. VisualVM can produce a lot of noise that is hard to sift through. On May 1, 2013, at 9:49 PM, Bryan Keller wrote: > I used exactly 0.94.4, pulled from the tag

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
nal Message ----- > From: Bryan Keller > To: "user@hbase.apache.org" > Cc: > Sent: Wednesday, May 1, 2013 6:01 PM > Subject: Re: Poor HBase map-reduce scan performance > > I tried running my test with 0.94.4, unfortunately performance was about the > same. I'

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
I tried running my test with 0.94.4, unfortunately performance was about the same. I'm planning on profiling the regionserver and trying some other things tonight and tomorrow and will report back. On May 1, 2013, at 8:00 AM, Bryan Keller wrote: > Yes I would like to try this, if

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
ve wide >>> rows and/or large key portions. That in turns makes scans scale better >>> across cores, since RAM is shared resource between cores (much like >> disk). >>> >>> >>> It's not hard to build the latest HBase against Cloudera's ve

Re: Poor HBase map-reduce scan performance

2013-05-01 Thread Bryan Keller
simple patch to pom.xml to do that. > > -- Lars > > > > ____ > From: Bryan Keller > > To: user@hbase.apache.org > Sent: Tuesday, April 30, 2013 11:02 PM > Subject: Re: Poor HBase map-reduce scan performance > > > The table has hashed keys

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
mple program to generate some data - not 700g, though > :) - I'll try to do a bit of profiling during the next days as my day job > permits, but I do not have any machines with SSDs). > > -- Lars > > > > > > From: Bryan Kelle

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
Yes, I have it enabled (forgot to mention that). On Apr 30, 2013, at 9:56 PM, Ted Yu wrote: > Have you tried enabling short circuit read ? > > Thanks > > On Apr 30, 2013, at 9:31 PM, Bryan Keller wrote: > >> Yes, I have tried various settings for setCaching() and I

Re: Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
pReduce jobs > scan.setCacheBlocks(false); // don't set to true for MR jobs > > I guess you have used the above setting. > > 0.94.x releases are compatible. Have you considered upgrading to, say > 0.94.7 which was recently released ? > > Cheers > > On Tue, Apr 30

Poor HBase map-reduce scan performance

2013-04-30 Thread Bryan Keller
I have been attempting to speed up my HBase map-reduce scans for a while now. I have tried just about everything without much luck. I'm running out of ideas and was hoping for some suggestions. This is HBase 0.94.2 and Hadoop 2.0.0 (CDH4.2.1). The table I'm scanning: 20 mil rows Hundreds of col

Re: Maximizing throughput

2013-01-15 Thread Bryan Keller
l max on a given box (client -> switch -> datanode 1 -> switch -> datanode 2). It was an eye opener that I was network I/O limited. I will probably move to a 10gbit/sec switch and/or use bonded NICs. On Jan 11, 2013, at 9:37 AM, Bryan Keller wrote: > Thanks for the responses. I&#x

Re: Maximizing throughput

2013-01-11 Thread Bryan Keller
Thanks for the responses. I'm running HBase 0.92.1 (Cloudera CDH4). The program is very simple, it inserts batches of rows into a table via multiple threads. I've tried running it with different parameters (column count, threads, batch size, etc.), but throughput didn't improve. I've pasted the

Maximizing throughput

2013-01-10 Thread Bryan Keller
I am attempting to configure HBase to maximize throughput, and have noticed some bottlenecks. In particular, with my configuration, write performance is well below theoretical throughput. I have a test program that inserts many rows into a test table. Network I/O is less than 20% of max, and dis

Re: HBaseClient.call() hang

2012-12-18 Thread Bryan Keller
ould have been reset, thus > causing the client to interrupt, right? It shouldn't be a matter of timeout > at all. > > On Dec 17, 2012, at 7:18 PM, Bryan Keller wrote: > >> It seems there was a cascading effect. The regionservers were busy with >> scanning a t

Re: HBaseClient.call() hang

2012-12-18 Thread Bryan Keller
oid this issue. what size of your block > size? and can you paste your JVM options here? > > I also met a long GC problem, but I tuned jvm options, it works very well > now. > > > On Tue, Dec 18, 2012 at 1:18 AM, Bryan Keller wrote: > >> It seems there was a ca

Re: HBaseClient.call() hang

2012-12-17 Thread Bryan Keller
ing down or being busy. I assume it was not > often that regionserver(s) went down. For busy region server, did you try > jstack'ing regionserver process ? > > Thanks > > On Fri, Dec 14, 2012 at 2:59 PM, Bryan Keller wrote: > >> I have encountered a problem wit

Re: HBaseClient.call() hang

2012-12-14 Thread Bryan Keller
Forgot to mention that. It's version 0.92.1 (Cloudera CDH4.1.1), running on CentOS 6 64 bit, Java 1.6.0_31 On Dec 14, 2012, at 5:31 PM, lars hofhansl wrote: > Hey Bryan, > > > which version of HBase it this? > > -- Lars > > > > ________

HBaseClient.call() hang

2012-12-14 Thread Bryan Keller
I have encountered a problem with HBaseClient.call() hanging. This occurs when one of my regionservers goes down while performing a table scan. What exacerbates this problem is that the scan I am performing uses filters, and the region size of the table is large (4gb). Because of this, it can ta

Re: dfs.replication

2012-12-12 Thread Bryan Keller
coming > from somewhere. Double check that classpath of yours with bin/hbase > classpath > > J-D > > On Wed, Dec 12, 2012 at 11:32 AM, Bryan Keller wrote: >> I noticed in some of the documentation that it states to add the Hadoop >> config directory to the HBase classpath

dfs.replication

2012-12-12 Thread Bryan Keller
I noticed in some of the documentation that it states to add the Hadoop config directory to the HBase classpath if you want HBase to use any DFS client settings, like dfs.replication. Is this still true? It seems like HBase is using the dfs.replication setting. I have it set to 2 in the Hadoop c

Re: Poor data locality of MR job

2012-08-02 Thread Bryan Keller
have in a different thread). Performance of the rack local mappers vs data local is roughly 2x slower, so the performance hit is significant. On Aug 2, 2012, at 11:37 AM, Jean-Daniel Cryans wrote: > On Wed, Aug 1, 2012 at 11:31 PM, Bryan Keller wrote: >> I have an 8 node cluster an

Re: Region balancing question

2012-08-02 Thread Bryan Keller
u might be using some older > version? > > -Anoop- > > From: Bryan Keller [brya...@gmail.com] > Sent: Thursday, August 02, 2012 11:37 AM > To: user@hbase.apache.org > Subject: Region balancing question > > I have a table on a 4 node test cluster. I also have some o

Re: Poor data locality of MR job

2012-08-02 Thread Bryan Keller
to > regionservers for you ? > > Did your regionserver(s) fail ? > > On Thu, Aug 2, 2012 at 8:31 AM, Bryan Keller wrote: > >> I have an 8 node cluster and a table that is pretty well balanced with on >> average 36 regions/node. When I run a mapreduce job on the cluste

Poor data locality of MR job

2012-08-01 Thread Bryan Keller
I have an 8 node cluster and a table that is pretty well balanced with on average 36 regions/node. When I run a mapreduce job on the cluster against this table, the data locality of the mappers is poor, e.g 100 rack local mappers and only 188 data local mappers. I would expect nearly all of the

Region balancing question

2012-08-01 Thread Bryan Keller
I have a table on a 4 node test cluster. I also have some other tables on the cluster. The table in question has a total of 12 regions. I noticed that 1 node has 6 regions, another has zero, and the remaining two nodes have the expected 3 regions. I'm a little confused how this can happen. The

WAL corruption

2012-07-02 Thread Bryan Keller
During an upgrade of my cluster to 0.90 to 0.92 over the weekend, the WAL (files in the /hbase/.logs directory) was corrupted and it prevented HBase from starting up. The exact exception was "java.io.IOException: Could not obtain the last block locations" on the WAL files. I was able to recover

Re: Leap second bug

2012-07-02 Thread Bryan Keller
We were also caught by this - we're running CentOS 6. Likewise, once we reset the date/time, HBase was happy again. I wonder why Java processes seemed to be affected more than other processes? On Jul 2, 2012, at 8:34 AM, Dean Banks wrote: > We were caught by this issue. It impacted all of our

WAL corruption

2012-07-02 Thread Bryan Keller
During an upgrade of my cluster to 0.90 to 0.92 over the weekend, the WAL (files in the /hbase/.logs directory) was corrupted and it prevented HBase from starting up. The exact exception was "java.io.IOException: Could not obtain the last block locations" on the WAL files. I was able to recover

Re: Small cluster setup

2012-06-19 Thread Bryan Keller
ness. > You can put ZK on any lightweigth server like HMaster, NN. > I use almost same configuration as yours: 10 Region Server + DataNode+ Task > Tracker, 1HMaster + ZK, 1NameNone + JobTracker + ZK, 1 ZK alone. > > Mikael.S > > On Tue, Jun 19, 2012 at 11:27 PM, Bryan Keller wro

Small cluster setup

2012-06-19 Thread Bryan Keller
I have a small cluster with 10 nodes. 8 nodes are datanodes/regionservers, and 2 nodes are running HA namenodes and HMaster. The question I have is, what would be the best way to configure Zookeeper in my cluster? Currently I have it running on one of the HMaster nodes. Running an instance on th

Task tracker timeout with filtered table scan

2012-05-31 Thread Bryan Keller
I have a large table that I am running a map reduce job on. The job scans for a particular column value in the table using a TableInputFormat with a filter on the scan. This value only matches a few rows, so most of the rows are filtered out. The problem is that the TableInputFormat will not r

Re: Created date field

2012-02-24 Thread Bryan Keller
t; thanks > > On Thu, Feb 23, 2012 at 3:57 PM, Bryan Keller wrote: > >> Does anyone know of any strategies for tracking the created date of a row >> or column, without a checkAndPut() type of solution? I am trying to avoid >> reading from the table to see if the va

Created date field

2012-02-23 Thread Bryan Keller
Does anyone know of any strategies for tracking the created date of a row or column, without a checkAndPut() type of solution? I am trying to avoid reading from the table to see if the value already exists before putting. One thought I had was to store a timestamp of every update as a column, bu

Re: Lease does not exist exceptions

2012-02-20 Thread Bryan Keller
LeaseException. I increased hbase.rpc.timeout to resolve the issue. On Feb 20, 2012, at 4:47 PM, Bryan Keller wrote: > I'm seeing "lease does not exist" exceptions under some circumstances, e.g. > > org.apache.hadoop.ipc.RemoteException: > org.apache.hadoop.hbase.regions

Lease does not exist exceptions

2012-02-20 Thread Bryan Keller
I'm seeing "lease does not exist" exceptions under some circumstances, e.g. org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hbase.regionserver.LeaseException: lease '4341530003498786620' does not exist After reading some about the exception, I was wondering if the following will cause

Re: Max region file size

2012-02-17 Thread Bryan Keller
Actually, I had pre-created too many regions so most were only half full, I had a couple of regions that were 4gb. On Feb 17, 2012, at 3:48 PM, Bryan Keller wrote: > Is the max region file size the size of the data uncompressed or the max size > of the store file? I noticed my store fil

Max region file size

2012-02-17 Thread Bryan Keller
Is the max region file size the size of the data uncompressed or the max size of the store file? I noticed my store files are ~2.1 gb though I have the max region size set to 4 gb. This is after a major compaction. Also, is the max region size 4 gb in HBase 0.90.4 or can it be larger? The docs s

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Bryan Keller
I was thinking (wrongly it seems) that having the region server read directly from the local file system would be faster than going through the data node, even with sequential access. On Feb 17, 2012, at 1:28 PM, Jean-Daniel Cryans wrote: > On Fri, Feb 17, 2012 at 1:21 PM, Bryan Keller wr

Re: Is hadoop 1.0.0 + HBase 0.90.5 the best combination for production cluster?

2012-02-17 Thread Bryan Keller
I have been experimenting with local reads. For me, enabling did not help improve read performance at all, I get the same performance either way. I can see in the data node logs it is passing back the local path, so it is enabled properly. Perhaps the benefits of local reads are dependent on th

Re: xceiver count, regionserver shutdown

2012-02-07 Thread Bryan Keller
-Daniel Cryans wrote: > On Mon, Feb 6, 2012 at 4:47 PM, Bryan Keller wrote: >> I increased the max region file size to 4gb so I should have fewer than 200 >> regions per node now, more like 25. With 2 column families that will be 50 >> memstores per node. 5.6gb would then

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
WALs. > > Note that you could set to have bigger WALs or more of them in order > to match the lower barrier (you'd tweak hbase.regionserver.maxlogs and > hbase.regionserver.hlog.blocksize) but it's still not as good as > having a few regions or using less of them at the same time. >

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
servers. > - Use a more sequential pattern so that you hit only a few regions at > a time, this is like the second solution but trying to make it work > with your current setup. This might not be practical for you as it > really depends on how easily you can sort your data source. > &

Re: xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
regions will have filled memstores so > you'd end up with hundreds of super small files... > > Please tell us more about the context of when this issue happens. > > J-D > > On Mon, Feb 6, 2012 at 11:42 AM, Bryan Keller wrote: >> I am trying to resolve an issue with my c

xceiver count, regionserver shutdown

2012-02-06 Thread Bryan Keller
I am trying to resolve an issue with my cluster when I am loading a bunch of data into HBase. I am reaching the "xciever" limit on the data nodes. Currently I have this set to 4096. The data node is logging "xceiverCount 4097 exceeds the limit of concurrent xcievers 4096". The regionservers even

Re: Using Scans in parallel

2011-10-10 Thread Bryan Keller
To follow up, the problem I was having w/ parallel scanners appears to be an issue w/ my app, I wasn't able to reproduce it in a more controlled test. On Oct 9, 2011, at 8:21 PM, Bryan Keller wrote: > BTW, a map reduce job can scan the table in 6m (both column families), > inc

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
BTW, a map reduce job can scan the table in 6m (both column families), including some processing. So that is the fastest approach. On Oct 9, 2011, at 8:03 PM, Bryan Keller wrote: > Sure. 2 region servers with 5 disks each. Table has 2 column families and 113 > regions total for 2m row

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
shtha wrote: > Interesting. > > Hey Bryan, can you please share the stats about: how many Regions, how > many Region Servers, time taken by Serial scanner and with 8 parallel > scanners. > > Himanshu > > On Sun, Oct 9, 2011 at 6:49 PM, Bryan Keller wrote: >> This i

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
t; So in theory it would be possible that multiple concurrent scans draw the > same scanner id. > > Since these are longs, this is astronomically unlikely, though (picking the > same number of 2^64, just does not happen :) ). > > > > ______

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
ile you do the scanning? > > I am pretty sure this has nothing to do with concurrent scans. > > From: Bryan Keller > To: Bryan Keller > Cc: user@hbase.apache.org > Sent: Sunday, October 9, 2011 11:03 AM > Subject: Re: Using Scans in parallel > > On further thought

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
On further thought, it seems this might be a serious issue, as two unrelated processes within an application may be scanning the same table at the same time. On Oct 9, 2011, at 10:59 AM, Bryan Keller wrote: > I was not able to get consistent results using multiple scanners in parallel >

Re: Using Scans in parallel

2011-10-09 Thread Bryan Keller
I was not able to get consistent results using multiple scanners in parallel on a table. I implemented a counter test that used 8 scanners in parallel on a table with 2m rows with 2k+ columns each, and the results were not consistent. There were no errors thrown, but the count was off by as much

Re: Speculative execution and TableOutputFormat

2011-09-12 Thread Bryan Keller
Ah that is a very interesting solution Leif, this seems optimal to me. I am going to try this and I'll report back. On Sep 12, 2011, at 9:09 AM, Leif Wickland wrote: > > Bryan, > > Have you considered writing your MR output to HFileFormat and then asking > the regions to adopt the result? Th

Re: Speculative execution and TableOutputFormat

2011-09-10 Thread Bryan Keller
educe.specex > > That's a good suggestion, and perhaps moving that config to > TableMapReduceUtil would be beneficial. > > > > > On 9/10/11 4:22 PM, "Bryan Keller" wrote: > >> I believe there is a problem with Hadoop's speculative execution

Speculative execution and TableOutputFormat

2011-09-10 Thread Bryan Keller
I believe there is a problem with Hadoop's speculative execution (which is on by default), and HBase's TableOutputFormat. If I understand correctly, speculative execution can launch the same task on multiple nodes, but only "commit" the one that finishes first. The other tasks that didn't comple

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

2011-04-16 Thread Bryan Keller
I opened a bug for this. https://issues.apache.org/jira/browse/HBASE-3792 On Apr 16, 2011, at 4:56 AM, Bryan Keller wrote: > I did more research and found the issue. > > The TableInputFormat creates an HTable using a new Configuration object, and > it never cleans it up. When runn

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

2011-04-16 Thread Bryan Keller
rue); > on line 52 before calling obj.wait(), situation should be different. > > Cheers > > On Fri, Apr 15, 2011 at 11:56 PM, Bryan Keller wrote: > >> FWIW, I created a test program that demonstrates the issue. The program >> creates an HBase table, populates it with 10 ro

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

2011-04-15 Thread Bryan Keller
): > HConnectionManager.deleteAllConnections(true); > But there is no such call in TableInputFormat / TableInputFormatBase / > TableRecordReader > > Do you mind filing a JIRA ? > > On Fri, Apr 15, 2011 at 3:41 PM, Bryan Keller wrote: > >> I am having this sam

Re: zk connection leak with TableInput/OutputFormat (CDH3b4, 0.90.1)

2011-04-15 Thread Bryan Keller
I am having this same problem. After every run of my map-reduce job which uses TableInputFormat, I am leaking one ZK connection. The connections that are not being cleaned up are connected to the node that submitted the job, not the cluster nodes. I tried explicitly cleaning up the connection u

Is there a setting to cap row size?

2011-04-07 Thread Bryan Keller
I have a wide table schema for an HBase table, where I model a one-to-many relationship of purchase orders and line items. Each row is a purchase order, and I add columns for each line item. Under normal circumstances I don't expect more than a few thousand columns per row, totalling less than 1

Re: Connecting to HBase from OSGi

2011-03-23 Thread Bryan Keller
Yes you can, I use Felix and Eclipse Virgo Blueprint in my app, and it connects to HBase. The HBase and Hadoop jars are not bundles, however, so you will need to either put them on the boot classpath, package them inside your bundle jar, or turn them into bundles yourself. On Mar 21, 2011, at 1

Re: Long client pauses with compression

2011-03-14 Thread Bryan Keller
to >128MB to flush > bigger files. > > Hope this helps, > > J-D > > On Mon, Mar 14, 2011 at 10:29 AM, Jean-Daniel Cryans > wrote: >> Thanks for the report Bryan, I'll try your little program against one >> of our 0.90.1 cluster that has similar hard

Re: Long client pauses with compression

2011-03-14 Thread Bryan Keller
s have enough heap (eg more than 3 > or 4GB). You should also consider setting MAX_FILESIZE to >1GB to > limit the number of regions and MEMSTORE_FLUSHSIZE to >128MB to flush > bigger files. > > Hope this helps, > > J-D > > On Mon, Mar 14, 2011 at 10:29 AM, Jean-Danie

Re: Long client pauses with compression

2011-03-13 Thread Bryan Keller
If interested, I wrote a small program that demonstrates the problem (http://vancameron.net/HBaseInsert.zip). It uses Gradle, so you'll need that. To run, enter "gradle run". On Mar 13, 2011, at 12:14 AM, Bryan Keller wrote: > I am using the Java client API to write 10,0

Long client pauses with compression

2011-03-13 Thread Bryan Keller
I am using the Java client API to write 10,000 rows with about 6000 columns each, via 8 threads making multiple calls to the HTable.put(List) method. I start with an empty table with one column family and no regions pre-created. With compression turned off, I am seeing very stable performance. A

HBase Java client dependencies

2011-02-28 Thread Bryan Keller
In my application I'm using the Java HBase API. I'm using Maven (well, Gradle actually) to declare a dependency on HBase. Unfortunately, the dependency on HBase drags every transitive dependency but the kitchen sink into my app, including Ant, Jasper, Jetty, and others. I am hoping not all of th

Re: Insert into tall table 50% faster than wide table

2010-12-23 Thread Bryan Keller
Correction, I ran the wrong test. Consolidating the Puts increased performance back to that of the tall table. So it appears row locks were the issue. Thanks for the help everyone. On Dec 23, 2010, at 2:28 PM, Bryan Keller wrote: > I revised the test so that it creates a single Put for e

Re: Insert into tall table 50% faster than wide table

2010-12-23 Thread Bryan Keller
one row in a single Put instance? Or are you creating many Put's for >>> each order but the same row? >>> >>> Lars >>> >>> On Thu, Dec 23, 2010 at 9:57 AM, Andrey Stepachev wrote: >>>> 2010/12/23 Ted Dunning >>>> >>&

Re: Insert into tall table 50% faster than wide table

2010-12-22 Thread Bryan Keller
rder.) >>> So you're doing one column write for each order and you have a total of 10K >>> rows. >>> >>> Unless I'm missing something part of the 'slowness' could be how your >>> writing your orders on your wide table. There are a co

Re: Insert into tall table 50% faster than wide table

2010-12-22 Thread Bryan Keller
Actually I don't think this is the problem as HBase versions cells, not rows, if I understand correctly. On Dec 22, 2010, at 5:03 PM, Bryan Keller wrote: > Perhaps slow wide table insert performance is related to row versioning? If I > have a customer row and keep adding order col

Re: Insert into tall table 50% faster than wide table

2010-12-22 Thread Bryan Keller
is no versioning going on. Could this be causing performance problems? On Dec 22, 2010, at 4:16 PM, Bryan Keller wrote: > It appears to be the same or better, not to derail my original question. The > much slower write performance will cause problems for me unless I can resolve > that. >

Re: Insert into tall table 50% faster than wide table

2010-12-22 Thread Bryan Keller
ide, > doing a lookup/scan? > > Thanks > > -Pete > > -Original Message----- > From: Bryan Keller [mailto:brya...@gmail.com] > Sent: Wednesday, December 22, 2010 3:41 PM > To: user@hbase.apache.org > Subject: Insert into tall table 50% faster than wide table

Insert into tall table 50% faster than wide table

2010-12-22 Thread Bryan Keller
I have been testing a couple of different approaches to storing customer orders. One is a tall table, where each order is a row. The other is a wide table where each customer is a row, and orders are columns in the row. I am finding that inserts into the tall table, i.e. adding rows for every or

Re: Composite key, scan on partial key

2010-12-14 Thread Bryan Keller
r of 'foo:'. > > The start,end key wont work with variable length in this way. But the > good news is prefix filter is very efficient. > > Good luck! > -ryan > > On Tue, Dec 14, 2010 at 3:28 PM, Bryan Keller wrote: >> I had a question about using a Scan o

Composite key, scan on partial key

2010-12-14 Thread Bryan Keller
I had a question about using a Scan on part of a composite key. Say I have order line item rows, and the ID is order ID + line item ID. Each ID is a random string. I want to get all line items for an order with my Scan object. Setting the startRow on Scan is easy enough, just set it to the order

Re: Schema design, one-to-many question

2010-11-29 Thread Bryan Keller
the important queries is something like "get me all the info > for this order". If so, it would be important that all fields for an order > are together. > > JG > >> -Original Message- >> From: Bryan Keller [mailto:brya...@gmail.com] >> Sent:

Schema design, one-to-many question

2010-11-29 Thread Bryan Keller
I have read comments on modeling one-to-many relationships in HBase and wanted to get some feedback. I have millions of customers, and each customer can make zero to thousands of orders. I want to store all of this data in HBase. The data is always accessed by customer. It seems there are a few sc