Re: Prewarming in-memory column families

2013-03-01 Thread Eric Czech
ever been a fan of that setting. > > J-D > > On Tue, Feb 26, 2013 at 5:04 PM, Sergey Shelukhin > wrote: > > should we make this built-in? Sounds like default user intent for > in-memory. > > > > On Tue, Feb 26, 2013 at 2:13 PM, Stack wrote: > > > >>

Re: Hbase sequential row merging in MapReduce job

2012-10-19 Thread Eric Czech
ong as you know your keyspace, you should be able to create your own > > splits. See TableInputFormatBase for the default implementation (which > is > > 1 input split per region) > > > > > > > > > > > > On 10/19/12 9:32 AM, "Eric Czech"

Re: Indexing w/ HBase

2012-10-12 Thread Eric Czech
> against the idea, just that its more of an issue of design.) > > -Mike > > On Oct 12, 2012, at 7:03 AM, Eric Czech wrote: > > > Hi everyone, > > > > Are there any tools or libraries for managing HDFS files that are used > > solely for the purpose of

Re: Managing MapReduce jobs with concurrent client reads

2012-09-07 Thread Eric Czech
or did you find that that isn't a reliable throttling mechanism? Also, is replication to that batch cluster done via HBase replication or some other approach? On Thu, Sep 6, 2012 at 4:08 PM, Stack wrote: > > On Wed, Sep 5, 2012 at 6:25 AM, Eric Czech wrote: > > Hi everyone, &

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
u want, you could hash the key and > then you wouldn't have a problem of hot spotting. > > > On Sep 4, 2012, at 1:51 PM, Eric Czech wrote: > > > How does the data flow in to the system? One source at a time? > > Generally, it will be one source at a time where these ro

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
to be > some sort of incremental larger than a previous value? (Are you always > inserting to the left side of the queue? > > How are you using the data when you pull it from the database? > > 'Hot spotting' may be unavoidable and depending on other factors, it may

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
ed out. > > (Note: I'm not an expert on how hbase balances the regions across a region > server so I couldn't tell you how it choses which nodes to place each > region.) > > But what are you trying to do? Avoid a hot spot on the initial load, or > are you looking at

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
be moved to another server to balanced the load. You can also > move it manually. > > JM > > 2012/9/4, Eric Czech : > > Thanks again, both of you. > > > > I'll look at pre splitting the regions so that there isn't so much > initial > > conte

Re: Key formats and very low cardinality leading fields

2012-09-04 Thread Eric Czech
our entries starting with 1, or 3, they > > will go on one uniq region. Only entries starting with 2 are going to > > be sometime on region 1, sometime on region 2. > > > > Of course, the more data you will load, the more regions you will > > have, the less hotspoting you will h

Re: Key formats and very low cardinality leading fields

2012-09-03 Thread Eric Czech
ou write million of lines starting with a 1. > > If you have une hundred regions, you will face the same issue at the > beginning, but the more data your will add, the more your table will > be split across all the servers and the less hotspottig you will have. > > Can't

Re: Key formats and very low cardinality leading fields

2012-09-03 Thread Eric Czech
between 1 and 30 for each > write, then you will reach multiple region/servers if you have, else, > you might have some hot-stopping. > > JM > > 2012/9/3, Eric Czech : >> Hi everyone, >> >> I was curious whether or not I should expect any write hot spots if I >>

Re: MemStore and prefix encoding

2012-08-26 Thread Eric Czech
timestamp it is undefined >> which one you'll get back when you read the next time). So that does not >> make sense. Writes with the same key, column family, qualifier (each with a >> different timestamp) count towards the version limit. >> >> -- Lars >>

MemStore and prefix encoding

2012-08-25 Thread Eric Czech
Hi everyone, Does prefix encoding apply to rows in MemStores or does it only apply to rows on disk in HFiles? I'm trying to decide if I should still favor larger values in order to not repeat keys, column families, and qualifiers more than necessary and while prefix encoding seems to negate that

Multiple scan input split for MR job

2012-08-08 Thread Eric Czech
Hi everyone, I've been searching for a way to specify an MR job on an HBase table using multiple key ranges (instead of just one), and as far as I can tell, the best way is still to create a custom InputFormat like MultiSegmentTableInputFormat and override getSplits to return splits based on multi

Re: Ideal row size

2012-08-08 Thread Eric Czech
the size of a block, so 64KB > including the keys. Having bigger cells would inflate the size of your > blocks but then you'd be outside of the normal HBase settings. > > That, and do some experiments. > > J-D > > On Tue, Aug 7, 2012 at 6:35 AM, Eric Czech wrote: &g

Ideal row size

2012-08-07 Thread Eric Czech
Hello everyone, I'm trying to store many small values in indexes created via MR jobs, and I was hoping to get some advice on how to structure my rows. Essentially, I have complete control over how large the rows should be as the values are small, consistent in size, and can be grouped together in

more tables or more rows

2012-08-07 Thread Eric Czech
data store, it wouldn't be wise to have > several small rows. The major purpose of Hbase is to host very large > tables that may go beyond billions of rows and millions of columns. > > Regards, > Mohammad Tariq > > > On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech wrote:

more tables or more rows

2012-08-05 Thread Eric Czech
I need to support data that comes from 30+ sources and the structure of that data is consistent across all the sources, but what I'm not clear on is whether or not I should use 30+ tables with roughly the same format or 1 table where the row key reflects the source. Anybody have a strong argument

Re: Index building process design

2012-07-25 Thread Eric Czech
new data > into HDFS directly as flat files, run MR over them to create the index and > also put them into HBase for serving. > > Hope that gives you some more ideas to think about. > > -ak > > > On Wednesday, July 11, 2012 at 10:26 PM, Eric Czech wrote: > > > Hi e

Re: Index building process design

2012-07-23 Thread Eric Czech
forth between clusters instead of building indexes on one cluster and copying them to another? On Thu, Jul 12, 2012 at 1:26 AM, Eric Czech wrote: > Hi everyone, > > I have a general design question (apologies in advanced if this has > been asked before). > > I'd like to bui

Index building process design

2012-07-11 Thread Eric Czech
Hi everyone, I have a general design question (apologies in advanced if this has been asked before). I'd like to build indexes off of a raw data store and I'm trying to think of the best way to control processing so some part of my cluster can still serve reads and writes without being affected h