ever been a fan of that setting.
>
> J-D
>
> On Tue, Feb 26, 2013 at 5:04 PM, Sergey Shelukhin
> wrote:
> > should we make this built-in? Sounds like default user intent for
> in-memory.
> >
> > On Tue, Feb 26, 2013 at 2:13 PM, Stack wrote:
> >
> >>
ong as you know your keyspace, you should be able to create your own
> > splits. See TableInputFormatBase for the default implementation (which
> is
> > 1 input split per region)
> >
> >
> >
> >
> >
> > On 10/19/12 9:32 AM, "Eric Czech"
> against the idea, just that its more of an issue of design.)
>
> -Mike
>
> On Oct 12, 2012, at 7:03 AM, Eric Czech wrote:
>
> > Hi everyone,
> >
> > Are there any tools or libraries for managing HDFS files that are used
> > solely for the purpose of
or did you find that that isn't a reliable
throttling mechanism?
Also, is replication to that batch cluster done via HBase replication
or some other approach?
On Thu, Sep 6, 2012 at 4:08 PM, Stack wrote:
>
> On Wed, Sep 5, 2012 at 6:25 AM, Eric Czech wrote:
> > Hi everyone,
&
u want, you could hash the key and
> then you wouldn't have a problem of hot spotting.
>
>
> On Sep 4, 2012, at 1:51 PM, Eric Czech wrote:
>
> > How does the data flow in to the system? One source at a time?
> > Generally, it will be one source at a time where these ro
to be
> some sort of incremental larger than a previous value? (Are you always
> inserting to the left side of the queue?
>
> How are you using the data when you pull it from the database?
>
> 'Hot spotting' may be unavoidable and depending on other factors, it may
ed out.
>
> (Note: I'm not an expert on how hbase balances the regions across a region
> server so I couldn't tell you how it choses which nodes to place each
> region.)
>
> But what are you trying to do? Avoid a hot spot on the initial load, or
> are you looking at
be moved to another server to balanced the load. You can also
> move it manually.
>
> JM
>
> 2012/9/4, Eric Czech :
> > Thanks again, both of you.
> >
> > I'll look at pre splitting the regions so that there isn't so much
> initial
> > conte
our entries starting with 1, or 3, they
> > will go on one uniq region. Only entries starting with 2 are going to
> > be sometime on region 1, sometime on region 2.
> >
> > Of course, the more data you will load, the more regions you will
> > have, the less hotspoting you will h
ou write million of lines starting with a 1.
>
> If you have une hundred regions, you will face the same issue at the
> beginning, but the more data your will add, the more your table will
> be split across all the servers and the less hotspottig you will have.
>
> Can't
between 1 and 30 for each
> write, then you will reach multiple region/servers if you have, else,
> you might have some hot-stopping.
>
> JM
>
> 2012/9/3, Eric Czech :
>> Hi everyone,
>>
>> I was curious whether or not I should expect any write hot spots if I
>>
timestamp it is undefined
>> which one you'll get back when you read the next time). So that does not
>> make sense. Writes with the same key, column family, qualifier (each with a
>> different timestamp) count towards the version limit.
>>
>> -- Lars
>>
Hi everyone,
Does prefix encoding apply to rows in MemStores or does it only apply
to rows on disk in HFiles? I'm trying to decide if I should still
favor larger values in order to not repeat keys, column families, and
qualifiers more than necessary and while prefix encoding seems to
negate that
Hi everyone,
I've been searching for a way to specify an MR job on an HBase table
using multiple key ranges (instead of just one), and as far as I can
tell, the best way is still to create a custom InputFormat like
MultiSegmentTableInputFormat and override getSplits to return splits
based on multi
the size of a block, so 64KB
> including the keys. Having bigger cells would inflate the size of your
> blocks but then you'd be outside of the normal HBase settings.
>
> That, and do some experiments.
>
> J-D
>
> On Tue, Aug 7, 2012 at 6:35 AM, Eric Czech wrote:
&g
Hello everyone,
I'm trying to store many small values in indexes created via MR jobs,
and I was hoping to get some advice on how to structure my rows.
Essentially, I have complete control over how large the rows should be
as the values are small, consistent in size, and can be grouped
together in
data store, it wouldn't be wise to have
> several small rows. The major purpose of Hbase is to host very large
> tables that may go beyond billions of rows and millions of columns.
>
> Regards,
> Mohammad Tariq
>
>
> On Mon, Aug 6, 2012 at 3:18 AM, Eric Czech wrote:
I need to support data that comes from 30+ sources and the structure
of that data is consistent across all the sources, but what I'm not
clear on is whether or not I should use 30+ tables with roughly the
same format or 1 table where the row key reflects the source.
Anybody have a strong argument
new data
> into HDFS directly as flat files, run MR over them to create the index and
> also put them into HBase for serving.
>
> Hope that gives you some more ideas to think about.
>
> -ak
>
>
> On Wednesday, July 11, 2012 at 10:26 PM, Eric Czech wrote:
>
> > Hi e
forth between clusters instead of building indexes on one cluster
and copying them to another?
On Thu, Jul 12, 2012 at 1:26 AM, Eric Czech wrote:
> Hi everyone,
>
> I have a general design question (apologies in advanced if this has
> been asked before).
>
> I'd like to bui
Hi everyone,
I have a general design question (apologies in advanced if this has
been asked before).
I'd like to build indexes off of a raw data store and I'm trying to
think of the best way to control processing so some part of my cluster
can still serve reads and writes without being affected h
21 matches
Mail list logo