Kevin,
You would want to make your row keys the words.
HBase defines it's tablets (called Regions) by the startRow and endRow. So as
you say, a given region may contain ro to ru. Looking up the word round
would use that region. This is handled automatically by the META table.
For a
Answers inline.
-Original Message-
From: Imran M Yousuf [mailto:imyou...@gmail.com]
Sent: Monday, May 17, 2010 8:14 AM
To: hbase-user@hadoop.apache.org
Subject: Availability Transaction and data integrity
Hi,
Currently we are designing an architecture for a Accounting SaaS and
Transaction and data integrity
Thanks, my answers are inline too.
On Mon, May 17, 2010 at 9:50 PM, Jonathan Gray jg...@facebook.com
wrote:
Answers inline.
snip /
* We will go live from January 2011, in that time frame should we
develop using 0.21-SNAPSHOT or should we stick to 0.20.x
I'm not sure I understand why you distinguish small HFiles and a single
behemoth HFile? Are you trying to understand more about disk space or I/O
patterns?
It looks like your understanding is correct. At the worst point, a given
Region will use twice it's disk space during a major
We should do better at scheduling major compactions over a longer period of
time if we keep it as a background process.
Also, there's been some discussion about adding some heuristics about never
major compacting very old and/or very large HFiles to prevent old, rarely read
data from being
So the question is how large to make your regions if you have 100s of TBs?
How many nodes will this be on and what are the specs of each node?
Many people run with 1-2GB regions or higher.
Primarily the issue will be memory usage and also the propensity for splitting.
With that dataset size,
-user@hadoop.apache.org
Subject: Re: Additional disk space required for Hbase compactions..
Hello List,
On 17/05/10 20:26, Jonathan Gray wrote:
Same with major compactions (you would definitely need to turn them
off and control them manually if you need them at all).
How would you
The HBase process just died? The logs end suddenly with nothing about shutting
down, no exceptions, etc? Did you check the .out files as well?
-Original Message-
From: Jorome m [mailto:jorom...@gmail.com]
Sent: Tuesday, May 11, 2010 5:58 PM
To: hbase-user@hadoop.apache.org
I would argue that the primary reasons for versioning has nothing to do with
rescuing users or being able to recover data.
To reiterate what others have said, the reasons that HBase/BigTable is
versioned is because of the immutable nature of data (an update is a newer
version on top of the old
Hey Saajan,
Does your data have any large pieces or is it mostly just short indexed fields?
A Solr/HBase hybrid definitely sounds interesting but is a big undertaking.
To build on what Edward is suggesting, to be able to efficiently do this type
of query directly on HBase you may need to have
under your 4
second requirement.
What is the concurrency and load like for this application? How many
queries/sec do you expect?
-Original Message-
From: Jonathan Gray [mailto:jg...@facebook.com]
Sent: Monday, May 03, 2010 9:49 AM
To: hbase-user@hadoop.apache.org
Subject: RE: HBase
One option would be to just do the delete. Deletes are cheap and nothing bad
will happen if you delete data which doesn't exist (unless you do the delete
latest version which does require a value to exist).
-Original Message-
From: Michael Dalton [mailto:mwdal...@gmail.com]
Sent:
Hey Chris,
That's a really significant slowdown. I can't think of anything obvious that
would cause that in your setup.
Any chance of some regionserver and master logs from the time it was going
slow? Is there any activity in the logs of the regionservers hosting the
regions of the table
Agreed that it's good to try to be agenda-less, but in the past we've always
taken the first couple hours to do a group discussion around some of the key
topics. Given there's a bunch of fairly major changes/testing going on these
days, I think there is a good bit of stuff that would benefit
and each entry might be updated/modified at least once in a
week.
Regards,
kranthi
On Wed, Mar 31, 2010 at 10:23 PM, Jonathan Gray jg...@facebook.com
wrote:
Kranthi,
HBase can handle a good number of tables, but tens or maybe a
hundred.
If
you have 500 tables you should definitely
Your client caches META information so it only needs to look it up once per
client. If regions split or move, the client will get a
NotServingRegionException from the regionserver, and only then will it re-query
META for a new location.
Can you explain more about what exactly your goal is
Shen,
You are right. Currently the default flush size is 64MB, the
compactionThreshold is 3, and the splitSize/max.filesize is 256MB. So we end
up compacting into a 192MB file when filling an empty region.
Take a look at HBASE-2375 (https://issues.apache.org/jira/browse/HBASE-2375).
That
, Apr 7, 2010 at 2:06 PM, Jonathan Gray jg...@facebook.com wrote:
Shen,
You are right. Currently the default flush size is 64MB, the
compactionThreshold is 3, and the splitSize/max.filesize is 256MB. So we
end up compacting into a 192MB file when filling an empty region.
Take a look
, Jonathan Gray jg...@facebook.com
wrote:
Imran,
It's impossible to give good advice on cluster size and hardware
configuration without some idea of the requirements.
Sorry my mistake, I should have elaborated a little bit more. Please
find some requirements below inline.
How much
Can you explain more about what information you are trying to find out?
You had an existing HDFS and you want to measure the additional impact adding
HBase is? Is that in terms of reads/writes/iops or data size?
If you have a steady-state set of metrics for HDFS w/o HBase, can you not just
with a scan result as the input that deletes
a
range on each task could be an efficient way to do these kinds of mass
deletes?
On 04/03/2010 01:26 AM, Jonathan Gray wrote:
Juhani,
Deletes are really special versions of Puts (so they are equally
fast). I suppose it would be possible to have
Imran,
It's impossible to give good advice on cluster size and hardware configuration
without some idea of the requirements.
How much data? How will the data be queried? What kind of load do you expect?
You are going to be doing offline batch/MapReduce, online random access, as
well as
It's likely not the actual deserialization itself but rather the time
to read the entire row from hdfs. There are some optimizations that
can be made here (using block index to get all blocks for a row with a
single hdfs read, tcp socket reuse, etc)
On Apr 3, 2010, at 11:35 AM, Sammy Yu
Row=product:zip:day ?
Basically you can create additional tables with other keys to give yourselves
the aggregates you need. You'll need to decide how many to make. With the
above row, you could actually get grouping by state by scanning a range of
zips. But if that's not efficient enough,
Juhani,
Deletes are really special versions of Puts (so they are equally fast). I
suppose it would be possible to have some kind of special filter that issued
deletes server-side but seems dangerous :) That's beyond even the notion of
stateful scanners which are tricky as is.
MultiDelete
Chen,
In general, you're going to get significantly different performance on clusters
of the size you are testing with. What is the disk setup?
Also, 2GB of ram is simply not enough to do any real testing. I recommend a
minimum of 2GB of heap for each RegionServer alone, though I strongly
Three cheers for Andrew and Trend Micro! This is very awesome. HBaseCon?
HBase Summit?
-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org]
Sent: Friday, April 02, 2010 11:39 AM
To: hbase-user@hadoop.apache.org
Subject: come to HUG10!
We are holding an all day
For 1/2, it seems that your row key design is ideal for those queries. You say
it's inefficient because you need to scan the whole session of data
containing hammer... but wouldn't you always have to do that unless you were
doing some kind of summary/rollups? Even in a relational database you
on bigger-than-memory data, the cache
effectiveness
would be greatly improved.
2010/3/31 Jonathan Gray jg...@facebook.com
There are many implications related to this. The core trade-off as I
see
it is between storage and read performance.
With the current setup, after we read blocks
Kranthi,
HBase can handle a good number of tables, but tens or maybe a hundred. If you
have 500 tables you should definitely be rethinking your schema design. The
issue is less about HBase being able to handle lots of tables, and much more
about whether scattering your data across lots of
Stack pointed this out to me yesterday which could be of interest to you:
http://wiki.apache.org/incubator/HeartProposal
http://heart.korea.ac.kr/
-Original Message-
From: Andrew Purtell [mailto:apurt...@apache.org]
Sent: Wednesday, March 31, 2010 9:27 AM
To:
There are many implications related to this. The core trade-off as I see it is
between storage and read performance.
With the current setup, after we read blocks from HDFS into memory, we can just
usher KeyValues straight out of the on-disk format and to the client without
any further
I'm not sure exactly what you're referring to with currentTimeMillis() being
unreliable on virtual machines.
Regardless of your environment, you should be running NTP to synchronize clocks.
Otherwise, take a look in the mailing archives, there have been a number of
lengthy discussions on HBase
Victor,
Rows, column qualifiers, and values are all byte[] in HBase. Since they can be
any binary (but you cannot just put any binary data into XML or other formats)
they must be encoded in some way. Base64 is a common way to represent binary
data in ASCII.
JG
-Original Message-
Good thing it throws that exception. It definitely would not perform any
server-side actions as Ryan said.
-Original Message-
From: Jeyendran Balakrishnan [mailto:jbalakrish...@docomolabs-usa.com]
Sent: Thursday, March 25, 2010 9:34 AM
To: hbase-user@hadoop.apache.org
Subject: RE:
that one can't use the iterator
to modify the iterable.
-jp
-Original Message-
From: Jonathan Gray [mailto:jg...@facebook.com]
Sent: Thursday, March 25, 2010 9:45 AM
To: hbase-user@hadoop.apache.org
Subject: RE: Is it safe to delete a row inside a scanner loop?
Good thing it throws
How many regions in this table?
Can you describe in more detail what exactly the test does?
Random read, then join (with another hbase table?), then random write back to
HBase?
-Original Message-
From: y_823...@tsmc.com [mailto:y_823...@tsmc.com]
Sent: Tuesday, March 23, 2010 10:57
As Edward said, try increasing HBase RegionServer heap to 4GB. Look around the
wiki for GC tuning information.
What does your data look like and what is your read/write pattern? Do you have
large rows or columns?
-Original Message-
From: Edward Capriolo
At some point joins may be necessary when denormalization is not possible.
There is no built-in mechanism to do it. It would be a series of additional
Get calls to the second table you are joining against. This would be helped
significantly with a parallel MultiGet which will hopefully make
the data? I've been
searching in the samples and I can't find a clear and simple example.
Thanks
Raffi
-Original Message-
From: Jonathan Gray [mailto:jg...@facebook.com]
Sent: Friday, March 19, 2010 12:03 PM
To: hbase-user@hadoop.apache.org
Subject: RE: How to join tables in HBase
No one has petabytes in HBase today. I would say the minimum scale that it
makes sense is hundreds of gigabytes to terabytes. As is being said now,
medium data not necessarily big data :)
The other reasons to use HBase would be for high availability, distribution,
and for the very different
)
at
org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:648)
at
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:915)
On Fri, Mar 12, 2010 at 9:08 PM, Jonathan Gray jl...@streamy.com wrote:
Seems like something weird is going on with your regionservers and
balancing.
Can you post big
Just FYI, after sharing this thread with my client, they've decided to go
for some monthly dedicated servers from softlayer.com instead of EC2. For
one, they will be using lots of inbound traffic and they have a promo for
free inbound. 2TB/mo outbound for free as well. When you take that
exception at all:
http://pastebin.com/80949RK2
On Sat, Mar 13, 2010 at 10:10 AM, Jonathan Gray jl...@streamy.com
wrote:
Ted,
Your attachments didn't come through. Try putting them up on the web
or
pastebin somewhere.
What's happening in the RegionServer logs between the time
.
Best regards,
- Andy
- Original Message
From: Jonathan Gray
To: hbase-user@hadoop.apache.org
Sent: Thu, March 11, 2010 3:01:22 PM
Subject: RE: [databasepro-48] HUG9
Pardon the link vomit, hopefully this comes across okay...
HBase Project Update by Jonathan Gray
For anyone not in the bay area, we had HUG9 last night. Links to the
presentations below.
JG
From: databasepro-48-annou...@meetup.com
[mailto:databasepro-48-annou...@meetup.com] On Behalf Of Jonathan Gray
Sent: Thursday, March 11, 2010 1:57 PM
To: databasepro-48-annou...@meetup.com
Fleming,
We're looking at a few different ideas for this problem right now.
One is to make an efficient method for warming up a clients META cache by
issuing a META scan for a single table or all tables. This will be
significantly faster than lots of gets.
The other bigger change is that META
will hopefully fill in
some details: http://www.slideshare.net/ghelmling/hbase-at-meetup
There are also some great presentations by Ryan Rawson and Jonathan Gray
on
how they've used HBase for realtime serving on their sites. See the
presentations wiki page:
http://wiki.apache.org/hadoop/HBase
Ferdy,
Another strategy might be to not issue the delete and just insert a new
version on top of the old one.
Whether this makes sense or not depends on whether the columns for that row
change between versions. If it's always the same columns then you can just
re-insert and when you grab the
Hey Michal,
There was an issue in the past where ROOT would not be properly reassigned
if there was only a single server left.
https://issues.apache.org/jira/browse/HBASE-1908
But that was fixed back in 0.20.2.
Can you post the master log?
JG
-Original Message-
From: MichaĆ
This is not an HBase or Hadoop requirement... this is how Java works when
pointing the classpath to jars.
-Original Message-
From: N Kapshoo [mailto:nkaps...@gmail.com]
Sent: Thursday, March 04, 2010 12:30 PM
To: hbase-user@hadoop.apache.org
Subject: Re: ClassNotFoundException for
What version of HBase are you running? There were some recent fixes related
to DNS issues causing regionservers to check-in to the master as a different
name. Anything strange about the network or DNS setup of your cluster?
ZooKeeper is sensitive to causes and network latency, as would any
Just to reiterate and confirm what Erik is saying, building the Map will
internally iterate all of the KeyValues, dissect each one, and do lots of
insertions in the map. It will be less efficient than just iterating the
list of KVs directly yourself and pulling out only what you need from each.
Yes, you could have issues if data has the same timestamp (only one of them
being returned).
As far as inserting things not in chronological order, there are no issues if
you are doing scans and not deleting anything. If you're asking for the latest
version of something with a Get, there are
You can either do exports at the HBase API level (a la Export class), or you
can force flush all your tables and do an HDFS level copy of the /hbase
directory (using distcp for example).
-Original Message-
From: Ted Yu [mailto:yuzhih...@gmail.com]
Sent: Tuesday, March 02, 2010 4:49 AM
FYI
Looks like they'll be talking at least somewhat about the new HBase
integration.
-Original Message-
From: Zheng Shao [mailto:zsh...@gmail.com]
Sent: Friday, February 26, 2010 1:56 PM
To: hive-u...@hadoop.apache.org; hive-...@hadoop.apache.org;
core-u...@hadoop.apache.org;
What are the issues with developing w/ HBase on Windows 7 x64? I'm doing
that right now and nothing was any different from doing it on Windows XP
x86.
I haven't run it to the point of actually doing a start-hbase.sh, but rather
running things like HBaseClusterTestCase w/o a problem.
JG
A bit late to the party but my two cents...
I am currently using a single node HBase instance in production (beta) for a
client.
The use case is simply to add random access capabilities atop some large
HDFS files. It's static data (rebuilt every few weeks) and close to 1TB or
so (with plans to
There is not currently a built-in method of doing parallel Gets. It
would not be especially difficult to implement something in Java with
ExecutorServices and Futures.
This is a proposed feature for 0.21 and there is a rough patch available
over in HBASE-1845.
JG
TuxRacer69 wrote:
Hello
If you need to be able to scan/lookup based on two different key/values,
then you will most likely need duplicate tables or duplicate rows.
This is common when you need to support two different lookup/read patterns.
Lars Francke wrote:
I have another schema design question. I hope you don't
Peter,
It's difficult to know what might cause performance issues on a
standalone instance. It often does not give a good idea of the
performance you would get on a fully distributed setup.
Are you monitoring the hbase logs? Anything interesting? How much heap
are you giving the
It's fairly easy to run HDFS into the ground if you eat up all the
resources.
It's also fairly easy to run a Linux machine into the ground if you eat
up all the resources; or just about anything by starving it of CPU.
I don't disagree with a read-only mode if the server is full, but in
Could be possible that if the compactions are very slow running, and
we're not counting snapshots as part of the heap usage, then we won't
start forcing more compactions because of heap pressure (not that this
would even help much if io is saturated).
Throw some heavy concurrent reading in
These client error messages are not particular descriptive as to the
root cause (they are fatal errors, or close to it).
What is going on in your regionservers when these errors happen? Check
the master and RS logs.
Also, you definitely do not want 19 zookeeper nodes. Reduce that to 3
or
Created HBASE-1937
https://issues.apache.org/jira/browse/HBASE-1937
Head over there to discuss this further. Thanks.
JG
Doug Meil wrote:
Hi there-
I'd like to suggest a convenience method on Result for getting the timestamp of
a value if it hasn't already been suggested before.
Getting
Personally, when I need to dig into a complex result with multiple
columns and versions, I iterate over the KeyValues directly rather than
messing with the Map-based return formats from Result.
In your example, are you just returning versions/values for a single
column? Maybe we could add
Do you see the files/blocks in HDFS?
Ananth T. Sarathy wrote:
I just restarted Hbase and when I go into the shell and type list, none of
my tables are listed, but I see all the data/blocks in s3.
here is the master log when it's restarted
http://pastebin.com/m1ebb7217
this happened once
I'm not exactly sure what you are doing, but it is not intended that you
would copy any code into the HBase Master.
You can run client programs standalone, they just need to have the
proper jars in their classpath (hadoop, hbase, zookeeper, log4j).
JG
Liu Xianglong wrote:
Hi, everyone. I am
Dmitriy,
Are you using any system/resource monitoring software? You should be
able to see if you are IO, CPU, Memory/GC, or Network bound by doing
some investigating during the import this should tell you if you can
get better performance or not (and if things are maxed, you can figure
Not S3, HDFS. Can you checkout the web ui or using the command-line
interface?
$HADOOP_HOME/bin/hadoop dfs -lsr /hbase
...would be a good start
Ananth T. Sarathy wrote:
i see all my blocks in my s3 bucket.
Ananth T Sarathy
On Mon, Oct 26, 2009 at 12:17 PM, Jonathan Gray jl...@streamy.com
. Sarathy
ananth.t.sara...@gmail.com wrote:
I am confused , why would I need a hadoop home if I am using s3 and the
jets3t package to write to s3?
Ananth T Sarathy
On Mon, Oct 26, 2009 at 12:25 PM, Jonathan Gray jl...@streamy.com wrote:
Not S3, HDFS. Can you checkout the web ui or using
, 2009 at 9:31 AM, Ananth T. Sarathy
ananth.t.sara...@gmail.com wrote:
I am confused , why would I need a hadoop home if I am using s3 and the
jets3t package to write to s3?
Ananth T Sarathy
On Mon, Oct 26, 2009 at 12:25 PM, Jonathan Gray jl...@streamy.com
wrote:
Not S3, HDFS. Can you checkout
...@gmail.com wrote:
I am confused , why would I need a hadoop home if I am using s3 and
the
jets3t package to write to s3?
Ananth T Sarathy
On Mon, Oct 26, 2009 at 12:25 PM, Jonathan Gray jl...@streamy.com
wrote:
Not S3, HDFS. Can you checkout the web ui or using the
command-line
interface
Needs to be run from $HADOOP_HOME not hbase home.
Ananth T. Sarathy wrote:
When i run this from my hbase home I get
-bash: bin/hadoop: No such file or directory
here are my libs
AgileJSON-2009-03-30.jar jetty-util-6.1.14.jar
commons-cli-2.0-SNAPSHOT.jar jruby-complete-1.2.0.jar
On Mon, Oct 26, 2009 at 9:31 AM, Ananth T. Sarathy
ananth.t.sara...@gmail.com wrote:
I am confused , why would I need a hadoop home if I am
using
s3
and
the
jets3t package to write to s3?
Ananth T Sarathy
On Mon, Oct 26, 2009 at 12:25 PM, Jonathan Gray
jl...@streamy.com
wrote:
Not S3
Doug,
1. This is a known issue and is currently being addressed in HBASE-1829
(https://issues.apache.org/jira/browse/HBASE-1829). This is currently
targeted at 0.21, but feel free to review the current patch and add in
your comments, if we get a working and tested patch soon then I would
Erik,
I just put up a patch with the fix you described and a unit test that
replicates the behavior.
Please test to confirm it works. If so, drop a note in the issue and I
will commit.
Thanks for finding the bug.
JG
Erik Rozendaal wrote:
Issue created: HBASE-1927
On 21 okt 2009, at
You're generally on the right track. In many cases, rather than using
secondary indexes in the relational world, you would have multiple
tables in HBase with different keys.
You may not need a table for each query, but that depends on your
requirements of performance and the specific details
While you set the max versions to 1, that is only enforced on major
compactions.
So re-inserting all the data will actually mean you have double the data
for some period of time. After a certain amount of time, a major
compaction will occur in the background, and at that point only 1
You are running all of these virtual machines on a single host node?
And they are all sharing 4GB of memory?
That is a major issue. First, GC pauses will start to lock things up
and create time outs. Then swapping will totally kill performance of
everything. Is that happening on your
That depends on how much memory you have for each node. I recommend
setting heap to 1/2 total memory
In general, I do not recommend running with VMs... Running two hbase
nodes on a single node in VMs vs running one hbase node on the same node
w/o VM, I don't really see where you'd get any
There is a distinct difference between adding columns and adding
column families.
As you hinted at in a previous e-mail, you really wanted a single family
with multiple qualifiers in it.
Creating a table, disabling it, modifying it (adding column _families_),
enabling it, and repeating
Are you currently being limited by network throughput? I wouldn't
become obsessed with data locality until it becomes the bottleneck.
Even the naive implementation of this would not be entirely simple...
but then what do you do if the regions on that node changed during the
course of the map
Yannis,
Excellent debug work! Thanks.
I just filed HBASE-1908 and will do some testing on this issue today.
https://issues.apache.org/jira/browse/HBASE-1908
JG
Yannis Pavlidis wrote:
Hey Ryan,
I performed additional testing with some alternate configurations and the problem arises (ONLY)
Mark,
I'm not sure exactly what you mean.
Each Result object is for a single row. You can determine the row with
Result.getRow().
A row contains families, qualifiers, timestamps, and values.
To get the value for familyA and qualifierB use:
Result.getValue(Bytes.toBytes(familyA),
One recommendation. Be sure to put the documents in a separate family
from the meta data. This will prevent you from having to rewrite the
documents during compactions (since you expect high updates to meta and
not documents).
stack wrote:
On Wed, Oct 14, 2009 at 2:44 AM, Dan Harvey
Nothing in HBase is designed to handle an eventual consistency data
store underneath.
In general, if a file that HBase thinks exists is not accessible on the
file system, HBase will become unstable and you would probably lose
access to that region until the system was restarted or the region
Digging in myself, but filed HBASE-1889.
I put up a quick patch already, would you mind giving it a try Zheng?
https://issues.apache.org/jira/browse/HBASE-1889
Thanks.
JG
On Tue, October 6, 2009 1:40 am, Zheng Shao wrote:
I compiled hbase trunk and started it using bin/start-hbase.sh.
I
This is being worked on. Ideally, a solution would batch things by region
and then by regionserver, so that the total number of RPC calls would at a
maximum be the number of servers.
Follow HBASE-1845 and related issues.
You can use threads and add some parallelism of the multiple gets in your
That is the behavior for SCVF. The other filters generally don't pay
attention to versions, but SCVF is special because it makes the decision
once it trips over the sought after column (the first/most recent version
of it).
What exactly are you trying to do? Could you use ValueFilter instead?
Is there a reason you have the split size set to 2MB? That's rather
small and you'll end up constantly splitting, even once you have good
distribution.
I'd go for pre-splitting, as others suggest, but with larger region sizes.
Ryan Rawson wrote:
An interesting thing about HBase is it really
Very strange.
Are you able to use the shell? $HBASE_HOME/bin/hbase shell
Type 'help' to see the options. To scan your table, type: scan 'tableName'
Zheng Lv wrote:
Hello J.G,
Thank you for your reply.
My hbase version is the newest : 0.20.0.
I have two tables, both having
Guillaume,
Thanks for providing more detail.
So, as I understand it, you are already storing the URL - Group
relationship (1:1), but you need to store Group - URLs relationship (1:N).
My solution would be to have a urls family in your GROUPS table. And
for each URL within a group, you
My feeling is the same as others. It is nice, but I always dig into logs
instead.
+1 on dropping it for now.
JG
On Thu, September 17, 2009 11:04 pm, stack wrote:
Its a sweet feature, I know how it works, but I find myself never really
using it. Instead I go to logs because there I can get
How many rows are you scanning? The code where you are iterating
through ... is also relevant, can you post it? And it would be
helpful if you could post more of the regionserver log file.
Also, which version are you running?
Zheng Lv wrote:
Hello Everyone,
I got some exceptions when I
First, I would recommend you try upgrading to HBase 0.20.0. There are a
number of significant improvements to performance and stability. Also,
you have plenty of memory, so give more of it to the HBase Regionserver
(especially if you upgrade to 0.20, give HBase 4GB or more) and you will
see
I just committed fixes to 0.20.1 branch for SingleColumnValueFilter.
You can grab the latest 0.20 branch from SVN, or you can apply the fix
yourself from HBASE-1821.
https://issues.apache.org/jira/browse/HBASE-1821
There was also another filter patch that went in, HBASE-1828.
Please check
In a number of cases, I don't do any insert-time transactions at all and
rely on periodic consistency checks. I can deal with stale indexes for
short periods of time without a problem and would rather not pay the
upfront cost.
As for updating 1000 data rows and then 1000 index updates, you'd
What happened on the server? Did you look at the regionserver logs?
You seem to be misusing the API. scan.addColumn(foobar) is incorrect,
that is the old-API style (we should mark it deprecated or through a
warning on it, if not there already).
I think what you were looking for was
Sometimes you have to, simple as that.
There are tools out there like Cascading (http://www.cascading.org) that
are designed to help write multi-job chains.
JG
Xine Jar wrote:
Hallo,
I have already written several simple mapreduce applications always 1
job/application.
Assume I want to
1 - 100 of 253 matches
Mail list logo