sing simple # of records or bytes returned is not enough to
define a good complexity measurement. Things like rows scanned, filtered,
rpc calls, etc. in ScanMetrics are very helpful to inform it though!
Good on you sir,
Alex
On Mon, Oct 29, 2018 at 11:05 AM Stack wrote:
> On Wed, Oct 24, 2
with some client or
server-side logic?
Any ideas are welcome!
Thank you in advance,
Alex Baranau
plan for the first event to be hosted by Cask at its HQ in Palo Alto in
end of June.
Thank you,
Alex Baranau
would cause
that behavior... Btw, why 25th is not collocated with datanode?
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase
On Fri, May 15, 2015 at 8:12 PM, Louis Hust wrote:
> Hi, Esteban,
>
> Hadoop Version 2.2.0, r1537062
2).
Here's the BaseRegionObserver implementation [2].
On a side note, be sure to not overuse the versions of a Cell. Many times
using columns is a better schema design.
Cheers,
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase
[1]
https://
d to some extend by upping the region
size.
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications on
Hadoop & HBase
On Wed, Mar 11, 2015 at 7:00 PM, David chen wrote:
> hbase.store.delete.expired.storefile is true in file
> hbase-0.98.5/hbase-serv
Quick question: have you by any chance noticed the region number to grow a
lot over the time of your measurements? Note that regions are not merged
automatically back if they shrink (incl. due to TTL) after being split (
http://hbase.apache.org/book.html#ops.regionmgt)
Alex Baranau
--
http
nless files are deleted,
they occupy space in hdfs).
Alex Baranau
http://cdap.io - open source framework to build and run data applications on
Hadoop & HBase
On Tue, Mar 10, 2015 at 9:15 PM, David chen wrote:
> Thanks lars,
> I ever ran scan to test TTL for several times, the data ex
Also, you could use RDBMs behind key-value abstraction, to start with,
while keeping your app design clean out of RDBMs specifics.
Alex Baranau
[1] https://github.com/google/leveldb
[2] https://github.com/dain/leveldb
[3] http://cdap.io
[4]
https://github.com/caskdata/cdap/blob/develop/cdap-api/s
CCing HBase's user ML.
Could you give an example of the row key and example of two different
queries you are making to better understand your case?
Thank you,
Alex Baranau
--
http://cdap.io - open source framework to build and run data applications
on Hadoop & HBase
On Mon, Mar 9,
his is an improved
version with ranges support, better API and documentation.
Alex Baranau
On Thu, Mar 28, 2013 at 10:38 AM, Robert Hamilton <
rhamil...@whalesharkmedia.com> wrote:
> Hi all. It it possible to test FuzzyRowFilter from the shell? If so, could
> somebody kindly point me to
s you will use
hash-based solution. At least in the beginning and in simplest cases.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
[1] http://search-hadoop.com/m/TjkXd11qhLS
On Wed, Dec 19, 2012 at 6:04 PM, David Arthur wrote:
> I wasn't
y for that).
Thank you,
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
[1] https://github.com/sematext/HBaseWD
On Tue, Dec 18, 2012 at 12:24 PM, Michael Segel
wrote:
> Quick answer...
>
> Look at the salt.
> Its just a number from
by choosing number of possible 'salt' prefixes (which
could be derived from hashed values, etc.) you can balance between
distributing writes efficiency and ability to run fast range scans.
Hope this helps
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase
the contract here is a transaction), so (currently) you would
> get unnecessarily reduced concurrency using that API for changes that do
> not need to be atomic.
>
>
> Also note that a Put(List) operation already writes multiple updates
> to a single WALEdit (doing a b
Or is it simply not efficient (there's more to that
besides what I described above)?
Thank you,
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
[1] https://issues.apache.org/jira/browse/HBASE-5229
se-case, on how granular those pieces of
data which can be deleted. E.g. storing minTs for each record doesn't make
sense. While keeping it for larger pieces of data may work.
You probably thought about this approach though.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - H
Hi Jerry,
Just out of the curiosity: what is your use-case? Why do you want to do
that? To gain extra protection from software error or smth else?
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
On Tue, Sep 18, 2012 at 6:32 PM, lars hofhansl wrote
nless some
of them have large values, so that it makes it longer to simply transfer
those values over the network (is your network fast, btw?).
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Thu, Sep 13, 2012 at 11:02 AM, Jacques wrote:
> Not
> An average row size is ~200 Bytes.
How many columns do you have?
I assume every time you try to fetch "non-cached in RSs block cache" data
(i.e. making "true test"), right?
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Sol
ation of cluster, does the performance sounds OK
> for timestamp filtering?
>
> Thanks,
> Anil
>
> On Mon, Aug 20, 2012 at 1:07 PM, Alex Baranau >wrote:
>
> > Created: https://issues.apache.org/jira/browse/HBASE-6618
> >
> > Alex Baranau
> > --
> &
scan.setMaxVersions(2). Not sure if keyvalues are fed into filter ordered
by their timestamp..
How about returning 2 most recent values to the client and filtering on the
client-side? Why this doesn't work in your case? (large values in columns
in size or?).
Alex Baranau
--
Sematext :: htt
Created: https://issues.apache.org/jira/browse/HBASE-6618
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
On Sat, Aug 18, 2012 at 5:02 PM, anil gupta wrote:
> Hi Alex,
>
> Apart from the query which i mentioned in last email. Till no
ike FuzzyRowFilter with range
Yes, smth like this looks like would be very valuable. It would be
interesting to implement too. Let's see if I find the time for that in my
work plan. If you want to try it by yourself, go for it! Let me know if you
need a help in that case ;)
Alex Baranau
--
er. Just grab the patch from HBASE-6509 and copy the filter. No
need to patch & rebuild HBase.
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
[1]
Anil Gupta added a comment - 18/Aug/12 04:37
Hi Alex,
I have a question related to this filter. I have a
Indeed. Wrote simple unit-test [1] and it fails.
And there's a JIRA for that also:
https://issues.apache.org/jira/browse/HBASE-4364. I added patch with the
simple unit-test that fails to it.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Sol
Hi Jerry,
Out of curiosity, what is your use-case? How do you want to use this?
Also, I guess, feel free to file a jira issue for this functionality (I
believe there's no such yet) .
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
On Tue,
paction will remain same.
I believe someone is working on making replication process (replicas
balancer) to be more smart at the moment. Hopes are to see this work soon :)
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Thu, Aug 2, 2012 at 5:5
your comments at HBASE-6509).
Alex Baranau
--
Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr
On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer wrote:
> Hi Alex,
>
> thanks a lot for the hint about setting the timestamp of the put.
> I didn't know t
ng on client-side when
you can do it on server-side just feels wrong. Esp. given that there's a
lot of data in HBase (otherwise why would you use it).
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Thu, Aug 2, 2012 at 7:09 PM, Matt Cor
g some time ago. If this idea works for
you I could look for the implementation and share it if it helps. Or may be
even simply add it to HBase codebase.
Hope this helps,
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Thu, Aug 2, 2012 at
These questions were raised many times in this ML and in other sources
(blogs, etc.). You can find them with a little effort.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Wed, Aug 1, 2012 at 1:33 AM, Mohammad Tariq wrote:
> Hello Mo
uch more requests in parallel than you have clients (depends on your
clients number of course, but I assume you don't have more that several,
incl. MR jobs).
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Tue, Jul 31, 2012 at 3:27 PM, T
me to execute and utilize network differently: pipelined *may* be slower
but can saturate network bandwidth better.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Tue, Jul 31, 2012 at 9:09 PM, Mohit Anchlia wrote:
> In the HBase book i
3+1+6+1+1=12 bytes.
I'd better use Bytes.toBytesBinary(String) method, which converts back to
byte array. Or, if you are using ResultScanner API for fetching data, just
invoke Result.getRow().length.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch
Yeah, your row keys start with \x00 which is = (byte) 0. This is not the
same as "0" (which is = (byte) 48). You know what to fix now ;)
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia wr
t first byte of your key to anything from (byte) 0 - (byte) 9, all of
them will fall into first regions which holds records with prefixes (byte)
0 - (byte) 48.
Could you check that?
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Fri, Jul 27, 20
n next releases.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Fri, Jul 27, 2012 at 2:21 PM, syed kather wrote:
> Thank you so much for your valuable information. I had not yet used any
> monitoring tool .. can please suggest m
memstore flush): 1566523617482885717,
size: 1993369 bytes.
btw, 2MB looks weird: very small flush size (in this case, in other cases
this may happen - long story). May be compression does very well :)
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Fri,
Very good explanation (and food for thinking) about using bloom filters in
HBase in answers here:
http://www.quora.com/How-are-bloom-filters-used-in-HBase.
Should we put the link to it from Apache HBase book (ref guide)?
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase
oking at hdfs - this way you make sure your data is
flushed to hdfs (and not hanged in Memstores).
You may want to check the START/END keys of this region (via master web ui
or in .META.). Then you can compare with the keys generated by your app.
This should give you some info about what's g
> Another problem is with data locality immediately after bulk loading
> through MR.
You might find this recent discussion about that useful: [1]
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
[1] The start is here:
http:
hen use it:
** for bigger memstore (I believe that should esp. improve your timings
for fetching data older than hour (there's kinda a spike on fetch time
chart there))
** for bigger block caches
** having more "hot" regions per RS
Alex Baranau
--
Sematext :: http://blog.semate
;US_FL"
"US_KN"
"US_MS"
"US_NC"
"US_VM"
"V"
so that data is more or less evenly distributed (note: there's no need to
split other countries in regions as they they will have small amount of
data).
No standard splitter will know what your
...,
(byte) 9 (i.e. with 0x00, 0x01, ..., 0x09) then no need to convert to
String.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Thu, Jul 26, 2012 at 11:43 AM, Mohit Anchlia wrote:
> On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau >wro
This leads to a region
server
> hotspots.
Again, may be an obvious q: have you tried to (or is it possible in your
case to) pre-split table so that regions are distributed over the cluster
from the start?
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
, you can define more), with start keys: "", "1", "2", ...,
"9" [1].
Btw, since you are salting your keys to achieve distribution, you might
also find this small lib helpful which implements most of the stuff for you
[2].
Hope this helps.
Alex Baranau
---
there's much more to that, why this cannot be done. So, you have
to figure out the way to set row key in your client code...
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Tue, Jul 24, 2012 at 10:58 AM, Daniel Gorgan - SKIN <
danie
such things.
And of course, you can use HBase Java API to fetch some data of the cluster
state as well. I guess you should start looking at it from HBaseAdmin class.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
[1]
hbase(main):001:0> s
> I read somewhere that HBase is not
> good at handling more than 100 column families
Heh. Usually it is not good to have more than two or three, actually.
See [1], and may be also [2].
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
[1
that 3 (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually
that 3 (or whatever is replication) replicas of this file (and hence
of this region) are "full" replicas, which makes it easier to preserve data
locality if RS fails down (or when anything else cause re-assigning the
region). But since Region size is usually much bigger (usually
aunch certain Reducer tasks, this would help us. I believe this is not
possible with MR1, please correct me if I'm wrong. Perhaps, this is this
possible with MR2?
I assume there's no way to provide a "hint" to a NameNode where to place
blocks of a new File too, right?
Thank you,
opposed to situation
when this hot data distributed over many more RSs (which will act like
distributed cache) e.g. with salting.
In general, yes, you will not see as big issues with uneven *read* load
distribution over the cluster as you might see in case of uneven *write*
load distribution.
Ale
> How do you create your scan(ner)? Could you paste the code here?
Sorry, meant to ask how do you instantiate HTable, configuration objects.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau wr
omposite key with these two attributes and added timestamp to
> > make it unique.
> >
> > To filter the data, I use rowkey filter with regex string comparator and
> > it works well with sample seed data. Now I am afraid whether this set up
> > will lead to region server hotspotting when we load production data in
> > HBase. I read hashing may solve this problem. Can some one help me in
> > implementing hashing the row key? Also I would want the row filter to
> work
> > as I have to display the number of components in a web page and I use row
> > key filter for implementing that functionality? Any guidance would be of
> > great help.
> >
> > --
> > Regards,
> > Anand
> >
>
>
>
> --
> Regards,
> Anand
>
--
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
enefit from data locality). I.e. it creates
one Map task per region. I wonder if this can be related.
Sorry for obvious check...
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson wrote:
> I
* The first lock is for guarding closes of Region. I.e. for forbidding
reading/writing to the Region which is being closed.
* The second lock is row lock.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -
Solr
On Mon, Jul 16, 2012 at 10:14 AM, Howard
Thank you guys for the pointers/info! I'll try to make use of it. If it
turns out into smth (like script, etc.) re-usable I will open a JIRA issue
and add it for others to use.
Thanx again,
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Wed, J
ted" by removing HFiles: I will specify
timerange on scans anyways (in this example to omit things older than 1
week).
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Mon, Jul 9, 2012 at 3:44 PM, Jonathan Hsieh wrote:
> You could set your ttls and
Heh, this is what I want to avoid actually: restarting RSs.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Mon, Jul 9, 2012 at 3:38 PM, Amandeep Khurana wrote:
> I _think_ you should be able to do it and be just fine but you'll need t
Hello,
I wonder, for purging old data, if I'm OK with "remove all StoreFiles which
are older than ..." way, can I do that? To me it seems like this can be a
very effective way to remove old data, similar to fast bulk import
functionality, but for deletion.
Thank you
node (even if you open shell on slave node) and it will
decide where to place regions (depending on the regions # on the slaves).
You can probably try to manually move regions to desired RSs, but that is
also not a good way to go with.
Alex Baranau
--
Sematext :: http://blog.sematext.com
I'd agree that HBase is not designed to be run in such "inter-continental"
single cluster setup. Latency in communication between nodes (slaves) is
vital for the health of the cluster.
So, the short answer: just don't do it that way.
What is the reason to have nodes in th
cific cases can (like when row keys are
"randomized", as explained above and in earlier message). So, as far as I
understand this should be addressed on higher level.
Alex Baranau
--
Sematext :: http://blog.sematext.com
On Thu, May 17, 2012 at 10:23 AM, Alex Baranau wrote:
> Hi,
imit.
Not sure if that would make sense to separate these two things though:
* mark until memstore flushes are forced and updates are blocked
* mark when memstore flushes are forced (without blocking updates)
As for now for two these
things hbase.regionserver.global.memstore.lowerLimit is used
I saw the need for such converting many times before. Should we add it as a
public method in some utility class? (create JIRA for that?)
Alex Baranau
--
Sematext :: http://blog.sematext.com/
On Mon, May 21, 2012 at 4:26 PM, Jean-Daniel Cryans wrote:
> How exactly are you building the
f each thread is writing into relatively small number of
all RSs though only, I think. Otherwise they will perform more or less the
same.
Am I completely crazy when thinking about this? Does it makes sense to you
at all?
Alex Baranau
--
Sematext :: http://blog.sematext.com/
Should I may be create a JIRA issue for that?
Alex Baranau
--
Sematext :: http://blog.sematext.com/
On Tue, May 8, 2012 at 4:00 PM, Alex Baranau wrote:
> Hi!
>
> Just trying to check that I understand things correctly about configuring
> memstore flushes.
>
> Basically, th
tores)? E.g.:
B.1 given setting X%, trigger flush of biggest memstore (or whatever is
logic for selecting memstore to flush) when memstore takes up X% of heap
(similar to (1), but triggers flushing when there's no need to block
updates yet)
B.2 any other which takes into ac
!
Alex Baranau
--
Sematext :: http://blog.sematext.com/
On Tue, May 1, 2012 at 9:02 AM, Dhaval Shah wrote:
>
>
> Not sure if its related (or even helpful) but we were using cdh3b4 (which
> is 0.90.1) and we saw similar issues with region servers going down.. we
> didn't lo
from HBase native API, which
might be OK or not OK in your case.
Alex Baranau
--
Sematext :: http://blog.sematext.com/
[1]
(Note: HTable is not thread-safe by itself, so this code isn't going to be
accessed from multiple threads and hence no synchronization is here)
private HTable hTa
ot a "stop-the-world"
process.
Any advice?
HBase: hbase-0.90.4-cdh3u3
Hadoop: 0.20.2-cdh3u3
Thank you,
Alex Baranau
[1]
last lines from RS log (no errors before too, and nothing written in *.out
file):
2012-04-30 18:52:11,806 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSpl
Why not just define startRow & stopRow for Scan [1]? Am I missing smth?
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
[1]
Smth like:
byte[] startRow = Bytes.toString("example key");
byte[] stopRow = Arrays.copyOf(startRow, startRow.le
helped me.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
[1]
Same error in log as you have when trying to access the region.
hbck showed:
ERROR: Region agg-sa-1.3,0011|
qb|5mhb|\x00\x00\x00\x00\x00C\xA3\x98\x004\x00\x00\x00\x015\xA0\x83K\xC4\x00\x
.
Note: setCacheBlocks(true) will not override your columnfamily settings, so
do not disable it on that level.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Thu, Apr 19, 2012 at 12:52 PM, Kevin M wrote:
> Thanks for the reply.
>
> I see. Wo
Are you sure you need to do table.close() after each put? Looks incorrect.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Thu, Apr 19, 2012 at 2:48 AM, Marcin Cylke wrote:
> On 17/04/12 18:45, Alex Baranau wrote:
> > I don't think t
t
should still be served.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Mon, Apr 16, 2012 at 11:21 AM, Bryan Beaudreault <
bbeaudrea...@hubspot.com> wrote:
> Hello,
>
> We've recently had a problem where regions will get stuc
lves using localhost, at other - your hostname. Since (I suppose) those
two didn't match - you got error.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Tue, Apr 17, 2012 at 9:34 AM, Marcin Cylke wrote:
> On 17/04/12 15:15, Alex Baranau wrote:
nning on your machine
(sudo jps)
3) cleanup your /tmp dir
I see "java.net.ConnectException: Connection refused", which may indicate
some of your cluster parts failed to start. Bigger log should be more
helpful.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene -
Here's some code that worked for me [1]. You may also find useful to look
at the pom's dependencies [2].
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
[1]
From
https://github.com/sematext/HBaseHUT/blob/CPs/src/test/java/com/sematext/hb
In case you haven't checked yet:
* http://hbase.apache.org/bulk-loads.html
* http://hbase.apache.org/book.html
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Wed, Apr 11, 2012 at 10:06 PM, Neha wrote:
> I am a newbie in HBase. I am wo
Compression applies to the files stored on disks. All versions of a column
are stored the same way (HBase doesn't differentiate them at the time of
writing and they are not placed "near" each other in the file). Given that,
yes you are likely to get the same level of compression (compr. ratio) if
y
work well.
Alex Baranau
--
Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase
On Mon, Apr 9, 2012 at 3:39 PM, Ian Varley wrote:
> Thanks, Andy. Yeah, a tool that compares a schema definition with a
> running cluster, and gives you a way to apply changes (without off
-sequential-keys/
Alex Baranau
.
> It is really frustrating that i cannot point on what was the real problem.
> Even log with debug did not point on problems (perhaps because it is also
> missing some debug statement like when a scanner lease is added to the RS)
>
> Mikael.S
>
>
> On Sun, Feb 12, 2012 a
in will try brutal variant: set
caching = 10 (or even 1), set batch = 10 (or even 1).
Alex
On Sun, Feb 12, 2012 at 1:49 PM, Alex Baranau wrote:
> Hi,
>
> 0.90.4-cdh3u2
>
> Alex
>
>
> On Sun, Feb 12, 2012 at 1:44 PM, wrote:
>
>> Which version of hbase are you usin
Hi,
0.90.4-cdh3u2
Alex
On Sun, Feb 12, 2012 at 1:44 PM, wrote:
> Which version of hbase are you using ?
>
> Thanks
>
>
>
> On Feb 12, 2012, at 10:41 AM, Alex Baranau
> wrote:
>
> > Hello,
> >
> > I'm getting scanner lease exceptions during
Hello,
I'm getting scanner lease exceptions during mapreduce job [1] after running
it for less than 7 minutes. Though I have set
hbase.regionserver.lease.period to 60 (i.e. 10 min) in hbase
configuration on master and all regionservers and master (and restarted
all). Also set it in job's confi
ally and parse html
to fetch data you want.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Sat, Jan 7, 2012 at 10:51 AM, Christian Schäfer
wrote:
>
> Hello,
>
> I want to measure requests per second for each Region Server during
> ins
Just published a post about current state of Flume & HBase integration
(HBase sinks for Flume) at
http://blog.sematext.com/2011/07/28/flume-and-hbase-integration.
Might be useful for those who are looking at this topic.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - N
Just published a post about Flume & HBase integration which might be
helpful. It describes the possible issues & workarounds for them.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Wed, Jul 27, 2011 at 9:39 PM, Mark wrote:
> Unfortu
-0.1.0-SNAPSHOT-2011.05.19.jar
(downloadable from https://github.com/sematext/HBaseWD)
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
P.S.
> Can you summarize HBaseWD in your blog
That is on my todo list! You pushed it higher to the top priority it
te hash of
original key (https://github.com/sematext/HBaseWD/issues/2).
In either way you don't need to delete record to update some cells of it or
add new cells.
Please let me know if you have more Qs!
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -
https://issues.apache.org/jira/browse/HBASE-3811
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu wrote:
> My plan was to make regions that have active scanners more stable - trying
> not to move the
d and use with
his/her own cluster.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Thu, Apr 21, 2011 at 6:04 PM, Eric Charles wrote:
> Hi Alex,
>
> Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase"
> threa
s (with no extra functionality)
just to distinguish it from the base one.
If you can share why/how do you want to treat them differently on server
side, that would be helpful.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Thu, Apr 21, 2011 at 4
For those who are looking for the solution to this or similar issue, this
can be useful:
Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which
implements solution close to what Lars described.
Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO
Alex Baranau
Sematext
It will be an ordinary scan. Though
the number of scan will increase, given that the typical situation is "many
regions for single table", the scans of the same "distributed scan" are
likely not to hit the same region.
Not sure if I answered your questions here. Feel free to ask
hare details on your case, that will help to understand what
effect(s) to expect from using this approach.
Alex Baranau
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase
On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu wrote:
> Interesting project, Alex.
> Since ther
1 - 100 of 126 matches
Mail list logo