Re: What is the best way to return scanner stats to client from co-processor?

2018-10-31 Thread Alex Baranau
sing simple # of records or bytes returned is not enough to define a good complexity measurement. Things like rows scanned, filtered, rpc calls, etc. in ScanMetrics are very helpful to inform it though! Good on you sir, Alex On Mon, Oct 29, 2018 at 11:05 AM Stack wrote: > On Wed, Oct 24, 2

What is the best way to return scanner stats to client from co-processor?

2018-10-24 Thread Alex Baranau
with some client or server-side logic? Any ideas are welcome! Thank you in advance, Alex Baranau

[OFFTOPIC] Big Data Application Meetup

2015-06-01 Thread Alex Baranau
plan for the first event to be hosted by Cask at its HQ in Palo Alto in end of June. Thank you, Alex Baranau

Re: HBase Block locality always 0

2015-05-18 Thread Alex Baranau
would cause that behavior... Btw, why 25th is not collocated with datanode? Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop & HBase On Fri, May 15, 2015 at 8:12 PM, Louis Hust wrote: > Hi, Esteban, > > Hadoop Version 2.2.0, r1537062

Re: cell level coprocessor

2015-05-14 Thread Alex Baranau
2). Here's the BaseRegionObserver implementation [2]. On a side note, be sure to not overuse the versions of a Cell. Many times using columns is a better schema design. Cheers, Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop & HBase [1] https://

Re: Re: Re: Re: Why can the capacity of a table with TTL grow continuously?

2015-03-11 Thread Alex Baranau
d to some extend by upping the region size. Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop & HBase On Wed, Mar 11, 2015 at 7:00 PM, David chen wrote: > hbase.store.delete.expired.storefile is true in file > hbase-0.98.5/hbase-serv

Re: Re: Re: Why can the capacity of a table with TTL grow continuously?

2015-03-11 Thread Alex Baranau
Quick question: have you by any chance noticed the region number to grow a lot over the time of your measurements? Note that regions are not merged automatically back if they shrink (incl. due to TTL) after being split ( http://hbase.apache.org/book.html#ops.regionmgt) Alex Baranau -- http

Re: Re: Re: Why can the capacity of a table with TTL grow continuously?

2015-03-11 Thread Alex Baranau
nless files are deleted, they occupy space in hdfs). Alex Baranau http://cdap.io - open source framework to build and run data applications on Hadoop & HBase On Tue, Mar 10, 2015 at 9:15 PM, David chen wrote: > Thanks lars, > I ever ran scan to test TTL for several times, the data ex

Re: Standalone == Dev Only?

2015-03-10 Thread Alex Baranau
Also, you could use RDBMs behind key-value abstraction, to start with, while keeping your app design clean out of RDBMs specifics. Alex Baranau [1] https://github.com/google/leveldb [2] https://github.com/dain/leveldb [3] http://cdap.io [4] https://github.com/caskdata/cdap/blob/develop/cdap-api/s

Re: Regarding a doubt I am having for HBase

2015-03-10 Thread Alex Baranau
CCing HBase's user ML. Could you give an example of the row key and example of two different queries you are making to better understand your case? Thank you, Alex Baranau -- http://cdap.io - open source framework to build and run data applications on Hadoop & HBase On Mon, Mar 9,

Re: FuzzyRowFilter in hbase shell

2013-03-28 Thread Alex Baranau
his is an improved version with ranges support, better API and documentation. Alex Baranau On Thu, Mar 28, 2013 at 10:38 AM, Robert Hamilton < rhamil...@whalesharkmedia.com> wrote: > Hi all. It it possible to test FuzzyRowFilter from the shell? If so, could > somebody kindly point me to

Re: Is it necessary to set MD5 on rowkey?

2012-12-19 Thread Alex Baranau
s you will use hash-based solution. At least in the beginning and in simplest cases. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] http://search-hadoop.com/m/TjkXd11qhLS On Wed, Dec 19, 2012 at 6:04 PM, David Arthur wrote: > I wasn't

Re: Is it necessary to set MD5 on rowkey?

2012-12-18 Thread Alex Baranau
y for that). Thank you, Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] https://github.com/sematext/HBaseWD On Tue, Dec 18, 2012 at 12:24 PM, Michael Segel wrote: > Quick answer... > > Look at the salt. > Its just a number from

Re: Is it necessary to set MD5 on rowkey?

2012-12-18 Thread Alex Baranau
by choosing number of possible 'salt' prefixes (which could be derived from hashed values, etc.) you can balance between distributing writes efficiency and ability to run fast range scans. Hope this helps Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase

Re: HBase: "small" WAL transactions Q

2012-10-02 Thread Alex Baranau
the contract here is a transaction), so (currently) you would > get unnecessarily reduced concurrency using that API for changes that do > not need to be atomic. > > > Also note that a Put(List) operation already writes multiple updates > to a single WALEdit (doing a b

HBase: "small" WAL transactions Q

2012-10-02 Thread Alex Baranau
Or is it simply not efficient (there's more to that besides what I described above)? Thank you, Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] https://issues.apache.org/jira/browse/HBASE-5229

Re: Undelete Rows

2012-09-19 Thread Alex Baranau
se-case, on how granular those pieces of data which can be deleted. E.g. storing minTs for each record doesn't make sense. While keeping it for larger pieces of data may work. You probably thought about this approach though. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - H

Re: Undelete Rows

2012-09-19 Thread Alex Baranau
Hi Jerry, Just out of the curiosity: what is your use-case? Why do you want to do that? To gain extra protection from software error or smth else? Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Sep 18, 2012 at 6:32 PM, lars hofhansl wrote

Re: Hbase Scan - number of columns make the query performance way different

2012-09-17 Thread Alex Baranau
nless some of them have large values, so that it makes it longer to simply transfer those values over the network (is your network fast, btw?). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Sep 13, 2012 at 11:02 AM, Jacques wrote: > Not

Re: Optimizing table scans

2012-09-17 Thread Alex Baranau
> An average row size is ~200 Bytes. How many columns do you have? I assume every time you try to fetch "non-cached in RSs block cache" data (i.e. making "true test"), right? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Sol

Re: Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

2012-08-22 Thread Alex Baranau
ation of cluster, does the performance sounds OK > for timestamp filtering? > > Thanks, > Anil > > On Mon, Aug 20, 2012 at 1:07 PM, Alex Baranau >wrote: > > > Created: https://issues.apache.org/jira/browse/HBASE-6618 > > > > Alex Baranau > > -- > &

Re: Column Value Reference Timestamp Filter

2012-08-20 Thread Alex Baranau
scan.setMaxVersions(2). Not sure if keyvalues are fed into filter ordered by their timestamp.. How about returning 2 most recent values to the client and filtering on the client-side? Why this doesn't work in your case? (large values in columns in size or?). Alex Baranau -- Sematext :: htt

Re: Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

2012-08-20 Thread Alex Baranau
Created: https://issues.apache.org/jira/browse/HBASE-6618 Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Sat, Aug 18, 2012 at 5:02 PM, anil gupta wrote: > Hi Alex, > > Apart from the query which i mentioned in last email. Till no

Re: Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

2012-08-18 Thread Alex Baranau
ike FuzzyRowFilter with range Yes, smth like this looks like would be very valuable. It would be interesting to implement too. Let's see if I find the time for that in my work plan. If you want to try it by yourself, go for it! Let me know if you need a help in that case ;) Alex Baranau --

Can I specify the range inside of fuzzy rule in FuzzyRowFilter?

2012-08-17 Thread Alex Baranau
er. Just grab the patch from HBASE-6509 and copy the filter. No need to patch & rebuild HBase. Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] Anil Gupta added a comment - 18/Aug/12 04:37 Hi Alex, I have a question related to this filter. I have a

Re: Filtering values and Get.addColumn

2012-08-15 Thread Alex Baranau
Indeed. Wrote simple unit-test [1] and it fails. And there's a JIRA for that also: https://issues.apache.org/jira/browse/HBASE-4364. I added patch with the simple unit-test that fails to it. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Sol

Re: CheckAndAppend Feature

2012-08-07 Thread Alex Baranau
Hi Jerry, Out of curiosity, what is your use-case? How do you want to use this? Also, I guess, feel free to file a jira issue for this functionality (I believe there's no such yet) . Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue,

Re: Poor data locality of MR job

2012-08-07 Thread Alex Baranau
paction will remain same. I believe someone is working on making replication process (replicas balancer) to be more smart at the moment. Hopes are to see this work soon :) Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Aug 2, 2012 at 5:5

Re: How to query by rowKey-infix

2012-08-03 Thread Alex Baranau
your comments at HBASE-6509). Alex Baranau -- Sematext :: http://sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Aug 3, 2012 at 5:23 AM, Christian Schäfer wrote: > Hi Alex, > > thanks a lot for the hint about setting the timestamp of the put. > I didn't know t

Re: How to query by rowKey-infix

2012-08-02 Thread Alex Baranau
ng on client-side when you can do it on server-side just feels wrong. Esp. given that there's a lot of data in HBase (otherwise why would you use it). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Aug 2, 2012 at 7:09 PM, Matt Cor

Re: How to query by rowKey-infix

2012-08-02 Thread Alex Baranau
g some time ago. If this idea works for you I could look for the implementation and share it if it helps. Or may be even simply add it to HBase codebase. Hope this helps, Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Aug 2, 2012 at

Re: Multiple CF and versions

2012-08-01 Thread Alex Baranau
These questions were raised many times in this ML and in other sources (blogs, etc.). You can find them with a little effort. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Wed, Aug 1, 2012 at 1:33 AM, Mohammad Tariq wrote: > Hello Mo

Re: Parallel scans

2012-08-01 Thread Alex Baranau
uch more requests in parallel than you have clients (depends on your clients number of course, but I assume you don't have more that several, incl. MR jobs). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 31, 2012 at 3:27 PM, T

Re: sync on writes

2012-08-01 Thread Alex Baranau
me to execute and utilize network differently: pipelined *may* be slower but can saturate network bandwidth better. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 31, 2012 at 9:09 PM, Mohit Anchlia wrote: > In the HBase book i

Re: Cluster load

2012-07-30 Thread Alex Baranau
3+1+6+1+1=12 bytes. I'd better use Bytes.toBytesBinary(String) method, which converts back to byte array. Or, if you are using ResultScanner API for fetching data, just invoke Result.getRow().length. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch

Re: Cluster load

2012-07-27 Thread Alex Baranau
Yeah, your row keys start with \x00 which is = (byte) 0. This is not the same as "0" (which is = (byte) 48). You know what to fix now ;) Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 8:43 PM, Mohit Anchlia wr

Re: Cluster load

2012-07-27 Thread Alex Baranau
t first byte of your key to anything from (byte) 0 - (byte) 9, all of them will fall into first regions which holds records with prefixes (byte) 0 - (byte) 48. Could you check that? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 20

Re: Cluster load

2012-07-27 Thread Alex Baranau
n next releases. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri, Jul 27, 2012 at 2:21 PM, syed kather wrote: > Thank you so much for your valuable information. I had not yet used any > monitoring tool .. can please suggest m

Re: Cluster load

2012-07-27 Thread Alex Baranau
memstore flush): 1566523617482885717, size: 1993369 bytes. btw, 2MB looks weird: very small flush size (in this case, in other cases this may happen - long story). May be compression does very well :) Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Fri,

Re: Bloom Filter

2012-07-27 Thread Alex Baranau
Very good explanation (and food for thinking) about using bloom filters in HBase in answers here: http://www.quora.com/How-are-bloom-filters-used-in-HBase. Should we put the link to it from Apache HBase book (ref guide)? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase

Re: Cluster load

2012-07-27 Thread Alex Baranau
oking at hdfs - this way you make sure your data is flushed to hdfs (and not hanged in Memstores). You may want to check the START/END keys of this region (via master web ui or in .META.). Then you can compare with the keys generated by your app. This should give you some info about what's g

Re: Bulk loading disadvantages

2012-07-27 Thread Alex Baranau
> Another problem is with data locality immediately after bulk loading > through MR. You might find this recent discussion about that useful: [1] Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] The start is here: http:

Re: Hbase Data Model to purge old data.

2012-07-26 Thread Alex Baranau
hen use it: ** for bigger memstore (I believe that should esp. improve your timings for fetching data older than hour (there's kinda a spike on fetch time chart there)) ** for bigger block caches ** having more "hot" regions per RS Alex Baranau -- Sematext :: http://blog.semate

Re: Row distribution

2012-07-26 Thread Alex Baranau
;US_FL" "US_KN" "US_MS" "US_NC" "US_VM" "V" so that data is more or less evenly distributed (note: there's no need to split other countries in regions as they they will have small amount of data). No standard splitter will know what your

Re: Row distribution

2012-07-26 Thread Alex Baranau
..., (byte) 9 (i.e. with 0x00, 0x01, ..., 0x09) then no need to convert to String. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Thu, Jul 26, 2012 at 11:43 AM, Mohit Anchlia wrote: > On Thu, Jul 26, 2012 at 7:16 AM, Alex Baranau >wro

Re: Hbase Data Model to purge old data.

2012-07-26 Thread Alex Baranau
This leads to a region server > hotspots. Again, may be an obvious q: have you tried to (or is it possible in your case to) pre-split table so that regions are distributed over the cluster from the start? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch -

Re: Row distribution

2012-07-26 Thread Alex Baranau
, you can define more), with start keys: "", "1", "2", ..., "9" [1]. Btw, since you are salting your keys to achieve distribution, you might also find this small lib helpful which implements most of the stuff for you [2]. Hope this helps. Alex Baranau ---

Re: Modify rowKey in prePut hook

2012-07-25 Thread Alex Baranau
there's much more to that, why this cannot be done. So, you have to figure out the way to set row key in your client code... Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 24, 2012 at 10:58 AM, Daniel Gorgan - SKIN < danie

Re: Row distribution

2012-07-25 Thread Alex Baranau
such things. And of course, you can use HBase Java API to fetch some data of the cluster state as well. I guess you should start looking at it from HBaseAdmin class. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1] hbase(main):001:0> s

Re: Rowkey hashing to avoid hotspotting

2012-07-19 Thread Alex Baranau
> I read somewhere that HBase is not > good at handling more than 100 column families Heh. Usually it is not good to have more than two or three, actually. See [1], and may be also [2]. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr [1

Fwd: Bulk Import & Data Locality

2012-07-18 Thread Alex Baranau
that 3 (or whatever is replication) replicas of this file (and hence of this region) are "full" replicas, which makes it easier to preserve data locality if RS fails down (or when anything else cause re-assigning the region). But since Region size is usually much bigger (usually

Re: Bulk Import & Data Locality

2012-07-18 Thread Alex Baranau
that 3 (or whatever is replication) replicas of this file (and hence of this region) are "full" replicas, which makes it easier to preserve data locality if RS fails down (or when anything else cause re-assigning the region). But since Region size is usually much bigger (usually

Bulk Import & Data Locality

2012-07-18 Thread Alex Baranau
aunch certain Reducer tasks, this would help us. I believe this is not possible with MR1, please correct me if I'm wrong. Perhaps, this is this possible with MR2? I assume there's no way to provide a "hint" to a NameNode where to place blocks of a new File too, right? Thank you,

Re: Rowkey hashing to avoid hotspotting

2012-07-17 Thread Alex Baranau
opposed to situation when this hot data distributed over many more RSs (which will act like distributed cache) e.g. with salting. In general, yes, you will not see as big issues with uneven *read* load distribution over the cluster as you might see in case of uneven *write* load distribution. Ale

Re: Scan only talks to a single region server

2012-07-17 Thread Alex Baranau
> How do you create your scan(ner)? Could you paste the code here? Sorry, meant to ask how do you instantiate HTable, configuration objects. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 17, 2012 at 11:37 AM, Alex Baranau wr

Re: Rowkey hashing to avoid hotspotting

2012-07-17 Thread Alex Baranau
omposite key with these two attributes and added timestamp to > > make it unique. > > > > To filter the data, I use rowkey filter with regex string comparator and > > it works well with sample seed data. Now I am afraid whether this set up > > will lead to region server hotspotting when we load production data in > > HBase. I read hashing may solve this problem. Can some one help me in > > implementing hashing the row key? Also I would want the row filter to > work > > as I have to display the number of components in a web page and I use row > > key filter for implementing that functionality? Any guidance would be of > > great help. > > > > -- > > Regards, > > Anand > > > > > > -- > Regards, > Anand > -- Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr

Re: Scan only talks to a single region server

2012-07-17 Thread Alex Baranau
enefit from data locality). I.e. it creates one Map task per region. I wonder if this can be related. Sorry for obvious check... Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Tue, Jul 17, 2012 at 11:11 AM, Whitney Sorenson wrote: > I&#x

Re: Why startRegionOperation get the lock.readLock().lock(),still need row lock?

2012-07-16 Thread Alex Baranau
* The first lock is for guarding closes of Region. I.e. for forbidding reading/writing to the Region which is being closed. * The second lock is row lock. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Hadoop - HBase - ElasticSearch - Solr On Mon, Jul 16, 2012 at 10:14 AM, Howard

Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?

2012-07-11 Thread Alex Baranau
Thank you guys for the pointers/info! I'll try to make use of it. If it turns out into smth (like script, etc.) re-usable I will open a JIRA issue and add it for others to use. Thanx again, Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Wed, J

Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?

2012-07-09 Thread Alex Baranau
ted" by removing HFiles: I will specify timerange on scans anyways (in this example to omit things older than 1 week). Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Mon, Jul 9, 2012 at 3:44 PM, Jonathan Hsieh wrote: > You could set your ttls and

Re: Can manually remove HFiles (similar to bulk import, but bulk remove)?

2012-07-09 Thread Alex Baranau
Heh, this is what I want to avoid actually: restarting RSs. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Mon, Jul 9, 2012 at 3:38 PM, Amandeep Khurana wrote: > I _think_ you should be able to do it and be just fine but you'll need t

Can manually remove HFiles (similar to bulk import, but bulk remove)?

2012-07-09 Thread Alex Baranau
Hello, I wonder, for purging old data, if I'm OK with "remove all StoreFiles which are older than ..." way, can I do that? To me it seems like this can be a very effective way to remove old data, similar to fast bulk import functionality, but for deletion. Thank you

Re: ways to improve performance of Scan with SingleColumnValueFilter..Please help!!!

2012-06-29 Thread Alex Baranau
node (even if you open shell on slave node) and it will decide where to place regions (depending on the regions # on the slaves). You can probably try to manually move regions to desired RSs, but that is also not a good way to go with. Alex Baranau -- Sematext :: http://blog.sematext.com

Re: ways to improve performance of Scan with SingleColumnValueFilter..Please help!!!

2012-06-29 Thread Alex Baranau
I'd agree that HBase is not designed to be run in such "inter-continental" single cluster setup. Latency in communication between nodes (slaves) is vital for the health of the cluster. So, the short answer: just don't do it that way. What is the reason to have nodes in th

Re: Consider individual RSs performance when writing records with random keys?

2012-05-23 Thread Alex Baranau
cific cases can (like when row keys are "randomized", as explained above and in earlier message). So, as far as I understand this should be addressed on higher level. Alex Baranau -- Sematext :: http://blog.sematext.com On Thu, May 17, 2012 at 10:23 AM, Alex Baranau wrote: > Hi,

Re: About HBase Memstore Flushes

2012-05-23 Thread Alex Baranau
imit. Not sure if that would make sense to separate these two things though: * mark until memstore flushes are forced and updates are blocked * mark when memstore flushes are forced (without blocking updates) As for now for two these things hbase.regionserver.global.memstore.lowerLimit is used

Re: Can we store a HBase Result object using Put

2012-05-23 Thread Alex Baranau
I saw the need for such converting many times before. Should we add it as a public method in some utility class? (create JIRA for that?) Alex Baranau -- Sematext :: http://blog.sematext.com/ On Mon, May 21, 2012 at 4:26 PM, Jean-Daniel Cryans wrote: > How exactly are you building the

Consider individual RSs performance when writing records with random keys?

2012-05-17 Thread Alex Baranau
f each thread is writing into relatively small number of all RSs though only, I think. Otherwise they will perform more or less the same. Am I completely crazy when thinking about this? Does it makes sense to you at all? Alex Baranau -- Sematext :: http://blog.sematext.com/

Re: About HBase Memstore Flushes

2012-05-09 Thread Alex Baranau
Should I may be create a JIRA issue for that? Alex Baranau -- Sematext :: http://blog.sematext.com/ On Tue, May 8, 2012 at 4:00 PM, Alex Baranau wrote: > Hi! > > Just trying to check that I understand things correctly about configuring > memstore flushes. > > Basically, th

About HBase Memstore Flushes

2012-05-08 Thread Alex Baranau
tores)? E.g.: B.1 given setting X%, trigger flush of biggest memstore (or whatever is logic for selecting memstore to flush) when memstore takes up X% of heap (similar to (1), but triggers flushing when there's no need to block updates yet) B.2 any other which takes into ac

Re: RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-05-01 Thread Alex Baranau
! Alex Baranau -- Sematext :: http://blog.sematext.com/ On Tue, May 1, 2012 at 9:02 AM, Dhaval Shah wrote: > > > Not sure if its related (or even helpful) but we were using cdh3b4 (which > is 0.90.1) and we saw similar issues with region servers going down.. we > didn't lo

Re: How is reconnection handled?

2012-05-01 Thread Alex Baranau
from HBase native API, which might be OK or not OK in your case. Alex Baranau -- Sematext :: http://blog.sematext.com/ [1] (Note: HTable is not thread-safe by itself, so this code isn't going to be accessed from multiple threads and hence no synchronization is here) private HTable hTa

RegionServer silently stops (only "issue": CMS-concurrent-mark ~80sec)

2012-04-30 Thread Alex Baranau
ot a "stop-the-world" process. Any advice? HBase: hbase-0.90.4-cdh3u3 Hadoop: 0.20.2-cdh3u3 Thank you, Alex Baranau [1] last lines from RS log (no errors before too, and nothing written in *.out file): 2012-04-30 18:52:11,806 DEBUG org.apache.hadoop.hbase.regionserver.CompactSpl

Re: Scan on compound key

2012-04-30 Thread Alex Baranau
Why not just define startRow & stopRow for Scan [1]? Am I missing smth? Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase [1] Smth like: byte[] startRow = Bytes.toString("example key"); byte[] stopRow = Arrays.copyOf(startRow, startRow.le

Re: some region Could not seek StoreFileScanner[HFileScanner for reader

2012-04-30 Thread Alex Baranau
helped me. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase [1] Same error in log as you have when trying to access the region. hbck showed: ERROR: Region agg-sa-1.3,0011| qb|5mhb|\x00\x00\x00\x00\x00C\xA3\x98\x004\x00\x00\x00\x015\xA0\x83K\xC4\x00\x

Re: Applying filters to ResultScanner

2012-04-19 Thread Alex Baranau
. Note: setCacheBlocks(true) will not override your columnfamily settings, so do not disable it on that level. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Thu, Apr 19, 2012 at 12:52 PM, Kevin M wrote: > Thanks for the reply. > > I see. Wo

Re: hbase coprocessor unit testing

2012-04-19 Thread Alex Baranau
Are you sure you need to do table.close() after each put? Looks incorrect. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Thu, Apr 19, 2012 at 2:48 AM, Marcin Cylke wrote: > On 17/04/12 18:45, Alex Baranau wrote: > > I don't think t

Re: regions stuck in transition

2012-04-17 Thread Alex Baranau
t should still be served. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Mon, Apr 16, 2012 at 11:21 AM, Bryan Beaudreault < bbeaudrea...@hubspot.com> wrote: > Hello, > > We've recently had a problem where regions will get stuc

Re: hbase coprocessor unit testing

2012-04-17 Thread Alex Baranau
lves using localhost, at other - your hostname. Since (I suppose) those two didn't match - you got error. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Tue, Apr 17, 2012 at 9:34 AM, Marcin Cylke wrote: > On 17/04/12 15:15, Alex Baranau wrote:

Re: [ hbase ] Re: hbase coprocessor unit testing

2012-04-17 Thread Alex Baranau
nning on your machine (sudo jps) 3) cleanup your /tmp dir I see "java.net.ConnectException: Connection refused", which may indicate some of your cluster parts failed to start. Bigger log should be more helpful. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene -

Re: hbase coprocessor unit testing

2012-04-16 Thread Alex Baranau
Here's some code that worked for me [1]. You may also find useful to look at the pom's dependencies [2]. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase [1] From https://github.com/sematext/HBaseHUT/blob/CPs/src/test/java/com/sematext/hb

Re: Performance Optimization techniques HBase

2012-04-12 Thread Alex Baranau
In case you haven't checked yet: * http://hbase.apache.org/bulk-loads.html * http://hbase.apache.org/book.html Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Wed, Apr 11, 2012 at 10:06 PM, Neha wrote: > I am a newbie in HBase. I am wo

Re: How many data versions should I keep in HBase?

2012-04-10 Thread Alex Baranau
Compression applies to the files stored on disks. All versions of a column are stored the same way (HBase doesn't differentiate them at the time of writing and they are not placed "near" each other in the file). Given that, yes you are likely to get the same level of compression (compr. ratio) if y

Re: Schema Updates: what do you do today?

2012-04-09 Thread Alex Baranau
work well. Alex Baranau -- Sematext :: http://blog.sematext.com/ :: Solr - Lucene - Hadoop - HBase On Mon, Apr 9, 2012 at 3:39 PM, Ian Varley wrote: > Thanks, Andy. Yeah, a tool that compares a schema definition with a > running cluster, and gives you a way to apply changes (without off

Blog post: HBaseWD: Avoid RegionServer Hotspotting Despite Sequential Keys

2012-04-09 Thread Alex Baranau
-sequential-keys/ Alex Baranau

Re: LeaseException despite high hbase.regionserver.lease.period

2012-02-13 Thread Alex Baranau
. > It is really frustrating that i cannot point on what was the real problem. > Even log with debug did not point on problems (perhaps because it is also > missing some debug statement like when a scanner lease is added to the RS) > > Mikael.S > > > On Sun, Feb 12, 2012 a

Re: LeaseException despite high hbase.regionserver.lease.period

2012-02-12 Thread Alex Baranau
in will try brutal variant: set caching = 10 (or even 1), set batch = 10 (or even 1). Alex On Sun, Feb 12, 2012 at 1:49 PM, Alex Baranau wrote: > Hi, > > 0.90.4-cdh3u2 > > Alex > > > On Sun, Feb 12, 2012 at 1:44 PM, wrote: > >> Which version of hbase are you usin

Re: LeaseException despite high hbase.regionserver.lease.period

2012-02-12 Thread Alex Baranau
Hi, 0.90.4-cdh3u2 Alex On Sun, Feb 12, 2012 at 1:44 PM, wrote: > Which version of hbase are you using ? > > Thanks > > > > On Feb 12, 2012, at 10:41 AM, Alex Baranau > wrote: > > > Hello, > > > > I'm getting scanner lease exceptions during

LeaseException despite high hbase.regionserver.lease.period

2012-02-12 Thread Alex Baranau
Hello, I'm getting scanner lease exceptions during mapreduce job [1] after running it for less than 7 minutes. Though I have set hbase.regionserver.lease.period to 60 (i.e. 10 min) in hbase configuration on master and all regionservers and master (and restarted all). Also set it in job's confi

Re: Capturing RegionServerMetrics during inserts

2012-01-08 Thread Alex Baranau
ally and parse html to fetch data you want. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Sat, Jan 7, 2012 at 10:51 AM, Christian Schäfer wrote: > > Hello, > > I want to measure requests per second for each Region Server during > ins

Flume & HBase integration status

2011-07-29 Thread Alex Baranau
Just published a post about current state of Flume & HBase integration (HBase sinks for Flume) at http://blog.sematext.com/2011/07/28/flume-and-hbase-integration. Might be useful for those who are looking at this topic. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - N

Re: Design/Schema questions

2011-07-29 Thread Alex Baranau
Just published a post about Flume & HBase integration which might be helpful. It describes the possible issues & workarounds for them. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Jul 27, 2011 at 9:39 PM, Mark wrote: > Unfortu

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-19 Thread Alex Baranau
-0.1.0-SNAPSHOT-2011.05.19.jar (downloadable from https://github.com/sematext/HBaseWD) Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase P.S. > Can you summarize HBaseWD in your blog That is on my todo list! You pushed it higher to the top priority it

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-05-18 Thread Alex Baranau
te hash of original key (https://github.com/sematext/HBaseWD/issues/2). In either way you don't need to delete record to update some cells of it or add new cells. Please let me know if you have more Qs! Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop -

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-21 Thread Alex Baranau
https://issues.apache.org/jira/browse/HBASE-3811 Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Thu, Apr 21, 2011 at 5:57 PM, Ted Yu wrote: > My plan was to make regions that have active scanners more stable - trying > not to move the

Re: Hash keys

2011-04-21 Thread Alex Baranau
d and use with his/her own cluster. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Thu, Apr 21, 2011 at 6:04 PM, Eric Charles wrote: > Hi Alex, > > Yep, saw the "[ANN]: HBaseWD: Distribute Sequential Writes in HBase" > threa

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-21 Thread Alex Baranau
s (with no extra functionality) just to distinguish it from the base one. If you can share why/how do you want to treat them differently on server side, that would be helpful. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Thu, Apr 21, 2011 at 4

Re: Hash keys

2011-04-21 Thread Alex Baranau
For those who are looking for the solution to this or similar issue, this can be useful: Take a look at HBaseWD (https://github.com/sematext/HBaseWD) lib, which implements solution close to what Lars described. Also some info here: http://search-hadoop.com/m/AQ7CG2GkiO Alex Baranau Sematext

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-21 Thread Alex Baranau
It will be an ordinary scan. Though the number of scan will increase, given that the typical situation is "many regions for single table", the scans of the same "distributed scan" are likely not to hit the same region. Not sure if I answered your questions here. Feel free to ask

Re: [ANN]: HBaseWD: Distribute Sequential Writes in HBase

2011-04-19 Thread Alex Baranau
hare details on your case, that will help to understand what effect(s) to expect from using this approach. Alex Baranau Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - Hadoop - HBase On Wed, Apr 20, 2011 at 8:17 AM, Ted Yu wrote: > Interesting project, Alex. > Since ther

  1   2   >