Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Eugeny Morozov
have a Class Foo and HBase has a Class Foo, your code will never see the light of day. Perhaps I'm stating the obvious but its something to think about when working w Hadoop. On Jan 19, 2013, at 3:36 AM, Eugeny Morozov emoro...@griddynamics.com wrote: Ted, that is correct. HBase 0.92

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Eugeny Morozov
. Perhaps I'm stating the obvious but its something to think about when working w Hadoop. On Jan 19, 2013, at 3:36 AM, Eugeny Morozov emoro...@griddynamics.com wrote: Ted, that is correct. HBase 0.92.x and we use part of the patch 6509. I use the filter as a custom

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-20 Thread Eugeny Morozov
exactly where to go? On Sat, Jan 19, 2013 at 5:16 PM, Ted yuzhih...@gmail.com wrote: In your original email you said the first key looked like start key of a region, can you verify that ? Thanks On Jan 19, 2013, at 1:36 AM, Eugeny Morozov emoro...@griddynamics.com wrote: Ted

Re: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-19 Thread Eugeny Morozov
yuzhih...@gmail.com wrote: To my knowledge CDH-4.1.2 is based on HBase 0.92.x Looks like you were using patch from HBASE-6509 which was integrated to trunk only. Please confirm. Copying Alex who wrote the patch. Cheers On Fri, Jan 18, 2013 at 3:28 PM, Eugeny Morozov emoro

Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-18 Thread Eugeny Morozov
Hi, folks! HBase, Hadoop, etc version is CDH-4.1.2 I'm using custom FuzzyRowFilter, which I get from http://blog.sematext.com/2012/08/09/consider-using-fuzzyrowfilter-when-in-need-for-secondary-indexes-in-hbase/and suddenly after quite a time we found that it starts loosing data. Basically the

Re: Many scanner opening

2012-12-23 Thread Eugeny Morozov
like it's enough to get contention =) On Thu, Dec 20, 2012 at 10:51 PM, lars hofhansl lhofha...@yahoo.com wrote: Cool. You probably made it less likely that your scanners will scan the same HFile in parallel. -- Lars From: Eugeny Morozov emoro

Re: Many scanner opening

2012-12-20 Thread Eugeny Morozov
...@yahoo.com wrote: You might have run into HBASE-7336. (Not available in any official release, yet) If you're using 0.94 (and probably 0.92) you can just apply this patch (it's save and simple). From: Eugeny Morozov emoro...@griddynamics.com To: user

Many scanner opening

2012-12-18 Thread Eugeny Morozov
Hello! We faced an issue recently that the more map tasks are completed, the longer it takes to complete one more map task. In our architecture we have two scanners to read the table. The first one, which is called 'outer' scanner is reading table and filter some rowkeys. These rowkeys are used

Re: Debugging Coprocessor code in Eclipse

2012-10-16 Thread Eugeny Morozov
Anil, you could've also get some benefit from using HBaseTestingUtility. It is able to run HBase cluster in standalone mode all-in-one JVM. Of course it requires to have some code to create tables, assign coprocessor to table and populate it with data. And then run client code against it. All of

Re: Does TotalOrderPartitioner refresh its partitions selection tree

2012-10-09 Thread Eugeny Morozov
Chris, In this case nothing scared actually happens. * If partitions are the same, then HBase simply copies all your HFiles during bulkloading procedure. * If partitions are changed, then it still copies them, but in addition, some of these files (according to number of split regions) would be

Re: Questions on Table design for time series data

2012-10-03 Thread Eugeny Morozov
I'd suggest to think about manual major compactions and splits. Using manual compactions and bulkload allows to split HFiles manually. Like if you would like to read last 3 months more often that all others data, then you could have three HFiles for each month and one HFile for whole other stuff.

Re: Distribution of regions to servers

2012-09-27 Thread Eugeny Morozov
queries. Hope it makes sense to you. Best Wishes Dan Han On Wed, Sep 26, 2012 at 3:19 PM, Eugeny Morozov emoro...@griddynamics.comwrote: Dan, I have additional questions. What is the access pattern of your queries? I mean that f.e. PrefixFilters have to be applied for all KeyValue

Re: Distribution of regions to servers

2012-09-26 Thread Eugeny Morozov
Dan, I have additional questions. What is the access pattern of your queries? I mean that f.e. PrefixFilters have to be applied for all KeyValue pairs in HFiles, which could be slow. Or f.e. scanner setCaching option is able to decrease number of network hops to get data from RegionServer.

Re: scan.setTimeRange performance

2012-09-25 Thread Eugeny Morozov
have much to contribute except than to point to a recent conversation that you can find here: http://comments.gmane.org/gmane.comp.java.hadoop.hbase.user/28722 Hope this helps, J-D On Fri, Sep 21, 2012 at 5:03 AM, Eugeny Morozov emoro...@griddynamics.com wrote: Hello! It is known

Re: Simple way to unit test HBase Map reduce jobs?

2012-09-25 Thread Eugeny Morozov
Hi, Elazar, I've found that MRUnit is pretty convenient way to test MR jobs. On the other hand there is HBaseTestingUtility, which is helpful to run miniCluster. Hope this helps. On Mon, Sep 24, 2012 at 8:43 PM, Elazar Leibovich elaz...@gmail.com wrote: Is there a way similar to miniserver to

scan.setTimeRange performance

2012-09-21 Thread Eugeny Morozov
Hello! It is known and I saw it in the code that time range set by scan.setTimeRange is used to filter out HFiles for further scan. Which means that speed of following scanner.next must be almost zero in case if I set time range far away in future. I am sure that I do not have HFiles that fall