Re: [Announce] 张铎 (Duo Zhang) is Apache HBase PMC chair

2019-07-20 Thread Anoop Sam John
Congrats Duo. Thanks Misty for your great work as the PMC chair. Anoop On Sat, Jul 20, 2019 at 12:07 AM Xu Cang wrote: > Thank you Misty! > Congratulations Duo, thanks for taking extra work! > > On Fri, Jul 19, 2019 at 11:23 AM Zach York > wrote: > > > Congratulations Duo! Thanks for

HBase Meetups in India

2019-05-08 Thread Anoop Sam John
Hi all, I have seen HBase meetups happening in Bay area as well as different cities in PRC. I believe we have many devs and users based in India. Some of us were discussing about starting this kind of meetups. In order to know the dev/users and which city they are based out, I have

RE: Data not loaded in table via ImportTSV

2013-04-16 Thread Anoop Sam John
Hi Have you used the tool, LoadIncrementalHFiles after the ImportTSV? -Anoop- From: Omkar Joshi [omkar.jo...@lntinfotech.com] Sent: Tuesday, April 16, 2013 12:01 PM To: user@hbase.apache.org Subject: Data not loaded in table via ImportTSV

RE: HBase random read performance

2013-04-15 Thread Anoop Sam John
Ankit I guess you might be having default HFile block size which is 64KB. For random gets a lower value will be better. Try will some thing like 8KB and check the latency? Ya ofcourse blooms can help (if major compaction was not done at the time of testing) -Anoop-

RE: coprocessor load test metrics

2013-04-15 Thread Anoop Sam John
PM To: Anoop Sam John; 'user@hbase.apache.org' Subject: coprocessor load test metrics Hi Anoop, Do we have any metrics for load testing for HBase Coprocessors? Regards, Deepak

RE: Essential column family performance

2013-04-09 Thread Anoop Sam John
Good finding Lars team :) -Anoop- From: lars hofhansl [la...@apache.org] Sent: Wednesday, April 10, 2013 9:46 AM To: user@hbase.apache.org Subject: Re: Essential column family performance That part did not show up in the profiling session. It was just

RE: Scanner returning subset of data

2013-04-08 Thread Anoop Sam John
Randy As Ted suggested can you see the client logs closely (RS side also)? Is there next() call retries happening from the client side because of RPC timeouts? In such a case this kind of issue can happen. I doubt he hit HBASE-5974 -Anoop- From: Ted

RE: Disabling balancer permanently in HBase

2013-04-07 Thread Anoop Sam John
HBASE-6260 made the balancer state to be persisted in ZK so that the restart of the Master wont have an issue. But this is available with 0.95 only. Just telling FYI -Anoop- From: Jean-Marc Spaggiari [jean-m...@spaggiari.org] Sent: Monday, April 08, 2013

RE: Getting less write throughput due to more number of columns

2013-03-26 Thread Anoop Sam John
When the number of columns (qualifiers) are more yes it can impact the performance. In HBase every where the storage will be in terms of KVs. The key will be some thing like rowkey+cfname+columnname+TS... So when u have 26 cells in a put then there will be repetition of many bytes in the

RE: Compaction problem

2013-03-26 Thread Anoop Sam John
@tarang As per 4G max heap size, you will get by deafult 1.4G total memory for all the memstores (5/6 regions).. By default you will get 35% of the heap size for memstore. Is your process only write centric? If rare read happens, think of increasing this global heap space setting..Else can

RE: Truncate hbase table based on column family

2013-03-26 Thread Anoop Sam John
varaprasad Pls see HBaseAdmin#deleteColumn().. You should disable the table before making an schema changes and enable back after that. -Anoop- From: varaprasad.bh...@polarisft.com [varaprasad.bh...@polarisft.com] Sent: Tuesday, March 26, 2013 2:15

RE: Is there a way to only scan data in memstore

2013-03-21 Thread Anoop Sam John
How you can be sure abt data will be in memstore only. What if in btw flush happening? Which version in use? In 94.x version (I am not sure abt the .x version no#) there is preStoreScannerOpen() CP hook. This impl can return a KVScanner for a store (In your impl the scanner can be only for

RE: NameNode of Hadoop Crash?

2013-03-18 Thread Anoop Sam John
Can you ask this question in HDFS user group pls? -Anoop- From: bhushan.kandalkar [bhushan.kandal...@harbingergroup.com] Sent: Monday, March 18, 2013 12:29 PM To: user@hbase.apache.org Subject: NameNode of Hadoop Crash? Hi Following is the error log in

RE: Regionserver goes down while endpoint execution

2013-03-15 Thread Anoop Sam John
Himanshu told it clearly. To make to more clear I am adding :) When the range of rowkeys that you are looking for spread across 5 regions, at the client side there will be 5 exec requests created and submitted to a thread pool.[HBase client side thread pool associated with HTable] Now as per

RE: region server down when scanning using mapreduce

2013-03-12 Thread Anoop Sam John
How is the GC pattern in your RSs which are getting down? In RS logs you might be having YouAreDeadExceptions... Pls try tuning your RS memory and GC opts. -Anoop- From: Lu, Wei [w...@microstrategy.com] Sent: Tuesday, March 12, 2013 1:42 PM To:

RE: Welcome our newest Committer Anoop

2013-03-10 Thread Anoop Sam John
Thanks to all.. Hope to work more and more for HBase! -Anoop- From: Andrew Purtell [apurt...@apache.org] Sent: Monday, March 11, 2013 7:33 AM To: user@hbase.apache.org Subject: Re: Welcome our newest Committer Anoop Congratulations Anoop. Welcome! On

RE: How HBase perform per-column scan?

2013-03-10 Thread Anoop Sam John
ROWCOL bloom says whether for a given row (rowkey) a given column (qualifier) is present in an HFile or not. But for the user he dont know the rowkeys. He wants all the rows with column 'x' -Anoop- From: Liu, Raymond [raymond@intel.com] Sent:

RE: can we use same column name for 2 different column families?

2013-03-10 Thread Anoop Sam John
can we have column name dob under column family F1 F2? Just fine.. Go ahead.. :) -Anoop- From: Ramasubramanian Narayanan [ramasubramanian.naraya...@gmail.com] Sent: Sunday, March 10, 2013 11:41 PM To: user@hbase.apache.org Subject: can we use same column

RE: Odd WARN in hbase 0.94.2

2013-03-07 Thread Anoop Sam John
Hi Byran, This change is needed with usage of any of the open src HDFS release or is it only in CDH? Is this related with HDFS-347? In such a case forget abt my previous mail abt adding in book :) -Anoop- From: Kevin O'dell

RE: Why InternalScanner doesn't have a method that returns entire row or object of Result

2013-03-07 Thread Anoop Sam John
Asaf You are correct! You mean the RegionScanner I think.. The 'limit' is applied at this level. HRegion$RegionScannerImpl -Anoop- From: Asaf Mesika [asaf.mes...@gmail.com] Sent: Thursday, March 07, 2013 6:04 PM To: user@hbase.apache.org Subject:

RE: Odd WARN in hbase 0.94.2

2013-03-06 Thread Anoop Sam John
Hi Kevin Thanks for the information. In HBase book, we have added a note on how to use short circuit reads.. Can we update it accordingly? Which version of HDFS need this attribute to be present in DN side also? It would be great if you can file a JIRA and give a change in the

RE: Miserable Performance of gets

2013-03-05 Thread Anoop Sam John
Hi Kiran When you say doing a batch get with 20 Gets, whether the rowkeys for these 20 Gets are in same region? How many RS you are having? Can u observer out of this 20, which all gets targetting which all regions. Some information on this can help explain the slowness...

RE: HBase CheckSum vs Hadoop CheckSum

2013-02-26 Thread Anoop Sam John
I was typing a reply and by the time Liang replied :) Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. Instead HBase only check for the correctness by comparing with stored checksum values. Still the periodic operation

RE: HBase CheckSum vs Hadoop CheckSum

2013-02-26 Thread Anoop Sam John
/26 Anoop Sam John anoo...@huawei.com: I was typing a reply and by the time Liang replied :) Ya agree with him. It is only the HDFS client (At RS) not doing the checksum verification based on the HDFS stored checksum. Instead HBase only check for the correctness by comparing with stored

RE: attributes - basic question

2013-02-22 Thread Anoop Sam John
We have used setAttribute() along with Scan which we are using in the CP. Ya it will work fine. Pls try with ur use case and if finding any issue pls report -Anoop- From: Toby Lazar [tla...@gmail.com] Sent: Saturday, February 23, 2013 4:07 AM To:

RE: Optimizing Multi Gets in hbase

2013-02-18 Thread Anoop Sam John
It will instantiate one scan op per Get -Anoop- From: Varun Sharma [va...@pinterest.com] Sent: Monday, February 18, 2013 3:27 PM To: user@hbase.apache.org Subject: Optimizing Multi Gets in hbase Hi, I am trying to batched get(s) on a cluster. Here is

RE: Co-Processor in scanning the HBase's Table

2013-02-17 Thread Anoop Sam John
I wanna use a custom code after scanning a large table and prefer to run the code after scanning each region Exactly at what point you want to run your custom code? We have hooks at points like opening a scanner at a region, closing scanner at a region, calling next (pre/post) etc -Anoop-

RE: Co-Processor in scanning the HBase's Table

2013-02-17 Thread Anoop Sam John
scanning a region or after scanning the regions that to belong to one regionserver. On Mon, Feb 18, 2013 at 7:45 AM, Anoop Sam John anoo...@huawei.com wrote: I wanna use a custom code after scanning a large table and prefer to run the code after scanning each region Exactly at what point you want

RE: Using HBase for Deduping

2013-02-15 Thread Anoop Sam John
? On Friday, February 15, 2013, Anoop Sam John wrote: When max versions set as 1 and duplicate key is added, the last added will win removing the old. This is what you want Rahul? I think from his explanation he needs the reverse way -Anoop- From

RE: Using Hbase for Dedupping

2013-02-14 Thread Anoop Sam John
Hi Rahul When you say that some events can come with duplicate UUID, what is the probability of such duplicate events? Is it like most of the events wont be unique and only few are duplicate? Also whether this same duplicated events come again and again (I mean same UUID for so

RE: Using HBase for Deduping

2013-02-14 Thread Anoop Sam John
When max versions set as 1 and duplicate key is added, the last added will win removing the old. This is what you want Rahul? I think from his explanation he needs the reverse way -Anoop- From: Asaf Mesika [asaf.mes...@gmail.com] Sent: Friday, February

RE: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space

2013-02-12 Thread Anoop Sam John
them to the expected. The question is: is it legal to change a KV I received from the InternalScanner before adding it the Result - i..e returning it from my own InternalScanner? On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote: Asaf, You have created a wrapper around

RE: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space

2013-02-12 Thread Anoop Sam John
loss, when I count the values and compare them to the expected. The question is: is it legal to change a KV I received from the InternalScanner before adding it the Result - i..e returning it from my own InternalScanner? On Feb 12, 2013, at 8:44 AM, Anoop Sam John wrote: Asaf

RE: Get on a row with multiple columns

2013-02-11 Thread Anoop Sam John
You mean the end point is geetting executed with high QoS? You checked with some logs? -Anoop- From: Varun Sharma [va...@pinterest.com] Sent: Monday, February 11, 2013 4:05 AM To: user@hbase.apache.org; lars hofhansl Subject: Re: Get on a row with

RE: restrict clients

2013-02-11 Thread Anoop Sam John
HBase supports Kerberos based authentication. Only those client nodes with a valid Kerberos ticket can connect with the HBase cluster. -Anoop- From: Rita [rmorgan...@gmail.com] Sent: Monday, February 11, 2013 6:37 PM To: user@hbase.apache.org Subject: Re:

RE: Custom preCompact RegionObserver crashes entire cluster on OOME: Heap Space

2013-02-11 Thread Anoop Sam John
Asaf, You have created a wrapper around the original InternalScanner instance created by the compaction flow? Where do the KV generated during the compaction process queue up before being written to the disk? Is this buffer configurable? When I wrote the Region Observer my assumption

RE: Start key and End key in HBase

2013-02-03 Thread Anoop Sam John
Can you pls make the question clear? You mean use in Scan? -Anoop- From: raviprasa...@polarisft.com [raviprasa...@polarisft.com] Sent: Monday, February 04, 2013 10:11 AM To: user@hbase.apache.org Subject: Start key and End key in HBase Hi all, Can

RE: Start key and End key in HBase

2013-02-03 Thread Anoop Sam John
When you do a scan with out specifying any start/end keys, it is full table scan. The scan from client side will go through all the regions one after the other. But when you know the rowkey range that you want to scan you can specify that using start/end keys. This time client will evaluate

RE: HBase Checksum

2013-01-31 Thread Anoop Sam John
You can check with HDFS level logs whether the checksum meta file is getting read to the DFS client? In the HBase handled checksum, this should not happen. Have you noticed any perf gain when you configure the HBase handled checksum option? -Anoop- From:

RE: HBase Checksum

2013-01-31 Thread Anoop Sam John
shortcircuit but I sure didn't see any performance increase. I haven't tried enabling hbase checksum yet but I'd like to be able to verify that works too. On Thu, Jan 31, 2013 at 9:55 PM, Anoop Sam John anoo...@huawei.com wrote: You can check with HDFS level logs whether the checksum meta file

RE: Pagination with HBase - getting previous page of data

2013-01-30 Thread Anoop Sam John
@Anil I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results = page size( my guess: due to multiple RS scans)? If you have used it then maybe you can explain the behaviour? Scan from client side never go to multiple RS in parallel. Scan

RE: Pagination with HBase - getting previous page of data

2013-01-30 Thread Anoop Sam John
100 rows from the 2nd region is using extra time and resources. Why not ask for only the number of missing lines? JM 2013/1/30, Anoop Sam John anoo...@huawei.com: @Anil I could not understand that why it goes to multiple regionservers in parallel. Why it cannot guarantee results = page size( my

RE: Find the tablename in Observer

2013-01-28 Thread Anoop Sam John
Will the CoprocessorEnvironment reference in the start() method be instanceof RegionCoprocessorEnvironment too No. It will be reference of RegionEnvironment . This is not a public class so you wont be able to do the casting. As I read your need, you want to get the table name just once and

RE: Find the tablename in Observer

2013-01-28 Thread Anoop Sam John
Oh sorry... Not checked the interface... We were doing in postOpen()... Thaks Gary for correcting me...:) -Anoop- From: Gary Helmling [ghelml...@gmail.com] Sent: Tuesday, January 29, 2013 11:29 AM To: user@hbase.apache.org Subject: Re: Find the tablename

RE: paging results filter

2013-01-24 Thread Anoop Sam John
@Toby You mean to say that you need a mechanism for directly jumping to a page. Say you are in page#1 (1-20) now and you want to jump to page#4(61-80).. Yes this is not there in PageFilter... The normal way of next page , next page will work fine as within the server the next() calls on the

RE: Region server Memory Use is double the -Xmx setting

2013-01-23 Thread Anoop Sam John
Are you using compression for HFiles? Yes we are using MaxDirectMemorySize and we dont use off-heap cache. -Anoop- From: Buckley,Ron [buckl...@oclc.org] Sent: Wednesday, January 23, 2013 8:49 PM To: user@hbase.apache.org Subject: RE: Region server

RE: HBase split policy

2013-01-22 Thread Anoop Sam John
Jean good topic. When a region splits it is the HFile(s) split happening. You know HFile logically split into n HFileBlocks and we will be having index meta data for these blocks at every HFile level. HBase will find the midkey from these block index data. It will take the mid block as the

RE: ResultCode.NEXT_ROW and scans with batching enabled

2013-01-22 Thread Anoop Sam John
Hi, In a scan, when a filter's filterKeyValue method returns ReturnCode.NEXT_ROW - does it actually skip to the next row or just the next batch It will go to the new row. In HBase 0.92 hasFilterRow has not been overridden for certain filters which effectively do filter out rows

RE: HBase split policy

2013-01-22 Thread Anoop Sam John
What will trigger the split? The things which can trigger a split 1. Explicit split call from the client side using admin API 2. A memstore flush 3. A compaction So even though there is no write operations happening on the region (no flushes) still a compaction performed for that region can

RE: Custom Filter and SEEK_NEXT_USING_HINT issue

2013-01-21 Thread Anoop Sam John
I suppose if scanning process has started at once on all regions, then I would find in log files at least one value per region, but I have found one value per region only for those regions, that resides before the particular one. @Eugeny - FuzzyFilter like any other filter works at the server

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
Austin, You are using HFileOutputFormat or TableOutputFormat? -Anoop- From: Austin Chungath [austi...@gmail.com] Sent: Monday, January 21, 2013 11:15 AM To: user@hbase.apache.org Subject: Re: Loading data, hbase slower than Hive? Thank you Tariq.

RE: Hbase Mapreduce- Problem in using arrayList of pust in MapFunction

2013-01-20 Thread Anoop Sam John
And also how can I use autoflush bufferclientside in Map function for inserting data to Hbase Table ? You are using TableOutputFormat right? Here autoFlush is turned OFF ... You can use config param hbase.client.write.buffer to set the client side buffer size. -Anoop-

RE: Loading data, hbase slower than Hive?

2013-01-20 Thread Anoop Sam John
solutions and let you guys know what the problem was. I might have to rethink the Rowkey design. Regards, Austin. On Mon, Jan 21, 2013 at 11:24 AM, Anoop Sam John anoo...@huawei.com wrote: Austin, You are using HFileOutputFormat or TableOutputFormat? -Anoop

RE: ValueFilter and VERSIONS

2013-01-17 Thread Anoop Sam John
Can you make use of SingleColumnValueFilter. In this you can specify whether the condition to be checked only on the latest version or not. SCVF#setLatestVersionOnly ( true) -Anoop- From: Li, Min [m...@microstrategy.com] Sent: Friday, January 18, 2013

RE: ValueFilter and VERSIONS

2013-01-17 Thread Anoop Sam John
can't identify the qualifier. Thanks, Min -Original Message- From: Anoop Sam John [mailto:anoo...@huawei.com] Sent: Friday, January 18, 2013 2:28 PM To: user@hbase.apache.org Subject: RE: ValueFilter and VERSIONS Can you make use of SingleColumnValueFilter. In this you can specify whether

RE: Hbase as mongodb

2013-01-16 Thread Anoop Sam John
Such as I can directly say Mongodb to get me all the objects having timestamp value of xxx date where timestamp is a field in Json objects stored in Mongodb It is possible to store any data in HBase which can be converted into byte[]. Yes using filters one can perform above kind of query. There

RE: Hbase as mongodb

2013-01-16 Thread Anoop Sam John
library for converting Java Object to JSON String and eventually to byte[] and vice-versa; but that is not scan/query friendly, so we integrated Apache Solr to the stack to get that done. http://smart-cms.org Thank you, Imran On Wed, Jan 16, 2013 at 7:27 PM, Anoop Sam John anoo...@huawei.com

RE: Coprocessor / threading model

2013-01-15 Thread Anoop Sam John
Thanks Andrew. A detailed and useful reply Nothing more needed to explain the anti pattern.. :) -Anoop- From: Andrew Purtell [apurt...@apache.org] Sent: Wednesday, January 16, 2013 12:50 AM To: user@hbase.apache.org Subject: Re: Coprocessor /

RE: Maximizing throughput

2013-01-10 Thread Anoop Sam John
Hi You mind telling the configs that you changed and set? BTW which version of HBase you are using? -Anoop- From: Bryan Keller [brya...@gmail.com] Sent: Friday, January 11, 2013 10:01 AM To: user@hbase.apache.org Subject: Maximizing throughput I

RE: HBase - Secondary Index

2013-01-08 Thread Anoop Sam John
... On Jan 7, 2013, at 7:49 AM, Anoop Sam John anoo...@huawei.com wrote: Hi, It is inverted index based on column(s) value(s) It will be region wise indexing. Can work when some one knows the rowkey range or NOT. -Anoop- From: Mohit Anchlia [mohitanch

RE: HBase - Secondary Index

2013-01-07 Thread Anoop Sam John
@hbase.apache.org Subject: Re: HBase - Secondary Index Hi Anoop, Am I correct in understanding that this indexing mechanism is only applicable when you know the row key? It's not an inverted index truly based on the column value. Mohit On Sun, Jan 6, 2013 at 7:48 PM, Anoop Sam John anoo...@huawei.com wrote

RE: HBase - Secondary Index

2013-01-06 Thread Anoop Sam John
a drawback of any index approach. Thanks for the explanation. Shengjie On 28 December 2012 04:14, Anoop Sam John anoo...@huawei.com wrote: Do you have link to that presentation? http://hbtc2012.hadooper.cn/subject/track4TedYu4.pdf -Anoop- From

RE: responsetooslow from regionserver

2013-01-04 Thread Anoop Sam John
This logs warns that the operation at the region server side is taking too much time... This is not an error... Pls check your cluster. You have hot spotting ? Also can check the GC logs at that server side... -Anoop- From: hua beatls

RE: which API is to get table meta data in hbase

2012-12-27 Thread Anoop Sam John
But I say, there need some meta data which record how many row number in the give table, I say , hbase has this meta data, is it, And which API is to get it, and how to use API, There is no such meta data for a table. You can check whether you can do this work on your own using co

RE: HBase - Secondary Index

2012-12-27 Thread Anoop Sam John
. This is better option in our case of handling the scan index usage also at sever side. There is no index data fetch to client side.. What happens when regions get splitted ? do you update the startkey on the index table? -Shengjie On 14 December 2012 08:54, Anoop Sam John anoo...@huawei.com wrote

RE: HBase - Secondary Index

2012-12-27 Thread Anoop Sam John
Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie will be a huge bonus here. There the seeks will be flying.. [Ted also

RE: HBase - Secondary Index

2012-12-27 Thread Anoop Sam John
On Thu, Dec 27, 2012 at 7:33 PM, Anoop Sam John anoo...@huawei.com wrote: Yes as you say when the no of rows to be returned is becoming more and more the latency will be becoming more. seeks within an HFile block is some what expensive op now. (Not much but still) The new encoding prefix trie

RE: how to use API to statistic how many message has been store in the table in hbase

2012-12-26 Thread Anoop Sam John
So you want to know the no# of rows in a table? Have a look at AggregationClient#rowCount() -Anoop- From: tgh [guanhua.t...@ia.ac.cn] Sent: Thursday, December 27, 2012 7:51 AM To: user@hbase.apache.org Subject: how to use API to statistic how many message

RE: HBase - Secondary Index

2012-12-19 Thread Anoop Sam John
- Secondary Index Very cool design. Just curious, for the index did you write something custom or using an existing library like Lucene? -David On 12/4/12 3:10 AM, Anoop Sam John wrote: Hi All Last week I got a chance to present the secondary indexing solution what we have done

RE: MR missing lines

2012-12-19 Thread Anoop Sam John
deleteColumn was doing. JM 2012/12/19, Anoop Sam John anoo...@huawei.com: Jean: just one thought after seeing the description and the code.. Not related to the missing as such You want to delete the row fully right? My table is only one CF with one C with one version And your code is like

RE: HBase - Secondary Index

2012-12-18 Thread Anoop Sam John
: HBase - Secondary Index Hi Anoop, Please find my reply inline. Thanks, Anil Gupta On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John anoo...@huawei.com wrote: Hi Anil During the scan, there is no need to fetch any index data to client side. So there is no need to create any

RE: HBase - Secondary Index

2012-12-18 Thread Anoop Sam John
. Thanks, Anil Gupta On Sun, Dec 16, 2012 at 8:02 PM, Anoop Sam John anoo...@huawei.com wrote: Hi Anil During the scan, there is no need to fetch any index data to client side. So there is no need to create any scanner on the index table at the client side. This happens

RE: MR missing lines

2012-12-18 Thread Anoop Sam John
Jean: just one thought after seeing the description and the code.. Not related to the missing as such You want to delete the row fully right? My table is only one CF with one C with one version And your code is like Delete delete_entry_proposed = new Delete(key);

RE: HBase - Secondary Index

2012-12-16 Thread Anoop Sam John
prefix. Hope u got it now :) -Anoop- From: anil gupta [anilgupt...@gmail.com] Sent: Friday, December 14, 2012 11:31 PM To: user@hbase.apache.org Subject: Re: HBase - Secondary Index On Fri, Dec 14, 2012 at 12:54 AM, Anoop Sam John anoo...@huawei.com wrote

RE: Re:Re: Counter and Coprocessor Musing

2012-12-11 Thread Anoop Sam John
Agree with Azury Ted : He mentions some thing different than HBASE-5982. If the count of the rows maintained in another meta table, then getting the rows count from that will be much faster than the AggregateImplementation getRowNum I think. Specific to the use case some one can make this using

RE: Heterogeneous cluster

2012-12-10 Thread Anoop Sam John
But if the job is running there, it can also be considered as running locally, right? Or will it always be retrieved from the datanode linked to the RS hosting the region we are dealing with? Not sure I'm clear :( Hi Jean, Sorry I have not seen the history of this mailing thread.

RE: .META. region server DDOSed by too many clients

2012-12-05 Thread Anoop Sam John
is the META table cached just like other tables Yes Varun I think so. -Anoop- From: Varun Sharma [va...@pinterest.com] Sent: Thursday, December 06, 2012 6:10 AM To: user@hbase.apache.org; lars hofhansl Subject: Re: .META. region server DDOSed by too

RE: Reg:delete performance on HBase table

2012-12-05 Thread Anoop Sam John
Hi Manoj If I read you correctly, I think you want to aggregate some 3,4 days of data and those data you want to get deleted. Can you think of creating tables for this period (one table for 4 days) and aggregate and drop the table? Then for the next 4 days another table? Or another

HBase - Secondary Index

2012-12-04 Thread Anoop Sam John
Hi All Last week I got a chance to present the secondary indexing solution what we have done in Huawei at the China Hadoop Conference. You can see the presentation from http://hbtc2012.hadooper.cn/subject/track4Anoop%20Sam%20John2.pdf I would like to hear what others think on

RE: Data Locality, HBase? Or Hadoop?

2012-12-03 Thread Anoop Sam John
I think all is clear now.. Just to conclude, the data locality is feature provided by HDFS. When DFS client writes some data, hadoop will try to maintain the data locality. HBase region server writes and reads data via the DFS client which is in the same process as that of the RS. When the

RE: Long row + column keys

2012-12-03 Thread Anoop Sam John
Hi Varun It looks to be very clear that you need to use some sort of encoding scheme. PrefixDeltaEncoding would be fine may be.. You can see the other algos also like the FastDiff... and see how much space it can save in your case. Also suggest you can use the encoding for

RE: Changing column family in hbase

2012-11-28 Thread Anoop Sam John
If you are having data in the current table schema? You want some how to move the data to new CF? If yes I dont think it is possible. Some similar question was asked in the mailing list today. Is your scenario also same? -Anoop- From:

RE: Aggregation while Bulk Loading into HBase

2012-11-28 Thread Anoop Sam John
Hi, Looks like you do not want more than one table instance in Mapper. On one table instance you want a Get before doing the Put. See TableOutputFormat and try changing the code to implement your req and use this custom output format. -Anoop-

RE: Regarding rework in changing column family

2012-11-27 Thread Anoop Sam John
Also what about the current data in the table. Now all are under the single CF. Modifying the table with addition of a new CF will not move data to the new family! Remember HBase only deals with CF at the table schema level. There is no qualifiers in the schema as such. When data is

RE: Hbase Region Split not working with JAVA API

2012-11-15 Thread Anoop Sam John
Pls give your used command for Put as well as the java code for put. -Anoop- From: msmdhussain [msmdhuss...@gmail.com] Sent: Thursday, November 15, 2012 2:13 PM To: user@hbase.apache.org Subject: Hbase Region Split not working with JAVA API Hi, I

RE: Column Family level bloom filters

2012-11-05 Thread Anoop Sam John
about column family level bloom filters You mean column blooms right? [Bloom on rowkey cf+qualifier] Are these filters in memory or are they just persisted as part of the HFile or both All blooms will get persisted while writing the HFile. When the HFile is opened for read the bloom info will

RE: Bulk Loading - LoadIncrementalHFiles

2012-11-01 Thread Anoop Sam John
Hi Yes while doing the bulk load the table can be presplit. It will have the same number of reducers as that of the region. One per region. Each HFile that the reducer generates will be having a max size of HFile max size configuration. You can see that while bulk loading also there will

RE: Filters for hbase scans require reboot.

2012-11-01 Thread Anoop Sam John
Yes Jonathan as of now we need a reboot.. Take a look at HBASE-1936. This is not completed. You can give your thoughts there and have a look at the patch/discussion... -Anoop- From: Jonathan Bishop [jbishop@gmail.com] Sent: Friday, November 02,

RE: Best technique for doing lookup with Secondary Index

2012-10-25 Thread Anoop Sam John
. Scan the secondary table by using prefix filter and startRow. How is the startRow determined for every query ? Regards Ram -Original Message- From: Anoop Sam John [mailto:anoo...@huawei.com] Sent: Thursday, October 25, 2012 10:15 AM To: user@hbase.apache.org Subject: RE: Best

RE: Hbase import Tsv performance (slow import)

2012-10-25 Thread Anoop Sam John
As per Anoop and Ram, WAL is not used with bulk loading so turning off WAL wont have any impact on performance. This is if HFileOutputFormat is being used.. There is a TableOutputFormat which also can be used as the OutputFormat for MR.. Here write to wal is applicable This one, instead of

RE: problem with fliter in scan

2012-10-25 Thread Anoop Sam John
Use SingleColumnValueFilter#filterIfMissing(true) s.setBatch(10); How many total columns in the Schema? When using the SingleColumnValueFilter setBatch() might not work ou always.. FYI -Anoop- From: jian fan [xiaofanhb...@gmail.com] Sent: Friday,

RE: Best technique for doing lookup with Secondary Index

2012-10-25 Thread Anoop Sam John
is rowkey B here? 1. Scan the secondary table by using prefix filter and startRow. How is the startRow determined for every query ? Regards Ram -Original Message- From: Anoop Sam John [mailto:anoo...@huawei.com] Sent: Thursday, October 25, 2012 10:15 AM To: user

RE: Best technique for doing lookup with Secondary Index

2012-10-24 Thread Anoop Sam John
I build the secondary table B using a prePut RegionObserver. Anil, In prePut hook u call HTable#put()? Why use the network calls from server side here then? can not handle it from client alone? You can have a look at Lily project. Thoughts after seeing ur idea on put and scan..

RE: repetita iuvant?

2012-10-24 Thread Anoop Sam John
Hi Can you tell more details? How much data your scan is going to retrieve? What is the time taken in each attempt ? Can you observe the cache hit ratio? What is the memory avail in RS?.Also the cluster details and regions -Anoop- From: surfer

RE: A question of storage structure for memstore?

2012-10-22 Thread Anoop Sam John
To be precise there will be one memstore per family per region.. If table having 2 CFs and there are 10 regions for that table then totally 2*10=20 memstores.. -Anoop- From: Kevin O'dell [kevin.od...@cloudera.com] Sent: Monday, October 22, 2012 5:55 PM

RE: HRegionInfo returns empty values.

2012-10-19 Thread Anoop Sam John
Actually how many regions in your table? Only one region? In that case it will be having startkey and endkey as empty.. So your case what it prints looks to be correct. -Anoop- From: Henry JunYoung KIM [henry.jy...@gmail.com] Sent: Friday, October 19,

RE: Coprocessor end point vs MapReduce?

2012-10-18 Thread Anoop Sam John
A CP and Endpoints operates at a region level.. Any operation within one region we can perform using this.. I have seen in below use case that along with the delete there was a need for inserting data to some other table also.. Also this was kind of a periodic action.. I really doubt how the

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Anoop Sam John
hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(className, new Path(hdfs://hbasecluster/tmp/hbase_cdh4.jar), Coprocessor.PRIORITY_USER,map); Anil, Don't you have to modify the table calling Admin API?? ! Not seeing that code here... -Anoop-

RE: Unable to add co-processor to table through HBase api

2012-10-18 Thread Anoop Sam John
what you mean by have to modify the table calling Admin API??. Am i missing some other calls in my code? Thanks, Anil Gupta On Thu, Oct 18, 2012 at 9:43 PM, Anoop Sam John anoo...@huawei.com wrote: hAdmin.getTableDescriptor(Bytes.toBytes(tableName)).addCoprocessor(cla

RE: Where is code in hbase that physically delete a record?

2012-10-17 Thread Anoop Sam John
You can see the code in ScanQueryMatcher Basically in major compact a scan will be happening scanning all the files... As per the delete markers, the deleted KVs wont come out of the scanner and thus gets eliminated. Also in case of major compact the delete markers itself will get deleted (

  1   2   >