How to use avro with HBase now?

2013-02-01 Thread Andrey Kouznetsov
Hello HBASE users! My project is integrating with another projects using Avro. The project started using HBase and it would be useful to use some HBase Avro API but according to HBASE-6653[https://issues.apache.org/jira/browse/HBASE-6553] avro gateway support has been removed in HBase 0.96.0.

Re: How to use avro with HBase now?

2013-02-01 Thread Nicolas Liochon
Hi, IIRC, it's still there on 0.94. 0.96 is not yet released, it's still in dev, so 0.94 is anyway the version to use. HBASE-6553 contains the patch to revert if you want to build your own 0.96 version with Avro. From the mail archive, the reasons for deprecate it then remote it were:

Hbase Read/Write throughput measure

2013-02-01 Thread Dalia Sobhy
Dear all, I want to measure the read/write throughput for a code on a cluster of 10 nodes. So is there any code or way to measure it? I have seen in a cloudera-based presentation that hbase read/write throughput = millions queries per second. So any help please?? Thanks Best Regards,

Re: HBase Checksum

2013-02-01 Thread Jean-Marc Spaggiari
Hi Robert, That's perfectly fine, it was my next question ;) Anoop, I saw a 5% performance increase by activating HBase Checksum. Can I disable it again to retry the baseline and see the difference? Or now that it's there, it's to late? Also, regarding BlockReaderLocal, I don't find that in my

Re: Hbase Read/Write throughput measure

2013-02-01 Thread Mohammad Tariq
Hello Dalia, I think the easiest way to measure the read/write throughput is to use PerformanceEvaluation tool that comes with the Hbase distribution. It spawns a map-reduce job to do the reads/writes in parallel. Apart from this there are several other ways to benchmark your Hbase

Re: Parallel scan in HBase

2013-02-01 Thread Mohammad Tariq
Hello Farrokh, Scans work sequentially with one region after the other. Scans from client side do not go to regionservers in parallel. And, for the second question, the code will run at the client side. Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Feb 1, 2013

Re: Parallel scan in HBase

2013-02-01 Thread Alexander Ignatov
You could use Coprocessors framework. To do that you have to implement your own Coprocessors's module and include it to each RegionServers. Here is an introduction article how to use Coprocessors: https://blogs.apache.org/hbase/entry/coprocessor_introduction -- Regards, Alexander Ignatov On

Re: Parallel scan in HBase

2013-02-01 Thread Jean-Marc Spaggiari
MR job is almost doing that. The map methode is called for each row, and you can have multiple jobs running at the same time. It's the way the rowcounter is working. Scanning every row to count it, but spreading the work over all the nodes... Give it a look. JM 2013/2/1, Alexander Ignatov

Re: Parallel scan in HBase

2013-02-01 Thread lars hofhansl
The scan contract in HBase is that all rows are returned in order, so all regions have to be traversed in order as well. It would be nice to add some facility to HBase to performs the scanning in parallel. From: Farrokh Shahriari mohandes.zebeleh...@gmail.com

Re: Parallel scan in HBase

2013-02-01 Thread Mohammad Tariq
Do you need to scan each n every row within that range?Or you need specific rows based on some filter? Warm Regards, Tariq https://mtariq.jux.com/ cloudfront.blogspot.com On Fri, Feb 1, 2013 at 9:16 PM, lars hofhansl la...@apache.org wrote: The scan contract in HBase is that all rows are

Re: Announcing Phoenix: A SQL layer over HBase

2013-02-01 Thread James Taylor
Thanks, everyone, and sorry to keep you waiting :-) We're using Phoenix in lots of different use cases and product areas at Salesforce: Product Metrics Data Archival Server Metrics and Monitoring Trending over time-series data Reporting I'll go into more detail on a future post at my blog:

Re: Parallel scan in HBase

2013-02-01 Thread James Taylor
If you run a SQL query that does aggregation (i.e. uses a built-in aggregation function like COUNT or does a GROUP BY), Phoenix will orchestrate the running of a set of queries in parallel, segmented along your row key (driven by the start/stop key plus region boundaries). We take advantage of

Re: How to set proxy excludes on http component?

2013-02-01 Thread Christian Schäfer
 Sorry...wrong mailing list :/ Von: Christian Schäfer syrious3...@yahoo.de An: user@hbase.apache.org user@hbase.apache.org Gesendet: 13:39 Freitag, 1.Februar 2013 Betreff: How to set proxy excludes on http component? Hello there, just wonder how to set

How to use avro with HBase now?

2013-02-01 Thread Andrey Kouznetsov
Hello HBASE users! My project is integrating with another projects using Avro. The project started using HBase and it would be useful to use some HBase Avro API but according to HBASE-6653[https://issues.apache.org/jira/browse/HBASE-6553] avro gateway support has been removed in HBase 0.96.0.

Re: How to use avro with HBase now?

2013-02-01 Thread Andrew Purtell
We removed the Avro gateway because the implementation as contributed was a work in progress that was not subsequently maintained. You don't want it anyway, you should build your own or consider Kiji: http://www.kiji.org/ On Fri, Feb 1, 2013 at 12:28 AM, Andrey Kouznetsov

Meetup in March in San Francisco. Any preference for 3/12 or 3/13 or 3/14?

2013-02-01 Thread Stack
Any preference for date? You all good w/ it? AdRoll have kindly offered to host. If you want to talk anything hbasey, write me off list. Thanks, St.Ack

How to set proxy excludes on http component?

2013-02-01 Thread Christian Schäfer
Hello there, just wonder how to set proxy excludes on http component asI didn't find any note in the docs. Proxy excludes that are globally set (e.g. System.setProperty(http.nonProxyHosts, localhost) are ignored by the http camel component. Any suggestions?  regards Christian

Re: HBase Checksum

2013-02-01 Thread Robert Dyer
Yes that log is a debug level log, as I saw in the source. But I too enabled DEBUG and still never saw that log message. But I, unlike you, see absolutely no change in performance. One test I did however that makes me think it is actually enabled: if I submit from another user I start getting

Re: HBase Checksum

2013-02-01 Thread Jean-Marc Spaggiari
I have done the major compaction just to be sure. From what I understand, Checksums are not there if this is not activated... So I think files need to be re-write to have those checkums added. I will still try to find a way to see that from the logs. Worst case, I will add some logs directly

Re: HBase Checksum

2013-02-01 Thread lars hofhansl
Doing HBase level checksums (as opposed to HDFS level) will mostly yield results for random gets. Scans (like rowcounting and similar) will probably see a negligible improvement. In HDFS a block and its checksum are stored in different local files on each datanode. So loading a block requires 2

Re: HBase Checksum

2013-02-01 Thread Jean-Marc Spaggiari
Thanks for the clarification Lars. Is there any UI or specify startup log we can check to validate that it's activated? If not, will it be nice to have something like that? 2013/2/1, lars hofhansl la...@apache.org: Doing HBase level checksums (as opposed to HDFS level) will mostly yield

Re: Meetup in March in San Francisco. Any preference for 3/12 or 3/13 or 3/14?

2013-02-01 Thread Jean-Daniel Cryans
Seems a bit close to the other meetup on 02/28 in the South Bay but maybe because it's in SF it's ok. No personal preference since I'll be on the other side of the pond. J-D On Fri, Feb 1, 2013 at 11:06 AM, Stack st...@duboce.net wrote: Any preference for date? You all good w/ it? AdRoll

Re: Meetup in March in San Francisco. Any preference for 3/12 or 3/13 or 3/14?

2013-02-01 Thread Andrew Purtell
Same here, I'll be in Asia. On Fri, Feb 1, 2013 at 1:14 PM, Jean-Daniel Cryans jdcry...@apache.orgwrote: Seems a bit close to the other meetup on 02/28 in the South Bay but maybe because it's in SF it's ok. No personal preference since I'll be on the other side of the pond. J-D On Fri,

Re: Hbase Read/Write throughput measure

2013-02-01 Thread Dalia Sobhy
Dear Mohamed, I checked this link: http://wiki.apache.org/hadoop/Hbase/PerformanceEvaluation But how to use the commands there is no clue, note that I am using cloudera manager. The other yahoo link I couldn't understand how to use it. So any help please?? Sent from my iPad On Feb 1, 2013,

Re: Hbase Read/Write throughput measure

2013-02-01 Thread Jean-Marc Spaggiari
When you will have figure how the command is working, you will have to understand the output ;) This is how I just tried it from my HBase directory: bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation randomRead 10 And this is the output (last lines): 1006632656314 10171183

Re: HBase Checksum

2013-02-01 Thread lars hofhansl
Agreed. One should be able to monitor these things. Mind filing a jira describing your experience? From: Jean-Marc Spaggiari jean-m...@spaggiari.org To: user@hbase.apache.org; lars hofhansl la...@apache.org Sent: Friday, February 1, 2013 1:09 PM Subject: Re:

Re: Parallel scan in HBase

2013-02-01 Thread Farrokh Shahriari
Thank you guys, @Mohammad : Yeah I should retreice all the rows and compare each of them to a specific value. As I understand that Hbase by default doesn't support parallel scan,but I can implement it by my own through Coprocessors knowing the start/end row key on each region, am I correct ?