Re: performance problem during read

2011-07-16 Thread Ted Yu
Yes. On Saturday, July 16, 2011, Mingjian Deng wrote: > Do you mean I need open a new issue? > > 2011/7/16 Stack > >> Yes.  Please file an issue.  A few fellas are messing with block cache >> at the moment so they might be up for taking a detour to figure the >> why on your interesting observati

Re: performance problem during read

2011-07-16 Thread Mingjian Deng
Do you mean I need open a new issue? 2011/7/16 Stack > Yes. Please file an issue. A few fellas are messing with block cache > at the moment so they might be up for taking a detour to figure the > why on your interesting observation. > > Thanks, > St.Ack > > On Thu, Jul 14, 2011 at 8:41 PM, Min

Re: Hash indexing of HFiles

2011-07-16 Thread Eric Charles
On 16/07/11 14:11, Michel Segel wrote: Eric, It depends... 1) does it work well enough? 2) I'm a contractor so it's not my call. It's up to my client. Having said that, I think that if our tests go well, it might get out to git hub. At least that's our plan. But there's definitely more work to

Re: hbase table as a queue.

2011-07-16 Thread Stack
Yes. I should have mentioned this. Thanks Ted. On Jul 16, 2011, at 17:52, Ted Dunning wrote: > Up to a pretty high transaction rate, you can simply use Zookeeper, > especially if you check out a block of tasks at once. > > With blocks of 100-1000, you should be able to handle a million even

Re: hbase table as a queue.

2011-07-16 Thread Ted Dunning
Up to a pretty high transaction rate, you can simply use Zookeeper, especially if you check out a block of tasks at once. With blocks of 100-1000, you should be able to handle a million events per second with very simple ZK data structures. On Sat, Jul 16, 2011 at 1:24 PM, Stack wrote: > Do not

Re: compressions and security

2011-07-16 Thread Stack
What Doug said but also, if cells are of a size and/or data type that is compressible (no point compressing small stuff, same for trying to compress images types), then there are big wins all around if you have the application compress the cells before putting them into hbase; network traffic is le

Re: User of FilterList

2011-07-16 Thread Jack Levin
Be mindful that if you are using a scanner with filters, RowKey remains the index of the table, and that filter just filters your results based on how you run your scanner, similarly to "cat file | grep filter", where if "file" is your table and has many lines (rows), your scan might be very ineffi

Re: hbase table as a queue.

2011-07-16 Thread Stack
I learned friday that our fellas on the frontend are using an hbase table to do simple queuing. They insert stuff to be processed by distributed processes and when processes are done with the work, they'll remove the processed element from the hbase table. They are queuing, processing, and remov

Re: Hash indexing of HFiles

2011-07-16 Thread Stack
On Fri, Jul 15, 2011 at 10:06 AM, Claudio Martella wrote: > On 7/15/11 6:24 PM, Stack wrote: >> How do you figure the N in the below Claudio? > N is the total amount of pairs in the sequence file. You know that when > you finish flushing a memstore or compacting files. So a perfect index? If thi

Re: Hash indexing of HFiles

2011-07-16 Thread Michel Segel
Eric, It depends... 1) does it work well enough? 2) I'm a contractor so it's not my call. It's up to my client. Having said that, I think that if our tests go well, it might get out to git hub. At least that's our plan. But there's definitely more work to be done in terms of testing and tweaking

Re: User of FilterList

2011-07-16 Thread Arun Sanjay J
Yes. It works fine. Thx for your response. I was just skeptical about whether it would work. The javadoc of FilterList as well confirms the same, " Since you can use Filter Lists as children of Filter Lists, you can create a hierarchy of filters to be evaluated.". Sanjay On Fri, Jul 15, 2011 at 3

Re: hbase table as a queue.

2011-07-16 Thread Jack Levin
One thing I need to point out, is that we do not need the Queue items to be worked on in order, so there is no traditional head and tail of the Queue. The Queue table is simply a set of work orders that can be fetched randomly or by applying a scan for a particular set of Rows that can even come o

hbase table as a queue.

2011-07-16 Thread Jack Levin
Hello, we are thinking about using Hbase table as a simple queue which will dispatch the work for a mapreduce job, as well as real time fetching of data to present to end user. In simple terms, suppose you had a data source table and a queue table. The queue table has a smaller set of Rows that p

Re: php to thrift vs java api

2011-07-16 Thread Jack Levin
Yes, we are using the latest .so, but unfortunately it does not make any difference, I think this is just a matter of the language, PHP is stateless, where Java runs as servlet inside the JVM with hot Jars; With PHP, even if IO to thrift is not an issue itself, given the task say merge join two arr

Re: designing date range schema

2011-07-16 Thread Doug Meil
Hi there- I just submitted a patch to the book here... https://issues.apache.org/jira/browse/HBASE-4110 You can see the contents in the patch. On 7/15/11 3:48 PM, "large data" wrote: >thank Doug! > >Writing to hbase would be driven by asyn events (rather than M/R jobs) >fired >on user activ

Re: compressions and security

2011-07-16 Thread Doug Meil
Hi there, see this in the book: http://hbase.apache.org/book.html#compression And this... http://hbase.apache.org/book.html#trouble.client.longpauseswithcompression And see this thread which was the original discussion on the long client pause entry. http://search-hadoop.com/m/WUnLM6ojHm1/Lon

compressions and security

2011-07-16 Thread Sam Seigal
Hi All, A quick question on compression. I saw that HBase can use LZO compression for storing data into the HFile. Has anyone done experiments with using compressions at the application level instead instead of letting HBase handle it ? Are there advantages/disadvantages of this approach ? Is it

Re: Hash indexing of HFiles

2011-07-16 Thread Eric Charles
On 15/07/11 16:48, Michael Segel wrote: Claudio, I'm not sure on how to answer this... Yes, we've got a prototype of a Lucene on HBase w Spatial that we're starting to test. That's Cool Michael. Is there a chance to read more on your prototype ? ;) With respect to hashing... In one pro

Re: HBase backup and outage scenarios in practice?

2011-07-16 Thread Eric Charles
On 15/07/11 16:39, Michael Segel wrote: I should clarify. 4 months ago is 'old' with respect to the rate of change within Hadoop/HBase. (Its a scary good thing.) Ted is correct that I'm looking at MapRTech's ability to do fast snap shots. Its important to note that while the Apache stack has