Re: HBase tunning

2012-10-05 Thread Amandeep Khurana
Mohit Getting the maximum performance out of HBase isn't just about tuning the cluster. There are several other factors to take into account. The two most important being: 1. Most important factor being the schema design 2. How you are using the APIs Starting with the default configs is okay. A

RE: Lucene instead of HFiles?

2012-10-05 Thread Fuad Efendi
If you don't like HFiles, and prefer Solr instead, consider Map. It is very nice... : - ) What about EhCache? Still synchronized?. use LinkedHashMap.. You just need "inverted table" for a search by secondary index, and you are comparing Lucene with HTable... wow... everything depen

Re: Lucene instead of HFiles?

2012-10-05 Thread Otis Gospodnetic
Hi Lars, Yeah, maybe. Somewhere in the back of my head was a completely fuzzy idea that if one were to sneak in Lucene at that low level one could get that full-text search over HBase data that comes up periodically. Also, I was thinking, having Lucene down there could make it possible to get ad-

RE: Lucene instead of HFiles?

2012-10-05 Thread Fuad Efendi
Lucene sucks with traditional "secondary indices" for traditional tables... engineering overhead, too much... and you indeed already have kind of "secondary indices" with HFile and Bloom Filter structure... just design "secondary" Bloom filters etc... Yes, Lucene/Solr already implement this fu

Re: Lucene instead of HFiles?

2012-10-05 Thread Otis Gospodnetic
Hi Renaud, On Fri, Oct 5, 2012 at 4:48 AM, Renaud Delbru wrote: > Hi, > > With respect to point 3, I know there is a new codec in Lucene 4.0 for > append-only filesystem such as hdfs (LUCENE-2373) Yeah. Though I think nobody wants to search indices directly in HDFS for performance reasons. > A

Re: Lucene instead of HFiles?

2012-10-05 Thread Otis Gospodnetic
Hi, On Fri, Oct 5, 2012 at 2:36 AM, Adrien Mogenet wrote: > "Don't bother trying this in production" ;-) > > 1. Are you sure lookup by key are faster ? No clue. But I also didn't say it's faster, just fast. :) > 2. Updating Lucene files in a lock-free maneer and ensuring good > concurrency can

Re: HBase tunning

2012-10-05 Thread Mohit Anchlia
I have a timeseries data and each row has upto 1000 cols. I just started with defaults and I have not tuned any parameters on client or server. My reads are reading all the cols in a row. But request for a given row is completely random. On Fri, Oct 5, 2012 at 6:05 PM, Kevin O'dell wrote: > Mohit

Re: HBase tunning

2012-10-05 Thread Kevin O'dell
Mohit, Michael is right most parameters usually go one way or the other depending on what you are trying to accomplish. Memstore - raise for high write Blockcache - raise for high reads hbase blocksize - higher for sequential workload lower for random client caching - lower for really wide ro

Re: HBase tunning

2012-10-05 Thread Michael Segel
Depends. What sort of system are you tuning? Sorry, but we have to start somewhere and if we don't know what you have in terms of hardware, we don't have a good starting point. On Oct 5, 2012, at 7:47 PM, Mohit Anchlia wrote: > Do most people start out with default values and then tune HBase

Re: Issue with column-counting filters accepting multiple versions of a column

2012-10-05 Thread Andrew Olson
Jira filed: https://issues.apache.org/jira/browse/HBASE-6954

Re: bulk deletes

2012-10-05 Thread lars hofhansl
Does it work? :) How did you do the deletes before?I assume you used the HTable.delete(List) API? (Doesn't really help you, but) In 0.92+ you could hook up a coprocessor into the compactions and simply filter out any KVs you want to have removed. -- Lars F

Re: bulk deletes

2012-10-05 Thread Jacques
While I didn't spend a lot of time with your code, I believe your approach is sound. Depending on your consistency requirements, I would suggest you consider utilizing a coprocessor to handle the deletes. Coprocessors can intercept compaction scans. Then just shift your delete logic to be an add

bulk deletes

2012-10-05 Thread Paul Mackles
We need to do deletes pretty regularly and sometimes we could have hundreds of millions of cells to delete. TTLs won't work for us because we have a fair amount of bizlogic around the deletes. Given their current implemention (we are on 0.90.4), this delete process can take a really long time

Re: questions to append data always from table end!!!

2012-10-05 Thread Jean-Marc Spaggiari
Hi, HBase tables are sorted alphabetically. So to add to the end, just take the biggest key and increment the last byte But by doing so, your inserts are going to all go to the same region server until it moves to the next one and you will end with hotspotting one server, which will result in

questions to append data always from table end!!!

2012-10-05 Thread JUN YOUNG KIM
hi, hbase users. this question is about a row-key design pattern I believe. To append data always an end of table, which row-key structures are recommenable? multiple threads puts many*many data into table. in this condition, I want to be sure that all of data are going to append the end of tab

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Bejoy KS
Hi It is definitely possible. In your map make the dept name as the output key and salary as the value. In the reducer for every key you can initialize a counter and a sum. Add on to the sum for all values and increment the counter by 1 for each value. Output the dept key and the new aggregat

Re: Lucene instead of HFiles?

2012-10-05 Thread Jacques
Abstractly, isn't this what Elastic Search and Katta already are: range-sharded data stores based on top of Lucene? J On Thu, Oct 4, 2012 at 8:34 PM, Otis Gospodnetic wrote: > Hi, > > Has anyone attempted using Lucene instead of HFiles (see > https://twitter.com/otisg/status/254047978174701568

Re: Lucene instead of HFiles?

2012-10-05 Thread Michael Segel
Actually I think you'd want to do the reverse. Store your Lucene index in HBase. Which is what we did a while back. This could be extended to SOLR, but we never had time to do it. On Oct 5, 2012, at 4:11 AM, Lars George wrote: > Hi Otis, > > My initial reaction was, "interesting idea". On

Re: ways to make orders when it puts

2012-10-05 Thread Michael Segel
You need to be a bit more specific. Your design doesn't make any sense and you're now starting a separate thread on this topic... On Oct 4, 2012, at 8:58 PM, Henry JunYoung KIM wrote: > yes, this needs for our indexer for datas. > > I mean that hbase need to store some kinds of data list base

Re: Zookeeper error

2012-10-05 Thread Bharadwaj Yadati
Hi JM, Thanks for your reply . The problem is with the /etc/hosts file .After changing the entry 127.0.1.1 to local lan ip (eg : 192.168.2.40 its working for me. Thanks, Bharadwaj On Thu, Oct 4, 2012 at 6:29 PM, Jean-Marc Spaggiari wrote: > Hi Bharadwaj, > > Have you tried to connect to yo

Re: Hbase clustering

2012-10-05 Thread Sonal Goyal
Hi, Please check the instructions in the HBase guide: http://hbase.apache.org/book/standalone_dist.html Best Regards, Sonal Crux: Reporting for HBase Nube Technologies On Fri, Oct 5, 2012 a

Re: Lucene instead of HFiles?

2012-10-05 Thread Lars George
Hi Otis, My initial reaction was, "interesting idea". On second thoughts though I do not see how this makes more sense compared to what we have now. HFiles combined with Bloom filters are fast to look up anyways. Adding Lucene as another "Storage Engine" (getting us close to Voldemort or MySQL

Re: Lucene instead of HFiles?

2012-10-05 Thread Renaud Delbru
Hi, With respect to point 3, I know there is a new codec in Lucene 4.0 for append-only filesystem such as hdfs (LUCENE-2373) Also, it would also depend on the use case. At the moment, for storing data, I would expect HFile to be much more efficient in term of compression than Lucene file sys

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Bertrand Dechoux
> > .It takes time for big data.I heard map reduce > java code will b faster.IS it true???Or i should go for pig programming?? > I guess one important question is what do you mean by 'it takes time'. And what goal do you want to reach. It may be that your current implementation is naive and can be

Re: questions to append data always from table end!!!

2012-10-05 Thread yuzhihong
This would make the last region a hot spot. Why do you want this design ? Thanks On Oct 4, 2012, at 11:50 PM, Henry JunYoung KIM wrote: > hi, hbase users. > > this question is about a row-key design pattern I believe. > To append data always an end of table, which row-key structures are >

Re: Multiple Aggregate functions in map reduce program

2012-10-05 Thread Khang Pham
Hi, ideally you want to "scan" through data once and the the (sum,count). One simple solution is write your own map-reduce with key = department, value = new VectorWritable(vector); With vector is an array which array[0] = salary, array[1] = 1. In the reduce phase all you need is to do the aggr