PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-15 Thread Bradford Stephens
Greetings, On the heels of our smashing success last month, we're going to be convening the Pacific Northwest (Oregon and Washington) Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the 24th. The meeting should start at 6:45, organized chats will end around 8:00, and then there sh

Re: K-means clustering algorithm on HBase

2009-06-15 Thread Ryan Rawson
I'm sorry, we can't help you with designing your algorithm. But we can answer any specific questions you have about hbase-specific schema, interoperability between hbase and map-reduce and hbase operations. Good luck! -ryan On Mon, Jun 15, 2009 at 9:26 PM, Puri, Aseem wrote: > I know how to wor

RE: K-means clustering algorithm on HBase

2009-06-15 Thread Puri, Aseem
I know how to work with map reduce on HBase. But my requirement is to cluster similar data together. In the link http://cwiki.apache.org/MAHOUT/k-means.html they had given some initial starting but not in detail how to do. So if some know how to do then please tell me. Thanks & Regards Aseem Puri

Re: K-means clustering algorithm on HBase

2009-06-15 Thread Ryan Rawson
I'm not sure what that is, but you can scan the entirety of a table and into a map reduce. from there whatever you want to do? On Mon, Jun 15, 2009 at 9:15 PM, Puri, Aseem wrote: > Hi > >I want to use K-means algorithm for clustering of HBase row > data with help of map reduce. I don

Re: K-means clustering algorithm on HBase

2009-06-15 Thread Andrew Purtell
I haven't heard of anyone doing this on HBase specifically, but the ASF Mahout project is building mapreduce formulations of various clustering, recommender, and other related machine learning and data mining algorithms: http://lucene.apache.org/mahout/ Hope that helps, - Andy (a Mahout

K-means clustering algorithm on HBase

2009-06-15 Thread Puri, Aseem
Hi I want to use K-means algorithm for clustering of HBase row data with help of map reduce. I don't know it is possible or not. If anybody tried this thing then please help me. Thanks & Regards Aseem Puri

Re: Help with Map/Reduce program

2009-06-15 Thread llpind
Thanks Ryan. You are right it is very much like word count. Here is what I have: private final static IntWritable one = new IntWritable(1); MAPPER = @Override public void map( Im

Re: HBase Write to Regionservers behavior

2009-06-15 Thread Bradford Stephens
Right now, we're storing the documents in HBase. The indices are stored in HDFS and then 'sharded' to each node using Katta. Not sure if there's much of an advantage to storing the index itself in HBase, though I'd be interested to see some use cases for it. On Sat, Jun 13, 2009 at 11:27 AM, zsong

Re: HMaster and /etc/hosts

2009-06-15 Thread Fredrik Möllerstrand
On Mon, Jun 15, 2009 at 6:18 PM, Lars George wrote: > Hi Fredrik, > > Stack suggested it could be that your servers have in the nsswitch.conf to > use files before dns? Could you try for us and switch that, revert the entry > in the /etc/hosts and then try if the options J-D suggest to see if they

Re: HMaster and /etc/hosts

2009-06-15 Thread Fredrik Möllerstrand
> wrt your problem, have you tried setting the following configs? > hbase.master.dns.interface > hbase.master.dns.nameserver > Indeed I did. Any such lookup was overriden by /etc/hosts as per /etc/nsswitch.conf. Now, if I only could get a hold of the person who put that hosts entry there in the fi

Re: HMaster and /etc/hosts

2009-06-15 Thread Lars George
Hi Fredrik, Stack suggested it could be that your servers have in the nsswitch.conf to use files before dns? Could you try for us and switch that, revert the entry in the /etc/hosts and then try if the options J-D suggest to see if they work for this problem? Then we can document this proper

Re: Searching for the rows

2009-06-15 Thread Piotr Praczyk
Thank you :-) Am I right, or using those filters have the drawback of applying them to all the rows following a necessary one instead of stopping the entire process ? If I have some small number of rows with given prefix, near the beginning of a region, will the filters make the MR task scan throu

Re: HMaster and /etc/hosts

2009-06-15 Thread Jean-Daniel Cryans
Fredrik, First, thanks for trying out trunk. wrt your problem, have you tried setting the following configs? hbase.master.dns.interface hbase.master.dns.nameserver This works just like in Hadoop. The reason we removed the master address is that the master can now failover to any other waiting m

HMaster and /etc/hosts

2009-06-15 Thread Fredrik Möllerstrand
Hello list! I've spent the better part of the afternoon upgrading from 0.19.3 to trunk, and I did fall into a hole or two. Specifically, it turns out that we rely on DNS lookups to find out what address HMaster binds to, which caused me some grief. The documentation is also weak on what part Zooke

Re: Searching for the rows

2009-06-15 Thread stack
On Mon, Jun 15, 2009 at 9:45 AM, Piotr Praczyk wrote: > > Does anybody maybe know, if there exists a method of finding the first row > larger (and smaller) in the lexycographical order than given ( not > necessarily existing) row id ? If you open a scanner with a first row, hbase will find the

Searching for the rows

2009-06-15 Thread Piotr Praczyk
Hi Does anybody maybe know, if there exists a method of finding the first row larger (and smaller) in the lexycographical order than given ( not necessarily existing) row id ? In particular I would like to find the first and the last row having a given prefix as a row id. I would be very grateful

Re: Row filters

2009-06-15 Thread Ryan Rawson
The scanner api does not support that. You can use multiple scanners to get the same effect. The speed won't be much slower either (in 0.20). In 0.21 with a new api we will cut down on the number of server roundtrips thus improving the speed even more, On Jun 15, 2009 2:04 AM, "Piotr Praczyk" wr

Re: Row filters

2009-06-15 Thread Piotr Praczyk
Thanks. I meant something a little different although. By fragment I meant all the rows in the table lying ( in the lexicographiocal order) between the row X and Y. The getScanner calls of HTable allow me to specify such rows. Although I wanted to have a sequence of such ragments : X_1 Y_1 ... X_n

Re: Row filters

2009-06-15 Thread Ryan Rawson
And let me follow up a bit... The best configuration for a m-r job is to have the # of map tasks = # of regions in the table. While a scanner can iterate between regions, once the table size gets really big, it's best in my experience, more reliable as well, to have a 1:1 correspondence between m

Re: Row filters

2009-06-15 Thread Ryan Rawson
Hey, The client-side scanner code already will move it to the next region when it hits the end of a region. -ryan On Mon, Jun 15, 2009 at 1:52 AM, Piotr Praczyk wrote: > 2009/6/12 stack > > > On Fri, Jun 12, 2009 at 8:41 AM, Erik Holstad > > wrote: > > > > > ... > > > not really sure how thi

Re: Row filters

2009-06-15 Thread Piotr Praczyk
2009/6/12 stack > On Fri, Jun 12, 2009 at 8:41 AM, Erik Holstad > wrote: > > > ... > > not really sure how this > > was done in 0.19 and earlier. > > > There's a stoprow filter in 0.19.x and earlier. There is also a getScanner > override that takes a start and stop row in 0.19.x (under the wrap