Greetings,
On the heels of our smashing success last month, we're going to be
convening the Pacific Northwest (Oregon and Washington)
Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the
24th. The meeting should start at 6:45, organized chats will end
around 8:00, and then there sh
I'm sorry, we can't help you with designing your algorithm. But we can
answer any specific questions you have about hbase-specific schema,
interoperability between hbase and map-reduce and hbase operations.
Good luck!
-ryan
On Mon, Jun 15, 2009 at 9:26 PM, Puri, Aseem wrote:
> I know how to wor
I know how to work with map reduce on HBase. But my requirement is to
cluster similar data together. In the link
http://cwiki.apache.org/MAHOUT/k-means.html they had given some initial
starting but not in detail how to do. So if some know how to do then
please tell me.
Thanks & Regards
Aseem Puri
I'm not sure what that is, but you can scan the entirety of a table and into
a map reduce.
from there whatever you want to do?
On Mon, Jun 15, 2009 at 9:15 PM, Puri, Aseem wrote:
> Hi
>
>I want to use K-means algorithm for clustering of HBase row
> data with help of map reduce. I don
I haven't heard of anyone doing this on HBase specifically, but the ASF Mahout
project is building mapreduce formulations of various clustering, recommender,
and other related machine learning and data mining algorithms:
http://lucene.apache.org/mahout/
Hope that helps,
- Andy (a Mahout
Hi
I want to use K-means algorithm for clustering of HBase row
data with help of map reduce. I don't know it is possible or not. If
anybody tried this thing then please help me.
Thanks & Regards
Aseem Puri
Thanks Ryan. You are right it is very much like word count. Here is what I
have:
private final static IntWritable one = new IntWritable(1);
MAPPER
=
@Override
public void map(
Im
Right now, we're storing the documents in HBase. The indices are
stored in HDFS and then 'sharded' to each node using Katta. Not sure
if there's much of an advantage to storing the index itself in HBase,
though I'd be interested to see some use cases for it.
On Sat, Jun 13, 2009 at 11:27 AM, zsong
On Mon, Jun 15, 2009 at 6:18 PM, Lars George wrote:
> Hi Fredrik,
>
> Stack suggested it could be that your servers have in the nsswitch.conf to
> use files before dns? Could you try for us and switch that, revert the entry
> in the /etc/hosts and then try if the options J-D suggest to see if they
> wrt your problem, have you tried setting the following configs?
> hbase.master.dns.interface
> hbase.master.dns.nameserver
>
Indeed I did. Any such lookup was overriden by /etc/hosts as per
/etc/nsswitch.conf. Now, if I only could get a hold of the person who
put that hosts entry there in the fi
Hi Fredrik,
Stack suggested it could be that your servers have in the nsswitch.conf
to use files before dns? Could you try for us and switch that, revert
the entry in the /etc/hosts and then try if the options J-D suggest to
see if they work for this problem?
Then we can document this proper
Thank you :-)
Am I right, or using those filters have the drawback of applying them to all
the rows following a necessary one instead of stopping the entire process ?
If I have some small number of rows with given prefix, near the beginning of
a region, will the filters make the MR task scan throu
Fredrik,
First, thanks for trying out trunk.
wrt your problem, have you tried setting the following configs?
hbase.master.dns.interface
hbase.master.dns.nameserver
This works just like in Hadoop.
The reason we removed the master address is that the master can now
failover to any other waiting m
Hello list!
I've spent the better part of the afternoon upgrading from 0.19.3 to
trunk, and I did fall into a hole or two. Specifically, it turns out
that we rely on DNS lookups to find out what address HMaster binds to,
which caused me some grief. The documentation is also weak on what
part Zooke
On Mon, Jun 15, 2009 at 9:45 AM, Piotr Praczyk wrote:
>
> Does anybody maybe know, if there exists a method of finding the first row
> larger (and smaller) in the lexycographical order than given ( not
> necessarily existing) row id ?
If you open a scanner with a first row, hbase will find the
Hi
Does anybody maybe know, if there exists a method of finding the first row
larger (and smaller) in the lexycographical order than given ( not
necessarily existing) row id ?
In particular I would like to find the first and the last row having a given
prefix as a row id.
I would be very grateful
The scanner api does not support that. You can use multiple scanners to get
the same effect. The speed won't be much slower either (in 0.20).
In 0.21 with a new api we will cut down on the number of server roundtrips
thus improving the speed even more,
On Jun 15, 2009 2:04 AM, "Piotr Praczyk" wr
Thanks. I meant something a little different although. By fragment I meant
all the rows in the table lying ( in the lexicographiocal order) between the
row X and Y.
The getScanner calls of HTable allow me to specify such rows. Although I
wanted to have a sequence of such ragments : X_1 Y_1 ... X_n
And let me follow up a bit...
The best configuration for a m-r job is to have the # of map tasks = # of
regions in the table. While a scanner can iterate between regions, once the
table size gets really big, it's best in my experience, more reliable as
well, to have a 1:1 correspondence between m
Hey,
The client-side scanner code already will move it to the next region when it
hits the end of a region.
-ryan
On Mon, Jun 15, 2009 at 1:52 AM, Piotr Praczyk wrote:
> 2009/6/12 stack
>
> > On Fri, Jun 12, 2009 at 8:41 AM, Erik Holstad
> > wrote:
> >
> > > ...
> > > not really sure how thi
2009/6/12 stack
> On Fri, Jun 12, 2009 at 8:41 AM, Erik Holstad
> wrote:
>
> > ...
> > not really sure how this
> > was done in 0.19 and earlier.
>
>
> There's a stoprow filter in 0.19.x and earlier. There is also a getScanner
> override that takes a start and stop row in 0.19.x (under the wrap
21 matches
Mail list logo