Re: Scan vs map-reduce

2014-04-14 Thread Li Li
thanks, I will try List later On Tue, Apr 15, 2014 at 3:39 AM, Doug Meil wrote: > > re: "my first version is using 20,000 Get² > > Just throwing this out there, but have you looked at multi-get? Multi-get > will group the gets by RegionServer internally. > > You are doing a lot of IO for a web-

Re: Scan vs map-reduce

2014-04-14 Thread Doug Meil
re: "my first version is using 20,000 Get² Just throwing this out there, but have you looked at multi-get? Multi-get will group the gets by RegionServer internally. You are doing a lot of IO for a web-app so this is going to be tough to make ³fast², but there are ways to make it ³faster.² But

Re: Scan vs map-reduce

2014-04-14 Thread Jean-Marc Spaggiari
This might help you: http://phoenix.incubator.apache.org/ JM Le 2014-04-14 07:53, "Li Li" a écrit : > I need to get about 20,000 rows from the table. the table is about > 1,000,000 rows. > my first version is using 20,000 Get and I found it's very slow. So I > modified it to a scan and filter un

Re: Scan vs map-reduce

2014-04-14 Thread Li Li
I need to get about 20,000 rows from the table. the table is about 1,000,000 rows. my first version is using 20,000 Get and I found it's very slow. So I modified it to a scan and filter unrelated rows in the client. maybe I should write a coprocessor. btw, is there any filter available for me? some

Re: Scan vs map-reduce

2014-04-14 Thread Jean-Marc Spaggiari
Hi Li Li, If you have more than one region, might be useful. MR will scan all the regions in parallel. If you do a full scan from a client API with no parallelism, then the MR job might be faster. But it will take more resources on the cluster and might impact the SLA of the other clients, if any,

Re: Scan vs map-reduce

2014-04-13 Thread Mohammad Tariq
Well, it depends. Could you please provide some more details?It will help us in giving a proper answer. Warm Regards, Tariq cloudfront.blogspot.com On Mon, Apr 14, 2014 at 11:38 AM, Li Li wrote: > I have a full table scan which cost about 10 minutes. it seems a > bottleneck for our application

Scan vs map-reduce

2014-04-13 Thread Li Li
I have a full table scan which cost about 10 minutes. it seems a bottleneck for our application. if use map-reduce to rewrite it. will it be faster?