sorry for all these unclear queries.
i turned of WAL on both the doc and index table.
in my system all documents have a UUID (assigned before it comes into the
system) i just use this UUID as the rowkey. so duplicates basically means
documents with the same id, even if the contents are the same
What happens when you have a poem like Mary had a little lamb?
Did you turn off the WAL on both table inserts, or just the index?
If you want to avoid processing duplicate docs... You could do this a couple of
ways. The simplest way is to record the doc ID and a check sum for the doc. If
the d
gt;>>
>>>> Does that explain it?
>>>>
>>>> -Mike
>>>>
>>>> On Feb 18, 2013, at 4:57 AM, yonghu wrote:
>>>>
>>>>> Hi, Michael
>>>>>
>>>>> I don't quite understand what do you mean by "round trip back to the
>>>>> client". In my understanding, as the RegionServer and TaskTracker can
>>>>> be the same node, MR don't have to pull data into client and then
>>>>> process. And you also mention the "unnecessary overhead", can you
>>>>> explain a little bit what operations or data processing can be seen as
>>>>> "unnecessary overhead".
>>>>>
>>>>> Thanks
>>>>>
>>>>> yong
>>>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>>>> wrote:
>>>>>> Why?
>>>>>>
>>>>>> This seems like an unnecessary overhead.
>>>>>>
>>>>>> You are writing code within the coprocessor on the server.
>>> Pessimistic code really isn't recommended if you are worried about
>>> performance.
>>>>>>
>>>>>> I have to ask... by the time you have executed the code in your
>>> co-processor, what would cause the initial write to fail?
>>>>>>
>>>>>>
>>>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel
>>> wrote:
>>>>>>
>>>>>>> its a local read. i just check the last param of PostCheckAndPut
>>> indicating if the Put succeeded. Incase if the put success, i insert a row
>>> in another table
>>>>>>>
>>>>>>> Sincerely,
>>>>>>> Prakash Kadel
>>>>>>>
>>>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan wrote:
>>>>>>>
>>>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the
>>> nature of
>>>>>>>> LSM, read is much slower compared to a write...
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Wei
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> From: Prakash Kadel
>>>>>>>> To: "user@hbase.apache.org" ,
>>>>>>>> Date: 02/17/2013 07:49 PM
>>>>>>>> Subject:coprocessor enabled put very slow, help please~~~
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> hi,
>>>>>>>> i am trying to insert few million documents
n a little bit what operations or data processing can be seen as
>>>> "unnecessary overhead".
>>>>
>>>> Thanks
>>>>
>>>> yong
>>>> On Mon, Feb 18, 2013 at 10:35 AM, Michael Segel
>>>> wrote:
>>>>> Why?
>>>>>
>>>>> This seems like an unnecessary overhead.
>>>>>
>>>>> You are writing code within the coprocessor on the server.
>> Pessimistic code really isn't recommended if you are worried about
>> performance.
>>>>>
>>>>> I have to ask... by the time you have executed the code in your
>> co-processor, what would cause the initial write to fail?
>>>>>
>>>>>
>>>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel
>> wrote:
>>>>>
>>>>>> its a local read. i just check the last param of PostCheckAndPut
>> indicating if the Put succeeded. Incase if the put success, i insert a row
>> in another table
>>>>>>
>>>>>> Sincerely,
>>>>>> Prakash Kadel
>>>>>>
>>>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan wrote:
>>>>>>
>>>>>>> Is your CheckAndPut involving a local or remote READ? Due to the
>> nature of
>>>>>>> LSM, read is much slower compared to a write...
>>>>>>>
>>>>>>>
>>>>>>> Best Regards,
>>>>>>> Wei
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> From: Prakash Kadel
>>>>>>> To: "user@hbase.apache.org" ,
>>>>>>> Date: 02/17/2013 07:49 PM
>>>>>>> Subject:coprocessor enabled put very slow, help please~~~
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> hi,
>>>>>>> i am trying to insert few million documents
gt;>>
> >>> This seems like an unnecessary overhead.
> >>>
> >>> You are writing code within the coprocessor on the server.
> Pessimistic code really isn't recommended if you are worried about
> performance.
> >>>
> >>> I have to ask... by the time you have executed the code in your
> co-processor, what would cause the initial write to fail?
> >>>
> >>>
> >>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel
> wrote:
> >>>
> >>>> its a local read. i just check the last param of PostCheckAndPut
> indicating if the Put succeeded. Incase if the put success, i insert a row
> in another table
> >>>>
> >>>> Sincerely,
> >>>> Prakash Kadel
> >>>>
> >>>> On Feb 18, 2013, at 2:52 PM, Wei Tan wrote:
> >>>>
> >>>>> Is your CheckAndPut involving a local or remote READ? Due to the
> nature of
> >>>>> LSM, read is much slower compared to a write...
> >>>>>
> >>>>>
> >>>>> Best Regards,
> >>>>> Wei
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> From: Prakash Kadel
> >>>>> To: "user@hbase.apache.org" ,
> >>>>> Date: 02/17/2013 07:49 PM
> >>>>> Subject:coprocessor enabled put very slow, help please~~~
> >>>>>
> >>>>>
> >>>>>
> >>>>> hi,
> >>>>> i am trying to insert few million documents
, in plan?
> Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From: Michel Segel
> To: "user@hbase.apache.org" ,
> Date: 02/18/2013 09:23 AM
> Subject:Re: coprocessor enabled put very slow, help please~~~
>
>
>
> Why ar
>> handle the thread safeness in using HTable? Any replacement for
>> HTablePool, in plan?
>> Thanks,
>>
>>
>> Best Regards,
>> Wei
>>
>>
>>
>>
>> From: Michel Segel
>> To: "user@hbase.apache.org" ,
>>
thread safeness in using HTable? Any replacement for
> HTablePool, in plan?
> Thanks,
>
>
> Best Regards,
> Wei
>
>
>
>
> From: Michel Segel
> To: "user@hbase.apache.org" ,
> Date: 02/18/2013 09:23 AM
> Subject:Re: coprocesso
Re: coprocessor enabled put very slow, help please~~~
Why are you using an HTable Pool?
Why are you closing the table after each iteration through?
Try using 1 HTable object. Turn off WAL
Initiate in start()
Close in Stop()
Surround the use in a try / catch
If exception caught, re instantiate
be the same node, MR don't have to pull data into client and then
>>>> process. And you also mention the "unnecessary overhead", can you
>>>> explain a little bit what operations or data processing can be seen as
>>>> "unnecessary overhead".
>>>>
>>&g
w...@us.ibm.com; 914-945-4386
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org" ,
> Date: 02/18/2013 04:04 AM
> Subject:Re: coprocessor enabled put very slow, help please~~~
>
>
>
> its a local read. i just check the last
To: "user@hbase.apache.org" ,
Date: 02/18/2013 04:04 AM
Subject: Re: coprocessor enabled put very slow, help please~~~
its a local read. i just check the last param of PostCheckAndPut
indicating if the Put succeeded. Incase if the put success, i insert a row
in another table
S
t;>>> Why?
>>>>
>>>> This seems like an unnecessary overhead.
>>>>
>>>> You are writing code within the coprocessor on the server. Pessimistic
>>>> code really isn't recommended if you are worried about performance.
>>>>
by the time you have executed the code in your
>>> co-processor, what would cause the initial write to fail?
>>>
>>>
>>> On Feb 18, 2013, at 3:01 AM, Prakash Kadel wrote:
>>>
>>>> its a local read. i just check the last param of Post
wrote:
>>>>
>>>>> its a local read. i just check the last param of PostCheckAndPut
>>>>> indicating if the Put succeeded. Incase if the put success, i insert a
>>>>> row in another table
>>>>>
>>>>> Sincerely,
&
indicating if the Put succeeded. Incase if the put success, i insert a row
>>>> in another table
>>>>
>>>> Sincerely,
>>>> Prakash Kadel
>>>>
>>>> On Feb 18, 2013, at 2:52 PM, Wei Tan wrote:
>>>>
>>>&
;
>>> Sincerely,
>>> Prakash Kadel
>>>
>>> On Feb 18, 2013, at 2:52 PM, Wei Tan wrote:
>>>
>>>> Is your CheckAndPut involving a local or remote READ? Due to the nature of
>>>> LSM, read is much slower compared to a write...
&g
at 2:52 PM, Wei Tan wrote:
>>
>>> Is your CheckAndPut involving a local or remote READ? Due to the nature of
>>> LSM, read is much slower compared to a write...
>>>
>>>
>>> Best Regards,
>>> Wei
>>>
>>>
>>>
&g
t;> From: Prakash Kadel
>> To: "user@hbase.apache.org" ,
>> Date: 02/17/2013 07:49 PM
>> Subject:coprocessor enabled put very slow, help please~~~
>>
>>
>>
>> hi,
>> i am trying to insert few million documents to hbase
t
> defeats the performance purpose.
>
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org"
> Sent: Sunday, February 17, 2013 5:26 PM
> Subject: Re: coprocessor enabled put very slow, help please~~~
>
> thanks again
>> LSM, read is much slower compared to a write...
>>
>>
>> Best Regards,
>> Wei
>>
>>
>>
>>
>> From: Prakash Kadel
>> To: "user@hbase.apache.org" ,
>> Date: 02/17/2013 07:49 PM
>> Subject:coproces
the nature of
> LSM, read is much slower compared to a write...
>
>
> Best Regards,
> Wei
>
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org" ,
> Date: 02/17/2013 07:49 PM
> Subject: coprocessor enabled put very sl
uot; ,
> Date: 02/17/2013 07:49 PM
> Subject: coprocessor enabled put very slow, help please~~~
>
>
>
> hi,
>i am trying to insert few million documents to hbase with mapreduce. To
> enable quick search of docs i want to have some indexes, so i tried to use
> the cop
Is your CheckAndPut involving a local or remote READ? Due to the nature of
LSM, read is much slower compared to a write...
Best Regards,
Wei
From: Prakash Kadel
To: "user@hbase.apache.org" ,
Date: 02/17/2013 07:49 PM
Subject:coprocessor enabled put very slow, h
ebruary 17, 2013 5:26 PM
Subject: Re: coprocessor enabled put very slow, help please~~~
thanks again,
i did try making indexes with the MR. dont have exact evaluation data, but
inserting indexes directly with mapreduce does seem to be much much faster than
making the indexes with the coprocessor
To: "user@hbase.apache.org"
>> Sent: Sunday, February 17, 2013 5:13 PM
>> Subject: Re: coprocessor enabled put very slow, help please~~~
>>
>> thank you lars,
>> That is my guess too. I am confused, isnt that something that cannot be
>> controlled. I
5:13 PM
> Subject: Re: coprocessor enabled put very slow, help please~~~
>
> thank you lars,
> That is my guess too. I am confused, isnt that something that cannot be
> controlled. Is this approach of creating some kind of index wrong?
>
> Sincerely,
> Prakash Kadel
>
better.
>
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org"
> Sent: Sunday, February 17, 2013 5:13 PM
> Subject: Re: coprocessor enabled put very slow, help please~~~
>
> thank you lars,
> That is my guess too. I am confused, isnt tha
hether that performs better.
>
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org"
> Sent: Sunday, February 17, 2013 5:13 PM
> Subject: Re: coprocessor enabled put very slow, help please~~~
>
> thank you lars,
&g
ubject: Re: coprocessor enabled put very slow, help please~~~
thank you lars,
That is my guess too. I am confused, isnt that something that cannot be
controlled. Is this approach of creating some kind of index wrong?
Sincerely,
Prakash Kadel
On Feb 18, 2013, at 10:07 AM, lars hofhansl wrote:
&g
ion server in most
> cases, that could explain it being (much) slower.
>
>
>
>
> From: Prakash Kadel
> To: "user@hbase.apache.org"
> Sent: Sunday, February 17, 2013 4:52 PM
> Subject: Re: coprocessor enabled put very slow, hel
Presumably the coprocessor issues Puts to another region server in most cases,
that could explain it being (much) slower.
From: Prakash Kadel
To: "user@hbase.apache.org"
Sent: Sunday, February 17, 2013 4:52 PM
Subject: Re: coprocessor enabled put
Forgot to mention. I am using 0.92.
Sincerely,
Prakash
On Feb 18, 2013, at 9:48 AM, Prakash Kadel wrote:
> hi,
> i am trying to insert few million documents to hbase with mapreduce. To
> enable quick search of docs i want to have some indexes, so i tried to use
> the coprocessors, but they
hi,
i am trying to insert few million documents to hbase with mapreduce. To
enable quick search of docs i want to have some indexes, so i tried to use the
coprocessors, but they are slowing down my inserts. Arent the coprocessors not
supposed to increase the latency?
my settings:
3 regio
34 matches
Mail list logo