Re: Duplicated events in HBase

Tom Chan Tue, 09 Aug 2016 12:07:18 -0700

Looking at the source code on develop branch,

https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L270


when events are exported the eventId is there, so at import time that
eventId will be used as rowKey:

https://github.com/apache/incubator-predictionio/blob/develop/data/src/main/scala/org/apache/predictionio/data/storage/hbase/HBEventsUtil.scala#L147-L150

So it should replace the existing row because they have the same row key.
Same cannot be said if you manually created a replacement event with the
same name, entityId, etc.

To be sure you can export your events to file, import it to a test
appId/channelId twice and see if there's any duplicated events (say you can
check using the event server)

Tom



On Mon, Aug 8, 2016 at 7:09 PM, Jose Rivera-Rubio <
[email protected]> wrote:

> Hi all,
>
> *Problem*:
>
> I'll be generating dumps of my event data using pio export and then
> running pio import using these dumps without doing pio app data-delete.
>
> *Question*:
> Is pio import running any duplicity checks or the data will be imported as
> is, generating duplicated eventIds?
>
> Many thanks!
>

Re: Duplicated events in HBase

Reply via email to