We use pouch sync (ember-pouch) to couch in a bad sometimes offline wifi
situation and it is very reliable. Not sure about your 10G but it should be
OK if you have the bandwidth when you are online. We had to implement
indicators for users to let them know they are walking outside the wifi
connection. Indicator example https://bloggr.exmer.com/ Perhaps if you
setup a 10G couch to couch you need to implement indicators on the client
side too? Have not needed purge much so can't help you answer that.

- Martin

On Tue, Jun 20, 2017 at 9:24 PM, Vovan Vovan <vova...@gmail.com> wrote:

> Hi guys,
>
> I'm new to couchdb. I'm planning to use CouchDB 2.0 to transfer some usage
> logs from a number of devices based on customer premises to the cloud
> solution for analysis. By logs I mean a constant stream of small JSON
> documents(up to 1K), I expect up to 100K of such documents from each device
> daily.
>
> 1. I'm planning to implement this by setting up one way replication from
> couchdb on the device to the couchdb cluster(up to 8 instances) in the
> cloud.
> 2. as logs are being replicated on the cloud side I'm going to handle the
> data stream by subscribing to the _changes feed and passing device logs to
> the pipeline for the further processing.
> 3. Once new log entries are taken and sent to the pipeline I basically
> don't need them in couchdb so I'm going to have data expiration by
> maintaining index on ctime and having separate job to do bulk deletion of
> documents older than a few days.
>
> Now, I know in couchdb documents are not being really deleted, just marked
> as 'deleted' so the database will permanently grow. I've and option either
> to use periodic _purge(which I heard may be not safe especially in
> clustered environment) or implement this as monthly rotating database(which
> is more complex and I don't really want to follow this route).
>
> My questions are:
>
> - Is this a valid use case for CouchDB? I want to use it primarily because
> of its good replication capabilities, especially in not reliable
> environments with some periods of being offline etc. Otherwise I'll have to
> write the whole set of data sync APIs with buffering, retries etc myself.
> - I find counter opinions in the internet - some people say CouhDB is a
> perfect tool for replication over the internet, other say - do use
> replication ONLY in the local network. What is the truth? Should I use
> CouchDB as a tool to sync data over not reliable network?
> - Is this recommended practice to set up a chain of replication? Due to
> security considerations I want customer devices to replicate each to its
> own database in the cloud. Then I want those databases to replicate to the
> single central log database I'd subscribe for _changes. The reason is that
> it's easier for me to have a single source of _changes feed rather than
> multiple databases.
> - Is using _purge safe in my case? From the official doc I read "In
> clustered or replicated environments it is very difficult to guarantee that
> a particular purged document has been removed from all replicas". I don't
> think this is a problem for me as I primarily care about database size so
> it shouldn't be critical if some documents fail to delete.
> - Considering I may have up to 10G of new data daily(or ~10 million of new
> entries daily with 100 devices estimate) I'm probably going to expire
> documents older than 48 hours. How often do I need to run _purge?
> - Does _purge block database for read/write while running? Does _compact
> (that I have to run after _purge) block?
>
> thanks,
> --Vovan
>

Reply via email to