To add fourth and fifth options, or mixture if you prefer, to Jan's suggestions:
Database-per-business-case: - group types by independent business objective, i.e. Addressing, Calendar (meetings & results), Assets, etc. - put all documents related to a single objective in a single database - very useful if you want to further isolate business logic or switch to a microservice approach. Database-per-scaling-case: - create a database for data that has a finite limit, like users, addresses, assets. - create databases for data which will grow infinitely like calendar entries or anything that has time as a primary key. - by recognising early which data will decrease in value over time, you're preparing yourself for more effective sharding in the future. Another factor to bear in mind is the opportunity of using different database systems for different use-cases. My approach now for example is to use CouchDB for backend persistence, which it does extremely well, and use a background task subscribed to the changes feed to push changes to say a Firebase instance. This allows you to group data into collections around users, optimised for the UIs. As with all software architecture decisions, there are tradeoffs in complexity, so you should always do what you feel most comfortable with to solve the business problem as soon as possible, and be prepared to worry about the other issues later. If you're wondering what I'm doing in my current project: I namespace databases by microservice, create a single database per type, and soft-shard infinite datasets by year, or whatever time based key makes the most sense, all while using the changes feeds to synchronise to other databases more suitable for querying or sending documents to a data lake. But I also think over-engineering is underrated :-) cheers sam On Thu, 8 Apr 2021 at 11:12, Jan Lehnardt <[email protected]> wrote: > Hi Kiril, > > first, this is a good question :) > > I’d like to add, that you are missing one more approach to consider: use a > single database. > > Let’s first look at the other approaches, and I’ll start with > database-per-type: > > - there is no real benefit of doing this, other than theoretical purity > - it won’t allow you to query across types, say: give me all or a part of > types for a single user, which is, I assume, going to be a very common thing > - to get to all types for a user, you have to make a request to each > database > - the more types you get, the more queries you have to make > - reducing the number of queries, and reducing the number of databases you > need to talk to to get your most common operation is beneficial > > So I would recommend against that. I know of some apps using this pattern > successfully, but that’s because they have very strong separation of types, > so they will never have to query more than one type at a time. If someone > else reading this who is happy with their setup this way, I won’t begrudge > them, but I would *not* recommend anyone starts with this pattern. > > Next, database-per-user: > > - you always clearly know where a user’s data is. Wanna delete a user: > delete their db. Easy for making sure data is separated from other users, > easy to comply with legislation about data deletion, etcpp. > - if you plan to replicate the data onto a per-user device (like an > offline web app, or native app), access control in CouchDB is such that it > is per database. So if that’s a requirement for you, then you are already > locked into database-per-user. If not, it is still neat from a data > organisation point of view. > - downsides include not being able to query across multiple user (how many > address book entries are there in total, what’s the median per user, etc). > Sometimes this doesn’t matter, sometimes it is important. > - there are workarounds to query across all databases (replicate all > individual databases into a central big database for querying). Again, this > is usually the case for setups where there needs to be per-user > replication. But it is a workaround. > - having many small database requires a bit more work on CouchDB’s side > when dealing with many concurrent users being active > > Finally: single database: > > - you can query across all users > - you can get all data per user in one request > - either via _all_docs if your doc _ids are prefixed with the user id > - or a view/mango by user-id > - you can’t authenticate/replicate by-user (but if you don’t have that > requirement, that’s not a downside) > - database sharding scales up linearly with your data usage > - concurrent users can be handled very efficiently > - partitioned databases with a partition key of the user id gives you > additional benefits at larger scales (likely matters less with 1000 users), > but ymmv, and you know the option is there for later > > * * * > > Given the situation you described, I’d strongly recommend a single > database for this app, because the benefits are overwhelming. > > I’d strongly recommend against using db-per-type, just because it doesn’t > gain you anything practically. > > If you need per-user replication, db-per-user is your only choice. It has > drawbacks, but no show stoppers at the data sizes you quote. > > HTH, > Jan > — > > > On 8. Apr 2021, at 10:07, Kiril Stankov <[email protected]> wrote: > > > > Hi all, > > > > I have a design question. > > Lets say that my solution will serve hundreds (or even more than 1000) > > customers. > > For each customer I will need 4 to 6 sets of objects, say: assets, > > address book, meetings and meeting results. > > I wonder what would be best approach here: > > - have userDB's for each customer with documents identified by type > > (this means potentially thousands of db's), or: > > - have 6 db's, one per document type and filter by customer ID. > > > > There might be thousands of documents of each type per customer, > > eventually bringing the total number of documents to millions. > > > > How each of the approaches above will affect clustering and what will be > > the best cluster setup. > > In terms of performance and indexing, does one of the designs > > outperforms the other? > > > > Thanks in advance, > > Kiril. > >
