Hello everyone, I'm having trouble working out how to implement bulk loading of data in a particular application I'm working on.
Making it as simple to understand as possible, I've got two types of document that need to be stored: - "accounts", which includes all the information about a user account (e.g. name of the user, account ID, address, account creation date, ...) - "transactions", which are tied to a specific account. Transaction information would include e.g. a transaction ID, account ID, transaction date, ... Importantly, I don't know which fields I'll receive in my "transaction" records, so I need a schema-less storage model If this was SQL, I'd have 2 separate tables, with a foreign key from each "transactions" record pointing to a record in "accounts". Nice and simple for my SQL-trained brain to work with, but the data I have to work with is inherently schema-less so a RDBMS isn't going to work. With CouchDB, I've got this data going into a single database. This seems to be the accepted best practice, and makes sense for this specific application. This works fine now, but is taking too long - I can't keep up with the rate of incoming data as long as I'm loading it in one record at a time. Assume the following naming conventions: - A1 is account number 1, A2 is account number 2, ... - T1A1 is the first transaction against account number 1, T3A4 is the third transaction against account number 4 The data I'm loading may come in the following sequence: A1, T1A1, A2, T2A1, A3, T3A1, T1A2, T1A3, A4, T2A2, ... In other words, I'm receiving new account data intermixed with new transaction data. I'll never receive a transaction for an account that doesn't already exist. Again, nothing unusual for a real life application. I'd really like to be bulk-loading in the data, as the need to load it quickly overrides all other requirements at this point. However, as I understand it, bulk loading the data will require that accounts already exist for any transactions, and that's difficult giving the intermixing of account and transaction data coming in. One possibility is that I could conceivably force the end of a bulk load "transaction" every time I see a new account number; doing that would ensure that I'm never trying to generate a transaction against an account that isn't already in the database. However, I'm wondering if this is the best way of dealing with this situation, which is presumably fairly common. Any thoughts/ideas/suggestions welcome. Thanks in advance Dave M.
