Tahoe-LAFS devchat 02-May-2017 Attendees: warner, exarkun, meejah
* magic-wormhole Dockerfile - https://github.com/warner/magic-wormhole/pull/149 * new 1382 branch: fails some integration tests, can use more review * reviewing PR 415 on I2CP * PR408: verifycaps * leave web/filenode in place, add new (very small) web/verifynode that only does GET(t=info), POST(t=check,repair) * really we need IRepairable, ICheckable, instead of adding the awkward IVerifierNode We spent most of the time studying the 1382 branch (PR 416). I think it's pretty close to landable, but there's a behavior regression (we walked through some potential fixes, but I think we'll probably just land it anyways and fix it later). The old uploader would, if necessary, try every single server in the hopes of placing all shares. The new uploader in this branch will only ever try 2*N servers. The deal is that there's a relatively expensive algorithm that finds a suitable share-placement, and the runtime grows (a lot? a little? I don't know) with the number of servers involved. To tolerate the pathological placement bugs that 1382 wants to solve, we need to know about existing shares before we decide on the placement, and to do that we need to send out queries before running this algorithm. In the normal/ideal case, most servers can accept our shares, and don't already have a share, so 2*N servers will be plenty. Asking more than that is a waste of traffic, and increases the chances that one of them will be slow to respond, slowing down the upload (we can't upload to anybody until we get back the answer from everybody). But in a really full grid, the first 2*N servers in the permuted list might be full, and in this new branch, that would cause the upload to fail. The longer-term fix we discussed would be a more iterative approach. For each server under consideration, it would remember which shares it wants to place there, and also what we know about any existing shares (whether we've even asked, whether they've answered, and what shares they've committed to accepting). Then there'd a loop somewhat like: 1: start with no information, and an empty placement plan 2: compute a potential placement plan 3: does our potential placement meet the happiness requirement? * no: add a server from the introducer's list, go to step 2. If we've run out of servers, abort 4: send share-reservation requests to all servers 5: as responses come back, update the placement plan * if the server rejects our request, go back to step 2 (to try putting that share somewhere else) * if the server reveals a pre-existing share, pause everything while we send out a bunch more queries in the hopes of finding all the other "dark matter" shares. When all the responses have come back, or we get tired of waiting, go back to step 2 to come up with a new plan 6: have we gotten commitments from all the servers in our plan? * yes: start the upload We may have to cancel a reservation if we learn that the share we were going to place is already present on some other server. There's a tradeoff between how many queries we make (more network traffic) and our chances of discovering/using pre-existing shares (also network traffic, for uploading shares). I think the most likely case is that the file has never been uploaded before, so I think the algorithm should basically assume this until proven otherwise. But as soon as we discover a pre-existing share, we should switch into a mode where we find most or all of the expected N shares, so we can avoid duplicate uploads. We don't *have* to find all N shares: it's probably better to upload a new share than to query every single server in the grid in the hopes of finding it. (That's for immutable files. For *mutable* files, it's more important to find and replace the old shares, because they represent a rollback hazard for subsequent downloaders, so we should be more aggressive about tracking down those shares.) Meejah pointed out that some applications may be more likely to have pre-existing shares, like "tahoe backup" where you've moved a lot of files from one directory to another, or some other uncoordinated-uploads scenarios, so I shouldn't be so quick to tune the algorithm to prefer brand-new-files, performance wise. The algorithm above, if we implemented it, would talk to exactly N servers in the ideal case (no pre-existing shares, all servers accept our share, all servers answer quickly). But if necessary it would try every server in the grid, and most pre-existing shares would be included correctly. But we might have to call the placement algorithm several times, and depending upon the size of the grid, that might start to get expensive (I haven't benchmarked it or even asked Daira what order of computational complexity it has). cheers, -Brian _______________________________________________ tahoe-dev mailing list [email protected] https://tahoe-lafs.org/cgi-bin/mailman/listinfo/tahoe-dev
