FASD paper: http://freenetproject.org/papers/kronfol_final_thesis.pdf
The FASD paper enables keyword searching but is spammable. Basically this is really easy (I haven't read it in a while so my terminology may be completely different to his): - Keywords are hashed, of course, and these hashes translate to routing locations. - An index element has a list of keywords connected to it. - These are inserted as normal keys, except that they are inserted to the locations of *each* of the keywords. - A search can be routed to all of the keyword locations or to some subset of them. It will include a list of keywords. - The receiving nodes will send back any index elements that match the search. - Standard algorithms for relevance are used to rank results (based on relative frequency etc). This can easily be extended for arbitrary insert-time relevance numbers. - I don't recall whether flow control was specified, it could become a problem, but the obvious solution is just to limit the number of returned items. The fundamental problem has always been spam: These are effectively KSKs, and when you fetch them you propagate all the spam that the spammer has inserted. So solve it the same way we solve this for chat. All we have to do to make FASD feasible is: - Make the items signed. - Send a compressed list of the inserter identities we are interested in along with the request. Clearly scalability is limited in much the same way as it is for chat. However it should be workable provided there is a certain amount of aggregation and centralisation at the end-user level. And it's a big improvement from having to download several huge indexes and intersect them on the client. Arguably it may be more efficient to use this mechanism only for popular keywords; for unpopular keywords the btree indexes should be very efficient. And it's probably a useful primitive for many applications... Could it be made fully distributed, without needing a WoT? Yes, but only with: 1) Some sort of reputation feedback mechanism. This most likely involves not propagating, or de-propagating, content that was fetched and not wanted, and 2) A sufficiently high cost for inserts of keywords to deter spamming. E.g. something similar to the Scarce KSKs proposal - limit the number of keyword index items that can be inserted to one per X seconds per link. These are in principle feasible but may open serious attacks e.g. the first may introduce censorship attacks, links to the wider question of is it safe ever to allow to discover the location of a document without propagating it. IMHO the main issue there is path folding, and we can turn path folding off for keyword searches (just like we turn it off for SSKs and inserts). Plus, will users give sufficient useful feedback? KISS: Simplest mechanism: Keep an LRU (random replacement better for data storage but not here). When a fetch happens, generate an unforgeable token for each returned datum. If the user gives positive feedback, this is sent back to us, we "spend" the token (i.e. commit it to persistent storage, it cannot be spent twice, like digicash), and promote the item. This would be an operation similar to an insert, which would promote on those nodes where it matched the original fetch. Obviously it will only work for a relatively short period after the original request. We might have recently fetched items stored in a special cache to be promoted into the main cache if feedback comes in, and dropped if it doesn't. To prevent abuse, we could perhaps use rate limiting similar to that proposed for inserting? A spammer keeping stuff in the cache indefinitely without anyone else fetching it may be possible by repeatedly requesting (he can't repeatedly insert) - but not if we don't promote items that we have already returned from a node that was close to the originator. Or we just impose rate limiting on search requests as well as inserts. Or ignore it - spam is only a problem if it is in volume, and inserts are throttled anyway. Hmmm, what exactly does FASD say about this? My recollection was that FASD was spammable. I should really read the paper again! It looks like it *might* be a tractable problem ... And can these mechanisms prevent spam on chat too?? Yes, chat is essentially a search problem. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 197 bytes Desc: This is a digitally signed message part. URL: <https://emu.freenetproject.org/pipermail/tech/attachments/20100719/1ea13819/attachment.pgp>
