FASD paper:
http://freenetproject.org/papers/kronfol_final_thesis.pdf

The FASD paper enables keyword searching but is spammable. Basically this is 
really easy (I haven't read it in a while so my terminology may be completely 
different to his):
- Keywords are hashed, of course, and these hashes translate to routing 
locations.
- An index element has a list of keywords connected to it.
- These are inserted as normal keys, except that they are inserted to the 
locations of *each* of the keywords.
- A search can be routed to all of the keyword locations or to some subset of 
them. It will include a list of keywords.
- The receiving nodes will send back any index elements that match the search.
- Standard algorithms for relevance are used to rank results (based on relative 
frequency etc). This can easily be extended for arbitrary insert-time relevance 
numbers.
- I don't recall whether flow control was specified, it could become a problem, 
but the obvious solution is just to limit the number of returned items.

The fundamental problem has always been spam: These are effectively KSKs, and 
when you fetch them you propagate all the spam that the spammer has inserted.

So solve it the same way we solve this for chat. All we have to do to make FASD 
feasible is:
- Make the items signed.
- Send a compressed list of the inserter identities we are interested in along 
with the request.

Clearly scalability is limited in much the same way as it is for chat. However 
it should be workable provided there is a certain amount of aggregation and 
centralisation at the end-user level. And it's a big improvement from having to 
download several huge indexes and intersect them on the client. Arguably it may 
be more efficient to use this mechanism only for popular keywords; for 
unpopular keywords the btree indexes should be very efficient. And it's 
probably a useful primitive for many applications...

Could it be made fully distributed, without needing a WoT? Yes, but only with:
1) Some sort of reputation feedback mechanism. This most likely involves not 
propagating, or de-propagating, content that was fetched and not wanted, and
2) A sufficiently high cost for inserts of keywords to deter spamming. E.g. 
something similar to the Scarce KSKs proposal - limit the number of keyword 
index items that can be inserted to one per X seconds per link.

These are in principle feasible but may open serious attacks e.g. the first may 
introduce censorship attacks, links to the wider question of is it safe ever to 
allow to discover the location of a document without propagating it. IMHO the 
main issue there is path folding, and we can turn path folding off for keyword 
searches (just like we turn it off for SSKs and inserts). Plus, will users give 
sufficient useful feedback?

KISS: Simplest mechanism: Keep an LRU (random replacement better for data 
storage but not here). When a fetch happens, generate an unforgeable token for 
each returned datum. If the user gives positive feedback, this is sent back to 
us, we "spend" the token (i.e. commit it to persistent storage, it cannot be 
spent twice, like digicash), and promote the item. This would be an operation 
similar to an insert, which would promote on those nodes where it matched the 
original fetch. Obviously it will only work for a relatively short period after 
the original request. We might have recently fetched items stored in a special 
cache to be promoted into the main cache if feedback comes in, and dropped if 
it doesn't. To prevent abuse, we could perhaps use rate limiting similar to 
that proposed for inserting? A spammer keeping stuff in the cache indefinitely 
without anyone else fetching it may be possible by repeatedly requesting (he 
can't repeatedly insert) - but not if we don't promote items that we have 
already returned from a node that was close to the originator. Or we just 
impose rate limiting on search requests as well as inserts. Or ignore it - spam 
is only a problem if it is in volume, and inserts are throttled anyway.

Hmmm, what exactly does FASD say about this? My recollection was that FASD was 
spammable. I should really read the paper again! It looks like it *might* be a 
tractable problem ...

And can these mechanisms prevent spam on chat too?? Yes, chat is essentially a 
search problem.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 197 bytes
Desc: This is a digitally signed message part.
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20100719/1ea13819/attachment.pgp>

Reply via email to