Re: [Tech] freenet not suited for sharing large data

Tom Kaitchuck Mon, 04 Aug 2003 11:22:43 -0700

On Monday 04 August 2003 04:20 am, Gabriel K wrote:
> > A. Just broadcast your request to everyone.
> > Doesn't scale.
>
> In DC you broadcast your request to everyone. But a better approach is to
> organize the search like chord does I think.
> Anyway, let's say it would be possible for a node to send a request to
> another neighbour, which passes it on, and in time the request forks, so
> that the requester doesn't have to wait for the message to pass ALL nodes.
> And even though it forks, and the same request is asked simultainiously,
> there are NO looping problems. A node only recieves the messages once.
> I would say that scales, would you?


No. No it does not. Even if you could some how make an optimal network 
(impossible unless you serialize everything), where you only asked the 
minimum number of people in order to find the data, and there were never any 
loopbacks. You still have to ask (on average) n*m nodes for every request. 
Where N is the number of nodes, and M is the fraction of the nodes that have 
the data you want. Also everytime that the data is not available, you have to 
ask everyone. This sucks royally.

> > B. Have a centrial server to keep track of it for you.
> > Centralized, and vulnerable to attack.
>
> Modification: you can have a central server making decisions for the
> network and individual nodes, and STILL it can be safe from attack IF no
> node knows its IP adress!
> It is VERY practical to have such an authority that every nodes trusts and
> obeys. If they trust ONE instance, this makes it a lot easier to solve many
> problems, because you avoid scaleability problems with voting processes
> (many times, if ONE any node can decide on it's own, that power can be
> abused) .

1. You can't hide a server that people have to connect to.
2. If you could, would you trust a server that you did not know who 
controlled?
3. The servers still need to know where the data is. Are they supposed to 
trust the individual clients are being honest about what they have and 
whether they sent it? If the data is going through the severer, it would be 
VERY slow.
4. It can still be attacked by crackers, lawyers, and network failures.

> > C. Use some sort of predefined routing scheme, where it is determined in
> > advance which node holds which data.
> > The holder any given piece of data can easily be determined.
>
> This COULD be solved by letting the "hidden" central server take care of
> such things, but it is better to put as much work as possible on nodes, to
> relieve the importance and load on the central point (if there is one).
> This is also why I think it's good to leave the data at the nodes that
> holds them from the beginning. No sorting into the network, and no extra
> store space on each node for someone elses files. You only are responsible
> for the stuff you WANT yourself.

If you don't store things the network speed is bounded by the person sharing 
the data.

> Umm, you are talking about that the HOLDER of the data is malicious and
> doesn't send the data? I would say there is no defense against that attack.

YES THERE IS! Don't route data there next time. If you can't do this, any node 
can just say it has every piece of data on the network, and it always returns 
requests very quickly. Pretty soon it's killing half the network's traffic. 
Plus a single malicious node could contact every other node and request it to 
be a proxy for it. Then it can pretend to be a ridicules number of nodes, and 
essentially bring down the whole network.

> That is also why it's good to just leave the data at the node that
> introduced it to the network. If THAT node is malicious, it might not even
> join the network and share it's data! The only harm in saying it shares
> data and then not sending it, is that a requester gets his hopes up and
> then gets very very sad :)
> Of course you can't force a malicious node to send you data. Hey, I can ask
> my friend to send me a tune, and if he sends me some other tune I didn't
> request, there is nothing I can do about it except ask him nicely! :)
>
> > From the performance perspective, it's true it does incur overhead.
>
> However
>
> > this is only over the whole network, not for any individual user. So each
> > user gets a download rate equal to their maximum network throughput under
> > peek demand. However supposing the network averages using 10 hops and the
> > load balancing works well, the total network cannot be receiving with
> > more than 1/10th of it's TOTAL bandwidth at any given time. This is not a
>
> problem,
>
> > because under most circomstances this is not the case. Yes, this prevents
> > dial-up users form contributing to the network, but for Broadband it's a
>
> non
>
> > issue. A 128k link can transmit a theoretical 10.8GB a day. Show me
>
> someone
>
> > who has a 128k link and AVERAGES downloading more than 1GB a day, and
> > I'll show you someone who needs a faster connection. Sure you could
> > change it
>
> from
>
> > aveaging 10 hops to 5 hops, but then you cut the rate that nodes learn in
> > half.
>
> Well, you are assuming users are active spread equal throughout the day...

This is a fair assumption, because we are routing biased on the hash of the 
data!

> I say that it is likely that networks will form where most users are from a
> specific region in the worlds, for instance scandinavia. So their activity
> will have it's peak at some time. And at this time I think it's not hard to
> reach the limit!

It should be very hard to reach the limit. The nodes where the data is comming 
from are randomly distributed. The intermidiate nodes are fairly randomly 
distributed, and if there are more original requests comming from a confined 
aria, the load balancing should shift it so more of the intermediate hops are 
routed elsewhere to compensate.

> Anyway, you are missing the point here... Sure, they might not reach this
> limit, BUT the PROTOCOL should not require so much BW! Maybe a user doesn't
> only want to use freeNet. Maybe he runs DirectConnect as well while doing
> other stuff that requires some BW...

> Not in freeNet... and that's why I think it's not suited for large file
> transfers.. at least not with the level of activity I think one should
> assume when writing a protocol.

What is better? Any individual can max out their download speed, even if the 
file is unpopular. You are making the flawed assumption that inserts are very 
common, and nearly as frequent as requests. Where in reality, requests 
probably outway inserts 1000 to 1! Think about any webpage, how many times is 
it viewed compaired to how many times it is updated? Now in Freenet the same 
applies to any file. It is not like other networks where once you download 
something, it becomes "shared". Content only needs to be inserted into the 
network ONCE. (Two people CAN'T insert the same conetent.)

Now lets think about what you are proposing. If you have a centralized system 
in charge of routing, you want it to be some what distributed, and you want 
end-to-end encryption, and you want none of the centrial servers to know any 
thing incriminating. So take MixMinion. Then you want to run files over it. 
But you want to be anonymous. So you allow any client to connect to another 
client as a proxy. Then when data is found the proxies will transfer the data 
to one another. Then you start adding optimizations. Like it would be a good 
idea for the proxies to cache the content for the next request. Also it would 
help if the proxies learned a little about their immediately surrounding 
network, so they wouldn't have to go through the main server, if they could 
find the file locally. For security, you want to encrypt the files. But it 
would be better if they were broken up so you could do bitTorrent style 
downloading from the network. Then to make the servers job easier, you want 
to categorize the proxies biased on content. (IE: you connect to the other 
proxies that are as close to what you want to share as possible.) That scheme 
doesn't have to be prefect, just the best match of the nodes you already 
know. Then, as I explained before, to prevent the network form being 
attacked, you have to return the data along the request path, but you can cut 
a few steps out. Then both for security, and to speed up the network, it 
would be best for you to off load all your shared files directly to the 
proxies, and dedicate all your storage space to your aria of specialization 
as determined by the nodes that are using you as a proxy.
Congratulations! You have recreated Freenet! Only difference is there is a big 
server in there. If you want Freenet with a big server there is a shorter 
way:
1. Find a very fast server with HUGE bandwidth.
2. Install Freenet on it.
_______________________________________________
Tech mailing list
[EMAIL PROTECTED]
http://hawk.freenetproject.org:8080/cgi-bin/mailman/listinfo/tech

Re: [Tech] freenet not suited for sharing large data

Reply via email to