Uh, no was Re: [Tech] Proposal: Move everything non-core out of fred

Matthew Toseland Tue, 4 Oct 2005 12:19:40 +0100

On Mon, Oct 03, 2005 at 02:00:52PM +0200, Guido Winkelmann wrote:
> Hi
> 
> Am Montag, 19. September 2005 17:07 schrieb Matthew Toseland:
> > On Sun, Sep 18, 2005 at 04:12:19PM +0200, Guido Winkelmann wrote:
> > > Hi,
> [...]
> > > If 0.7 should be done this way, the mentioned external daemon should, of
> > > course, come bundled with the standard Freenet package. A stripped down
> > > fred on its own would be of no use to most, just like a TCP/IP stack on
> > > its own - or i2p...
> >
> > And it would have to run in the same JVM, for performance and user
> > friendliness reasons (an extra several tens of megs for a second VM is
> > *BAD*). Why is it beneficial?
> 
> It doesn't have to run at all, and if it runs, it doesn't have to run on the 
> same computer. Given the extremely high computing resource needs for Freenet, 
> I think this is beneficial.


99.999% of users will not do this. We have to optimize the common case
first.
> 
> > > Things that should be thrown out that are present in current versions
> > > (5xxx) include:
> > >
> > > - The Web interface (definitely - but it might be wise to leave a
> > > stripped down version for strictly node-maintainance related purposes.)
> >
> > If you're going to chuck it out, then chuck it out. We do have FCP,
> > after all.
> 
> I'm sorry, I'm not a native english speaker. I haven't quite understood what 
> you were trying to tell me there.

If you are going to get rid of the web interface, then get rid of the
web interface. Do not include anything but FCP.
> 
> > > - The Distribution Servlet (definitely)
> > > - Everything related to splitfile handling (maybe, more talk on that
> > > further down)
> >
> > Yuck. That would mean that we have to have two separate FCP protocols -
> > high level FCP and low level FCP.
> 
> No, not necessarily. If you move splitfile handling out of the node, you 
> cannot supply a high-level FCP at all. If you leave it in, you could support 
> both a low-level- and a high level FCP, but there's no good reason for the 
> low-level one then. So, however you do it, you'll need only one FCP (/maybe/ 
> a second one for streams, but that's an entirely different discussion).

No. We will need a high-level FCP in order for it to be easy to write
clients. It is very important for it to be easy to write clients,
especially with smaller blocksizes (=> compressed, binary, split metadata
etc).
> 
> > IMHO we must not require every single 
> > client author to implement binary metadata, compressed metadata,
> > hierarchical metadata, FEC codecs, and everything else required; we must 
> > have a "high level FCP" including reasonably transparent splitfile
> > support,
> 
> No, not "reasonably" transparent. Fully transparent support or none at all - 
> loading a 500 MiB file should be no different (to the programmer of a 
> third-party app) than loading a 4 KiB file. (well, except for resource 
> consumption...) But that's a different discussion again.

Different discussion, but the way you handle it in practice might well be
different - not just resource consumption but resource allocation too.
> 
> > and it should probably include download direct to disk, 
> > download when the client is not connected, and queueing, as well, for
> > reasons related to starting splitfile downloads from fproxy.
> 
> I don't think that's a good idea. Better have a properly done download 
> manager 
> outside the node than a half-assed one inside it.

Why would it be half-assed? It wouldn't have much of a GUI, so third
parties can build one themselves for it. But it would be accessible from
ALL CLIENTS, which solves all manner of integration problems e.g. the
presence of large files on fproxy sites.
> 
> > > - The command line client (I don't think there's a good reason to include
> > > that in the normal freenet.jar)
> >
> > It's tiny, and it's a useful debugging tool. There is no compelling
> > reason to include it in the jar except to save 10kB or whatever it is.
> > Which is insufficient reason given Fred's memory footprint!
> 
> Well, this is not an important part of the whole discussion anyway, but I 
> still think it should get it's own .jar. Having to call it by its class name 
> from the freenet.jar may be intuitive for Java programmers, but not to anyone 
> else. This isn't how how things usually work on the commandline, usually you 
> have one utility = one binary. The way it is now, most people won't even know 
> there is a supplied cl client.

You can provide one binary if you want to. Just like you can provide one
binary for Fred.
> 
> > > and maybe other stuff I don't know about.
> > >
> > > The advantages to this approach are:
> > >
> > > - The node source code becomes smaller and easier to maintain.
> >
> > It does? How so? We'd have to maintain two separate source modules!
> 
> Yes, but it won't have to be the same people maintaining the seperate deamon 
> source who currently maintain fred's. I can, of course, not guaranty that 
> there will be people who will want to do this, but this way, your chances of 
> ever being able to delegate a significant portion of the work off to others 
> will increase dramatically.

It's the same repository. It would not save any access control isssues,
for example, because either somebody has access to the repository or he
doesn't.
> 
> > > - The operation of the node becomes more reliable. (There is less stuff
> > > in there that might cause it to crash.)
> >
> > The node must be cleanly separated from the UI code, including e.g.
> > splitfile assembly. This happens via interfaces, not via running it in a
> > separate VM, and not via having it in a separate CVS module.
> 
> It happens by running it in a different process or even on a different 
> computer.

Which would slow things down significantly and be used by exactly 3
users out of 100,000.
> 
> > > - Resource consumption of the node will be lower and much more
> > > predictable.
> >
> > Only if it runs without the extra daemon. Which it won't for the
> > majority.
> 
> You're making a big assumption here. Who says everyone will really want to 
> use 
> the services you provide by default with Freenet? Even right now there are 
> probably lots of users who only use fproxy for status info about the node and 
> mostly run their nodes only for frost or pm4pigs or something like that. For 
> these users, fproxy is mostly bloat. (I am one of those users.)

Fproxy is indeed mostly bloat in Fred 0.5, however in 0.7 we are moving
all the useful functionality that is implemented exclusively for fproxy
and the command line client into FCPv2, so that it is used by everyone.
Fproxy itself then is a very thin layer on top. I accept that there WAS
a problem with there being a lot of code in Fred which was solely for
the benefit of fproxy, but *we are fixing the problem*. And your
proposed solution is to make the current problems much much worse by
forcing everyone to implement FEC encoding, splitfiles, binary metadata,
split metadata, and everything else, in their own mutually incompatible
ways, at the cost of thousands of lines of duplicated code. We *MUST*
have FCPv2, which means that if we do as you suggest we will have two
separate versions of FCP. This is quite possible but it needs to be
justified.
> 
> Freenet is, in broad terms, a new anonymizing data transport layer. As such, 
> it can be used broad range of applications, most of which none of us ever 
> even thought about. "Freesite-browsing" might become the least important 
> application before you know it.

It is immensely important for political reasons IMHO. And we can't
bundle Frost, while we might be able to bundle a java-based
email/freemail/IM client.
> 
> > The memory overhead for running two JVMs instead of one will be 
> > significant,
> 
> As I said:
> - It doesn't always have to be running
> - It can run on a different computer

99% of the time it will run on the same computer. 99% of users would not
make such extreme tweaks; they would use what they were given and then
install more stuff on top.
> 
> > unless the JVM automatically coalesces, in which case there 
> > is no point anyway.
> > > - The code for enduser related functionality becomes a lot more
> > > accessible for new programmers. (Patches to fad are pretty much
> > > guaranteed not to interfere with the node's core operations and, thus,
> > > are more easily accepted.)
> >
> > Nobody (in the OSS world) codes java. And of those who do, they don't
> > use Fred's existing interfaces. This is a matter of documentation and
> > communication and stubbornness, not a matter of what is in what jar.
> 
> There definitely are people in the OSS world who code Java. Look at Azureus 
> for example or at I2P. It's also not simply a stubbornness-thing. Look at 
> other large OS-projects like Gnome, KDE, mldonkey[1] and whatnot. These are 
> thriving. Freenet isn't. I think this is a problem and I think it should be 
> addressed and I think that splitting user-related functionality and core 
> functionality is an important step in that direction. (Documentation is 
> probably even more important.)

No, making the interfaces MORE COMPLEX AND MORE LOW LEVEL will not
encourage the production of new apps. What we are doing - implementing
easy to use, high level interfaces that hide the complexity of splitfile
decoding from the app, implementing new services other than store and
retreive, and implementing and bundling basic communication tools such
as IRC, email and ideally CVS - will help to get new tools. Making
everyone code a redundant several thousand lines of code in their apps
just to save on memory usage when the user never uses splitfiles is
ridiculous.

> If a new programmer decides he wants to help out, the first thing he'll have 
> to do is find out how things work in the already-existing code, where he'll 
> have to tinker with it to achieve certain goals and where his own code might 
> fit into the whole thing. The bigger and more complex the existing codebase 
> is, the harder and more frustrating this process will get. Right now, if a 
> new programmer who has some ground-breaking ideas for fproxy for example, he 
> will have to wade through tons of code having to do with things like 
> datastore management, routing table management, key processing and whatnot.

He can ask. It is more important that it be simple to build 3rd party
apps than that it be simple to add on to the node, because the latter is
always going to be harder anyway. But the key problem here is a lack of
communication, which mostly emerges from the fact that you cannot easily
and safely communicate with the devs from within the network. The way to
fix this is:
- Implement and bundle an IRC proxy.
- Implement and bundle an email proxy, and tie it into the email lists.
- In the meantime, use Frost.
- Look into distributed version control systems.
> 
> If the code were in a seperate program, this would be a lot easier. The 
> general outline of the program, "the way of doing things" in there could be 
> simpler, more streamlined and better adapted to the needs of an end-user 
> tool.

The whole idea of "a separate program" is pretty dubious in java. A
separate code module maybe, but that's a lot of extra complexity for
zero gain - you would ALWAYS and I mean ALWAYS need to run the second
module, to provide FCPv2, if you run any clients. Now, it might be a
slight gain from a development view, but the same could be achieved by
having a separate module and having the two built into a single jar with
internal interfaces between them.
> 
> Then there is also the issue that people working on enduser related code 
> might 
> easily break some of the node's core functionality if it's all one big 
> program.

End-user code such as?

CVS does not have fine-grained access control. You cannot grant somebody
write access to one module but not to another. Now, maybe this is fixed
by Subversion...

> Someone from the project (I think it was Ian Clarke) said the 
> project had learnt the hard way that lots of times parts of the code that 
> shouldn't affect the performance of the node, did. A consequence of that, 
> IMHO, should be "well then don't put so much code in there that might go 
> wrong.". "Keep It Simple, Stupid", they say. 

Einstein once said that the trick with Occam's Razor (~= KISS) is not to
slit your own throat. Things should be as simple as possible - AND NO
SIMPLER. In this instance, in order for the clients to have a simple
interface, it is absolutely vital that we move the complexity of e.g.
splitfile handling into the node. Because this involves significant
resource allocation issues, it is necessary to have an actual download
manager in the node.

> (Note how many of the internet technologies that enjoy real long-term 
> success, like IP, TCP, UDP, SMTP, NNTP, rfc822-messages, HTTP... did just 
> that.)

Yes, and they are built on how many layers? How many applications use IP
directly? We must have a high level FCPv2 API!
> 
> > > - The external daemon can be written in a language other than Java
> > > (Whether that's an advantage may depend on your POV. IMO, it's a big
> > > one.)
> >
> > Not if it is bundled with Fred it can't. The high level daemon must be
> > maintained by the project itself, meaning it has to be in java.
> 
> Why? Have you sworn a holy oath to Java or something?

Convince Ian. Then convince me. Then we can talk about it.
> 
> > And, 
> > without intending to start a language flamewar here, you do know that
> > java can be compiled, right?
> 
> I have basic knowledge of Java. It's mandatory at the uni. Still, I don't see 
> what this has to do with anything

GCJ 4.1 will in all likelihood be able to compile Fred 0.7, that's the
point I am making.
> 
> > > - Third party implementations of the node itself can be done more easily.
> > > (The programmers won't have to rewrite all of the most basic user
> > > interface stuff from scratch) (I think this is a pretty important issue
> > > in long run.)
> >
> > Why does it matter? As long as there are clear internal interfaces it is
> > quite possible to plug in an alternative node implementation. And given
> > the SUBSTANTIAL effort involved in cloning either the node or the high
> > level code, this is not a big deal.
> 
> Cloning a Unix-like operating system takes substantial effort, too. Yet, 
> people have done it, linux is proof of that. The availability of userspace 
> tools which are more or less independent from the kernel they're running on 
> is an immeasurable help in such a case. Had the GNU tools been tightly 
> integrated into the kernel they were designed for back in 1991, Linux would 
> not have come very far.

Indeed, which is why it is essential that there is a simple interface
for the third party clients to use. This is called "high level FCPv2".
It would be quite possible to make a low level FCPv2 interface between
the node and the high level functionality, but it would take significant
effort for no immediately apparent gain, and it would complicate the
code enormously. Your whole case is based on the idea that the low level
FCP2 will be usable by most clients. It won't. Freenet 0.7 has a 32kB
block size. This means that the metadata for a large splitfile may not
fit into a single block. This means we have to split the metadata
itself. The client stuff is really quite complex, and it is already
agreed that the fred 0.5 FEC API is a monster. We really *have* to
implement splitfiles in the node.
> 
> > By definition all user friendly code must be left out of the node!
> 
> Not quite. What I am saying is that the user interface-related code in the 
> node should be as little as possible but as much as necessary. Some stuff 
> just needs to be done at that at level - resource restrictions (i.e. limits 
> for bandwidth and diskspace usage), peer management in the darknet case, 
> keeping the node up-to-date, stuff like this. This is all just 
> node-administration/maintainence stuff, i.e. none of the "essential" 
> functionality. (Essential functionality being the sort of functionality that 
> can be the reason for people to run a node in the first place, i.e. things 
> like freesite-surfing, using some sort of message-board or mail software via 
> freenet or something like that.)
> 
> This might be done via a stripped down version of fproxy or, better yet, via 
> some external utility that simply edits the config file(s) and sends the node 
> some sort of signal when it's done, to make it reread it's configuration.

Wonderful, so Fred ends up taking up THREE ports for HTTP. And two for
FCP. This is all pointless extra complexity. It will have cost in lines
of code, cost in maintenance, and cost in run-time RAM footprint. That
cost must be justified.

And if we accept that high level FCP must be implemented, then the
majority of the overhead of fproxy is in fact the things you talk about
above - the configuration servlet, the various status pages and so on.
Actually fetching pages is pretty easy as we have to provide that
functionality for FCP *anyway*.
> 
> > > - Fred can finally become a lean and mean unbloated routing machine.
> >
> > Which is totally meaningless if it requires a bloated user daemon for it
> > to do anything useful.
> 
> Do you require the Mosaic browser to do anything useful at all with HTTP, or 
> even TCP/IP? You might use it, but there are plenty of other useful options.

Who uses IP directly? Almost all protocols are layered, and this model
works well on the internet. Yes, it uses a bit more resources, but it
makes life easy for the programmers, and it means that the different
implementations are more interoperable. Both of those are far more
important than saving some RAM for the occasional people who *only* use
IP directly.
> 
> > > Disadvantages:
> > >
> > > - The project needs to make extra sure that "fad" really is in a usable
> > > state before it can release 0.7
> > >
> 
> [... Discussion about whether to do splitfile handling in the node or in the 
> clients ...]
> 
> There might be a third option: Have a third, independent agent somewhere 
> which 
> does nothing except splitfile handling. This has the following advantages:
> - The node can really be reduced to (32 KiB-) key and stream 
> handling/forwarding and datastore management.
> - The splitfile handling agent can be extremely simple: It does nothing 
> except 
> splitfile encoding/decoding.

No, it can't be that simple. The reason is that it has to do the
requests (otherwise you end up with the universally hated Freenet 0.5
FCP FEC API, where the client fetches the blocks, feeds them to the
node, asks it to decode them, and then manually inserts healing blocks).
It also has to store the data, for similar reasons; it is simpler for
the agent to store the data than it is to coordinate a client cache with
the client. And if we are storing the data we might as well implement a
queue. So you end up with what is already expected for FCPv2.

> - The splitfile handling agent can be run on a different machine.

Indeed, but you still haven't explained why this justifies the cost.

> - The splitfile handling agent needs only be run when the node is actually in 
> use. If the user isn't using his node for a while, it can fully concentrate 
> on handling/forwarding requests from it's network neighbors.

So what? The code is insignificant - a meg maybe in bytecode - it's the
data structures, and the JVM, which take up space.

> - Third party apps don't can still be as simple as they were if splitfile 
> were 
> done inside the node.
> 
> Disadvantages:
> - This would mean we really need to have a low-level FCP and a high-level FCP.

Splitting it up as you propose absolutely requires low- and high- level
FCP. This is not IMHO on the critical path for 0.7. It does not make
life easier for third party devs. It does mean we have to maintain more
code, more complexity, and more interfaces. And it does not make life
much easier for potential developers.
> 
> I see two ways of integrating this external agent with Freenet:
> 
> - Put it between the node and the client apps
> This way, clients would talk directly to the splitfile handling agent using a 
> high-level FCP while the agent would use the node as its backend, talking to 
> it using a low-level FCP.

This is possible.
> 
> - Have the node delegate splitfile handling to the agent on it's own accord.
> This way, clients would still talk directly to the node. If the node receives 
> a request for a key, it simply forwards the request to the agent and waits. 
> The agent will then request that key from the node using low-level FCP (which 
> says "give me the raw 32 KiB block behind this key" instead of "give me the 
> file referenced by this key") and decide whether it's a splitfile or not. I 
> it isn't, it simply truncates the block to the exact file-length and gives it 
> back as the final result. If it is, the agent requests the remaining blocks, 
> decodes the file and hands it back.

This is pointless.
> 
> The last approach has the advantage that it could still be implemented much 
> later without stirring up too much mud.

So could the first approach.
> 
> > > Thus far for my proposal. Comments would be appreciated.
> > >
> > >   Guido
> 
>       Guido
> 
> [1] mldonkey is interesting for yet another reason: They're using a language 
> that's even less popular than Java, and they're still doing better than 
> Freenet.

"Doing better than" defined how? They have more users? I would suggest
that is due to performance more than anything else. They have more
developers? That is down to all the CS students who studied ML finding
they can do something useful with it (building warez-sharing clients).
But it is also due to the communications issues I have discussed, which
we must improve on in 0.7. These would not be fixed by adding more
complexity, more code and more bloat for no clear purpose.
-- 
Matthew J Toseland - toad at amphibian.dyndns.org
Freenet Project Official Codemonkey - http://freenetproject.org/
ICTHUS - Nothing is impossible. Our Boss says so.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
URL: 
<https://emu.freenetproject.org/pipermail/tech/attachments/20051004/2cf250fe/attachment.pgp>

Uh, no was Re: [Tech] Proposal: Move everything non-core out of fred

Reply via email to