Alex Rousskov wrote:
On Tue, 2008-08-26 at 12:15 +1200, Amos Jeffries wrote:
On Tue, 2008-08-26 at 02:23 +1200, Amos Jeffries wrote:
Okay, got the pretty-picture drawn up.

NP: this is drawn up as a high-level flow from my accumulated view of
all our work to date and where its heading. That includes Adrian's
Squid-2 work and where I see it most efficiently mapping into Squid-3.

It should be very similar to how squid currently works. With a few major
differences that we have all spoken and planned things around already.
Thank you for working on this picture. I am not quite sure I interpret
it correctly, but I do see a few distinct objects there: Data Pump, HTTP
Parser, and Store. This is more or less clear, at least at this high
level.

It is not clear to me whether the other blobs such as Protocol
Processing and Protocol Handling are flows, objects, or something else.
I am also not sure whether the arrows represent passing message data,
passing processing responsibility, or something else. Do different
colors and blob shapes mean something?

If we want an architecture picture, I think it would be great if we can
formulate it in terms of objects and flows among them. This should make
roles and boundaries much more clear.
Okay. The clouds are where I'm uncertain of the distinct content not
knowing everything about Squid yet.

 - The protocol processing cloud is modules such as FTP, Gopher, HTTP,
HTTPS?. Each being separate, but performing a 1-1 relationship with the
request. A flow handled by a protocol 'manager' object.

Is HTTP protocol processing module a single class implementing both
client- and server-side processing?

Good question. I think its probably best not to at this point. Though with the overall design it does not at this point matter if they are separate but communicating modules.


Forwarding Logic looks at the request just enough to decide where to shove
it and passes it on to one of these.

Does it stay in the loop to, say, try another forwarding path?

No. If another path is needed responsible module needs to explicitly pass it back into forwarding logic, with whatever new state the FL might need to deal with it properly (error page result being one such case).

Same goes for any module handing off responsibility to non-specific destination.

Does
forwarding logic know about caching?

Only as the cache is one possible path to completion.


 - The second cloud is the ACL handling, redirectors. We came up with some
ideas at AusMeet that make all that a single object flow manager.
Efficiency of that still needs to be checked.

Will those ideas be documented/discussed? Or is the current plan to test
performance first?

Yes, eventually. I'm looking for time to write up a new feature page.


 - Arrows are callback/responsibility flow.

Callbacks and "responsibility" can be rather different things and can
"flow" in different directions. Perhaps the arrows can be removed (for
now) to avoid false implications?

True, but in Squid responsibility for current code operations on state flows down the callbacks at present.


1) Data Pump side-band. This is a processing-free 'pump' for any
requests which do not actually need to go through Squid's twisted
logics. With all logics compiled out it should be equivalent to a NOP.
But can be used by things such as adaptation components as an IO
abstraction.
What data does the Data Pump pumps? Message bodies? What are the valid
ends of a pump? Can there be many Pumps per HTTP transaction? Does the
Pump communicate any metadata to the other side?
Data pump moves bytes, from A to B. IO level provides all the hooks for it
to do so. A and B could be sockets, buffers, pipes, handles, whatever gets
micro-designed.

A pipe moves something from A to B. Is Data Pump a pipe? Pipes connect
two ends. Pumps have a single end that produces/generates/provides
something. You can put something into a pipe and get it on the other
end. You can only get something from a pump.

If Data Pump is a pipe, please note that the current pipes are slaves
(they are being told what to do). Are you proposing active pipes that
use some kind of unified I/O APIs to suck data from one end and push it
into the other?

Contrary to Adrians latest pump statements, I'm still envisaging a data pump as a one-way. From source to sink, whatever those may be. Yes my vision of it is a slave told where the source/sink/buffer is and left at it to completion.

This lends itself to the which HTTP model, one pipe reads headers into a buffer and passes that in, then when whatever logics handle the headers asks pump to read the body from source to a given sink (cache object, adaptation buffer or clients tcp socket for three likely examples).


Does Data Pump/Pipe store/buffer the bytes to give the other end a
chance to get ready for consumption?

I hope not, its sinks would ideally be straight into some type of socket, but buffers need to be accommodated.


As for many pumps per transaction: Ideally 1 (zero-copy), realistically 2
(client-side, and server-side).

I do not understand how a transaction can have one pump or even one pipe
(unless the pipe is bi-directional). Is Data Pump a bidirectional pipe
that can shovel bytes in both directions?

Purely stateless. Uni-directional, but ambiguous as to which that uni-direction is.

The core function of pump would be to handle non-adapted tunnel traffic or request bodies, which may be very large amounts of bytes needed for socket A to socket B with nothing but size accounting or speed delays between.

Most modules really should be acting on a buffer pre-filled by a pump somewhere, and passed non-copy (excepting the adapters of course) as part of the request state.


I apologize for so many questions, but the picture does not really
define these things and without knowing what the blobs are, how one can
evaluate the Architecture or one's compliance with it?

No worries. Thats why its still only proposed.


Content-adaptation may need more to pump
bytes out to the ICAP helper and back etc.

If adaptation components can use Data Pump as an I/O abstraction, should
not all other high-level components processing the transaction do the
same so that high-level I/O code could be reused among all the
components?
Yes. The exception being quick forwarding logic which may handle accept()
before bootstrapping it into a protocol manager or a 'tunnel' pump.

The NOP equivalence mentioned above confuses me. Do you mean that the
pump does not copy data if it does not have to?
Yes. As close to zero-copy as reasonably possible.

2) Client Facing IO is unwound from all processing logics. It's simply a
raw input layer to accept connections and interface to the clients.
The "external" side of the Client Facing IO blob is socket API and such,
right? What is the Squid-side interface of the Client Facing IO blob? A
collection of portable socket-level routines? Some kind of a Transaction
object?
Something. I'm not going into implementation details. I'm thinking the TCP
listening sockets themselves.

I am not asking about implementation details. I am asking about
high-level interfaces of the blobs on the picture. Without that
knowledge, it is difficult to understand how the blobs are connected and
what they send to each other.

Okay. The two yellow bars for IO, are whats left of the comm layer (and SSL layer) after its been slim-lined down to simply handle the sockets and setup initial state objects on accept(). Everything else, from byte reads to byte writes lies between them in one place or another (read/write as part of the pump).


Is limiting the number of accepted connections a "processing logic" or
"Client Facing IO" logic?
Limiting accepted connections? Why would we want to do that?

Because we are running out of resources and do not want to accept more
responsibility until we deal with what we already have? But this is not
critical at this point, there are much bigger questions so let's ignore
this one.

Understood. For that it would be part of the client facing IO. Or possibly the queue (or later thread) processing priority code.


delay_pools moves to a governor feature slowing the data pump. ACLs stay
as forwarding logic assists on an if-needed basis.


3) Server Facing IO is likewise unwound from processing logics AND from
client IO logics. Though in reality it may share lowest level socket
code.

4) Processing components are distinct. Each is fully optional and causes
no run-time delay if not enabled.
What decides which processing components are enabled for a given
transaction? Do processing components interact with each other or a
central "authority"? What is their input and output? Can you give a few
examples of components?
Forwarding Logic. Or possibly an ACL/helper flow manager. How its coded
defines whats done. Presently there is quite a chain of processing.

We talked of a registry object which was squid.conf given a list and order
for ACL, redirectors, etc.  That would make the detailed state fiddling
sit behind the single manager API.

Many processing decisions are not static so I doubt a registry object
driven by squid.conf can handle this. In fact, I suspect no single
object can handle this complexity so the responsibility to enable
processing components would have to be spread around processing
components (and forwarder), which makes things like a "single pipe with
bumps" design difficult to implement.

I think one of us mis-understands. Adrian explained the current flow of security processing in Squid was something like:
 cachable -> http_access ACLs -> FXFF ACLs -> http_access ACLs -> blah blah

Currently a fixed order of operation, most of which can be turned off in squid.conf, but the code runs through it all anyway, on a quick path, but still through it. The aim here was to reduce those mega-blocks down to a minimal order of calls as determined by the user. Giving them the added benefit of knowing and (if they liked changing) the exact order of processing. ie, is url_rewrite done before http_access check or after caching? is caching done before auth? is icp_access done before or after cache_peer_access?


The text on the picture seems to imply that there can be only one
Processing Component active for a given transaction, which worries me,
but perhaps I just do not understand what kind of Components you are
describing here.
The finer details may run in parallel within a module. But the high level
processing sequence for any single request needs to be linear (or at least
representable in a linear fashion) to be understandable.

I am not sure I agree, but let's wait until there is a processing
sequence on the picture.

5) Stores are an optional extra, if the configuration calls for caching.
But not needed for basic operations.
Is there a single global index of stored responses? If yes, is it
enabled only when caching is enabled?
That would be an implementation details inside the Store module top left
of the picture.

IMO there should be a global API for storage. Whether that API loops
through a single index or a set of per-Cache ones is a detail choice.

Sure, but it is important to decide whether the global index (or
equivalent store interface) exists when caching is disabled. Currently,
you get an index whether Squid (or a given caching scheme) needs it or
not. Do you propose that there is no such index?

With this architecture you could disable caching entirely to the point of not being compiled in. It's irrelevant outside the store module. All the other modules need to see is a buffer of data or its absence.


Do you consider request merging a form of caching?
I consider it a flow design issue. If forwarding wants to take a request
and point it at an already filled buffer (from store, from live stream, or
from /dev/zero) thats it's business.

How will it find a "filled buffer" to merge with if there is no index?

I think we are clashing. From what I've been told the current design of squid depends on in-transit objects being in cache storage. This does not hold true in my proposed architecture. The ForwardingLogic my have an internal hash/cache/index of in-transit URLs if it really need to, but its won't involve the store. It's in-flow data.

(NP: this model also applies to broadcast streams if we want to go that way eventually).


If we all agree and work towards this type of model and things are kept
modular isolated to the highest levels. I don't see the future
integration of either squid branch or CacheBoy as being a big task.
I think we would need a more detailed or precise architecture
description to be able to "work towards it" or, more precisely, to
identify code that does not satisfy the architectural constraints.
Otherwise, everybody will be claiming to conform to the Architecture
principles but there will be no improvement as far as merging Squid2 or
external code into Squid3.
Agreed. That detailing is what we are starting now.
The two orange clouds need to be fleshed out into named components, then
on to slightly finer details.

All current blobs need better description/definition, IMO. A few more
blobs may need to be added. The next step would be to define the flows
(i.e., which blob talks to which and what do the send to each other).

Yes.


BTW, the text descriptions you gave above appear much more useful than
the picture itself. Perhaps we can define the main objects and flows
better and then redraw the picture to match the descriptions? Should
this go into a wiki?

If you can't find any flaws with that highest level flow design. We can
wiki the progress so far and start iterating down to API definitions and
TODO lists.

It is too early to find flaws. I do not understand the current picture
yet. What I am saying is that it may be easier to ignore the picture for
now, define a few blobs, and then try to draw it again.

It would also be nice to agree on how distant is the future that the
picture should reflect. Are we drawing Squid 3.3? Squid 10? The
Architecture picture would be quite different for those two examples...

Really? A good architecture, (which I am aiming for here) would look the same for both. With possibly different names or larger numbers of blobs, maybe finer detail the older it gets.

Theres firstly a lot of work to get to anything like this end-product. Though we could achieve it by 3.2 if we all agreed and set out to do just that.

Afterwards, a vastly larger array of possible 'pluggable' bits that can be integrated. As individual implementations of the blobs.

Amos
--
Please use Squid 2.7.STABLE4 or 3.0.STABLE8

Reply via email to