Vedr.: Where should I store my static content in a clustered environment

Thomas Nybro Bolding Wed, 23 Nov 2005 02:40:35 -0800

Hi Eickvonder,

I would definitely go for solution 5, which resembles an assigment we were 
given in a course in Distributed Computing.


If possible go for active replication to distribute the load on several 
database servers. To implement this you must implement a common frontend 
(FE) communicating with the replica managers (RM). A read operation simply 
connects to the FE which (using round-robin or similar) connects to a 
database server. If youre not worried about byzantine errors simply fetch 
the first file being returned. A write operation connects to the FE which 
updates all database servers.

If you settle for this solution but something seems a bit unclear I 
recommend reading up upon distributed computing terms (e.g. "Distributed 
Systems: Concepts and Design", G. Coulouris, J. Dollimore, T. 
Kindberg, Addison-Wesley, 4rd edition, 2005, ISBN 0321263545)

As far as invalidation goes you can basically choose timestamp cache 
invalidation or callback cache invalidation.

Timestamp cache invalidation will upon each read request read the last 
time the file was updated and if e.g. 5 minutes has passed read a new file 
from the database. This is rather simple but does not ensure consistency. 
Further if the html files really are "static" and not changed very often 
you will probably choose long timeouts to minimize the number of 
irrelevant reads thus prolonging the time the webservers are out of sync 
after an update has been comitted. If possible memorywise save the state 
(fileId as int, time as long) in a hashmap or similar on the webservers to 
avoid having to read from disk before determining whether to fetch from 
the database.

Callback cache invalidation is better at acheiving consistency and 
minimizes reads from the database. The FE/RM should know which webservers 
has requested which files and send an "invalidate" to the those webservers 
when a client commits an update (thus ensuring webservers which have read 
the file will read it again once it is requested from a client). Also if 
possible here memorywise save the state (webserverId as int, fileId as 
int, time as long) in a hashmap or similar on the webservers to avoid 
having to read from disk before determining which webservers to 
invalidate.



Good luck, Thomas






"Eickvonder Bjoern" <[EMAIL PROTECTED]>
23-11-2005 10:23
Besvar venligst til "Tomcat Users List"

 
        Til:    <[email protected]>
        cc: 
        Vedr.:  Where should I store my static content in a clustered 
environment



Hi,

lets start with describing my current task where I would appreciate any
advice from you.
I have to construct a clustered system (with lots of webservers) that
has few dynamic pages but a lot of static ones, whereby all resources
have to be protected by security-constraints of a webapp (so letting
Apache deliver this content won't work). Moreover there should be the
possibility to upload/delete static components via a web form. My main
problem is now where should I store the static data (mainly html pages,
images, ...; but over 4 GB(!) large in total)?

As far as now I'm considering the following solutions:

1.) Storing the content within the webapp of each webserver. This would
include that the servers know each other as the upload/delete operations
must be propagated from one server to all the others. Moreover the
update of the dynamic parts would not be as easy any more as just
uploading a new war-file as this requires deleting the old webapp
directory (that contains the content is this case as well).

2.) Storing the content in a separate directory but still on each
webserver. This would still include that servers must know each other,
but updating the dynamic part would be easier. The downside is that I
would have to write a servlet that delivers all static content with all
the problems of mime-types, character encoding and so on which I would
have to handle myself.

3.) Storing the content in a database on a separate server. The
advantage would be that webservers only need to know their database
server and updating the webapps would be easy (just uploading new
war-files). The downside here is that I need a servlet too and I think
it's maybe not the fastest solution as all requests of all servers to
each single chuck of static content would require a connection to the
database server.

4.) As 3.) but storing data on a single separate server in the
filesystem. The advantages/disadvantages should be similar to 3.)
whereby I do not know which solution might be faster.

5.) As 3.)/4.) but additionally implementing a caching-mechanism on the
webservers. This means if a webserver gets a request for a specific page
for the first time he connects the database server to retrieve that
page, then stores it in its webapp directory and then let tomcat deliver
that page. On the second request it is just checked if that page is
already there and if so it is delivered directly. Of course I must
implement some mechanism such that the webservers get to know if their
cached data is outdated but so far this seems to me the best solution.

Anyone ever faced this kind of problem? Any kind of remark to my
possible solutions or any other possibilities you might know of are
appreciated.

Thanks you in advance for your help.

Bjoern

 

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





<FONT SIZE=1 FACE="Arial">_______________
Vi goer opmaerksom paa, at denne e-mail kan indeholde fortrolig information. 
Hvis du ved en fejltagelse modtager e-mailen, beder vi dig venligst informere 
afsender om fejlen ved at bruge svar-funktionen. Samtidig beder vi dig slette 
e-mailen i dit system uden at videresende eller kopiere den.
Selv om e-mailen og ethvert vedhaeftet bilag efter vores overbevisning er fri 
for virus og andre fejl, som kan paavirke computeren eller it-systemet, hvori 
den modtages og laeses, aabnes den paa modtagerens eget ansvar. Vi paatager os 
ikke noget ansvar for tab og skade, som er opstaaet i forbindelse med at 
modtage og bruge e-mailen.
_______________
Please note that this message may contain confidential information. If you have 
received this message by mistake, please inform the sender of the mistake by 
sending a reply, then delete the message from your system without making, 
distributing or retaining any copies of it.
Although we believe that the message and any attachments are free from viruses 
and other errors that might affect the computer or IT system where it is 
received and read, the recipient opens the message at his or her own risk. We 
assume no responsibility for any loss or damage arising from the receipt or use 
of this message.
</FONT>

Vedr.: Where should I store my static content in a clustered environment

Reply via email to