I am looking for input on Sedna configuration and deployment.  

 

I apologize about bring up the hosting issue up again, and the length of the
post.  I have update details on the problem space, and wonder if anybody has
an input for an ideal solution.

 

Problem space:

* 2000 individual companies.

* Each company contains 500-2000 MB spread over 60 files that are inserted
on initialization of the company.

* The files are simple text XML files. The largest xml files can go to about
250 MB.

* No more than 200-400 users are using the system at any time.

* Insert about 1-5 MB per-xml a day per-company

* Update about 1-5 MB per-xml a day per-company

 

Options (that I can think of):

1.  Single system: 

2000 catalogs == 2000 individual companies.

 

Pros: Ease to manage.

 

Cons: 

Can the Sedna data file become fragmented?  eXist-db has data file issues
with the catalog approach.

Are there any boundary conditions surrounding a 1-2 TB data file and 2000
catalogs?

 

2.  Multiple databases: 

2000 databases == 2000 individual companies.

 

Pros: 

* Don't have to worry about fragmentation.

* If a company drops off, just delete the individual database.

* Can tell clients that they have their own database, no sharing.

 

Cons: 

* More difficult to manage than a single system.

* Had to modify and recompile Sedna

* Memory limitations.  100 MB of RAM per-company, 50 companies running = 5+
GB.  

(Can't easily meet 200-400 users without modifications and a bigger box))

* Memory fragments over time that is easily resolved by re-booting the
server.

 

3.  Amazon micro instances: 

2000 micro ec2 instances == 2000 individual companies.

Roughly 500 MB of usable memory per-instance.

 

Pros: 

* Only have the instance running when someone is using the system.

* Turn off and on the instance on for the specific company.

* Company drops off, just delete the individual instance.

 

Cons: 

* Very difficult to manage. 

* Amazon has account limits on instances per-region.

FYI: The Amazon micro instance is not a good approach if the client trying
to access the database via the client.  Amazon micro instances have very low
bandwidth.  Micro instances work if there is a proxy between the client and
database, and the proxy is in the same region as the micro instance.

 

4.  Hybrid:

* A hybrid approach that combines several techniques.  Possibly 10 databases
with 200 catalogs each.  

 

 

Any input is appreciated.

Thanks,

Malcolm

 

 

------------------------------------------------------------------------------
Achieve unprecedented app performance and reliability
What every C/C++ and Fortran developer should know.
Learn how Intel has extended the reach of its next-generation tools
to help boost performance applications - inlcuding clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Sedna-discussion mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/sedna-discussion

Reply via email to