On Tue, 2003-07-29 at 10:05, lou wrote:
> > You're still storing it on a disk, so the reliability is the same, store
> > your db on a raid5, or store flat files on a raid5.... its the same
> > thing, raid5 loses more than one disk at a time and you're hosed either
> > way, so the only way to be sure is to have archived backups, or hot
> > systems (replicated db's).
> 
> In order to understand why you have to know what are the probabilities of 
> failure and etc.
> Talking in general having everything in one database (excluding the case of 
> this very
> database being replicated) you've got yourself a single point failure, since 
> with FS is
> bit different (I'm underling _FS_ rather than _disk_, because the database 
> itself is
> dependable on both of them excluding the cases when you run database in the 
> RAM disk or
> NAS), granulating the FS in different partitions gives you more redundancy 
> and in some
> cases more security (not only in point of view of being consistent, but in 
> general
> security issues ACLs and so).
> 

Depends on the database, MySQL using InnoDB can split the database into
certain sized files located on many different filesystems.  Most other
databases have some sort of similar ability.

> What's wrong with mirroring RAID, having RAID 1 +0 and IMHO this will solve 
> your problems.
> there are tons of applications, distributed filesystems and so which you can 
> use to
> provide consistency and redundancy as well as one major thing _security_.
> 

I don't have any problems, I have a RAID5 that works just fine.

You seem to be hinting that running a database is inherantly insecure,
which I'd consider to be false.  Its no less secure than a unix
machine.  A stock installed unix machine that isn't kept up to date, and
setup properly can lead to a root exploit, bypassing all fs acl's... 
So, if you setup a db to be as protected as your unix machine, the
security implementations are the same.

> ACL in database suck, you can put permissions on table, database, can you put 
> permissions
> per every record? I guess not in the every day database.
> 

Every record isn't necessary, since the programs accessing the database
will need access to all of the records eventually anyhow in a mail
application.  What they may not need is access to every column in a
table, and MySQL does have column level acl's as well as table and
database level.  

> > As for the access time, that's true, in part, but the second part of
> > your statement is false.  file system mta's with large numbers of users
> > will suffer tremendously compared to a db.  A db is an indexed,
> > organized storage, flat files are sequential only.  So when you have
> > large mailboxes, a database will win over flat file for access time
> > since it doesn't have to seek the whole file to search a message in the
> > middle. 
> 
> Btrees, hash tables, used in FS as well.
Yes, for locating files, fs's don't index and maintain meta data of the
contents of a file.

> 
> > Of course, if you're only running pop3 and your clients are
> > behaving appropriately (downloading all messages every time) a flat file
> > might show a performance benefit (since it is a sequential read and all
> > the other indexing isn't needed).  But some clients will skip messages
> > they've already downloaded, requiring an expensive seek without an index
> > as to where to look, and pop3 is becoming less popular as imap (as a
> > direct protocol, or the backend of a webmail system) becomes more
> > popular, which don't regularly do a sequential read.
> 
> The main strength of the database I'd say is in the cache the way it's 
> handled and write
> ahead logs and so (that's where metadata comes in). Look at reiserfs it uses 
> Btrees also
> it has hash table, not to mention the metadata. Now look at reiserfs 4 :-). 
> even the
> metadata and data journaling gives you more secure storage than database.
> XFS, EXT3, REISERFS, JFS, NTFS all these are journals filesystems.
> 
If that's a main strength of a database, then its the main strength of a
fs as well since most all of them contains what you just mentioned.   As
you say below, databases and filesystems share the same philosophy.  And
I don't see how an fs is any more secure than a db...

> One thing I know for sure that most of the features are derived  from 
> database designs.
> 
> Let's be honest, what do you prefer, being consistent, secure and redundant 
> than speed?! 
> I'd not think so. Hardware is cheap, horse power is cheap.

I prefer both and both are achievable at the same time.

> Making a cluster of Distributed Filesystems is much more feasible than doing 
> it with
> Database, for 1 most of the database use the Lazy  Replication approach 
> rather than Eager,
> and all the updates are done _after_ the transaction was done, in Eager you 
> have
> synchronous replication where the transactions are spread across the whole 
> cluster,
> not to mention Eager partial replication, where different bits can reside on 
> different
> servers, but:
> 

As you're so eager to point out, db's in the common case reside on top
of a fs, which could be distributed....  So maybe the solution is a
combination of both.  My point is, that in mail access, the indexing and
cacheing of a fs is insufficient.  An fs at best sorts on the mail
"folder" level (which in the case of pop3 is pointless because you have
1 folder).  A db indexes on the message level, causing a great increase
in speed potential.  Even if you have a hiearchy of account->folder->one
message per file, the filesystem still does not provide any meta data on
the contents of the message file other than a timestamp.


> 1) It gives you more granularity
> 2) More consistency in sense of data writes
> 3) Creates a whole variety of options to add/remove server depending on the 
> load.
> 4) ... and so on.
> 

Addressed above, running your db over a dfs would reap all the benefits
describe above, instead of just 1-4.

> > In addition, file access becomes increasingly slow when you have
> > thousands of files in a directory, or thousands of directories in a
> > directory with hundreds of files in those subdirectories...  Flat files
> > do not win in any scenario other than ease of installation on extremely
> > small setups.
> 
> Actually files aint a bad idea, just not accepting it in this case looks like 
> a biased
> opinion. Look at Plan9 it have concepts of which other OS can dream of ;-)
> 

It comes frome experience with both systems, mbox with sendmail, and
dbmail with MySQL/InnoDB.  As the spool size increased, performance
decreased.  The same happens with dbmail and MySQL, however, the dropoff
is at a much less steep rate with dbmail than with the mbox.  Now of
course, there are some solutions to relieve this such as Maildir, etc. 
However, they still don't address the performance benefit gained from
indexing message level metadata which an fs doesn't do.

> Also how many files to add in a folder is a user choice .
> since the database can easily start coughing when having too much indices and 
> blah blah..
> 

The users's choice drives the system.  Unless you are in a corporate
environment where you are in a position to dictate absolute system
policy, the users dictate to you.  If you don't provide the services
they want, they go elsewhere.




Reply via email to