On Sat, Apr 23, 2016 at 12:50 PM, Keith Medcalf <kmedcalf at dessus.com> wrote:

>
> When a file is a bag-o-bytes rather than a collection-of-records (fixed
> length records at that), there is no longer any need to keep the records
> the same length, hence no need to specify the "reserved length" of things
> (contrary to common mis-belief, varchar(100) does not mean a "variable
> length character field with a length up to 100 characters", it means a
> character field with a length of 100 into which a smaller amount of data
> may be stored that will be padded to 100 characters with special
> discarded-upon-retrieval padding characters" -- that is, the difference
> between CHAR(100) and VARCHAR(100) is that in the former case data less
> than 100 characters is padded with spaces, and in the latter case it is
> padded with nulls.  The column is still, however, 100 characters wide.


That's not necessarily true in the case of VARCHAR. It could be done that
way, but it could just as easily and probably much more efficiently be done
in a truly variable length fashion. Microsoft SQL Server (as far as I
understand, being the only other SQL engine I've spent much time with)
stores VARCHAR columns as the actual number of bytes in the field plus two
bytes of overhead. So a 100 byte string stored in a VARCHAR(10000) column
would only need 102 bytes of storage, not 10000 or 10002.

My source (because I haven't used MSSQL for a few years) is
https://msdn.microsoft.com/en-us/library/ms176089.aspx to refresh my memory.

For any SQL datastore, the way the data is stored is completely an
implementation detail. The SQL engine would be free to serialize all values
to text and store them in 4K pages if it wanted to, then deserialize them
on the way back. I certainly don't know of any that do that, but the
impetus for the creation of VARCHAR fields (I imagine) was specifically to
avoid storing padding for data that did not require it.

-- 
Scott Robison

Reply via email to