On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote:
> I'm new to SQLite . not a programmer . not a DBA . just an end-user with no
> dev support for a pilot project (single user, no updates, just queries).
> 
>  
> 
> I want to analyze the data contained in a 44GB csv file with 44M rows x 600
> columns (fields all <15 char). Seems like a DBMS will allow me to query it
> in a variety of ways to analyze the data. 

Yes, SQLite is quite capable of doing simple analyzis of such amounts of data,
especially selecting small subsets based on a simple criteria. However before 
trying to do some real work you have to understand the structure of your data,
realize your possible queries and carefully design database schema (tables 
and, equally important, indises). Perhaps, putting all data in a single 
600-column table is not a good idea (though allowed technically), especially
if your columns are equal by their physical nature: it is not so easy to
select arbitrarily calculated columns, only rows.

> I have the data files and SQLite on my laptop: a 64-bit Win7 Intel dual-proc
> with 4GB RAM + 200GB free disk space.

Well-indexed database of small data pieces usually takes up several times 
more disk space than the raw data. Probably 200GB would not be enough, 
dependng mostly on the number of indises. Consider dedicating a separate
disk (or even RAID array) for it.

> End-user tools like Excel & Access failed due to lack of memory. I
> downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager add-on
> but it would not load the csv files - 'csv worker failed'. So I tried
> Database Master from Nucleon but it failed after loading (it took 100
> minutes) ~57,000 rows with error message = 'database or disk is full". I
> tried to create another table in the same db but could not with same error
> message. The DB size shows as 10,000KB (that looks suspiciously like a size
> setting?).

Try bare sqlite shell instead of those external tools. It should take 
at least several hours to fill up your database.

> From what I've read SQLite can handle this size DB.

Surely it can. In one of my projects the database takes up almost 3 terabytes 
of disk space, contains more than 10^10 records and still provides small 
selects of indexed data in real time.

> 1.       Is SQLite the wrong tool for this project? (I don't want the
> overkill and admin overhead of a large MySQL or SQL Server, etc.)

It depends on the data structure, semantics and what you are going to find 
there. SQLite isn't very good for calculation of complex aggregate functions,
but works fine in simple selecting and sorting.

> 2.       If SQLite will work, are there configuration settings in SQLite or
> Win7 that will permit the load . or is there a better tool for this project?

Increasing PAGE_SIZE to match the filesystem block (cluster) size and 
perhaps CACHE_SIZE to fill most of the available RAM would help a bit. 
Also, don't forget to turn off journaling and wrap all in a single 
transaction when creating database for the first time.

Valentin Davydov.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to