I have scaled down the attributes of interest to 46 columns (discarding the other 550). No columns are calculated. No updates to this file ... one user ... only query, sort, etc. type transactions. So I want to load two 22GB csv files into an empty 46 column table. (I intend to test load with 999 records by 46 col file.) initially I only have 1 index on a record # ... am not positive several other fields that I want to index may not be missing data in some records (I assume that will error out if I make those an index). After I get the data loaded and inspect for nulls in prospective index attributes, can I add indices?
I was planning to load using sqlite3 CLI ".import" command. Is there a way I can monitor the progress of the load, with only minimal impact on performance ? I've started several loads only to find out hours later that nothing has been loaded. Thanks for helpful info. peter -----Original Message----- From: sqlite-users-boun...@sqlite.org [mailto:sqlite-users-boun...@sqlite.org] On Behalf Of Valentin Davydov Sent: Friday, May 04, 2012 9:43 AM To: General Discussion of SQLite Database Subject: Re: [sqlite] is SQLite the right tool to analyze a 44GB file On Tue, May 01, 2012 at 04:06:01PM -0400, peter korinis wrote: > I'm new to SQLite . not a programmer . not a DBA . just an end-user > with no dev support for a pilot project (single user, no updates, just queries). > > > > I want to analyze the data contained in a 44GB csv file with 44M rows > x 600 columns (fields all <15 char). Seems like a DBMS will allow me > to query it in a variety of ways to analyze the data. Yes, SQLite is quite capable of doing simple analyzis of such amounts of data, especially selecting small subsets based on a simple criteria. However before trying to do some real work you have to understand the structure of your data, realize your possible queries and carefully design database schema (tables and, equally important, indises). Perhaps, putting all data in a single 600-column table is not a good idea (though allowed technically), especially if your columns are equal by their physical nature: it is not so easy to select arbitrarily calculated columns, only rows. > I have the data files and SQLite on my laptop: a 64-bit Win7 Intel > dual-proc with 4GB RAM + 200GB free disk space. Well-indexed database of small data pieces usually takes up several times more disk space than the raw data. Probably 200GB would not be enough, dependng mostly on the number of indises. Consider dedicating a separate disk (or even RAID array) for it. > End-user tools like Excel & Access failed due to lack of memory. I > downloaded SQLite ver.3.6.23.1. I tried to use Firefox Data Manager > add-on but it would not load the csv files - 'csv worker failed'. So I > tried Database Master from Nucleon but it failed after loading (it > took 100 > minutes) ~57,000 rows with error message = 'database or disk is full". > I tried to create another table in the same db but could not with same > error message. The DB size shows as 10,000KB (that looks suspiciously > like a size setting?). Try bare sqlite shell instead of those external tools. It should take at least several hours to fill up your database. > From what I've read SQLite can handle this size DB. Surely it can. In one of my projects the database takes up almost 3 terabytes of disk space, contains more than 10^10 records and still provides small selects of indexed data in real time. > 1. Is SQLite the wrong tool for this project? (I don't want the > overkill and admin overhead of a large MySQL or SQL Server, etc.) It depends on the data structure, semantics and what you are going to find there. SQLite isn't very good for calculation of complex aggregate functions, but works fine in simple selecting and sorting. > 2. If SQLite will work, are there configuration settings in SQLite or > Win7 that will permit the load . or is there a better tool for this project? Increasing PAGE_SIZE to match the filesystem block (cluster) size and perhaps CACHE_SIZE to fill most of the available RAM would help a bit. Also, don't forget to turn off journaling and wrap all in a single transaction when creating database for the first time. Valentin Davydov. _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users _______________________________________________ sqlite-users mailing list sqlite-users@sqlite.org http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users