Dear Rashad,

     I have been developing a survey system and cross tab engine. I was 
experimenting with my random data generator which is part of the suite - 
someone on hackernews mentioned an app that could handle thousands of records 
and someone enquired if it could handle millions of records. So I thought, lets 
check what kind of speeds my software can do. So i got my random data generator 
to generate 1 million data files and then another part of the software wrote 
out the files into a single flat file and did my experiments.

     Let me tell you that after I generated the 1 million files - the system 
got very slow. Everytime I cd'd into the directory containing the 1 million 
files - the next command took minutes to execute. I guess it is because the os 
has to read the directory entries - and the directory entry itself is about 40 
mb (showing at 1 level higher than itself)

     drwxrwxr-x 5 nxd nxd  40255488 Oct  6 11:39 attempt2


  Above is the directory that contained the 1 million files. My laptop is an 
i5, having 8Gb of ram. So I am not sure your idea of flat files is very good. 
Some of the other reasons I think why if got slow also would be that the buffer 
cache got full of data I would never need etc. Also "zsh" would not be able to 
do command completion - again because the command line size parameter would be 
exceeded. I think you would just move the bottle neck from database search to 
directory entry search. Also directory entries have no concept of regular 
expression and indexing which would make it fast. Each and every single entry 
has to be visited before you decide that this is the file which you want.


     I think you should try out a few experiments yourself, before you rush 
into something like this. My software is open source - so if you want to create 
a few million files the random data generator can be used and I can show you 
how to use it.


Regards,
Neil





>________________________________
> From: Mohammed Rashad <[email protected]>
>To: [email protected] 
>Sent: Sunday, October 7, 2012 7:14 PM
>Subject: [Wt-interest] Wt File IO vs Database IO
> 
>
>All,
>
>
>Due to large data used in a crowd source mapping project. I had decided to 
>completely eliminate the use of database and use Flat files for storage, 
>retrieval and query.
>
>
>I thought of storing each record in db as individual files. so this will help 
>in the retrieval speed and no search is needed in the entire db or a file.
>
>
>but if a table have more than 10000's of records and users accessing 
>(same/different) records from different places will result in N number of File 
>I/O
>
>
>Will this be a bottleneck in the application. consider each file of size <= 
>15KB.?
>
>
>The main reason to eliminate db is because of performance bottleneck in 
>database I/O.
>
>
>So moving to new model will help in anyway as the number of users and data 
>will be much more than expected?
>
>
>
>-- 
>
>Regards,
>   Rashad
>
>------------------------------------------------------------------------------
>Don't let slow site performance ruin your business. Deploy New Relic APM
>Deploy New Relic app performance management and know exactly
>what is happening inside your Ruby, Python, PHP, Java, and .NET app
>Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>http://p.sf.net/sfu/newrelic-dev2dev
>_______________________________________________
>witty-interest mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/witty-interest
>
>
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
witty-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/witty-interest

Reply via email to