Dear Rashad,
I have been developing a survey system and cross tab engine. I was
experimenting with my random data generator which is part of the suite -
someone on hackernews mentioned an app that could handle thousands of records
and someone enquired if it could handle millions of records. So I thought, lets
check what kind of speeds my software can do. So i got my random data generator
to generate 1 million data files and then another part of the software wrote
out the files into a single flat file and did my experiments.
Let me tell you that after I generated the 1 million files - the system
got very slow. Everytime I cd'd into the directory containing the 1 million
files - the next command took minutes to execute. I guess it is because the os
has to read the directory entries - and the directory entry itself is about 40
mb (showing at 1 level higher than itself)
drwxrwxr-x 5 nxd nxd 40255488 Oct 6 11:39 attempt2
Above is the directory that contained the 1 million files. My laptop is an
i5, having 8Gb of ram. So I am not sure your idea of flat files is very good.
Some of the other reasons I think why if got slow also would be that the buffer
cache got full of data I would never need etc. Also "zsh" would not be able to
do command completion - again because the command line size parameter would be
exceeded. I think you would just move the bottle neck from database search to
directory entry search. Also directory entries have no concept of regular
expression and indexing which would make it fast. Each and every single entry
has to be visited before you decide that this is the file which you want.
I think you should try out a few experiments yourself, before you rush
into something like this. My software is open source - so if you want to create
a few million files the random data generator can be used and I can show you
how to use it.
Regards,
Neil
>________________________________
> From: Mohammed Rashad <[email protected]>
>To: [email protected]
>Sent: Sunday, October 7, 2012 7:14 PM
>Subject: [Wt-interest] Wt File IO vs Database IO
>
>
>All,
>
>
>Due to large data used in a crowd source mapping project. I had decided to
>completely eliminate the use of database and use Flat files for storage,
>retrieval and query.
>
>
>I thought of storing each record in db as individual files. so this will help
>in the retrieval speed and no search is needed in the entire db or a file.
>
>
>but if a table have more than 10000's of records and users accessing
>(same/different) records from different places will result in N number of File
>I/O
>
>
>Will this be a bottleneck in the application. consider each file of size <=
>15KB.?
>
>
>The main reason to eliminate db is because of performance bottleneck in
>database I/O.
>
>
>So moving to new model will help in anyway as the number of users and data
>will be much more than expected?
>
>
>
>--
>
>Regards,
> Rashad
>
>------------------------------------------------------------------------------
>Don't let slow site performance ruin your business. Deploy New Relic APM
>Deploy New Relic app performance management and know exactly
>what is happening inside your Ruby, Python, PHP, Java, and .NET app
>Try New Relic at no cost today and get our sweet Data Nerd shirt too!
>http://p.sf.net/sfu/newrelic-dev2dev
>_______________________________________________
>witty-interest mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/witty-interest
>
>
>
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
witty-interest mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/witty-interest