On Mon, 14 Nov 2011 16:58:02 +0000 David Holland <dholland-t...@netbsd.org> wrote:
> On Mon, Nov 14, 2011 at 04:03:09AM -0500, Matthew Mondor wrote: > > > I was recently talking to some people who'd been working with some > > > (physicists, I think) doing data-intensive simulation of some kind, > > > and that reminded me: for various reasons, many people who are doing > > > serious data collection or simulation tend to encode vast amounts of > > > metadata in the names of their data files. Arguably this is a bad way > > > of doing things, but there are reasons for it and not so many clear > > > alternatives... anyway, 256 character filenames often aren't enough in > > > that context. > > > > It's only my opinion, but they really should be using multiple files or > > a database for the metadata with as necessary a "link" to an actual > > file for data. > > Perhaps, but telling people they should be working a different way > usually doesn't help. (Have you ever done any stuff like this? Even if > you have only a few settings and only a couple hundred output files, > there's still no decent way to arrange it but name the output files > after the settings.) I agree that if they already started on the wrong path it's hard to tell them to change their methods, but it was probably not ideal to expect that file length was an unlimited resource... The situations where I had to deal with such were web sites, with media stored as files and metadata in databases (with file names either being hashes or a serial number); another instance was in camera security software saving stills and archiving videos as files, with the directory and file names being based on a type of time stamp. Another case is mmmail where mail is stored in a custom format in files, backed by a postgresql database. It works well, but it can be tricky not to leak files (in the case of a web application using postgresql for instance, delete trigger functions can be used to insert entries in a table for files to be deleted, with a scheduled event or daemon cleaning those up). The few instances where I've seen leaked files were after abnormal crashes/reboots though; some recovery/cleanup software is then useful. I guess that this also gives an answer you expected however: that it's more complex to DTRT, as user software must create the link between two loosely coupled systems :) > Pathname buffers generally shouldn't be (and in NetBSD, aren't) on the > stack regardless. Even at only 1K each, it's really easy to blow a 4k > kernel stack with them. (In practice you can generally get away with > one; but two, like you need for rename, link, symlink, etc. is too > many.) > > Or I guess you don't mean in the kernel, do you... Oh, yes I meant userland indeed; as kernel code should minimize stack use... -- Matt