Re: Max. number of subdirectories dump

2013-08-19 Thread David Holland
On Sun, Aug 18, 2013 at 06:04:55PM +0200, Johnny Billquist wrote: Looking at 2.11BSD, it looks like this: struct direct { [snip] In NetBSD (fairly current): struct dirent { careful, you want struct direct, not struct dirent: struct direct { u_int32_t d_fileno;

Re: Max. number of subdirectories dump

2013-08-19 Thread David Holland
On Sun, Aug 18, 2013 at 12:24:12PM -0400, Mouse wrote: A directory may contain entries other than subdirectories. Since there is no enforced ordering of entries in a directory, the whole directory must be read to find all the subdirectories (unless 32767 subdirs are found first, I

Re: Max. number of subdirectories dump

2013-08-19 Thread Johnny Billquist
On 2013-08-19 08:41, David Holland wrote: On Sun, Aug 18, 2013 at 06:04:55PM +0200, Johnny Billquist wrote: Looking at 2.11BSD, it looks like this: struct direct { [snip] In NetBSD (fairly current): struct dirent { careful, you want struct direct, not struct dirent:

Re: Max. number of subdirectories dump

2013-08-19 Thread David Laight
On Sun, Aug 18, 2013 at 03:08:21PM +0200, Manuel Wiesinger wrote: Hello, I am working on a defrag tool for UFS2/FFSv2 as Google Summer of Code Project. The size of a directory offset is of type int32_t (see src/sys/ufs/ufs/dir.h), which is a signed integer. So the maximum size can be

Re: Max. number of subdirectories dump

2013-08-19 Thread David Holland
On Mon, Aug 19, 2013 at 09:01:35AM +0200, Johnny Billquist wrote: careful, you want struct direct, not struct dirent: Hmm. Probably a good point. I was wondering if NetBSD had just renamed direct to dirent, but that was just me getting confused then. (And lazy, since I didn't really

Re: Max. number of subdirectories dump

2013-08-19 Thread Abhinav Upadhyay
On Mon, Aug 19, 2013 at 1:01 PM, David Holland dholland-t...@netbsd.org wrote: Speed. There's a moderately famous paper A trace-driven analysis of the UNIX 4.2 BSD file system -- you might have heard of it :-) Actually I haven't. I should look it up. Thanks. Yeah, that one's worth a

Re: Max. number of subdirectories dump

2013-08-19 Thread Manuel Wiesinger
On 08/19/13 09:31, David Laight wrote: For defrag I'd have though you'd work from the inode table and treat directories no different from files. That's what I'm doing. I have an additional optimisation step, which tries to move files in the same directory, so that they are stored

Re: Max. number of subdirectories dump

2013-08-19 Thread Johnny Billquist
On 2013-08-19 15:56, Manuel Wiesinger wrote: On 08/19/13 09:31, David Laight wrote: For defrag I'd have though you'd work from the inode table and treat directories no different from files. That's what I'm doing. I have an additional optimisation step, which tries to move files in the same

Re: Max. number of subdirectories dump

2013-08-19 Thread Manuel Wiesinger
On 08/19/13 17:52, Johnny Billquist wrote: I confess I have not studied the topic in enough detail. Are there any research on what strategies are better when it comes to locating files in a file system in proximity of each other? Is a common directory the best indicator on where to place files,

Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
Hello, I am working on a defrag tool for UFS2/FFSv2 as Google Summer of Code Project. The size of a directory offset is of type int32_t (see src/sys/ufs/ufs/dir.h), which is a signed integer. So the maximum size can be (2^31)-1. When testing, the maximum number of subdirectories was

Re: Max. number of subdirectories dump

2013-08-18 Thread Joerg Sonnenberger
On Sun, Aug 18, 2013 at 03:08:21PM +0200, Manuel Wiesinger wrote: My question is: Is the number of subdirectories really limited by (2^15)-1? Yes. A subdirectory creates a hard-link to the parent directory's inode and nlink counter is 16bit. Whether or not that is interpreted as signed in all

Re: Max. number of subdirectories dump

2013-08-18 Thread Martin Husemann
On Sun, Aug 18, 2013 at 03:08:21PM +0200, Manuel Wiesinger wrote: When testing, the maximum number of subdirectories was 32767, which is (2^15)-1, when trying to add a 32767th directory, I got the error message: Too many links. This one is simple: a subdirectory has a .. entry, which

Re: Max. number of subdirectories dump

2013-08-18 Thread Johnny Billquist
On 2013-08-18 15:08, Manuel Wiesinger wrote: Hello, I am working on a defrag tool for UFS2/FFSv2 as Google Summer of Code Project. The size of a directory offset is of type int32_t (see src/sys/ufs/ufs/dir.h), which is a signed integer. So the maximum size can be (2^31)-1. When testing, the

Re: Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
On 08/18/13 15:59, Johnny Billquist wrote: As Joerg said, the link count is the limitation here. Thanks for the clarification. I could have thought of this myself. Not sure I understand the question. Are you suggesting that you don't need to scan through all the contents of a directory to

Re: Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
On 08/18/13 15:58, Martin Husemann wrote: This is assuming there are no regular files in the directory. If there are (first, before the subdirectories), will it still work? Correct, I should have mentioned that I am testing a a special case. It still finds all entries but not those in indirect

Re: Max. number of subdirectories dump

2013-08-18 Thread Johnny Billquist
On 2013-08-18 17:33, Manuel Wiesinger wrote: On 08/18/13 15:59, Johnny Billquist wrote: Not sure I understand the question. Are you suggesting that you don't need to scan through all the contents of a directory to find the subdirectories? No, I'm in a step where I just search for

Re: Max. number of subdirectories dump

2013-08-18 Thread Joerg Sonnenberger
On Sun, Aug 18, 2013 at 05:36:47PM +0200, Johnny Billquist wrote: There is nothing in the directory entry that even tells if the entry is a directory or just a plain file, unless I remember wrong. And they are not sorted so that all directories comes first... There is a type tag for each

Re: Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
On 08/18/13 17:38, Joerg Sonnenberger wrote: There is a type tag for each directory entry. Correct. Namely u_int8_t d_type; in sys/ufs/ufs/dir.h

Re: Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
On 08/18/13 16:18, Johnny Billquist wrote: It might also fail if the names of the subdirectories are rather long, I would suspect... If I got you correctly, yes it might fail. Unless I iterate any directory block there is.

Re: Max. number of subdirectories dump

2013-08-18 Thread Johnny Billquist
On 2013-08-18 17:50, Johnny Billquist wrote: On 2013-08-18 17:38, Joerg Sonnenberger wrote: On Sun, Aug 18, 2013 at 05:36:47PM +0200, Johnny Billquist wrote: There is nothing in the directory entry that even tells if the entry is a directory or just a plain file, unless I remember wrong. And

Re: Max. number of subdirectories dump

2013-08-18 Thread Joerg Sonnenberger
On Sun, Aug 18, 2013 at 06:04:55PM +0200, Johnny Billquist wrote: It's an obvious optimization to keep type already in the directory itself. But is there any other reason why it was added there? It obviously means you have the same information in two places, with all the obvious risks of

Re: Max. number of subdirectories dump

2013-08-18 Thread Mouse
[W]hen trying to add a 32767th directory, I got the error message: Too many links. As others have said, this is because link counts are 16 bits, and someone at some point didn't want to deal with making them unsigned. When my tools reads only the single indirect blocks, it get all 32767

Re: Max. number of subdirectories dump

2013-08-18 Thread Manuel Wiesinger
On 08/18/13 18:24, Mouse wrote: Your subdirectory names are comaratively short. I've been testing the special case, where only directories are in a directory, they are short for that reason, to get a real maximum of subdirectories. [W]hy does dump iterate the indirect blocks, when looking

re: Max. number of subdirectories dump

2013-08-18 Thread matthew green
It's an obvious optimization to keep type already in the directory itself. But is there any other reason why it was added there? It obviously means you have the same information in two places, with all struct dirent is not stored on the disk, but created during eg readdir() system call via

Re: Max. number of subdirectories dump

2013-08-18 Thread Mouse
It's an obvious optimization to keep type already in the directory itself. But is there any other reason why it was added there? It obviously means you have the same information in two places, [...] struct dirent is not stored on the disk, but created during eg readdir() system call via