That's possible but not out-of-the-box. The available plugin protocol-file does the opposite - get the files raw content to be passed to a parser to extract plain-text content and meta data (author, etc.) - get some file-specific meta data (eg, modified time)
You have to write your own plugin which extracts all file-related meta data (but not the content). Finally, you have think how to index this meta data efficiently, eg, via an adapted Solr schema. > would nutch be a good candidate for this? i don't need the actual content of > the file systems. It depends on your needs and how you want to index the meta data. Nutch should be able to do this but from C/C++ (ev. via some framework Qt, Gtk, etc.) a file-system can be traversed more efficiently and with less overhead. > What i would like to be able to do is allow a user to search for a file by > name and then tell them which filesystem and server the file exists on. Isn't this done by UNIX' locate? > so if a user searches for a file, i'd like to be able provide a history of > change times and so on. That needs some efforts, simply because you have to store the history of modifications and update it if a file has changed. Sebastian On 05/11/2013 06:56 PM, anoop wrote: > Hi, > > i have a project in mind where i'd like to crawl files on a filesystem that > has been mounted to a host but not pull the content of the files. Just the > metadata around that file. > > filename, path, parent, owner, group, perms, size, atime, mtime, ctime, > filesystem (unique id), server (unique id) > > would nutch be a good candidate for this? i don't need the actual content of > the file systems. > > What i would like to be able to do is allow a user to search for a file by > name and then tell them which filesystem and server the file exists on. > > In addition, i would be crawling the same filesystems from the same servers > over and over to track files that have changed over time. > > so if a user searches for a file, i'd like to be able provide a history of > change times and so on. > > thanks! > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Nutch-to-index-filesystem-meta-data-tp4062593.html > Sent from the Nutch - User mailing list archive at Nabble.com. >

