That's possible but not out-of-the-box.

The available plugin protocol-file does the opposite
- get the files raw content to be passed to a parser
  to extract plain-text content and meta data (author, etc.)
- get some file-specific meta data (eg, modified time)

You have to write your own plugin which extracts all file-related
meta data (but not the content). Finally, you have think how to
index this meta data efficiently, eg, via an adapted Solr schema.

> would nutch be a good candidate for this? i don't need the actual content of
> the file systems.
It depends on your needs and how you want to index the meta data.
Nutch should be able to do this but from C/C++ (ev. via some framework Qt, Gtk, 
etc.)
a file-system can be traversed more efficiently and with less overhead.

> What i would like to be able to do is allow a user to search for a file by
> name and then tell them which filesystem and server the file exists on.
Isn't this done by UNIX' locate?

> so if a user searches for a file, i'd like to be able provide a history of
> change times and so on.
That needs some efforts, simply because you have to store the history of 
modifications
and update it if a file has changed.

Sebastian

On 05/11/2013 06:56 PM, anoop wrote:
> Hi,
> 
> i have a project in mind where i'd like to crawl files on a filesystem that
> has been mounted to a host but not pull the content of the files. Just the
> metadata around that file.
> 
> filename, path, parent, owner, group, perms, size, atime, mtime, ctime,
> filesystem (unique id), server (unique id)
> 
> would nutch be a good candidate for this? i don't need the actual content of
> the file systems.
> 
> What i would like to be able to do is allow a user to search for a file by
> name and then tell them which filesystem and server the file exists on.
> 
> In addition, i would be crawling the same filesystems from the same servers
> over and over to track files that have changed over time.
> 
> so if a user searches for a file, i'd like to be able provide a history of
> change times and so on.
> 
> thanks!
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Nutch-to-index-filesystem-meta-data-tp4062593.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
> 

Reply via email to