Hi,

i have a project in mind where i'd like to crawl files on a filesystem that
has been mounted to a host but not pull the content of the files. Just the
metadata around that file.

filename, path, parent, owner, group, perms, size, atime, mtime, ctime,
filesystem (unique id), server (unique id)

would nutch be a good candidate for this? i don't need the actual content of
the file systems.

What i would like to be able to do is allow a user to search for a file by
name and then tell them which filesystem and server the file exists on.

In addition, i would be crawling the same filesystems from the same servers
over and over to track files that have changed over time.

so if a user searches for a file, i'd like to be able provide a history of
change times and so on.

thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Nutch-to-index-filesystem-meta-data-tp4062593.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to