So you are suggesting me to iterate file system and index fs tree entities including: directory names, file names, file size etc. and then post it to solr? I need to index the FS tree, not the file contents.
On Tue, Mar 5, 2013 at 5:54 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote: > Would Solr's post.jar work for you? It has a directory recurse option. > The usage/help output is pasted below. > > Here's what should work for you: "java -Dauto -Drecursive -jar post.jar > /some/folder" > > Erik > > > > exampledocs java -jar post.jar --help > SimplePostTool version 1.5 > Usage: java [SystemProperties] -jar post.jar [-h|-] [<file|folder|url|arg> > [<file|folder|url|arg>...]] > > Supported System Properties and their defaults: > -Ddata=files|web|args|stdin (default=files) > -Dtype=<content-type> (default=application/xml) > -Durl=<solr-update-url> (default=http://localhost:8983/solr/update) > -Dauto=yes|no (default=no) > -Drecursive=yes|no|<depth> (default=0) > -Ddelay=<seconds> (default=0 for files, 10 for web) > -Dfiletypes=<type>[,<type>,...] > (default=xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log) > -Dparams="<key>=<value>[&<key>=<value>...]" (values must be URL-encoded) > -Dcommit=yes|no (default=yes) > -Doptimize=yes|no (default=no) > -Dout=yes|no (default=no) > > This is a simple command line tool for POSTing raw data to a Solr > port. Data can be read from files specified as commandline args, > URLs specified as args, as raw commandline arg strings or via STDIN. > Examples: > java -jar post.jar *.xml > java -Ddata=args -jar post.jar '<delete><id>42</id></delete>' > java -Ddata=stdin -jar post.jar < hd.xml > java -Ddata=web -jar post.jar http://example.com/ > java -Dtype=text/csv -jar post.jar *.csv > java -Dtype=application/json -jar post.jar *.json > java -Durl=http://localhost:8983/solr/update/extract -Dparams=literal.id=a > -Dtype=application/pdf -jar post.jar a.pdf > java -Dauto -jar post.jar * > java -Dauto -Drecursive -jar post.jar afolder > java -Dauto -Dfiletypes=ppt,html -jar post.jar afolder > The options controlled by System Properties include the Solr > URL to POST to, the Content-Type of the data, whether a commit > or optimize should be executed, and whether the response should > be written to STDOUT. If auto=yes the tool will try to set type > and url automatically from file name. When posting rich documents > the file name will be propagated as "resource.name" and also used > as "literal.id". You may override these or any other request parameter > through the -Dparams property. To do a commit only, use "-" as argument. > The web mode is a simple crawler following links within domain, default > delay=10s. > > > On Mar 5, 2013, at 04:38 , Syao Work wrote: > > > Hello, > > > > I am trying to index some FS folder tree. > > Spent 2 days finding what could be the problem - got nothing :) There > are not so much examples on indexing File System. > > In the logs I cant find any exceptions why it does not process the info > > Data import configuration and debug response are attached > > > > > > Using: > > 1. solr web admin tool, > > 2. Java version "1.7.0_09-icedtea" > > OpenJDK Runtime Environment (fedora-2.3.7.0.fc17-x86_64) > > OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode) > > > > Thank you for your time, > > Ro > > > > P.S. Excuse my bad English, I am not a native English speaker. > > <data-config.xml><import-debug-response.json> > >