I'm very unclear on how to associate what I need to a Solr index entry.
Based on what I've read thus far, you can extract data from text files and
store that in a Solr document.

I have hundreds of thousands of documents in a database/svn type system.
When I index a file, it is likely going to be local to the filesystem and I
know the location it will take on in the database. So, when I index, I want
to provide a path that it can find it when someone else does a search.

123.xml may look like:

<mydoc>
<title>my title</title>
<para>Every foobar has its day</para>
<figure href="/abc/xxx.gif"><caption>My caption</caption>
</mydoc>

and the proprietary location I want it to be associated with is:

/abc/def/ghi/123.xml

So, when a user does a search for "foobar", it returns some information
about 123.xml but most importantly the location should be available.

I have yet to find (in the schema.xml or otherwise) where you can define
that path to store, and how you would pass along that parameter in the
indexing of that document.

Instead, from the examples I can find, including the book, you store fields
from your data into the index. In the book's examples (a music database),
searching for "Cherub Rock" returns a list of with their duration, track
name, album name, and artist. In other words, the full text data you
retrieve is the only information the search index has to offer.

Just for example, using the exampledocs post.jar, I'm envisioning something
like this:

java -jar post.jar 123.xml -dblocation "/abc/def/ghi/123.xml" -othermeta1
"xxx" -othermeta2 "zzz"

Then the Solr doc would look like:
<doc>
<field name="id">123</field>
<field name="dblocation">/abc/def/ghi/123.xml</field>
<field name="othermeta1">xxx</field>
<field name="othermeta2">zzz</field>
<field name="title">my title</field>
<field name="graphic">/abc/xxx.gif</field>
<field name="text">Every foobar has its day My caption</field>
</doc>

This way, when a user searches for foobar, they get item 123 back, review
the search result and if they decide that's the data they want, they can use
the dblocation field to retrieve the data for editing purposes (and then
re-index it following their edits).

I'm guessing I just haven't found the right terms yet to look into, as I'm
very new to this. Thanks for any direction you can provide. Also, if Solr
appears to be the wrong tool for what I need, let me know as well!

Thank you,
Walter

Reply via email to