On 5/18/2018 1:47 PM, S.Ashwath wrote:
> I have 2 directories: 1 with txt files and the other with corresponding
> JSON (metadata) files (around 90000 of each). There is one JSON file for
> each CSV file, and they share the same name (they don't share any other
> fields).
>
> The txt files just have plain text, I mapped each line to a field call
> 'sentence' and included the file name as a field using the data import
> handler. No problems here.
>
> The JSON file has metadata: 3 tags: a URL, author and title (for the
> content in the corresponding txt file).
> When I index the JSON file (I just used the _default schema, and posted the
> fields to the schema, as explained in the official solr tutorial),* I don't
> know how to get the file name into the index as a field.* As far as i know,
> that's no way to use the Data import handler for JSON files. I've read that
> I can pass a literal through the bin/post tool, but again, as far as I
> understand, I can't pass in the file name dynamically as a literal.
>
> I NEED to get the file name, it is the only way in which I can associate
> the metadata with each sentence in the txt files in my downstream Python
> code.
>
> So if anybody has a suggestion about how I should index the JSON file name
> along with the JSON content (or even some workaround), I'd be eternally
> grateful.

The indexing tools included with Solr are good for simple use cases. 
They're generic tools with limits.

The bin/post tool calls a class that is literally called
SimplePostTool.  It is never going to have a lot of capability.

The dataimport handler, while certainly capable of far more than the
simple post tool, is somewhat rigid in its operation. 

A sizable percentage of Solr users end up writing their own indexing
software because what's included with Solr isn't capable of adjusting to
their needs.  Your situation sounds like one that is going to require
custom indexing software that you or somebody in your company must write.

Thanks,
Shawn

Reply via email to