I don't know if DIH can solve your problem but I would go for
a simple self programmed ETL in JAVA and use SolrJ for loading.

Best regards,
Bernd


Am 18.05.2018 um 21:47 schrieb S.Ashwath:
Hello,

I have 2 directories: 1 with txt files and the other with corresponding
JSON (metadata) files (around 90000 of each). There is one JSON file for
each CSV file, and they share the same name (they don't share any other
fields).

The txt files just have plain text, I mapped each line to a field call
'sentence' and included the file name as a field using the data import
handler. No problems here.

The JSON file has metadata: 3 tags: a URL, author and title (for the
content in the corresponding txt file).
When I index the JSON file (I just used the _default schema, and posted the
fields to the schema, as explained in the official solr tutorial),* I don't
know how to get the file name into the index as a field.* As far as i know,
that's no way to use the Data import handler for JSON files. I've read that
I can pass a literal through the bin/post tool, but again, as far as I
understand, I can't pass in the file name dynamically as a literal.

I NEED to get the file name, it is the only way in which I can associate
the metadata with each sentence in the txt files in my downstream Python
code.

So if anybody has a suggestion about how I should index the JSON file name
along with the JSON content (or even some workaround), I'd be eternally
grateful.

Regards,

Ash

Reply via email to