Re: Solr index lot of pdf, doc, txt

Alexandre Rafalovitch Wed, 17 Jul 2013 09:22:13 -0700

You don't seem to be too creative with your doc_id values, so perhaps you
can use Solr 4's post.jar recursive option:
http://wiki.apache.org/solr/ExtractingRequestHandler#SimplePostTool_.28post.jar.29


Otherwise, you need to correlate the ID and the source file somehow, so you
probably need a file with ID and location fields and then use
DataImportHandler with nested entities to do so.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Wed, Jul 17, 2013 at 12:15 PM, sodoo <first...@yahoo.com> wrote:

> Hi guys.
>
> I need a lot of pdf, doc, txt files.
> Now I index manually below command.
>
> ######### PDF INDEX
> curl
> "
> http://localhost:8983/solr/update/extract?stream.file=/opt/solr/documents/test.pdf&literal.doc_id=pdf_1&commit=true
> "
>
> ######### TXT INDEX
> curl
> "
> http://localhost:8983/solr/update/extract?stream.file=/opt/solr/documents/test1.txt&literal.doc_id=txt_1&commit=true
> "
>
> ######### WORD DOC INDEX
> curl
> "
> http://localhost:8983/solr/update/extract?stream.file=/opt/solr/documents/test2.docx&literal.doc_id=doc_1&commit=true
> "
>
> But this is bad solution. Because I have almost 100 pdf, 200 docx and 50
> txt. Then add to day by day all of documents.
>
> I need a good solution.
>
> Please assist me on this and advice me.
>
> Thanks.
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-index-lot-of-pdf-doc-txt-tp4078651.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Solr index lot of pdf, doc, txt

Reply via email to