Re: ContentTypes supported by Solr to index

2015-04-15 Thread Jack Krupansky
Check to see if there are any errors in the Solr log for jpg and zip files. Solr should do something for them - if not, file a Jira to suggest that it should, as an imporvement. Zip should give a list of the enclosed files. Images should at least give the metadata. -- Jack Krupansky On Wed, Apr 1

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Andrea. For image files and zip files, even metadata is not available. Just to explain further, I have indexed a total of 10 files, out of which a .jpg file and .zip file are present. After the indexing process is complete, no information about either of these files is present in the solr q

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Andrea Gazzarini
Sorry, attachments are not supported here :( Anyway, I believe the misunderstanding resides in what you think you should mean "image indexing": actually, AFAIK, Tika indexes only a) the textual content of a given resource b) its metadata. So - for a JPG file (or in genetal, an image) you will

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Thanks Andrea. I can see that Tika1.5 supports both compressed (ZIP) and image (JPG) formats. If thats the case, why SolrCell could not index the documents of .zip and .jpg? Am I missing something here? No error is thrown in the overall process and the java program completes successfully. But when

Re: ContentTypes supported by Solr to index

2015-04-15 Thread Andrea Gazzarini
Hi Vijay, here you can find all supported formats by Tika, which is internally used by SolrCell: * https://tika.apache.org/*1.4*/formats.html * https://tika.apache.org/*1.5*/formats.html * https://tika.apache.org/*1.6*/formats.html * https://tika.apache.org/*1.7*/formats.html Best, Andrea

ContentTypes supported by Solr to index

2015-04-15 Thread Vijaya Narayana Reddy Bhoomi Reddy
Hi, I am trying to index various binary file types into Solr. However, some file types seems to be ignored and not getting indexed, though the metadata is being extracted successfuly for all the types. Specifically, zip files and jpg files are not getting indexed, where as pdf, MS office document