1) Being aggressive and insulting is not a way to help people understand such complex tool or to help people in general.
2) I read again the feature page of Solr and it is stated that the interface is REST-like and not RESTful as I though in the first place, and communicate to the devs. And as the devs told me a RESTful interface doesn't use parameters in the URI/URL, so ii is my mistake. Hence we have no problem with the interface as it is. Any way I still have a question regarding the /extract interface. It seems that every time a file is updated in Solr, the lucene document is recreated from scratch which means that any extra information we want to be indexed/stored along the file is erased if the request doesn't contains them. Is there a parameter that allow changing that behaviour? Regards, Roland. On Tue, Jun 11, 2013 at 4:35 PM, Jack Krupansky <j...@basetechnology.com>wrote: > "is it possible to index the file + metadata with a JSON/XML request?" > > You still aren't being clear as to what you are really trying to achieve > here. I mean, just write a shell script that does the curl command, or > write a Java program or application layer that uses SolrJ to talk to Solr > and accepts JSON?XML/REST requests. > > > "It seems that the only way to index a file with some metadata is to build > a > request that would look like the following example that uses curl." > > Curl is just a fancy way to do an HTTP request. You can do the same HTTP > request from Java code (or Python or whatever.) > > > "The developer would like to avoid using parameters in the url to pass > arguments." > > Seriously?! What is THAT all about!! I mean, really, HTTP and URLs and > URL query parameters are part of the heart of the Internet infrastructure! > > If this whole thread is merely that you have an IDIOT who can't cope with > passing HTTP URL query parameters, all I can say is... Wow! > > But use SolrJ and then at least it doesn't LOOK like they are URL Query > parameters. > > Or, maybe this is just a case where the developer WANTS to use SOAP rather > than a REST style of API. > > In any case, please clue us in as to what PROBLEM you are really trying to > solve. Just use plain English and avoid getting caught up in what the > solution might be. > > The real bottom line is that random application developers should not be > talking directly to Solr anyway - they should be provided with an > "application layer" that has a clean, application-oriented REST API and the > gory details of the Solr API would be hidden inside the application layer. > > > -- Jack Krupansky > > -----Original Message----- From: Roland Everaert > Sent: Tuesday, June 11, 2013 8:48 AM > > To: solr-user@lucene.apache.org > Subject: Re: Adding pdf/word file using JSON/XML > > We are working on an application that allows some users to add files (pdf, > ms word, odt, etc), located on their local hard disk, to our internal > system and allows other users to search for them. So we are considering > Solr for the indexing and search functionalities of the system. Along with > the file content, we want to index some metadata related to the file. > > It seems obvious that Solr couldn't import the file from the local disk of > the user, so the system will have to import the file into a directory that > Solr can reach and instruct Solr to index the file with the metadata, but > is it possible to index the file + metadata with a JSON/XML request? > > It seems that the only way to index a file with some metadata is to build a > request that would look like the following exemple that uses curl. The > developer would like to avoid using parameters in the url to pass > arguments. > > curl " > http://localhost:8080/solr/**update/extract?literal.id=** > doc10&literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text> > " > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" > > > Additionally, it seems that if a subsequent request is sent to the indexer > to update the file, if the metadata are not passed to Solr with the > request, they are deleted. > > Thanks for your help, > > > > Roland. > > > On Mon, Jun 10, 2013 at 4:14 PM, Jack Krupansky <j...@basetechnology.com>* > *wrote: > > Sorry, but you are STILL not being clear! >> >> Are you asking if you can pass Solr parameters as XML fields? No. >> >> Are you asking if the file name and path can be indexed as metadata? To >> some degree: >> >> curl >> "http://localhost:8983/solr/****update/extract?literal.id=doc-****1\<http://localhost:8983/solr/**update/extract?literal.id=doc-**1%5C> >> <http://localhost:8983/**solr/update/extract?literal.**id=doc-1%5C<http://localhost:8983/solr/update/extract?literal.id=doc-1%5C> >> > >> &commit=true&uprefix=attr_" -F "HelloWorld.docx=@HelloWorld.****docx" >> >> Then the stream has a name that is indexed as metadata: >> >> <arr name="attr_meta"> >> <str>stream_source_info</str> >> <str>HelloWorld.docx</str> >> <str>stream_content_type</str> >> <str>application/octet-stream<****/str> >> >> <str>stream_size</str> >> <str>10096</str> >> <str>stream_name</str> >> <str>HelloWorld.docx</str> >> <str>Content-Type</str> >> <str>application/vnd.****openxmlformats-officedocument.**** >> wordprocessingml.document</****str> >> </arr> >> >> and >> >> <arr name="attr_stream_source_info"****> >> >> <str>HelloWorld.docx</str> >> </arr> >> >> <arr name="attr_stream_name"> >> <str>HelloWorld.docx</str> >> </arr> >> >> Or, what is it that you are really string to do? >> >> Simply tell us in plain language what problem you are trying to solve. >> >> -- Jack Krupansky >> >> -----Original Message----- From: Roland Everaert >> Sent: Monday, June 10, 2013 9:23 AM >> To: solr-user@lucene.apache.org >> Subject: Re: Adding pdf/word file using JSON/XML >> >> >> Sorry if it was not clear. >> >> What I would like is to know how to construct an XML/JSON request that >> provide any necessary information (supposedly the full path on disk) to >> solr to retrieve and index a pdf/ms word document. >> >> So, an XML request could look like this: >> >> <add> >> <doc> >> <field name="id">doc10</field> >> <field name="name">BLAH</field> >> <field name="path">/path/to/file.pdf<****/field> >> >> </doc> >> </add> >> >> >> Regards, >> >> >> Roland. >> >> >> On Mon, Jun 10, 2013 at 3:12 PM, Gora Mohanty <g...@mimirtech.com> wrote: >> >> On 10 June 2013 17:47, Roland Everaert <reveatw...@gmail.com> wrote: >> >>> > Hi, >>> > >>> > Based on the wiki, below is an example of how I am currently adding a > >>> pdf >>> > file with an extra field called name: >>> > curl " >>> > >>> http://localhost:8080/solr/****update/extract?literal.id=**<http://localhost:8080/solr/**update/extract?literal.id=**> >>> doc10&literal.name=BLAH&****defaultField=text<http://** >>> localhost:8080/solr/update/**extract?literal.id=doc10&** >>> literal.name=BLAH&**defaultField=text<http://localhost:8080/solr/update/extract?literal.id=doc10&literal.name=BLAH&defaultField=text> >>> > >>> >>> " >>> > --data-binary @/path/to/file.pdf -H "Content-Type: application/pdf" >>> > >>> > Is it possible to add a file + any extra fields using a JSON or XML >>> request. >>> >>> It is not entirely clear what you are asking. Do you mean >>> can one do the same as your example above for a PDF >>> file, but with a XML or JSON file? If so, yes. Please see >>> the examples in example/exampledocs/ of a Solr source >>> tree, and >>> http://wiki.apache.org/solr/****ExtractingRequestHandler<http://wiki.apache.org/solr/**ExtractingRequestHandler> >>> <http:**//wiki.apache.org/solr/**ExtractingRequestHandler<http://wiki.apache.org/solr/ExtractingRequestHandler> >>> > >>> >>> Regards, >>> Gora >>> >>> >>> >> >