Hey,

thanks!  This is good stuff.  I didn't expect you to just make the fix!

If I can find the bandwidth, I'd like to make something which allows
file uploads via the XMLUpdateHandler as well... Do you have any ideas
here?  I was thinking we could just send the XML payload as another
POST field.

Would this work?

Thanks again,

Jacob

On Sun, Dec 14, 2008 at 9:18 AM, Grant Ingersoll <gsing...@apache.org> wrote:
> Hi Jacob,
>
> I just updated the code such that it should now be possible to send in
> multiple values as literals, as in an HTML form that looks like:
>
> <form enctype="multipart/form-data" action="/solr/update/extract"
> method="POST">
>  <input name="ext.literal.features" value="solr"/>
>  <input name="ext.literal.features" value="cool"/>
>  <input name="ext.def.fl" value="text"/>
> Choose a file to upload: <input name="file" type="file" /><br />
> <input type="submit" value="Upload File" />
> </form>
>
> Cheers,
> Grant
>
> On Dec 12, 2008, at 11:53 PM, Jacob Singh wrote:
>
>> Hi Grant,
>>
>> Thanks for the quick response.  My Colleague looked into the code a
>> bit, and I did as well, here is what I see (my Java sucks):
>>
>>
>> http://svn.apache.org/repos/asf/lucene/solr/trunk/contrib/extraction/src/main/java/org/apache/solr/handler/extraction/SolrContentHandler.java
>> //handle the literals from the params
>>   Iterator<String> paramNames = params.getParameterNamesIterator();
>>   while (paramNames.hasNext()) {
>>     String name = paramNames.next();
>>     if (name.startsWith(LITERALS_PREFIX)) {
>>       String fieldName = name.substring(LITERALS_PREFIX.length());
>>       //no need to map names here, since they are literals from the user
>>       SchemaField schFld = schema.getFieldOrNull(fieldName);
>>       if (schFld != null) {
>>         String value = params.get(name);
>>         boost = getBoost(fieldName);
>>         //no need to transform here, b/c we can assume the user sent
>> it in correctly
>>         document.addField(fieldName, value, boost);
>>       } else {
>>         handleUndeclaredField(fieldName);
>>       }
>>     }
>>   }
>>
>>
>> I don't know the solr source quite well enough to know if
>> document.addField() can take a struct in the form of some serialized
>> string, but how can I pass a multi-valued field via a
>> file-upload/multi-part POST?
>>
>> One idea is that as one of the POST fields, I could add an XML payload
>> as could be parsed by the XML handler, and then we could instantiate
>> it, pass in the doc by reference, and get its multivalue fields all
>> populated nicely.  But this perhaps isn't a fantastic solution, I'm
>> really not much of a Java programmer at all, would love to hear your
>> expert opinion on how to solve this.
>>
>> Best,
>> J
>>
>> On Fri, Dec 12, 2008 at 6:40 PM, Grant Ingersoll <gsing...@apache.org>
>> wrote:
>>>
>>> Hmmm, I think I see the disconnect, but I'm not sure.  Sending to the ERH
>>> (ExtractingReqHandler) is not an XML command at all, it's a file-upload/
>>> multi-part encoding.  I think you will need an API that does something
>>> like:
>>>
>>> (Just making this up, this is not real code)
>>> File file = new File(fileToIndex)
>>> resp = solr.addFile(file, params);
>>> ----
>>>
>>> Where params contains the literals, captures, etc.  Then, in your API you
>>> need to do whatever PHP does to send that file as a multipart file (I
>>> think
>>> you can also POST it, too, but that has some downsides as described on
>>> the
>>> wiki)
>>>
>>> I'll try to whip up some SolrJ sample code, as I know others have asked
>>> for
>>> that.
>>>
>>> -Grant
>>>
>>> On Dec 12, 2008, at 5:34 AM, Jacob Singh wrote:
>>>
>>>> Hi Grant,
>>>>
>>>> Happy to.
>>>>
>>>> Currently we are sending over documents by building a big XML file of
>>>> all of the fields of that document. Something like this:
>>>>
>>>> $document = new Apache_Solr_Document();
>>>>  $document->id = apachesolr_document_id($node->nid);
>>>>  $document->title = $node->title;
>>>>  $document->body = strip_tags($text);
>>>>  $document->type  = $node->type;
>>>>  foreach ($categories as $cat) {
>>>>    $document->setMultiValue('category', $cat);
>>>>  }
>>>>
>>>> The PHP Client library then takes all of this, and builds it into an
>>>> XML payload which we POST over to Solr.
>>>>
>>>> When we implement rich file handling, I see these instructions:
>>>>
>>>> -----------------------------
>>>> Literals
>>>>
>>>> To add in your own metadata, pass in the literal parameter along with
>>>> the
>>>> file:
>>>>
>>>> curl
>>>>
>>>> http://localhost:8983/solr/update/extract?ext.idx.attr=true\&ext.def.fl=text\&ext.map.div=foo_t\&ext.capture=div\&ext.boost.foo_t=3\&ext.literal.blah_i=1
>>>> -F "tutori...@tutorial.pdf"
>>>>
>>>> -----------------------------
>>>>
>>>> So it seems we can:
>>>>
>>>> a). Refactor the class to not generate XML, but rather to build post
>>>> headers for each field.  We would like to avoid this.
>>>> b)  Instead, I was hoping we could send the XML payload with all the
>>>> literal fields defined (like id, type, etc), and the post fields
>>>> required for the file content and the field it belongs to in one
>>>> reqeust
>>>>
>>>> Since my understanding is that docs in Solr are immutable, there is no:
>>>> c). Send the file contents over, give it an ID, and then send over the
>>>> rest of the fields and merge into that ID.
>>>>
>>>> If the unfortunate answer is a, then how do we deal with multi-value
>>>> fields?  I don't know how to format them given the ext.literal format
>>>> above.
>>>>
>>>> Thanks for your help and awesome contributions!
>>>>
>>>> -Jacob
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Dec 12, 2008 at 4:52 AM, Grant Ingersoll <gsing...@apache.org>
>>>> wrote:
>>>>>
>>>>> On Dec 10, 2008, at 10:21 PM, Jacob Singh wrote:
>>>>>
>>>>>> Hey folks,
>>>>>>
>>>>>> I'm looking at implementing ExtractingRequestHandler in the
>>>>>> Apache_Solr_PHP
>>>>>> library, and I'm wondering what we can do about adding meta-data.
>>>>>>
>>>>>> I saw the docs, which suggests you use different post headers to pass
>>>>>> field
>>>>>> values along with ext.literal.  Is there anyway to use the
>>>>>> XmlUpdateHandler
>>>>>> instead along with a document?  I'm not sure how this would work,
>>>>>> perhaps it
>>>>>> would require 2 trips, perhaps the XML would be in the post "content"
>>>>>> and
>>>>>> the file in something else?  The thing is we would need to refactor
>>>>>> the
>>>>>> class pretty heavily in this case when indexing RichDocs and we were
>>>>>> hoping
>>>>>> to avoid it.
>>>>>>
>>>>>
>>>>> I'm not sure I follow how the XmlUpdateHandler plays in, can you
>>>>> explain
>>>>> a little more?  My PHP is weak, but maybe some code will help...
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Jacob
>>>>>> --
>>>>>>
>>>>>> +1 510 277-0891 (o)
>>>>>> +91 9999 33 7458 (m)
>>>>>>
>>>>>> web: http://pajamadesign.com
>>>>>>
>>>>>> Skype: pajamadesign
>>>>>> Yahoo: jacobsingh
>>>>>> AIM: jacobsingh
>>>>>> gTalk: jacobsi...@gmail.com
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> +1 510 277-0891 (o)
>>>> +91 9999 33 7458 (m)
>>>>
>>>> web: http://pajamadesign.com
>>>>
>>>> Skype: pajamadesign
>>>> Yahoo: jacobsingh
>>>> AIM: jacobsingh
>>>> gTalk: jacobsi...@gmail.com
>>>
>>> --------------------------
>>> Grant Ingersoll
>>>
>>> Lucene Helpful Hints:
>>> http://wiki.apache.org/lucene-java/BasicsOfPerformance
>>> http://wiki.apache.org/lucene-java/LuceneFAQ
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>>
>> --
>>
>> +1 510 277-0891 (o)
>> +91 9999 33 7458 (m)
>>
>> web: http://pajamadesign.com
>>
>> Skype: pajamadesign
>> Yahoo: jacobsingh
>> AIM: jacobsingh
>> gTalk: jacobsi...@gmail.com
>
> --------------------------
> Grant Ingersoll
>
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ
>
>
>
>
>
>
>
>
>
>
>



-- 

+1 510 277-0891 (o)
+91 9999 33 7458 (m)

web: http://pajamadesign.com

Skype: pajamadesign
Yahoo: jacobsingh
AIM: jacobsingh
gTalk: jacobsi...@gmail.com

Reply via email to