Thanks Hoss,

  The issue mentioned describes a similar behavior to what I observed, but not 
quite.  Commons-fileupload creates java.io.File objects for the temp files, and 
when those Files are garbage collected, the temp file is deleted.  I've 
verified this by letting the temp files build up and then forcing a full 
collection which clears all of them.  So I think the reason a percentage of 
temp files built up in my system was that under heavy load, some of the 
java.io.Files made it into old gen in the heap.  I switched to G1, and the 
problem went away.

Regarding the how the XML files are being sent, I have verified that each XML 
file is sent as a single request, by aligning the access log of my Solr master 
server with the processing log of my SolrJ server.  I didn't test the requests 
to see if the MIME type is multipart, but I suppose it is possible if some 
other form data or instruction needed to be passed with it.  Either way, I 
suppose it would go through fileupload anyway, because somebody's got to make a 
temp file for large files, right?

Ryan
________________________________________
From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Wednesday, January 16, 2013 6:06 PM
To: solr-user@lucene.apache.org
Subject: RE: SolrJ DirectXmlRequest

: DirectXmlRequest is part of the SolrJ library, so I guess that means it
: is not commonly used.  My use case is that I'm applying an XSLT to the
: raw XML on the client side, instead of leaving that up to the Solr
: master (although even if I applied the XSLT on the Solr server, I'd

I think Otis's point was that most people don't have Solr XML files lying
arround that they send to Solr, nor do they build up XML strings in Java
in the Solr input format (with XSLT or otherwise) ... most people using
SolrJ build up SolrInputDocument objects and pass those to their
SolrServer instance.

: I've done some research and I'm fairly confident that apache
: commons-fileupload library is responsible for the temp files.  There's

I believe you are correct ... searching for "solr fileupload temp files"
lead me to this issue which seems to have fallen by the way side...

        https://issues.apache.org/jira/browse/SOLR-1953

...if you could try that patch outand/or post your comments it would be
helpful.

Something that seems really odd to me however is how/why your basic
updates are even causing multipart/file-upload functionality to be used
... a quick skim of the client code suggests that that should only happen
if your try to send multiple ContentStreams in a single request: I can
understand why that wouldn't typically happen for most users building up
multiple SolrInputDocuments (they would get added to a single stream); and
i can understand why that would typically happen for users sending
multiple binary files to something like ExtractingRequestHandler -- but if
you are using DirectXmlRequest in the way you described each xml file
should be sent as a single stream in a single request and the XML should
be sent in the raw POST body -- the commons-fileupload code shouldn't even
come into play.  (either that, or i'm missing something, or you're using
an older version of solr that used fileupload even if there was only a
single content stream)


-Hoss

---------------------------------------------------------------------
This transmission (including any attachments) may contain confidential 
information, privileged material (including material protected by the 
solicitor-client or other applicable privileges), or constitute non-public 
information. Any use of this information by anyone other than the intended 
recipient is prohibited. If you have received this transmission in error, 
please immediately reply to the sender and delete this information from your 
system. Use, dissemination, distribution, or reproduction of this transmission 
by unintended recipients is not authorized and may be unlawful.

Reply via email to