Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
Ok, it seems like ManifoldCF is working correctly here, no? Karl On Mon, Jan 14, 2013 at 5:24 PM, Ahmet Arslan wrote: > Hello, > > I increased these settings of jetty/solr : > > 147483647 > 147483647 > > Now I can index all 130 aspx files (will all metadata) with security disabled. > > Thanks, >

[jira] [Commented] (CONNECTORS-608) Solr connector gets socket timeouts on slow documents

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553205#comment-13553205 ] Karl Wright commented on CONNECTORS-608: Please synch up again. Another fix f

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hello, I increased these settings of jetty/solr : 147483647 147483647 Now I can index all 130 aspx files (will all metadata) with security disabled. Thanks, Ahmet --- On Mon, 1/14/13, Ahmet Arslan wrote: > From: Ahmet Arslan > Subject: Re: Repeated service interruptions - failure processin

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hi Karl, I track down problem to this: A metadata is causing this. If I select only ID metadata (Normally I select all these : Created, FileLeafRef, ID, IKAccessGroup, IKContentType, IKDocuments, IKExpertise, IKExplanation, IKFAQ, IKImportant, Modified, Title ) all aspx files are indexed succ

[jira] [Commented] (CONNECTORS-608) Solr connector gets socket timeouts on slow documents

2013-01-14 Thread David Morana (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13553118#comment-13553118 ] David Morana commented on CONNECTORS-608: - I built and ran connectors-608. Un

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
Let's try to figure out why we can't index streamed data from these .aspx files. Can you add enough debugging output to figure out what the connector is actually trying to stream to Solr? In order to do that you may well need to write a class that wraps the input stream that is handed to Solr wit

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hi Karl, I think people may want to index content aspx files, so treating them specially may not be a good solution. In our environment, aspx files are used to construct a web site that used internally. In my understanding this one of the use cases of SharePoint. In our case content of aspx fi

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
It's also possible that the getACLs() problem has to do with these files. Apparently you can't get the permissions for them. In that case, if security is on, we can't index them, because we can't get valid ACLs. Karl On Mon, Jan 14, 2013 at 11:46 AM, Karl Wright wrote: > Hi Ahmet, > > We could

[jira] [Resolved] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-611. Resolution: Fixed > SharePoint connector throws NPE getting permissions in some cas

[jira] [Commented] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552954#comment-13552954 ] Karl Wright commented on CONNECTORS-611: r1433020, to skip documents that don'

[jira] [Commented] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552952#comment-13552952 ] Karl Wright commented on CONNECTORS-611: It looks like the basic problem is th

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
Hi Ahmet, We could specifically treat .aspx files specially, so that they are considered to never have any content. But are there cases where someone might want to index any content that these URLs might return? Specifically, what do .aspx "files" typically contain, when found in a SharePoint hie

[jira] [Commented] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552857#comment-13552857 ] Karl Wright commented on CONNECTORS-611: The Axis-generated code looks like th

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hi Karl, Now 39 aspx files (out of 130) are indexed. Job didn't get killed. No exceptions in the log. I increased the maximum POST size of solr/jetty but that 39 number didn't increased. I will check the size of remaining 130 - 39 *.aspx files. Actually I am mapping extracted content of this

[jira] [Commented] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552841#comment-13552841 ] Karl Wright commented on CONNECTORS-611: Hi Ahmet, Since the exchange in ques

[jira] [Updated] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Ahmet Arslan (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-611?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ahmet Arslan updated CONNECTORS-611: Attachment: Archive.zip I added these two lines to logging.ini file {code} log4j.logge

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
I checked in a fix for this ticket on trunk. Please let me know if it resolves this issue. Karl On Mon, Jan 14, 2013 at 10:20 AM, Karl Wright wrote: > This is because httpclient is retrying on error for three times by > default. This has to be disabled in the Solr connector, or the rest > of t

[jira] [Resolved] (CONNECTORS-610) Solr connector should disable httpclient retries, or errors will always cause a job abort

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-610. Resolution: Fixed > Solr connector should disable httpclient retries, or errors wil

[jira] [Commented] (CONNECTORS-610) Solr connector should disable httpclient retries, or errors will always cause a job abort

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552796#comment-13552796 ] Karl Wright commented on CONNECTORS-610: r1432958. > Solr co

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
Hmm, this makes no sense. The code is this: com.microsoft.sharepoint.webpartpages.GetPermissionCollectionResponseGetPermissionCollectionResult aclResult = aclCall.getPermissionCollection( encodedRelativePath, "Item" ); org.apache.axis.message.MessageElement[] aclList = aclResult.get_

[jira] [Created] (CONNECTORS-611) SharePoint connector throws NPE getting permissions in some cases

2013-01-14 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-611: -- Summary: SharePoint connector throws NPE getting permissions in some cases Key: CONNECTORS-611 URL: https://issues.apache.org/jira/browse/CONNECTORS-611 Project:

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
This is because httpclient is retrying on error for three times by default. This has to be disabled in the Solr connector, or the rest of the logic won't work right. I've opened a ticket (CONNECTORS-610) for this problem too. Karl On Mon, Jan 14, 2013 at 10:13 AM, Ahmet Arslan wrote: > Hi Karl

[jira] [Created] (CONNECTORS-610) Solr connector should disable httpclient retries, or errors will always cause a job abort

2013-01-14 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-610: -- Summary: Solr connector should disable httpclient retries, or errors will always cause a job abort Key: CONNECTORS-610 URL: https://issues.apache.org/jira/browse/CONNECTORS-61

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hi, If I enable security (Active Directory), job seems hang and I get this too: FATAL 2013-01-14 17:13:46,871 (Worker thread '15') - Error tossed: null java.lang.NullPointerException at org.apache.manifoldcf.crawler.connectors.sharepoint.SPSProxyHelper.getDocumentACLs(SPSProxyHelper.java

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hi Karl, Thanks for quick fix. I am still seeing the following error after 'svn up' and 'ant build' ERROR 2013-01-14 17:09:41,949 (Worker thread '6') - Exception tossed: Repeated service interruptions - failure processing document: null org.apache.manifoldcf.core.interfaces.ManifoldCFException

[jira] [Commented] (CONNECTORS-609) Solr connector does not handle http code 413 properly

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13552657#comment-13552657 ] Karl Wright commented on CONNECTORS-609: r1432919 > Solr con

[jira] [Resolved] (CONNECTORS-609) Solr connector does not handle http code 413 properly

2013-01-14 Thread Karl Wright (JIRA)
[ https://issues.apache.org/jira/browse/CONNECTORS-609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karl Wright resolved CONNECTORS-609. Resolution: Fixed > Solr connector does not handle http code 413 properly > ---

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
CONNECTORS-609 Karl On Mon, Jan 14, 2013 at 8:30 AM, Karl Wright wrote: > Hi Ahmet, > > The exception that seems to be causing the abort is a socket exception > coming from a socket write: > >> Caused by: java.net.SocketException: Broken pipe > > This makes sense in light of the http code return

[jira] [Created] (CONNECTORS-609) Solr connector does not handle http code 413 properly

2013-01-14 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-609: -- Summary: Solr connector does not handle http code 413 properly Key: CONNECTORS-609 URL: https://issues.apache.org/jira/browse/CONNECTORS-609 Project: ManifoldCF

Re: Repeated service interruptions - failure processing document: null

2013-01-14 Thread Karl Wright
Hi Ahmet, The exception that seems to be causing the abort is a socket exception coming from a socket write: > Caused by: java.net.SocketException: Broken pipe This makes sense in light of the http code returned from Solr, which was 413: http://www.checkupdown.com/status/E413.html . So there i

Repeated service interruptions - failure processing document: null

2013-01-14 Thread Ahmet Arslan
Hello, I am indexing a SharePoint 2010 instance using mcf-trunk (At revision 1432907) There is no problem with a Document library that contains word excel etc. However, I receive the following errors with a Document library that has *.aspx files in it. Status of Jobs => Error: Repeated service