[jira] [Assigned] (CONNECTORS-1579) Error when crawling a MSSQL table

2019-02-05 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1579:
---

Assignee: Karl Wright

> Error when crawling a MSSQL table
> -
>
> Key: CONNECTORS-1579
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1579
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: JDBC connector
>Affects Versions: ManifoldCF 2.12
>Reporter: Donald Van den Driessche
>Assignee: Karl Wright
>Priority: Major
> Attachments: 636_bb2.csv
>
>
> When I'm crawling a MSSQL table through the JDBC connector I get following 
> error on multiple lines:
>  
> {noformat}
> FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple 
> document primary component dispositions not allowed: document '636'
> java.lang.IllegalStateException: Multiple document primary component 
> dispositions not allowed: document '636'
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605)
>  ~[mcf-pull-agent.jar:?]
> at 
> org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]{noformat}
> I looked this error up on the internet and it said that it might have 
> something to do with using the same key for different lines.
>  I checked, but I couldn't find any duplicates that match any of the selected 
> fields in the JDBC.
> Hereby my queries:
>  Seeding query
> {code:java}
> SELECT pk1 as $(IDCOLUMN)
> FROM dbo.bb2
> WHERE search_url IS NOT NULL
> AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', 
> 'application/xml', 'application/zip');
> {code}
> Version check query: none
>  Access token query: none
>  Data query: 
>  
>  
> {code:java}
> SELECT 
> pk1 AS $(IDCOLUMN), 
> search_url AS $(URLCOLUMN), 
> ISNULL(content, '') AS $(DATACOLUMN),
> doc_id, 
> search_url AS url, 
> ISNULL(title, '') as title, 
> ISNULL(groups,'') as groups, 
> ISNULL(type,'') as document_type, 
> ISNULL(users, '') as users
> FROM dbo.bb2
> WHERE pk1 IN $(IDLIST);
> {code}
> The hereby added csv is the corresponding line from the table.
> [^636_bb2.csv]
>  
> Due to this problem, the whole crawling pipeline is being held up. It keeps 
> on retrying this line.
> Could you help me understand this error?
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Question about programmatic configuration and connection pool management

2019-02-02 Thread Karl Wright
Hello,

I am contacting you on behalf of the Apache ManifoldCF project.  This
project consists of a crawler framework plus a number of prebuilt
connectors to various repositories and output indexes.

I've recently looked at using Apache CXF as a replacement for Axis for
development of new web-services based connectors.  Everything has been
going well until I realized that CXF seemingly takes away all control over
connection pooling from the user.  This is inconsistent with the ManifoldCF
model, unfortunately, where the connectors have user-configured maximum
numbers of outstanding connections for every kind of connection.

The specific model required by ManifoldCF is as follows: each connection
instance is set up by the framework, and is expected to maintain at most
one connection to the service.  The connections will be used when needed,
by the framework, and explicitly closed when idle long enough (at the
control of the connector writer), or explicitly closed, once again by the
framework.

For all of our projects using Web Services or HTTP in the past, we've used
HttpComponents/HttpClient, and had each connection set up its own
connection pool of size 1.  This has all the right characteristics.  Is
there any way to implement this model programmatically using
the AsyncHTTPConduit?  If not, is there any way I can easily create my own
conduit implementation that would allow such pool management?  (Otherwise,
the Async conduit appears to have everything I need, although it's a bit
clunky to configure.)

Thanks in advance,
Karl


Re: Apache CXF question

2019-02-01 Thread Karl Wright
Thanks, Kishore -- but I already have the documentation.  What I need is
Apache CXF expertise. ;-)

Karl


On Fri, Feb 1, 2019 at 2:28 PM Kishore Kumar  wrote:

> Hi Karl,
>
> Good morning, I have shared you a Dropbox shared folder with OpenText
> Content Server Web Service Documentation.
>
> If you have not received the link from Dropbox in your inbox, check in
> Spam or let me know.
>
> Thanks,
> Kishore Kumar
>
> -----Original Message-
> From: Karl Wright 
> Sent: 01 February 2019 05:48
> To: dev ; Rafa Haro 
> Subject: Apache CXF question
>
> I'm still working on the new OpenText connector, now using Apache CXF to
> handle the web services piece.  I've never worked with this package before,
> but I've got the WSDLs generating what looks like usable java classes
> representing the WSDL interfaces.  But the underlying transport is
> mysterious given what is generated.  So, two questions:
>
> (1) It doesn't appear to me like explicit generation of classes from the
> XSD are needed here.  It looks like CXF does that too.  Am I wrong?
> (2) I want the transport to go via an HttpComponents/HttpClient HttpClient
> object that I create and initialize myself.  How can I set that up?  If
> anyone on this list has a few snippets of code they can share it would be
> great.
>
> Thanks in advance,
> Karl
>


Apache CXF question

2019-02-01 Thread Karl Wright
I'm still working on the new OpenText connector, now using Apache CXF to
handle the web services piece.  I've never worked with this package before,
but I've got the WSDLs generating what looks like usable java classes
representing the WSDL interfaces.  But the underlying transport is
mysterious given what is generated.  So, two questions:

(1) It doesn't appear to me like explicit generation of classes from the
XSD are needed here.  It looks like CXF does that too.  Am I wrong?
(2) I want the transport to go via an HttpComponents/HttpClient HttpClient
object that I create and initialize myself.  How can I set that up?  If
anyone on this list has a few snippets of code they can share it would be
great.

Thanks in advance,
Karl


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-31 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757517#comment-16757517
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], we have zero control over whether/when this gets addressed in 
SolrJ.  Previous interactions with the SolrJ developers do not make me feel 
like a fix would likely be a prompt one.  But I suggest that [~erlendfg] at 
least take the step of opening a ticket.

We can afford to wait until the next MCF release is imminent before taking any 
action, but if there's no resolution in sight then, I think we should implement 
the workaround for the time being.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-31 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16757137#comment-16757137
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], if SolrJ is overriding our .setExpectContinue(true), then your 
workaround is pretty reasonable, and I'd be happy to commit that (as long as 
you include enough comment so that we can figure out what we were thinking 
later).


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756074#comment-16756074
 ] 

Karl Wright commented on CONNECTORS-1564:
-

The way you tell it is this:

{code}
request.setProtocolVersion(HttpVersion.HTTP_1_1);
{code}

I suspect there's a similar method in the RequestOptions builder.  But I bet 
one of the things we're doing in the builder is convincing it that it's HTTP 
1.0, and that's the problem.  We need to figure out what it is.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756071#comment-16756071
 ] 

Karl Wright commented on CONNECTORS-1564:
-

Oh, and I vaguely recall something -- that since the expect-continue header is 
for HTTP 1.1 (and not HTTP 1.0), there was code in HttpComponents/HttpClient 
that disabled it if the client thought it was working in an HTTP 1.0 
environment.  I wonder if we just need to tell it somehow that it's HTTP 1.1?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-30 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16756069#comment-16756069
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], forcing the header would be a last resort.  But we can do it if we 
must.  However there are about a dozen connectors that rely on this 
functionality working properly, so I really want to know what is going wrong.

Can you experiment with changing the order of the builder method invocations 
for HttpClient in HttpPoster?  It's the only thing I can think of that might be 
germane.  Perhaps if toString() isn't helpful, you can still inspect the 
property in question.  Is there a getter method for useExpectContinue?



> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: About publishing in mvn central repository

2019-01-30 Thread Karl Wright
There's a ticket outstanding for this but nobody could figure out how to do
it, since the jars are built with Ant not Maven.

If you want to work out how, please feel free to go ahead.

Karl

On Wed, Jan 30, 2019 at 7:08 AM Cihad Guzel  wrote:

> Hi,
>
> There aren't Manifoldcf jar packages in the mvn central repository. Maybe
> they can be published in the repository? So we can add mcf-core or other
> mfc jar packages to our projects as dependency.
>
> What do you think about that?
>
> [1]
> https://maven.apache.org/repository/guide-central-repository-upload.html
>
>
> Regards,
> Cihad Güzel
>


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755582#comment-16755582
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~erlendfg], are you in a position to build MCF and experiment with how the 
HttpClient is constructed in HttpPoster.java?  I suspect that what is happening 
is that the expect/continue is indeed being set but something that is later 
done to the builder is turning it back off again.  So I would suggest adding a 
log.debug("httpclientbuilder = "+httpClientBuilder) line in there before we 
actually use the builder to construct the client, to see if this is the case, 
and if so, try to figure out which addition is causing the flag to be flipped 
back.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1673#comment-1673
 ] 

Karl Wright edited comment on CONNECTORS-1564 at 1/30/19 1:37 AM:
--

[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

This was the test you said was working:

{code}
HttpClientBuilder builder = HttpClientBuilder.create();
RequestConfig rc = 
RequestConfig.custom().setExpectContinueEnabled(true).build();
builder.setDefaultRequestConfig(rc);
{code}

We will figure out what winds up canceling out the expect/continue flag, if 
that's what indeed is happening.



was (Author: kwri...@metacarta.com):
[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

Can you include the test example here that you used to verify that 
expect-continue was working?  


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1673#comment-1673
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o] We're using the standard setup code that was recommended by Oleg.  
If the builders have decent toString() methods, we can dump them to the log 
when we create the HttpClient object to confirm they are set up correctly.  But 
from the beginning we could see nothing wrong with it.

Can you include the test example here that you used to verify that 
expect-continue was working?  


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1576) Running Multiple Jobs in ManifoldCF

2019-01-29 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1576.
-
Resolution: Not A Problem

> Running Multiple Jobs in ManifoldCF
> ---
>
> Key: CONNECTORS-1576
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1576
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>  Labels: features
> Fix For: ManifoldCF 2.9.1
>
>
> Hi,
> We have configured two jobs to index documentum contents. when running it in 
> parallel, seeding is working fine. But only one job processes the document 
> and pushes to ES. After the first job completes, the second job is processing 
> the document. 
> Is this the expected behavior? Or Are we missing anything?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1576) Running Multiple Jobs in ManifoldCF

2019-01-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755215#comment-16755215
 ] 

Karl Wright commented on CONNECTORS-1576:
-

The documents that have been queued at the time the second job is started all 
must be processed before any documents from the second job are picked up.  This 
is because of how documents are assigned priorities in the database.

Once you get past the initial bunch of queued documents then both jobs will run 
simultaneously.



> Running Multiple Jobs in ManifoldCF
> ---
>
> Key: CONNECTORS-1576
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1576
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Pavithra Dhakshinamurthy
>Priority: Major
>  Labels: features
> Fix For: ManifoldCF 2.9.1
>
>
> Hi,
> We have configured two jobs to index documentum contents. when running it in 
> parallel, seeding is working fine. But only one job processes the document 
> and pushes to ES. After the first job completes, the second job is processing 
> the document. 
> Is this the expected behavior? Or Are we missing anything?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-29 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16755212#comment-16755212
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], you need to be looking here:

{code}
https://svn.apache.org/repos/asf/manifoldcf/trunk/connectors/solr/connector/src/main/java/org/apache/manifoldcf/agents/output/solr/HttpPoster.java
{code}

ManifoldCF has its own HttpClient construction.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1575) inconsistant use of value-labels

2019-01-28 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753908#comment-16753908
 ] 

Karl Wright commented on CONNECTORS-1575:
-

This is because there are two somewhat different internal representations 
involved.  While it is unfortunate that they appear inconsistent, there is 
nothing that can be done to change them since doing so would be backwards 
incompatible.


> inconsistant use of value-labels 
> -
>
> Key: CONNECTORS-1575
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1575
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
> Attachments: image-2019-01-28-11-57-46-738.png
>
>
> When retrieving a job, using the API there seems to be inconsistencies in the 
> return JSON of a job.
> For the schedule value of 'hourofday', 'minutesofhour', etc. the label of the 
> value is 'value' while for all other value-labels it is '_value_'.
>  
> !image-2019-01-28-11-57-46-738.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1574) Performance tuning of manifold

2019-01-28 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16753913#comment-16753913
 ] 

Karl Wright commented on CONNECTORS-1574:
-

If you look in the ManifoldCF log, all queries that take more than a minute to 
execute are logged, along with an EXPLAIN plan.  Could you look at your logs 
and find the queries and provide their explanation?

The quality of the query plans is usually dependent on the quality of the 
statistics that the database keeps.  When the statistics are out of date, then 
the plan sometimes gets horribly bad.  ManifoldCF *attempts* to keep up with 
this by re-analyzing tables after a fixed number of changes, but necessarily it 
cannot do better than estimate the number of changes and their effects on the 
table statistics.  So if you are experiencing problems with certain queries, 
you can set properties.xml values that increase the frequency of analyze 
operations for that table.  But first we need to know what's going wrong.


> Performance tuning of manifold
> --
>
> Key: CONNECTORS-1574
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector, JCIFS connector, Solr 6.x 
> component
>Affects Versions: ManifoldCF 2.5
> Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>    Reporter: balaji
>Assignee: Karl Wright
>Priority: Critical
>  Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of 
> data. we are currently having 450-500 jobs which needs to run simultaneously. 
> We need to index json data and we are using connector type as *file system* 
> along with *postgres* as backend database. 
> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 6 documents was getting indexed in 15minutes but 
> now even a directory path having 5 documents takes 20 minutes or sometimes 
> doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes 
> and on seeing the pg_stat_activity we observe that 2 queries are in waiting 
> state state because of which the page doesn't load. so if we kill those 
> queries or restart manifold the issue gets resolved and the page loads 
> properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE 
> (STATUS=$1 OR STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 
> WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is 
> scheduling of jobs which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a 
> robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1574) Performance tuning of manifold

2019-01-28 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1574:
---

Assignee: Karl Wright

> Performance tuning of manifold
> --
>
> Key: CONNECTORS-1574
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1574
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: File system connector, JCIFS connector, Solr 6.x 
> component
>Affects Versions: ManifoldCF 2.5
> Environment: Apache manifold installed in Linux machine
> Linux version 3.10.0-327.el7.ppc64le
> Red Hat Enterprise Linux Server release 7.2 (Maipo)
>    Reporter: balaji
>Assignee: Karl Wright
>Priority: Critical
>  Labels: performance
>
> My team is using *Apache ManifoldCF 2.5 with SOLR Cloud* for indexing of 
> data. we are currently having 450-500 jobs which needs to run simultaneously. 
> We need to index json data and we are using connector type as *file system* 
> along with *postgres* as backend database. 
> We are facing several issues like
> 1. Scheduling works for some jobs and doesn't work for other jobs. 
> 2. Some jobs gets completed and some jobs hangs and doesn't get completed.
> 3. With one job earlier 6 documents was getting indexed in 15minutes but 
> now even a directory path having 5 documents takes 20 minutes or sometimes 
> doesn't get completed
> 4. "list all jobs" or "status and job management" page doesn't load sometimes 
> and on seeing the pg_stat_activity we observe that 2 queries are in waiting 
> state state because of which the page doesn't load. so if we kill those 
> queries or restart manifold the issue gets resolved and the page loads 
> properly
> queries getting stuck:
> 1. SELECT ID,FAILTIME, FAILCOUNT, SEEDINGVERSION, STATUS FROM JOBS WHERE 
> (STATUS=$1 OR STATUS=$2) FOR UPDATE
> 2. UPDATE JOBS SET ERRORTEXT=NULL, ENDTIME=NULL, WINDOWEND=NULL, STATUS=$1 
> WHERE ID=$2
> note : We have deployed manifold in *linux*. Our major requirement is 
> scheduling of jobs which will run every 15 minutes
> Please help us in fine tuning manifold so that it runs smoothly and acts as a 
> robust system.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1575) inconsistant use of value-labels

2019-01-28 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1575.
-
Resolution: Won't Fix

> inconsistant use of value-labels 
> -
>
> Key: CONNECTORS-1575
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1575
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Priority: Minor
> Attachments: image-2019-01-28-11-57-46-738.png
>
>
> When retrieving a job, using the API there seems to be inconsistencies in the 
> return JSON of a job.
> For the schedule value of 'hourofday', 'minutesofhour', etc. the label of the 
> value is 'value' while for all other value-labels it is '_value_'.
>  
> !image-2019-01-28-11-57-46-738.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Mambo CMS

2019-01-26 Thread Karl Wright
We do not have Mambo connectors in MCF.
I don't know anything about CMIS support in that offering either.

Karl


On Sat, Jan 26, 2019 at 8:17 AM Furkan KAMACI 
wrote:

> Hi All,
>
> Mambo (http://mambo-foundation.org) is an open source CMS system which is
> being used by many companies.
>
> Do we have a Mambo integration via ManifoldCF or does anybody knows Mambo
> supports our CMIS connector?
>
> If not, we can suggest it as a GSoC project for 2019.
>
> Kind Regards,
> Furkan KAMACI
>


Re: Axis question

2019-01-26 Thread Karl Wright
I was able to get the wsdl->java compilation working without downloading a
ton of additional dependencies, and with cxf version 2.6.2.  Thanks, Rafa,
for your help in getting this far.

Karl


On Fri, Jan 25, 2019 at 4:11 PM Karl Wright  wrote:

> That's one approach.  I'm not thrilled with it; we cannot guarantee no
> client wsdl changes over time.  But if there's nothing better we'll have to
> live with it.
>
> The real problem, of course, is that code generated with version X of cxf
> requires runtime libraries from version X, and that's still a conflict.  So
> I need to get the WSDL2Java going for 2.6.2.
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 3:54 PM Rafa Haro  wrote:
>
>> I would try to be pragmatic. If those wsdl are not likely to change in the
>> future, I would build the client classes offline. Not sure if the
>> generated
>> class are going to use further classes of cxf and then the problem could
>> end up being the same, but it is worth to try
>>
>> El El vie, 25 ene 2019 a las 21:14, Karl Wright 
>> escribió:
>>
>> > I downloaded the cxf binary, latest version.
>> > The dependency list is huge and very likely conflicts with existing
>> > connectors which have dependencies on cxf 2.x.  I would estimate that
>> > including all the new jars and dependencies would easily double our
>> > download footprint.
>> >
>> > Surely there must be a list of the minimal jars needed to get
>> WSDLToJava to
>> > function somewhere?
>> >
>> > Karl
>> >
>> >
>> >
>> >
>> > On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:
>> >
>> > > I'm not getting missing cxf jars.  I'm getting problems with
>> downstream
>> > > dependencies.
>> > >
>> > > We don't usually ship more jars than we need to, is the short answer
>> to
>> > > your second question.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
>> > >
>> > >> which jars are you downloading?. Why not getting the whole release?
>> > >>
>> > >> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>> > >>
>> > >>> Not sure, Karl I just picked up last release. I can try to find the
>> > >>> first version offering it but as long as they have backwards
>> > compatibility
>> > >>> we should be fine with the last version although we might need to
>> > update
>> > >>> the affected connectors
>> > >>>
>> > >>> Rafa
>> > >>>
>> > >>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright 
>> > wrote:
>> > >>>
>> > >>>> When did it first appear?  We're currently on 2.6.2; this is set by
>> > >>>> various dependencies by our connectors.
>> > >>>>
>> > >>>> Karl
>> > >>>>
>> > >>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright 
>> > wrote:
>> > >>>>
>> > >>>>> The tools package doesn't seem to have it either.
>> > >>>>> Karl
>> > >>>>>
>> > >>>>>
>> > >>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
>> > >>>>> wrote:
>> > >>>>>
>> > >>>>>> Do you know what jar/maven package this is in?  because I don't
>> seem
>> > >>>>>> to have it in our normal cxf jars...
>> > >>>>>>
>> > >>>>>> Karl
>> > >>>>>>
>> > >>>>>>
>> > >>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro 
>> wrote:
>> > >>>>>>
>> > >>>>>>> I used a wsdl2java script that comes as an utility of the apache
>> > cxf
>> > >>>>>>> release, but basically is making use
>> > >>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find
>> here
>> > >>>>>>> an usage
>> > >>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>> > >>>>>>>
>> > >>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright > >
>> > >>>>>>> wrote:
>> > >>>>>

Re: Axis question

2019-01-25 Thread Karl Wright
That's one approach.  I'm not thrilled with it; we cannot guarantee no
client wsdl changes over time.  But if there's nothing better we'll have to
live with it.

The real problem, of course, is that code generated with version X of cxf
requires runtime libraries from version X, and that's still a conflict.  So
I need to get the WSDL2Java going for 2.6.2.

Karl


On Fri, Jan 25, 2019 at 3:54 PM Rafa Haro  wrote:

> I would try to be pragmatic. If those wsdl are not likely to change in the
> future, I would build the client classes offline. Not sure if the generated
> class are going to use further classes of cxf and then the problem could
> end up being the same, but it is worth to try
>
> El El vie, 25 ene 2019 a las 21:14, Karl Wright 
> escribió:
>
> > I downloaded the cxf binary, latest version.
> > The dependency list is huge and very likely conflicts with existing
> > connectors which have dependencies on cxf 2.x.  I would estimate that
> > including all the new jars and dependencies would easily double our
> > download footprint.
> >
> > Surely there must be a list of the minimal jars needed to get WSDLToJava
> to
> > function somewhere?
> >
> > Karl
> >
> >
> >
> >
> > On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:
> >
> > > I'm not getting missing cxf jars.  I'm getting problems with downstream
> > > dependencies.
> > >
> > > We don't usually ship more jars than we need to, is the short answer to
> > > your second question.
> > >
> > > Karl
> > >
> > >
> > > On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
> > >
> > >> which jars are you downloading?. Why not getting the whole release?
> > >>
> > >> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
> > >>
> > >>> Not sure, Karl I just picked up last release. I can try to find the
> > >>> first version offering it but as long as they have backwards
> > compatibility
> > >>> we should be fine with the last version although we might need to
> > update
> > >>> the affected connectors
> > >>>
> > >>> Rafa
> > >>>
> > >>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright 
> > wrote:
> > >>>
> > >>>> When did it first appear?  We're currently on 2.6.2; this is set by
> > >>>> various dependencies by our connectors.
> > >>>>
> > >>>> Karl
> > >>>>
> > >>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright 
> > wrote:
> > >>>>
> > >>>>> The tools package doesn't seem to have it either.
> > >>>>> Karl
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
> > >>>>> wrote:
> > >>>>>
> > >>>>>> Do you know what jar/maven package this is in?  because I don't
> seem
> > >>>>>> to have it in our normal cxf jars...
> > >>>>>>
> > >>>>>> Karl
> > >>>>>>
> > >>>>>>
> > >>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro 
> wrote:
> > >>>>>>
> > >>>>>>> I used a wsdl2java script that comes as an utility of the apache
> > cxf
> > >>>>>>> release, but basically is making use
> > >>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find
> here
> > >>>>>>> an usage
> > >>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
> > >>>>>>>
> > >>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
> > >>>>>>> wrote:
> > >>>>>>>
> > >>>>>>> > I was using ancient Axis 1.4 and none of them were working.
> You
> > >>>>>>> can
> > >>>>>>> > exercise this with "ant classcreate-wsdls" in the csws
> directory.
> > >>>>>>> >
> > >>>>>>> > If you can give instructions for invoking CXF, maybe we can do
> > that
> > >>>>>>> > instead.  What's the main class, and what jars do we need to
> > >>>>>>> include?
> > >>>>>>> >
> > >>>>>>> > 

Re: Axis question

2019-01-25 Thread Karl Wright
I downloaded the cxf binary, latest version.
The dependency list is huge and very likely conflicts with existing
connectors which have dependencies on cxf 2.x.  I would estimate that
including all the new jars and dependencies would easily double our
download footprint.

Surely there must be a list of the minimal jars needed to get WSDLToJava to
function somewhere?

Karl




On Fri, Jan 25, 2019 at 2:14 PM Karl Wright  wrote:

> I'm not getting missing cxf jars.  I'm getting problems with downstream
> dependencies.
>
> We don't usually ship more jars than we need to, is the short answer to
> your second question.
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:
>
>> which jars are you downloading?. Why not getting the whole release?
>>
>> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>>
>>> Not sure, Karl I just picked up last release. I can try to find the
>>> first version offering it but as long as they have backwards compatibility
>>> we should be fine with the last version although we might need to update
>>> the affected connectors
>>>
>>> Rafa
>>>
>>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright  wrote:
>>>
>>>> When did it first appear?  We're currently on 2.6.2; this is set by
>>>> various dependencies by our connectors.
>>>>
>>>> Karl
>>>>
>>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>>>>
>>>>> The tools package doesn't seem to have it either.
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright 
>>>>> wrote:
>>>>>
>>>>>> Do you know what jar/maven package this is in?  because I don't seem
>>>>>> to have it in our normal cxf jars...
>>>>>>
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>>>>
>>>>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>>>>> release, but basically is making use
>>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here
>>>>>>> an usage
>>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>>>>
>>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
>>>>>>> wrote:
>>>>>>>
>>>>>>> > I was using ancient Axis 1.4 and none of them were working.  You
>>>>>>> can
>>>>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>>>>> >
>>>>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>>>>> > instead.  What's the main class, and what jars do we need to
>>>>>>> include?
>>>>>>> >
>>>>>>> > Karl
>>>>>>> >
>>>>>>> >
>>>>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro 
>>>>>>> wrote:
>>>>>>> >
>>>>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>>>>> CXF and
>>>>>>> >> it was apparently working fine. Which ones were failing for you?
>>>>>>> >>
>>>>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>>>>> wrote:
>>>>>>> >>
>>>>>>> >>> Were you able to look at this yesterday at all?
>>>>>>> >>> Karl
>>>>>>> >>>
>>>>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>>>>> wrote:
>>>>>>> >>>
>>>>>>> >>>> They're all checked in.
>>>>>>> >>>>
>>>>>>> >>>> See
>>>>>>> >>>>
>>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>>>> >>>>
>>>>>>> >>>> Karl
>>>>>>> >>>>
>>>>>>> >>>>
>>>>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>>>>> wrote:
>>>>>>> >>>>
>>>>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later
>>>>>>> today
>>>>>>> >>>>>
>>>>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright <
>>>>>>> daddy...@gmail.com>
>>>>>>> >>>>> wrote:
>>>>>>> >>>>>
>>>>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>>>>> has been
>>>>>>> >>>>> > discontinued and the only API is now web services based.
>>>>>>> The WSDLs
>>>>>>> >>>>> and
>>>>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>>>> >>>>> WSDL2Java to
>>>>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>>>>> make
>>>>>>> >>>>> this work
>>>>>>> >>>>> > -- even though the WSDLs references have been made local and
>>>>>>> the
>>>>>>> >>>>> XSDs also
>>>>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>>>>> definitions,
>>>>>>> >>>>> even
>>>>>>> >>>>> > though those definitions are clearly present in the XSD
>>>>>>> files.
>>>>>>> >>>>> >
>>>>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>>>>> services in
>>>>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on
>>>>>>> as
>>>>>>> >>>>> verbose a
>>>>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>>>>> all.  I
>>>>>>> >>>>> suspect
>>>>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>>>>> >>>>> >
>>>>>>> >>>>> > Thanks in advance,
>>>>>>> >>>>> > Karl
>>>>>>> >>>>> >
>>>>>>> >>>>>
>>>>>>> >>>>
>>>>>>>
>>>>>>


Re: Axis question

2019-01-25 Thread Karl Wright
I'm not getting missing cxf jars.  I'm getting problems with downstream
dependencies.

We don't usually ship more jars than we need to, is the short answer to
your second question.

Karl


On Fri, Jan 25, 2019 at 11:38 AM Rafa Haro  wrote:

> which jars are you downloading?. Why not getting the whole release?
>
> On Fri, Jan 25, 2019 at 5:31 PM Rafa Haro  wrote:
>
>> Not sure, Karl I just picked up last release. I can try to find the first
>> version offering it but as long as they have backwards compatibility we
>> should be fine with the last version although we might need to update the
>> affected connectors
>>
>> Rafa
>>
>> On Fri, Jan 25, 2019 at 3:53 PM Karl Wright  wrote:
>>
>>> When did it first appear?  We're currently on 2.6.2; this is set by
>>> various dependencies by our connectors.
>>>
>>> Karl
>>>
>>> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>>>
>>>> The tools package doesn't seem to have it either.
>>>> Karl
>>>>
>>>>
>>>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>>>>
>>>>> Do you know what jar/maven package this is in?  because I don't seem
>>>>> to have it in our normal cxf jars...
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>>>
>>>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>>>> release, but basically is making use
>>>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>>>>> usage
>>>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>>>
>>>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright 
>>>>>> wrote:
>>>>>>
>>>>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>>>> >
>>>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>>>> > instead.  What's the main class, and what jars do we need to
>>>>>> include?
>>>>>> >
>>>>>> > Karl
>>>>>> >
>>>>>> >
>>>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>>>>> >
>>>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>>>> CXF and
>>>>>> >> it was apparently working fine. Which ones were failing for you?
>>>>>> >>
>>>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>>>> wrote:
>>>>>> >>
>>>>>> >>> Were you able to look at this yesterday at all?
>>>>>> >>> Karl
>>>>>> >>>
>>>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>>>> wrote:
>>>>>> >>>
>>>>>> >>>> They're all checked in.
>>>>>> >>>>
>>>>>> >>>> See
>>>>>> >>>>
>>>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>>> >>>>
>>>>>> >>>> Karl
>>>>>> >>>>
>>>>>> >>>>
>>>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>>>> wrote:
>>>>>> >>>>
>>>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later
>>>>>> today
>>>>>> >>>>>
>>>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright <
>>>>>> daddy...@gmail.com>
>>>>>> >>>>> wrote:
>>>>>> >>>>>
>>>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>>>> has been
>>>>>> >>>>> > discontinued and the only API is now web services based.  The
>>>>>> WSDLs
>>>>>> >>>>> and
>>>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>>> >>>>> WSDL2Java to
>>>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>>>> make
>>>>>> >>>>> this work
>>>>>> >>>>> > -- even though the WSDLs references have been made local and
>>>>>> the
>>>>>> >>>>> XSDs also
>>>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>>>> definitions,
>>>>>> >>>>> even
>>>>>> >>>>> > though those definitions are clearly present in the XSD files.
>>>>>> >>>>> >
>>>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>>>> services in
>>>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>>>> >>>>> verbose a
>>>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>>>> all.  I
>>>>>> >>>>> suspect
>>>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>>>> >>>>> >
>>>>>> >>>>> > Thanks in advance,
>>>>>> >>>>> > Karl
>>>>>> >>>>> >
>>>>>> >>>>>
>>>>>> >>>>
>>>>>>
>>>>>


Re: Job slower

2019-01-25 Thread Karl Wright
Did you try 'vacuum full'?

Karl


On Fri, Jan 25, 2019 at 3:47 AM Bisonti Mario 
wrote:

> Hallo.
>
> I use MCF 2.12 and postgresql 9.3.25 Solr 7.6 Tika 1.19 on Ubuntu Server
> 18.04
>
>
>
> Weekly I scheduled by crontab  for the user postgres :
>
> 15 8 * * Sun vacuumdb --all --analyze
>
> 20 10 * * Sun reindexdb postgres
>
> 25 10 * * Sun reindexdb dbname
>
>
>
> I see that the job that indexes 70 documents daily, runs slower day by
> day.
>
> It run 8 hours a few of week ago, but now it runs in 12 hours and the
> number of documents are not changed too much.
>
>
>
> What could I do to speed up the job?
>
>
>
> Thanks a lot
>
> Mario
>
>


Re: Axis question

2019-01-25 Thread Karl Wright
I've been fighting with this pretty hard for a couple of hours now.  I did
find the proper cxf tools jar eventually but I'm getting one dependency
problem after another.  Currently I have:

>>>>>>
classcreate-wsdl-cxf:
[mkdir] Created dir:
/mnt/c/wip/mcf/CONNECTORS-1566/connectors/csws/build/wsdljava
 [java] Jan 25, 2019 4:09:02 PM org.apache.cxf.staxutils.StaxUtils
createXMLInputFactory
 [java] WARNING: Could not create a secure Stax XMLInputFactory.  Found
class com.sun.xml.internal.stream.XMLInputFactoryImpl.  Suggest Woodstox
4.2.0 or newer.
 [java] Jan 25, 2019 4:09:03 PM org.apache.cxf.staxutils.StaxUtils
createXMLInputFactory
 [java] WARNING: Could not create a secure Stax XMLInputFactory.  Found
class com.sun.xml.internal.stream.XMLInputFactoryImpl.  Suggest Woodstox
4.2.0 or newer.
 [java]
 [java] WSDLToJava Error: Could not find jaxws frontend within classpath
 [java]
<<<<<<

... even though I have jaxws* in the path and woodstox 5.7 too.

Rafa, can you tell me what classpath you are using and what the full
dependencies are for this tool?

Karl


On Fri, Jan 25, 2019 at 9:53 AM Karl Wright  wrote:

> When did it first appear?  We're currently on 2.6.2; this is set by
> various dependencies by our connectors.
>
> Karl
>
> On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:
>
>> The tools package doesn't seem to have it either.
>> Karl
>>
>>
>> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>>
>>> Do you know what jar/maven package this is in?  because I don't seem to
>>> have it in our normal cxf jars...
>>>
>>> Karl
>>>
>>>
>>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>>
>>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>>> release, but basically is making use
>>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>>> usage
>>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>>
>>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>>>
>>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>>> >
>>>> > If you can give instructions for invoking CXF, maybe we can do that
>>>> > instead.  What's the main class, and what jars do we need to include?
>>>> >
>>>> > Karl
>>>> >
>>>> >
>>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>>> >
>>>> >> Yes, I did. I have only tested Authentication service with Apache
>>>> CXF and
>>>> >> it was apparently working fine. Which ones were failing for you?
>>>> >>
>>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>>> wrote:
>>>> >>
>>>> >>> Were you able to look at this yesterday at all?
>>>> >>> Karl
>>>> >>>
>>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>>> wrote:
>>>> >>>
>>>> >>>> They're all checked in.
>>>> >>>>
>>>> >>>> See
>>>> >>>>
>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>> >>>>
>>>> >>>> Karl
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro 
>>>> wrote:
>>>> >>>>
>>>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>> >>>>>
>>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>> >>>>> wrote:
>>>> >>>>>
>>>> >>>>> > I'm redeveloping the Livelink connector because the API code
>>>> has been
>>>> >>>>> > discontinued and the only API is now web services based.  The
>>>> WSDLs
>>>> >>>>> and
>>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>> >>>>> WSDL2Java to
>>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to
>>>> make
>>>> >>>>> this work
>>>> >>>>> > -- even though the WSDLs references have been made local and the
>>>> >>>>> XSDs also
>>>> >>>>> > seem to be getting parsed, it complains about missing
>>>> definitions,
>>>> >>>>> even
>>>> >>>>> > though those definitions are clearly present in the XSD files.
>>>> >>>>> >
>>>> >>>>> > Has anyone had enough experience with this tool, and web
>>>> services in
>>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>> >>>>> verbose a
>>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at
>>>> all.  I
>>>> >>>>> suspect
>>>> >>>>> > namespace issues but I can't figure out what they are.
>>>> >>>>> >
>>>> >>>>> > Thanks in advance,
>>>> >>>>> > Karl
>>>> >>>>> >
>>>> >>>>>
>>>> >>>>
>>>>
>>>


Re: Axis question

2019-01-25 Thread Karl Wright
When did it first appear?  We're currently on 2.6.2; this is set by various
dependencies by our connectors.

Karl

On Fri, Jan 25, 2019 at 9:52 AM Karl Wright  wrote:

> The tools package doesn't seem to have it either.
> Karl
>
>
> On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:
>
>> Do you know what jar/maven package this is in?  because I don't seem to
>> have it in our normal cxf jars...
>>
>> Karl
>>
>>
>> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>>
>>> I used a wsdl2java script that comes as an utility of the apache cxf
>>> release, but basically is making use
>>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>>> usage
>>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>>
>>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>>
>>> > I was using ancient Axis 1.4 and none of them were working.  You can
>>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>>> >
>>> > If you can give instructions for invoking CXF, maybe we can do that
>>> > instead.  What's the main class, and what jars do we need to include?
>>> >
>>> > Karl
>>> >
>>> >
>>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>>> >
>>> >> Yes, I did. I have only tested Authentication service with Apache CXF
>>> and
>>> >> it was apparently working fine. Which ones were failing for you?
>>> >>
>>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>>> wrote:
>>> >>
>>> >>> Were you able to look at this yesterday at all?
>>> >>> Karl
>>> >>>
>>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>>> wrote:
>>> >>>
>>> >>>> They're all checked in.
>>> >>>>
>>> >>>> See
>>> >>>>
>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>> >>>>
>>> >>>> Karl
>>> >>>>
>>> >>>>
>>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>> >>>>
>>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>>> >>>>>
>>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>> >>>>> wrote:
>>> >>>>>
>>> >>>>> > I'm redeveloping the Livelink connector because the API code has
>>> been
>>> >>>>> > discontinued and the only API is now web services based.  The
>>> WSDLs
>>> >>>>> and
>>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>> >>>>> WSDL2Java to
>>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>> >>>>> this work
>>> >>>>> > -- even though the WSDLs references have been made local and the
>>> >>>>> XSDs also
>>> >>>>> > seem to be getting parsed, it complains about missing
>>> definitions,
>>> >>>>> even
>>> >>>>> > though those definitions are clearly present in the XSD files.
>>> >>>>> >
>>> >>>>> > Has anyone had enough experience with this tool, and web
>>> services in
>>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>>> >>>>> verbose a
>>> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.
>>> I
>>> >>>>> suspect
>>> >>>>> > namespace issues but I can't figure out what they are.
>>> >>>>> >
>>> >>>>> > Thanks in advance,
>>> >>>>> > Karl
>>> >>>>> >
>>> >>>>>
>>> >>>>
>>>
>>


Re: Axis question

2019-01-25 Thread Karl Wright
The tools package doesn't seem to have it either.
Karl


On Fri, Jan 25, 2019 at 9:43 AM Karl Wright  wrote:

> Do you know what jar/maven package this is in?  because I don't seem to
> have it in our normal cxf jars...
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:
>
>> I used a wsdl2java script that comes as an utility of the apache cxf
>> release, but basically is making use
>> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an
>> usage
>> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>>
>> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>>
>> > I was using ancient Axis 1.4 and none of them were working.  You can
>> > exercise this with "ant classcreate-wsdls" in the csws directory.
>> >
>> > If you can give instructions for invoking CXF, maybe we can do that
>> > instead.  What's the main class, and what jars do we need to include?
>> >
>> > Karl
>> >
>> >
>> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>> >
>> >> Yes, I did. I have only tested Authentication service with Apache CXF
>> and
>> >> it was apparently working fine. Which ones were failing for you?
>> >>
>> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
>> wrote:
>> >>
>> >>> Were you able to look at this yesterday at all?
>> >>> Karl
>> >>>
>> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
>> wrote:
>> >>>
>> >>>> They're all checked in.
>> >>>>
>> >>>> See
>> >>>>
>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>> >>>>
>> >>>> Karl
>> >>>>
>> >>>>
>> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>> >>>>
>> >>>>> Karl, can you share the WSDL, I can try to take a look later today
>> >>>>>
>> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>> >>>>> wrote:
>> >>>>>
>> >>>>> > I'm redeveloping the Livelink connector because the API code has
>> been
>> >>>>> > discontinued and the only API is now web services based.  The
>> WSDLs
>> >>>>> and
>> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
>> >>>>> WSDL2Java to
>> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>> >>>>> this work
>> >>>>> > -- even though the WSDLs references have been made local and the
>> >>>>> XSDs also
>> >>>>> > seem to be getting parsed, it complains about missing definitions,
>> >>>>> even
>> >>>>> > though those definitions are clearly present in the XSD files.
>> >>>>> >
>> >>>>> > Has anyone had enough experience with this tool, and web services
>> in
>> >>>>> > general, to figure out what's wrong?  I've tried turning on as
>> >>>>> verbose a
>> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>> >>>>> suspect
>> >>>>> > namespace issues but I can't figure out what they are.
>> >>>>> >
>> >>>>> > Thanks in advance,
>> >>>>> > Karl
>> >>>>> >
>> >>>>>
>> >>>>
>>
>


Re: Axis question

2019-01-25 Thread Karl Wright
Do you know what jar/maven package this is in?  because I don't seem to
have it in our normal cxf jars...

Karl


On Fri, Jan 25, 2019 at 9:08 AM Rafa Haro  wrote:

> I used a wsdl2java script that comes as an utility of the apache cxf
> release, but basically is making use
> of org.apache.cxf.tools.wsdlto.WSDLToJava class. You can find here an usage
> example with ant: http://cxf.apache.org/docs/wsdl-to-java.html
>
> On Fri, Jan 25, 2019 at 2:59 PM Karl Wright  wrote:
>
> > I was using ancient Axis 1.4 and none of them were working.  You can
> > exercise this with "ant classcreate-wsdls" in the csws directory.
> >
> > If you can give instructions for invoking CXF, maybe we can do that
> > instead.  What's the main class, and what jars do we need to include?
> >
> > Karl
> >
> >
> > On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
> >
> >> Yes, I did. I have only tested Authentication service with Apache CXF
> and
> >> it was apparently working fine. Which ones were failing for you?
> >>
> >> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright 
> wrote:
> >>
> >>> Were you able to look at this yesterday at all?
> >>> Karl
> >>>
> >>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright 
> wrote:
> >>>
> >>>> They're all checked in.
> >>>>
> >>>> See
> >>>>
> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
> >>>>
> >>>> Karl
> >>>>
> >>>>
> >>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
> >>>>
> >>>>> Karl, can you share the WSDL, I can try to take a look later today
> >>>>>
> >>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
> >>>>> wrote:
> >>>>>
> >>>>> > I'm redeveloping the Livelink connector because the API code has
> been
> >>>>> > discontinued and the only API is now web services based.  The WSDLs
> >>>>> and
> >>>>> > XSDs have been exported and I'm trying to use the Axis tool
> >>>>> WSDL2Java to
> >>>>> > convert to Java code.  Unfortunately, I haven't been able to make
> >>>>> this work
> >>>>> > -- even though the WSDLs references have been made local and the
> >>>>> XSDs also
> >>>>> > seem to be getting parsed, it complains about missing definitions,
> >>>>> even
> >>>>> > though those definitions are clearly present in the XSD files.
> >>>>> >
> >>>>> > Has anyone had enough experience with this tool, and web services
> in
> >>>>> > general, to figure out what's wrong?  I've tried turning on as
> >>>>> verbose a
> >>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
> >>>>> suspect
> >>>>> > namespace issues but I can't figure out what they are.
> >>>>> >
> >>>>> > Thanks in advance,
> >>>>> > Karl
> >>>>> >
> >>>>>
> >>>>
>


Re: Axis question

2019-01-25 Thread Karl Wright
The cxf stuff is already present, and is available in connector-common-lib
as well, so all that might be needed might be a new ant rule to invoke it:

01/17/2019  05:47 PM 1,400,339 cxf-core-3.2.6.jar
01/17/2019  05:46 PM   181,690 cxf-rt-bindings-soap-3.2.6.jar
01/17/2019  05:46 PM38,307 cxf-rt-bindings-xml-3.2.6.jar
01/17/2019  05:46 PM   105,048 cxf-rt-databinding-jaxb-3.2.6.jar
01/17/2019  05:47 PM   680,120 cxf-rt-frontend-jaxrs-3.2.6.jar
01/17/2019  05:46 PM   346,308 cxf-rt-frontend-jaxws-3.2.6.jar
01/17/2019  05:46 PM   103,850 cxf-rt-frontend-simple-3.2.6.jar
01/17/2019  05:47 PM   179,790 cxf-rt-rs-client-3.2.6.jar
01/17/2019  05:47 PM   362,532 cxf-rt-transports-http-3.2.6.jar
01/17/2019  05:46 PM75,478 cxf-rt-ws-addr-3.2.6.jar
01/17/2019  05:46 PM   214,507 cxf-rt-ws-policy-3.2.6.jar
01/17/2019  05:46 PM   173,359 cxf-rt-wsdl-3.2.6.jar

We'd also need XSD code generation, which is currently done by Castor
(haven't even tried it yet), so if this package has that ability too, it
would would be fantastic.

Karl




On Fri, Jan 25, 2019 at 8:59 AM Karl Wright  wrote:

> I was using ancient Axis 1.4 and none of them were working.  You can
> exercise this with "ant classcreate-wsdls" in the csws directory.
>
> If you can give instructions for invoking CXF, maybe we can do that
> instead.  What's the main class, and what jars do we need to include?
>
> Karl
>
>
> On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:
>
>> Yes, I did. I have only tested Authentication service with Apache CXF and
>> it was apparently working fine. Which ones were failing for you?
>>
>> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright  wrote:
>>
>>> Were you able to look at this yesterday at all?
>>> Karl
>>>
>>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:
>>>
>>>> They're all checked in.
>>>>
>>>> See
>>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>>>
>>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>>>
>>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>>> wrote:
>>>>>
>>>>> > I'm redeveloping the Livelink connector because the API code has been
>>>>> > discontinued and the only API is now web services based.  The WSDLs
>>>>> and
>>>>> > XSDs have been exported and I'm trying to use the Axis tool
>>>>> WSDL2Java to
>>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>>>> this work
>>>>> > -- even though the WSDLs references have been made local and the
>>>>> XSDs also
>>>>> > seem to be getting parsed, it complains about missing definitions,
>>>>> even
>>>>> > though those definitions are clearly present in the XSD files.
>>>>> >
>>>>> > Has anyone had enough experience with this tool, and web services in
>>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>>> verbose a
>>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>>>>> suspect
>>>>> > namespace issues but I can't figure out what they are.
>>>>> >
>>>>> > Thanks in advance,
>>>>> > Karl
>>>>> >
>>>>>
>>>>


Re: Axis question

2019-01-25 Thread Karl Wright
I was using ancient Axis 1.4 and none of them were working.  You can
exercise this with "ant classcreate-wsdls" in the csws directory.

If you can give instructions for invoking CXF, maybe we can do that
instead.  What's the main class, and what jars do we need to include?

Karl


On Fri, Jan 25, 2019 at 7:28 AM Rafa Haro  wrote:

> Yes, I did. I have only tested Authentication service with Apache CXF and
> it was apparently working fine. Which ones were failing for you?
>
> On Fri, Jan 25, 2019 at 12:38 PM Karl Wright  wrote:
>
>> Were you able to look at this yesterday at all?
>> Karl
>>
>> On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:
>>
>>> They're all checked in.
>>>
>>> See
>>> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>>>
>>> Karl
>>>
>>>
>>> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>>>
>>>> Karl, can you share the WSDL, I can try to take a look later today
>>>>
>>>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright 
>>>> wrote:
>>>>
>>>> > I'm redeveloping the Livelink connector because the API code has been
>>>> > discontinued and the only API is now web services based.  The WSDLs
>>>> and
>>>> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java
>>>> to
>>>> > convert to Java code.  Unfortunately, I haven't been able to make
>>>> this work
>>>> > -- even though the WSDLs references have been made local and the XSDs
>>>> also
>>>> > seem to be getting parsed, it complains about missing definitions,
>>>> even
>>>> > though those definitions are clearly present in the XSD files.
>>>> >
>>>> > Has anyone had enough experience with this tool, and web services in
>>>> > general, to figure out what's wrong?  I've tried turning on as
>>>> verbose a
>>>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>>>> suspect
>>>> > namespace issues but I can't figure out what they are.
>>>> >
>>>> > Thanks in advance,
>>>> > Karl
>>>> >
>>>>
>>>


Re: Axis question

2019-01-25 Thread Karl Wright
Were you able to look at this yesterday at all?
Karl

On Thu, Jan 24, 2019 at 6:34 AM Karl Wright  wrote:

> They're all checked in.
>
> See
> https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls
>
> Karl
>
>
> On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:
>
>> Karl, can you share the WSDL, I can try to take a look later today
>>
>> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright  wrote:
>>
>> > I'm redeveloping the Livelink connector because the API code has been
>> > discontinued and the only API is now web services based.  The WSDLs and
>> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
>> > convert to Java code.  Unfortunately, I haven't been able to make this
>> work
>> > -- even though the WSDLs references have been made local and the XSDs
>> also
>> > seem to be getting parsed, it complains about missing definitions, even
>> > though those definitions are clearly present in the XSD files.
>> >
>> > Has anyone had enough experience with this tool, and web services in
>> > general, to figure out what's wrong?  I've tried turning on as verbose a
>> > debugging level for WSDL2Java as I can and it's no help at all.  I
>> suspect
>> > namespace issues but I can't figure out what they are.
>> >
>> > Thanks in advance,
>> > Karl
>> >
>>
>


[jira] [Resolved] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

2019-01-24 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1573.
-
Resolution: Not A Problem

> Web Crawler exclude from index matches too much?
> 
>
> Key: CONNECTORS-1573
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Korneel Staelens
>Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does 
> index all childpages of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not 
> indexed due to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1573) Web Crawler exclude from index matches too much?

2019-01-24 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16751689#comment-16751689
 ] 

Karl Wright commented on CONNECTORS-1573:
-

Questions like this should be asked to the us...@manifoldcf.apache.org list, 
not via a ticket.

The quick answer: if you look at the simple history, you can tell whether the 
pages are fetched or not.  If they are not fetched at all (that is, they do not 
appear), then your inclusion and exclusion list is wrong.  That doesn't sound 
like it's the problem here; it sounds like *after* fetching it's being blocked. 
 There are a number of reasons for that; the Simple History should give you a 
good idea which answer it is.  If it reports "JOBDESCRIPTION", that means that 
the *indexing* inclusion/exclusion rule discarded it   This is not the same as 
the *fetching* include/exclusion rules, which is what it sounds like you might 
be setting.  They're on the same tabs, just farther down.  The manual does not 
include the indexing rules sections; this should be addressed.

I suspect that, based on the regexps you given, you're also overlooking the 
fact that if the regexp matches ANYWHERE in the URL it is considered a match.  
So if you want a very specific URL, you need to delimit it with ^ at the 
beginning and $ at the end, to insure that the entire URL matches and ONLY that 
URL.




> Web Crawler exclude from index matches too much?
> 
>
> Key: CONNECTORS-1573
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1573
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.10
>Reporter: Korneel Staelens
>Priority: Major
>
> Hello, 
> I'm not sure this is a bug, or my misinterpretation of the exclusion rules:
> I want to set-up a rule, so that it does NOT index a parentpage, but does 
> index all childpages of that parent:
> I'm setting up a rule: 
> Inclusions: 
> .*
>  
> Exclustions:
> [http://www.website.com/nl/]
> (I've tried also: http://www.website.com/nl/(\s)* )
> No dice, I'f I'm looking at the logs, I see the pages are crawled, but not 
> indexed due to job restriction. Is my rule wrong? Or is this a small bug?
>  
> Thanks for advice!
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Axis question

2019-01-24 Thread Karl Wright
They're all checked in.

See
https://svn.apache.org/repos/asf/manifoldcf/branches/CONNECTORS-1566/connectors/csws/wsdls

Karl


On Thu, Jan 24, 2019 at 6:24 AM Rafa Haro  wrote:

> Karl, can you share the WSDL, I can try to take a look later today
>
> On Thu, Jan 24, 2019 at 12:13 PM Karl Wright  wrote:
>
> > I'm redeveloping the Livelink connector because the API code has been
> > discontinued and the only API is now web services based.  The WSDLs and
> > XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
> > convert to Java code.  Unfortunately, I haven't been able to make this
> work
> > -- even though the WSDLs references have been made local and the XSDs
> also
> > seem to be getting parsed, it complains about missing definitions, even
> > though those definitions are clearly present in the XSD files.
> >
> > Has anyone had enough experience with this tool, and web services in
> > general, to figure out what's wrong?  I've tried turning on as verbose a
> > debugging level for WSDL2Java as I can and it's no help at all.  I
> suspect
> > namespace issues but I can't figure out what they are.
> >
> > Thanks in advance,
> > Karl
> >
>


Axis question

2019-01-24 Thread Karl Wright
I'm redeveloping the Livelink connector because the API code has been
discontinued and the only API is now web services based.  The WSDLs and
XSDs have been exported and I'm trying to use the Axis tool WSDL2Java to
convert to Java code.  Unfortunately, I haven't been able to make this work
-- even though the WSDLs references have been made local and the XSDs also
seem to be getting parsed, it complains about missing definitions, even
though those definitions are clearly present in the XSD files.

Has anyone had enough experience with this tool, and web services in
general, to figure out what's wrong?  I've tried turning on as verbose a
debugging level for WSDL2Java as I can and it's no help at all.  I suspect
namespace issues but I can't figure out what they are.

Thanks in advance,
Karl


Re: Do we support UTF-16 chars in version strings when using MySQL/MariaDB?

2019-01-23 Thread Karl Wright
It's critical, with Manifold, that the database instance be capable of
handling any characters it's likely to encounter.  For Postgresql we tell
people to install it with the utf-8 collation, for instance, and when we
create database instances ourselves we try to specify that as well.  For
MariaDB, have a look at the database implementation we've got, and let me
know if this is something we're missing anywhere?

Thanks,
Karl


On Wed, Jan 23, 2019 at 3:00 AM Markus Schuch  wrote:

> Hi,
>
> while using MySQL/MariaDB for MCF i encountered a "deadlock" kind of
> situation caused by a UTF-16 character (e.g. U+1F3AE) in a String
> inserted in one of the varchar colums.
>
> In my case a connector wrote th title of a parent document in to the
> version string of the process document, which contained the character
> U+1F3AE - a gamepad :)
>
> This lead to SQL Error 22001 "Incorrect string value: '\xF0\x9F\x8E\xAE'
> for column 'lastversion' at row 1" in mysql because the utf8 collation
> encoding does not support that kind of chars. (utf8mb4 does)
>
> The cause was hard to find, because it somehow it lead to a transaction
> abort loop in the incremental ingester and the error was not logged
> properly.
>
> My question:
> - should we create the mysql database with utf8mb4 by default?
> - or should inserted strings be sanatized from UTF-16 chars?
> - or should 22001 be handled better?
>
> Thanks in advance
> Markus
>


Re: Is SOLR-12798 still a blocker for deleting documents?

2019-01-21 Thread Karl Wright
The latest (2.12) version of MCF fixes this problem by working around it.

Karl


On Mon, Jan 21, 2019 at 5:12 AM Erlend Garåsen 
wrote:

>
> I have encountered the same problem Karl reported in the following ticket:
> https://issues.apache.org/jira/browse/SOLR-12798
>
> Since the ticket is unresolved, is this still a problem with the latest
> MCF version? I get the following error when the Solr connector tries to
> delete documents:
>
> FATAL 2019-01-21T09:44:15,346 (Worker thread '6') - Error tossed: This
> Should not happen
> java.lang.RuntimeException: This Should not happen
>   at
>
> org.apache.solr.client.solrj.impl.BinaryRequestWriter.getContentStreams(BinaryRequestWriter.java:67)
> ~[?:?]
> […]
>   at
>
> org.apache.manifoldcf.agents.output.solr.HttpPoster$DeleteThread.run(HttpPoster.java:1366)
> ~[?:?]
>
> I'm just asking in case my preemptive patch is causing these problems.
>
> Erlend
>


[jira] [Resolved] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-21 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1563.
-
Resolution: Not A Problem

User has a configuration that makes no sense.

> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-18 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1535.
-
Resolution: Fixed

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-18 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746262#comment-16746262
 ] 

Karl Wright commented on CONNECTORS-1535:
-

[~jamesthomas], the registry process has no dependencies whatsoever on DFC, so 
any changes to this would be unnecessary.

Last question: can the DFC properties location be provided as a -D switch 
parameter to the JVM?  


> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-17 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745422#comment-16745422
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o] thanks for trying this.

I await Erlend's more precise description of his setup.  We are in fact setting 
up the HttpClientBuilder exactly as you recommend:

{code}
RequestConfig.Builder requestBuilder = RequestConfig.custom()
  .setCircularRedirectsAllowed(true)
  .setSocketTimeout(socketTimeout)
  .setExpectContinueEnabled(true)
  .setConnectTimeout(connectionTimeout)
  .setConnectionRequestTimeout(socketTimeout);

HttpClientBuilder clientBuilder = HttpClients.custom()
  .setConnectionManager(connectionManager)
  .disableAutomaticRetries()
  .setDefaultRequestConfig(requestBuilder.build())
  .setRedirectStrategy(new LaxRedirectStrategy())
  .setRequestExecutor(new HttpRequestExecutor(socketTimeout));
{code}



> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16745418#comment-16745418
 ] 

Karl Wright commented on CONNECTORS-1535:
-

Can you put dfc.properties in the same directory as the other DFC files and 
have it be found?


> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1535:

Fix Version/s: ManifoldCF 2.13

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1535) Documentum Connector cannot find dfc.properties

2019-01-17 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1535:
---

Assignee: Karl Wright

> Documentum Connector cannot find dfc.properties
> ---
>
> Key: CONNECTORS-1535
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1535
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.10, ManifoldCF 2.11
> Environment: Manifold 2.11
> CentOS Linux release 7.5.1804 (Core)
> OpenJDK 64-Bit Server VM 18.9 (build 11+28, mixed mode)
>  
>Reporter: James Thomas
>Assignee: Karl Wright
>Priority: Major
>
> I have found that when installing a clean MCF instance I cannot get 
> Documentum repository connectors to connect to Documentum until I have added 
> this line to the processes/documentum-server/run.sh script before the call to 
> Java:
>  
> {code:java}
> CLASSPATH="$CLASSPATH""$PATHSEP""$DOCUMENTUM"{code}
> Until I do this, attempts to save the connector will result in this output to 
> the console:
>  
> {noformat}
> 4 [RMI TCP Connection(2)-127.0.0.1] ERROR 
> com.documentum.fc.common.impl.preferences.PreferencesManager  - 
> [DFC_PREFERENCE_LOAD_FAILED] Failed to load persistent preferences from null
> java.io.FileNotFoundException: dfc.properties
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.locateMainPersistentStore(PreferencesManager.java:378)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.readPersistentProperties(PreferencesManager.java:329)
>     at 
> com.documentum.fc.common.impl.preferences.PreferencesManager.(PreferencesManager.java:37)
>     at 
> com.documentum.fc.common.DfPreferences.initialize(DfPreferences.java:64)
> ..{noformat}
> and this message in the MCF UI:
>  
> {noformat}
> Connection failed: Documentum error: No DocBrokers are configured{noformat}
>  
>  
> I mentioned this in #1512 for MCF 2.10 but it got lost in the other work done 
> in that ticket. While setting up 2.11 from scratch I encountered it again.
>  
> Once I have edited the run.sh script I get this in the console, showing that 
> (for whatever reason) the change is significant:
>  
> {noformat}
> Reading DFC configuration from 
> "file:/opt/manifold/apache-manifoldcf-2.11/processes/documentum-server/dfc.properties"
> {noformat}
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: FW: ManifoldCF Documentum connector slowness

2019-01-17 Thread Karl Wright
Hi,

HSQLDB is actually reasonably fast, but it has other problems, namely that
it stores whole DB tables in memory so if your crawl is large enough it
will run out.

The reason for Documentum connector slowness is almost always poor
Documentum performance, and has nothing to do with MCF itself.  You can
prove that by looking at the Simple History and seeing how long it takes to
fetch documents (for example).

Karl




On Thu, Jan 17, 2019 at 9:05 AM Gomathi Palanisamy <
gpalanis...@worldbankgroup.org> wrote:

>
>
> Documentum connector crawling performance can be improved with changing
> the MCF default HSQL to PostgreSQL? Any suggestions?
>
>
>
> *From:* Gomathi Palanisamy
> *Sent:* Wednesday, January 16, 2019 2:54 PM
> *To:* 'user-i...@manifoldcf.apache.org' ;
> 'user-...@manifoldcf.apache.org' 
> *Subject:* RE: ManifoldCF Documentum connector slowness
>
>
>
>
>
> Hi Team,
>
>
>
> We are crawling data from DCTM repository using ManiFoldCF documentum
> connector and writing the crawled data to MongoDB. Crawling triggered with
> throttling value 500.But crawling speed is very slow per minute connector
> is fetching only 170 documents. The server where MCF installed is
> configured with enough memory with 8 logical cores (CPU). Can someone help
> us here to improve crawling speed?
>
>
>
> Thanks,
>
> Gomathi
>


Re: DidMCF 2.12 change the JSON API response format?

2019-01-17 Thread Karl Wright
The output format did change, and the reason was because the "syntactic
sugar" format would not preserve ordering, so that if you output and
re-input, you'd lose information.

The more complex form is being used only where there is a possibility of
ordering confusion.  It was always accepted as input (so that has not
changed).

Thanks,
Karl


On Thu, Jan 17, 2019 at 1:44 AM James Thomas 
wrote:

> Hi,
>
> After installing MCF 2.12 I am seeing a change in the JSON response format
> from the REST API.
>
> Here's an example, from the URL mcf-api-service/json/jobs. I've isolated a
> job which exists in both my 2.11 and 2.12 instances. 2.11 is on the left:
>
>
>
> The 2.12 format feels "internal" and I see that
> https://manifoldcf.apache.org/release/release-2.12/api/framework/constant-values.html
> shows e.g. "_type_", "_children_" as constants (but that's also true in
> 2.12.
>
> I can't see a release note that talks about this change, although
> "CONNECTORS-1549: Problem with API output JSON: losing order in child
> records." talks about JSON format and the test data for that change (
> https://issues.apache.org/jira/secure/attachment/12944641/CONNECTORS-1549.patch)
> appears to change the JSON format as in my observation.
>
> Is this change expected, and if so is there a way to request the pre-2.12
> format?
>
> Cheers,
> James
>
>
> --
> James Thomas
> Head of Testing
> [image: Linguamatics] 
> * Phone: * +44 (0)1223 651910 |
> * Website: * www.linguamatics.com |
> * Address: * 324 Cambridge Science Park | Milton Road | Cambridge, UK |
> CB4 0WG
> LinkedIn  | Twitter
>  | Facebook
> 
> Frost & Sullivan Best Practices Award 2016 & 2017 |
> 
> Queen's Award for Enterprise: International Trade 2014
> 
> Linguamatics Limited is a company incorporated in England and Wales with
> registered number: 4248841
> and a registered office at 324 Cambridge Science Park, Milton Road,
> Cambridge, CB4 0WG
>


Re: SharedDriveConnector - jcifs.smb.SmbException: A device attached to the system is not functioning

2019-01-17 Thread Karl Wright
That error is coming from the server you're trying to index.  Sounds like
some kind of hardware problem is being detected.

Karl


On Thu, Jan 17, 2019 at 2:17 AM  wrote:

>
>
> Hi,
>
> I've got a problem with the WindowsShares-Connector.
> While indexing data on a filesever i get the following exception after
> approximately 1 documents have been indexed.
>
> jcifs.smb.SmbException: A device attached to the system is not functioning.
> at jcifs.smb.SmbTransport.checkStatus(SmbTransport.java:563) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbTransport.send(SmbTransport.java:640) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbSession.send(SmbSession.java:238) ~[jcifs.jar:?]
> at jcifs.smb.SmbTree.send(SmbTree.java:119) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.send(SmbFile.java:775) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.doFindFirstNext(SmbFile.java:1989) ~
> [jcifs.jar:?]
> at jcifs.smb.SmbFile.doEnum(SmbFile.java:1741) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.listFiles(SmbFile.java:1718) ~[jcifs.jar:?]
> at jcifs.smb.SmbFile.listFiles(SmbFile.java:1707) ~[jcifs.jar:?]
> at
>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileListFiles
> (SharedDriveConnector.java:2318) [mcf-jcifs-connector.jar:?]
>
> at
>
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments
> (SharedDriveConnector.java:798) [mcf-jcifs-connector.jar:?]
>
> at org.apache.manifoldcf.crawler.system.WorkerThread.run
> (WorkerThread.java:399) [mcf-pull-agent.jar:?]
> ERROR 2019-01-16T08:48:00,172 (Worker thread '14') - JCIFS: SmbException
> tossed processing smb://??.??.??.???/dir/dir/dir
>
> I'm using ManifoldCF 2.11 and jcifs-1.3.19.jar
>
> Do you have an idea what i could do or even a solution for it?
>
> Kind regards,
>
> Florjana
>
> 
> Der Austausch von Nachrichten via e-mail dient ausschließlich
> Informationszwecken und ist nur für den Gebrauch des Empfängers bestimmt.
> Rechtsgeschäftliche Erklärungen dürfen über dieses Medium nicht
> ausgetauscht werden. Sollten Sie nicht der Adressat sein, verständigen Sie
> uns bitte unverzüglich per e-mail oder Telefon und vernichten Sie diese
> Nachricht.
>
> The exchange of e-mail messages is for purposes of information only and
> only intended for the recipient.
> This medium may not be used to exchange legal declarations. If you are not
> the intended recipient, please contact us immediately by e-mail or phone
> and delete this message from your system.


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743186#comment-16743186
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], any updates?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743033#comment-16743033
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please also see this discussion:

https://issues.apache.org/jira/browse/CONNECTORS-1533



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743028#comment-16743028
 ] 

Karl Wright commented on CONNECTORS-1563:
-

First, I asked for the Simple History, not the manifoldcf logs.  What does the 
simple history say about document ingestions for the connection in question 
with the new configuration?

But, from your solr log:

{code}
2019-01-15 11:51:54.211 ERROR (qtp592617454-22) [   x:eesolr_webcrawler] 
o.a.s.s.HttpSolrCall null:org.apache.solr.common.SolrException: 
org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:234)
{code}

Note that the stack trace is from the ExtractingDocumentLoader, which is Tika.  
You did not manage to actually change the output handler to the non-extracting 
one, possibly because you have configured your Solr in a non-default way.  I 
cannot debug that for you, sorry.

Can you do the following:  Download the current 7.x version of Solr, fresh, and 
extract it.  Start it using the standard provided simple scripts.  Point 
ManifoldCF at it and crawl some documents, using the setup for the connection I 
have described.  Does that work?  If it does, and I expect it to because that 
is what works for me here, then it is your job to figure out what you did to 
Solr to make that not work.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: Document simple history.docx, managed-schema, manifold 
> settings.docx, manifoldcf.log, solr.log, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16743006#comment-16743006
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please include [INFO] messages from the Solr log for example indexing requests, 
and also include records from the Simple History for documents indexed with the 
new configuration.  Thanks.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-15 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16742949#comment-16742949
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Please view the Solr connection and click the button that tells it to forget 
about everything it has indexed.  That will force reindexing.  That's standard 
step when you change configuration like this and you want all documents to be 
reindexed.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, manifold settings.docx, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1570) ManifoldCF Documentum connetor crawling performance

2019-01-12 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741333#comment-16741333
 ] 

Karl Wright commented on CONNECTORS-1570:
-

Please ask your question on the us...@manifoldcf.apache.org list.
In our experience, the performance of documentum itself is the bottleneck, and 
nothing can be done without optimizing for that.


> ManifoldCF Documentum connetor crawling performance
> ---
>
> Key: CONNECTORS-1570
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1570
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Gomahti
>Priority: Major
>
> We are crawling data from DCTM repository using ManiFoldCF documentum 
> connector and writing the crawled data to MongoDB. Crawling triggered with 
> throttling value 500.But crawling speed is very slow per minute connector is 
> fetching only 170 documents. The server where MCF installed is configured 
> with enough memory with 8 logical cores (CPU). Can someone help us here to 
> improve crawling speed?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1570) ManifoldCF Documentum connetor crawling performance

2019-01-12 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1570.
-
Resolution: Not A Problem

> ManifoldCF Documentum connetor crawling performance
> ---
>
> Key: CONNECTORS-1570
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1570
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Documentum connector
>Affects Versions: ManifoldCF 2.9.1
>Reporter: Gomahti
>Priority: Major
>
> We are crawling data from DCTM repository using ManiFoldCF documentum 
> connector and writing the crawled data to MongoDB. Crawling triggered with 
> throttling value 500.But crawling speed is very slow per minute connector is 
> fetching only 170 documents. The server where MCF installed is configured 
> with enough memory with 8 logical cores (CPU). Can someone help us here to 
> improve crawling speed?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16741004#comment-16741004
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
I need to pass from manifold one custom field and value which I want to see in 
Solr index. That is the reason why I used metadata transformer where I can pass 
the custom field in job - tab metadata adjuster.
{quote}

Yes, people do that all the time.  Just add the Metadata Adjuster any place in 
your pipeline and have it inject the field value you want.  It will be 
faithfully transmitted to Solr.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740587#comment-16740587
 ] 

Karl Wright commented on CONNECTORS-1563:
-

The metadata extractor can go anywhere in your pipeline, after Tika extraction. 
 There is absolutely no point in having *two* Tika extractions though -- and 
that's what you're trying to do with the setup you've got.

What I'd recommend is that you use only the ManifoldCF-side Tika extractor, and 
inject content into Solr using the /update handler, not the /update/extract 
handler.  There's also a checkbox you'd need to uncheck in the Solr connection 
configuration. It's all covered in the ManifoldCF end user documentation.



> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740435#comment-16740435
 ] 

Karl Wright commented on CONNECTORS-1563:
-

{quote}
Solr cell with standard update handler...
{quote}

This is not Option 2; it's a combination of (1) and (2) and is not a model that 
we support.


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1563) SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 bytes

2019-01-11 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16740330#comment-16740330
 ] 

Karl Wright commented on CONNECTORS-1563:
-

Can you tell me which configuration you are attempting:

(1) Solr Cell + extract update handler + no Tika content extraction in MCF, or
(2) NO Solr Cell + standard update handler + Tika content extraction in MCF

Which is it?


> SolrException: org.apache.tika.exception.ZeroByteFileException: InputStream 
> must have > 0 bytes
> ---
>
> Key: CONNECTORS-1563
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1563
> Project: ManifoldCF
>  Issue Type: Task
>  Components: Lucene/SOLR connector
>Reporter: Sneha
>Assignee: Karl Wright
>Priority: Major
> Attachments: managed-schema, solrconfig.xml
>
>
> I am encountering this problem:
> I have checked "Use the Extract Update Handler:" param then I am getting an 
> error on Solr i.e. null:org.apache.solr.common.SolrException: 
> org.apache.tika.exception.ZeroByteFileException: InputStream must have > 0 
> bytes
> If I ignore tika exception, my documents get indexed but dont have content 
> field on Solr.
> I am using Solr 7.3.1 and manifoldCF 2.8.1
> I am using solr cell and hence not configured external tika extractor in 
> manifoldCF pipeline
> Please help me with this problem
> Thanks in advance



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1569) IBM WebSEAL authentication

2019-01-10 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16739541#comment-16739541
 ] 

Karl Wright commented on CONNECTORS-1569:
-

I'm not sure what the best approach might be for this since almost everyone 
wants the expect-continue in place.  It's essential, in fact, for 
authenticating properly via POST on many other systems.

Adding a way of disabling this via the UI is plausible but it's significant 
work all around.  Still, I think that would be the best approach to meet your 
needs.  Unfortunately I'm already booked at least until March, so you may do 
best by trying to submit a patch that I can integrate and/or clean up.

> IBM WebSEAL authentication
> --
>
> Key: CONNECTORS-1569
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1569
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifold 2.11
>  IBM WebSEAL
>Reporter: Ferdi Klomp
>Assignee: Karl Wright
>Priority: Major
>  Labels: ManifoldCF
>
> Hi,
> We have stumbled upon a problem with the Web Connector authentication in 
> relation to IBM WebSEAL. We were unable to perform a successfully 
> authentication against WebSEAL. After some time debugging we figured out the 
> web connector sends out a "Expect:100 Continue" header and this is not 
> supported by WebSEAL.
>  [https://www-01.ibm.com/support/docview.wss?uid=swg21626421
> ]1. Disabling the "Expect:100 Continue" functionality by putting 
> setExpectedContinueEnabled to false in the "ThrottledFetcher.java" eventually 
> solved the problem. The exact line can be found here:
>  
> [https://github.com/apache/manifoldcf/blob/trunk/connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java#L508]
> I'm not sure if this option is required for other environment, or that it can 
> be disabled by default, or made configurable?
> 2. Another option would be to make the timeout configurable, as the WebSEAL 
> docs state "The browser need to have some kind of timeout to to send the 
> request body before exceeding intra-connection-timeout.". By default, the web 
> connector request timeout exceeded the intra-connection-timeout of WebSEAL.
> What is the best way to proceed and get a fixed for this in the web connector?
> Kind regards,
>  Ferdi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1569) IBM WebSEAL authentication

2019-01-10 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1569:
---

Assignee: Karl Wright

> IBM WebSEAL authentication
> --
>
> Key: CONNECTORS-1569
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1569
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifold 2.11
>  IBM WebSEAL
>Reporter: Ferdi Klomp
>Assignee: Karl Wright
>Priority: Major
>  Labels: ManifoldCF
>
> Hi,
> We have stumbled upon a problem with the Web Connector authentication in 
> relation to IBM WebSEAL. We were unable to perform a successfully 
> authentication against WebSEAL. After some time debugging we figured out the 
> web connector sends out a "Expect:100 Continue" header and this is not 
> supported by WebSEAL.
>  [https://www-01.ibm.com/support/docview.wss?uid=swg21626421
> ]1. Disabling the "Expect:100 Continue" functionality by putting 
> setExpectedContinueEnabled to false in the "ThrottledFetcher.java" eventually 
> solved the problem. The exact line can be found here:
>  
> [https://github.com/apache/manifoldcf/blob/trunk/connectors/webcrawler/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/webcrawler/ThrottledFetcher.java#L508]
> I'm not sure if this option is required for other environment, or that it can 
> be disabled by default, or made configurable?
> 2. Another option would be to make the timeout configurable, as the WebSEAL 
> docs state "The browser need to have some kind of timeout to to send the 
> request body before exceeding intra-connection-timeout.". By default, the web 
> connector request timeout exceeded the intra-connection-timeout of WebSEAL.
> What is the best way to proceed and get a fixed for this in the web connector?
> Kind regards,
>  Ferdi



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738271#comment-16738271
 ] 

Karl Wright commented on CONNECTORS-1562:
-

The "Stream has been closed" issue is occurring because it is simply taking too 
long to read all the data from the sitemap page, and the webserver is closing 
the connection before it's complete.  Alternatively, it might be because the 
server is configured to cut pages off after a certain number of bytes.  I don't 
know which one it is.  You will need to do some research to figure out what 
your server's rules look like.  The preferred solution would be to simply relax 
the rules for that one page.

However, if that's not possible, the best alternative would be to break the 
sitemap page up into pieces.  If each piece was, say 1/4 the size, it might be 
small enough to get past your current rules.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16738232#comment-16738232
 ] 

Karl Wright commented on CONNECTORS-1562:
-

We already discussed the IOEXCEPTION issue; that's because of throttling and 
the connection closing is likely occurring on the server side.

For the NULLPOINTEREXCEPTION, there is a stack trace dumped to the ManifoldCF 
log.  Can you find it and create a ticket with it?  Thanks!


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> image-2019-01-09-14-20-50-616.png, manifoldcf.log.cleanup, 
> manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Facing Error while executing the job After sometime

2019-01-09 Thread Karl Wright
This is a serious fatal error of some kind and we need a complete stack
trace to address it.  The JVM stops giving complete stack traces after they
repeat for a certain number of times, so you will need go back far enough
in the log to find where a complete trace was dumped.

Thanks,
Karl


On Wed, Jan 9, 2019 at 3:18 AM Nikita Ahuja  wrote:

>
> Hi Mates,
>
>
> I am executing a Web connector job with the ManifoldCF version 2.12 on the
> linux server, with given specification:
>
> Repository Connector: Web
> Transformation: Tika Extractor
> Transformation: Metadata Adjuster
> Output: ElasticSearch
>
> But the issue is after crawling some records, I am seeing the below error
> in the ManifoldCF log and the service gets deactivated on the server and
> then after restart it works normally.
>
> *FATAL 2019-01-08T15:24:03,103 (Worker thread '1') - Error tossed: null*
> *java.lang.ArrayIndexOutOfBoundsException*
> *FATAL 2019-01-08T15:24:28,285 (Worker thread '14') - Error tossed: null*
> *java.lang.ArrayIndexOutOfBoundsException*
> *FATAL 2019-01-08T15:24:41,702 (Worker thread '30') - Error tossed: null*
> *java.lang.ArrayIndexOutOfBoundsException*
>
> Please suggest any steps which has to be followed.
>
> Thanks,
> Nikita
>
> --
> Thanks and Regards,
> Nikita
> Email: nik...@smartshore.nl
> United Sources Service Pvt. Ltd.
> a "Smartshore" Company
> Mobile: +91 99 888 57720
> http://www.smartshore.nl
>


[jira] [Resolved] (CONNECTORS-1568) UI error imported web connection

2019-01-09 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1568.
-
Resolution: Fixed

A minor fix has been committed that makes the UI robust against a missing 
truststore in the connection definition.  Otherwise, it sounds like the user 
found an error in their process and that resolved the issue for them.


> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(Web

[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-09 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737950#comment-16737950
 ] 

Karl Wright commented on CONNECTORS-1567:
-

The best way to construct an API request for any connection or job is to create 
it in the UI and then export it.  The documentation is correct but it is hard 
to pick through all the details, and the UI is easier.  So that is what I would 
do if I were trying to verify everything worked.

Unfortunately, because ManifoldCF was forced to remove a JSON jar we depended 
on due to a legal ruling by the Board, we had to retrofit a different (and not 
as good) JSON jar in place a few years back, and that had all sorts of 
downstream effects on the API JSON format.  We did not need to change the 
specification, but we did need to change how we output certain constructs to 
JSON to not use the syntactic sugar we earlier could use.  I fixed a bug in 
this area in either MCF 2.10 or 2.11, so anything output before that time might 
not have reimported faithfully.  Hope that helps with the explanation.


> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
> Attachments: bandwidth.png, bandwidth_test_abc.png
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1567.
-
Resolution: Cannot Reproduce

Was already fixed; the JSON reported was old-form and thus not necessarily 
correct.


> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737581#comment-16737581
 ] 

Karl Wright commented on CONNECTORS-1567:
-

Reading and writing sides match.

In XML, the format would look like this:

{code}

  ...
  
match_value
description
rate_value
  

{code}

This gets translated to JSON, which should merge the "throttle" fields into one 
throttle array, like this:

{code}
throttle: [ {... first throttle ... }, {... second throttle ... } ...]
{code}

That's obviously not happening and I need to figure out why.


> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737559#comment-16737559
 ] 

Karl Wright commented on CONNECTORS-1567:
-

The output code is in common for all connections, and looks correct:

{code}
String[] throttles = connection.getThrottles();
j = 0;
while (j < throttles.length)
{
  String match = throttles[j++];
  String description = connection.getThrottleDescription(match);
  float rate = connection.getThrottleValue(match);
  child = new ConfigurationNode(CONNECTIONNODE_THROTTLE);
  ConfigurationNode throttleChildNode;

  throttleChildNode = new ConfigurationNode(CONNECTIONNODE_MATCH);
  throttleChildNode.setValue(match);
  child.addChild(child.getChildCount(),throttleChildNode);

  if (description != null)
  {
throttleChildNode = new 
ConfigurationNode(CONNECTIONNODE_MATCHDESCRIPTION);
throttleChildNode.setValue(description);
child.addChild(child.getChildCount(),throttleChildNode);
  }

  throttleChildNode = new ConfigurationNode(CONNECTIONNODE_RATE);
  throttleChildNode.setValue(new Float(rate).toString());
  child.addChild(child.getChildCount(),throttleChildNode);

  connectionNode.addChild(connectionNode.getChildCount(),child);
}
{code}

Note that the throttles are an array, so if there are no throttles, one should 
expect null or an empty array to be output.  Checking the reading side now.

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737201#comment-16737201
 ] 

Karl Wright commented on CONNECTORS-1562:
-

You are correct; the hopcount of zero will capture the whitelist, and a 
hopcount of 1 will capture everything the whitelist refers to.  My apologies.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1567:

Fix Version/s: ManifoldCF 2.13

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1568) UI error imported web connection

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737081#comment-16737081
 ] 

Karl Wright commented on CONNECTORS-1568:
-

The UI error is to be expected when the configuration data is corrupted, 
although I've already committed a fix for this particular brand of corruption.  
The bug is that a web configuration that is exported and then reimported gets 
corrupted.


> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody

[jira] [Updated] (CONNECTORS-1568) UI error imported web connection

2019-01-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1568:

Fix Version/s: ManifoldCF 2.13

> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.13
>
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
>     at 
> org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
>     at 
> org.apache.manifoldcf.crawler.interfaces.RepositoryConnector

[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16737070#comment-16737070
 ] 

Karl Wright commented on CONNECTORS-1564:
-

[~michael-o], Erlend provided the code above and it does supposedly enable the 
expect header.  Obviously that code is not working for some reason.  Can you 
review the code and tell us what we are doing wrong?


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1568) UI error imported web connection

2019-01-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1568:
---

Assignee: Karl Wright

> UI error imported web connection
> 
>
> Key: CONNECTORS-1568
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1568
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
>
> Using the ManifoldCF API, we export a web repository connector, with basic 
> settings.
>  Than we importing the web connector using the manifoldcf API.
>  The connector get's imported and can be used in a job.
>  When trying to view or edit the connector in the UI following error pops up.
> (connected to issue: 
> [CONNECTORS-1567)|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1567]
> *HTTP ERROR 500*
>  Problem accessing /mcf-crawler-ui/editconnection.jsp. Reason:
>      Server Error
> *Caused by:*
> {code:java}
> org.apache.jasper.JasperException: An exception occurred processing JSP page 
> /editconnection.jsp at line 564
> 561:
> 562: if (className.length() > 0)
> 563: {
> 564:   
> RepositoryConnectorFactory.outputConfigurationBody(threadContext,className,new
>  
> org.apache.manifoldcf.ui.jsp.JspWrapper(out,adminprofile),pageContext.getRequest().getLocale(),parameters,tabName);
> 565: }
> 566: %>
> 567:
> Stacktrace:
>     at 
> org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:521)
>     at 
> org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:430)
>     at 
> org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:313)
>     at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:260)
>     at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
>     at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:769)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>     at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1125)
>     at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
>     at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
>     at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1059)
>     at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
>     at 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)
>     at 
> org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
>     at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)
>     at org.eclipse.jetty.server.Server.handle(Server.java:497)
>     at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:311)
>     at 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:248)
>     at 
> org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:610)
>     at 
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:539)
>     at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>     at org.apache.manifoldcf.core.common.Base64.decodeString(Base64.java:164)
>     at 
> org.apache.manifoldcf.connectorcommon.keystore.KeystoreManager.(KeystoreManager.java:86)
>     at 
> org.apache.manifoldcf.connectorcommon.interfaces.KeystoreManagerFactory.make(KeystoreManagerFactory.java:47)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.fillInCertificatesTab(WebcrawlerConnector.java:1701)
>     at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.outputConfigurationBody(WebcrawlerConnector.java:1866)
>     at 
> org.apache.manifoldcf.core.interfaces.ConnectorFactory.outputThisConfigurationBody(ConnectorFactory.java:83)
>     at 
> org.apache.manifoldcf.crawler.interfaces.RepositoryConnectorFactor

[jira] [Comment Edited] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736881#comment-16736881
 ] 

Karl Wright edited comment on CONNECTORS-1562 at 1/8/19 11:39 AM:
--

Good to know that you got beyond the crawling issue.
If you run any MCF job to completion, all no-longer-present documents should be 
removed from the index.  That applies to web jobs too.  So I expect that to 
work as per design.

If you remove a document from the site map, and you want MCF to pick up that 
the document is now unreachable and should be removed, you can do this by 
setting a hopcount maximum that is large but also selecting "delete unreachable 
documents".  The only thing I'd caution you about if you use this approach is 
that links BETWEEN documents will also be traversed, so if you want the sitemap 
to be a whitelist then you want hopcount max = 2.



was (Author: kwri...@metacarta.com):
Good to know that you got beyond the crawling issue.
If you run any MCF job to completion, all no-longer-present documents should be 
removed from the index.  That applies to web jobs too.  So I expect that to 
work as per design.

If you remove a document from the site map, and you want MCF to pick up that 
the document is now unreachable and should be removed, you can do this by 
setting a hopcount maximum that is large but also selecting "delete unreachable 
documents".  The only thing I'd caution you about if you use this approach is 
that links BETWEEN documents will also be traversed, so if you want the sitemap 
to be a whitelist then you want hopcount max = 1.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1567) export of web connection bandwidth throttling

2019-01-08 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1567:
---

Assignee: Karl Wright

> export of web connection bandwidth throttling
> -
>
> Key: CONNECTORS-1567
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1567
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12
>Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Major
>
> When exporting the web connector using the API, it doesn't export the 
> bandwidth throttling.
>  Than when importing this connector to a clean manifoldcf it creates the 
> connector with basic bandwidth.
>  When using the connector in a job it works properly.
> The issue here is that the connector isn't created with correct bandwidth 
> throttling.
>  And the connector gives issues in the UI when trying to view or edit.
> (related to issue: 
> [CONNECTORS-1568|https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1568])
> e.g.:
> {code:java}
> {
>   "name": "test_web",
>   "configuration": null,
> "_PARAMETER_": [
>   {
> "_attribute_name": "Email address",
> "_value_": "tim.steenbeke@formica.digital"
>   },
>   {
> "_attribute_name": "Robots usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Meta robots tags usage",
> "_value_": "all"
>   },
>   {
> "_attribute_name": "Proxy host",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy port",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication domain",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication user name",
> "_value_": ""
>   },
>   {
> "_attribute_name": "Proxy authentication password",
> "_value_": ""
>   }
> ]
>   },
>   "description": "Website repository standard settup",
>   "throttle": null,
>   "max_connections": 10,
>   "class_name": 
> "org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector",
>   "acl_authority": null
> }{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736881#comment-16736881
 ] 

Karl Wright commented on CONNECTORS-1562:
-

Good to know that you got beyond the crawling issue.
If you run any MCF job to completion, all no-longer-present documents should be 
removed from the index.  That applies to web jobs too.  So I expect that to 
work as per design.



> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736881#comment-16736881
 ] 

Karl Wright edited comment on CONNECTORS-1562 at 1/8/19 8:38 AM:
-

Good to know that you got beyond the crawling issue.
If you run any MCF job to completion, all no-longer-present documents should be 
removed from the index.  That applies to web jobs too.  So I expect that to 
work as per design.

If you remove a document from the site map, and you want MCF to pick up that 
the document is now unreachable and should be removed, you can do this by 
setting a hopcount maximum that is large but also selecting "delete unreachable 
documents".  The only thing I'd caution you about if you use this approach is 
that links BETWEEN documents will also be traversed, so if you want the sitemap 
to be a whitelist then you want hopcount max = 1.



was (Author: kwri...@metacarta.com):
Good to know that you got beyond the crawling issue.
If you run any MCF job to completion, all no-longer-present documents should be 
removed from the index.  That applies to web jobs too.  So I expect that to 
work as per design.



> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1565) Upgrade commons-collections to 3.2.2 (CVE-2015-6420)

2019-01-08 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736878#comment-16736878
 ] 

Karl Wright commented on CONNECTORS-1565:
-

I'm concerned that we would break something because essentially it disables 
behavior (you need to turn on the behavior if you want it now, explicitly).  
Nevertheless, if all the integration tests we have pass, I'm OK with it.  The 
worst that can happen is that somebody will open a ticket against one of our 
connectors and we'll have to roll it back.


> Upgrade commons-collections to 3.2.2 (CVE-2015-6420)
> 
>
> Key: CONNECTORS-1565
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1565
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.12
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Critical
> Fix For: ManifoldCF next
>
>
> We should upgrade commons-collections to 3.2.2 due to a known security issue 
> with 3.2.1
> https://commons.apache.org/proper/commons-collections/security-reports.html
> Further reading:
> [http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-andyour-application-have-in-common-this-vulnerability/]
> [https://www.cvedetails.com/cve/CVE-2015-6420/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CONNECTORS-1566) Develop CSWS connector as a replacement for deprecated LiveLink LAPI connector

2019-01-07 Thread Karl Wright (JIRA)
Karl Wright created CONNECTORS-1566:
---

 Summary: Develop CSWS connector as a replacement for deprecated 
LiveLink LAPI connector
 Key: CONNECTORS-1566
 URL: https://issues.apache.org/jira/browse/CONNECTORS-1566
 Project: ManifoldCF
  Issue Type: Task
  Components: LiveLink connector
Affects Versions: ManifoldCF 2.12
Reporter: Karl Wright
Assignee: Karl Wright
 Fix For: ManifoldCF 2.13


LAPI is being deprecated.  We need to develop a replacement for it using the 
ContentServer Web Services API.





--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1565) Upgrade commons-collections to 3.2.2 (CVE-2015-6420)

2019-01-07 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736011#comment-16736011
 ] 

Karl Wright commented on CONNECTORS-1565:
-

This CVE applies only to deserialization of collections over the wire.  We 
don't do any of that.  It's possible that some connector's client library does 
this but if so the connector client library would need to be updated as well, 
so we'd have to wait for that to happen anyway.


> Upgrade commons-collections to 3.2.2 (CVE-2015-6420)
> 
>
> Key: CONNECTORS-1565
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1565
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.12
>Reporter: Markus Schuch
>Assignee: Markus Schuch
>Priority: Critical
> Fix For: ManifoldCF next
>
>
> We should upgrade commons-collections to 3.2.2 due to a known security issue 
> with 3.2.1
> https://commons.apache.org/proper/commons-collections/security-reports.html
> Further reading:
> [http://foxglovesecurity.com/2015/11/06/what-do-weblogic-websphere-jboss-jenkins-opennms-andyour-application-have-in-common-this-vulnerability/]
> [https://www.cvedetails.com/cve/CVE-2015-6420/]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-07 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16736008#comment-16736008
 ] 

Karl Wright commented on CONNECTORS-1562:
-

The API, either on export or reimport, did not write or read the SSL keystore 
properly.  That warrants a new ticket, as does the export of web connection 
bandwidth throttling, if it's indeed not there.  It would be helpful to include 
"steps to reproduce" so I can put together an appropriate unit test as well.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-07 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16735596#comment-16735596
 ] 

Karl Wright commented on CONNECTORS-1562:
-

Hmm, never seen that particular error before.  I don't get it here, obviously.  
It looks like there's a configuration parameter (specifically, the keystore 
binary object) that has been lost and that's upsetting the UI.  Did you edit 
the configuration using the API?



> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-04 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16734041#comment-16734041
 ] 

Karl Wright commented on CONNECTORS-1564:
-

Oleg says:

{quote}
Could you please ask the contributor if he has considered using
AuthCache to implement preemptive BASIC authentication as described
here?

http://hc.apache.org/httpcomponents-client-4.5.x/httpclient/examples/org/apache/http/examples/client/ClientPreemptiveBasicAuthentication.java
{quote}


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Pre-emptive authorization

2019-01-03 Thread Karl Wright
Well, I don't actually see anything wrong with the idea of sending the auth
header right up front and not requiring a whole extra back-and-forth to
authorize.  NTLM needs that but basic auth doesn't in theory.  What is
wrong with what they are doing?  Do you have a spec I can present to them?

Karl


On Thu, Jan 3, 2019 at 12:21 PM Michael Osipov  wrote:

> Am 2019-01-03 um 17:33 schrieb Karl Wright:
> > Hi Oleg et al,
> >
> > One ManifoldCF user has an unusual requirement for basic auth that
> requires
> > the auth header to be sent pre-emptively, not as a consequence of
> receiving
> > a 401 response.  He proposes the following patch for ManifoldCF, but I
> > wonder whether there's a better way to do this with existing
> > HttpComponents/HttpClient code.
> >
> > Here's the patch link:
> >
> >
> https://issues.apache.org/jira/secure/attachment/12953640/CONNECTORS-1564.patch
> >
> > Any thoughts?
>
> I consider this to be a solution to a symptom, not a problem.
>
> Michael
>


Pre-emptive authorization

2019-01-03 Thread Karl Wright
Hi Oleg et al,

One ManifoldCF user has an unusual requirement for basic auth that requires
the auth header to be sent pre-emptively, not as a consequence of receiving
a 401 response.  He proposes the following patch for ManifoldCF, but I
wonder whether there's a better way to do this with existing
HttpComponents/HttpClient code.

Here's the patch link:

https://issues.apache.org/jira/secure/attachment/12953640/CONNECTORS-1564.patch

Any thoughts?

Karl


[jira] [Commented] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-03 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733198#comment-16733198
 ] 

Karl Wright commented on CONNECTORS-1564:
-

I think we should open a conversation with HttpComponents/HttpClient about 
this.  I'll start an email thread.


> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CONNECTORS-1564) Support preemptive authentication to Solr connector

2019-01-03 Thread Karl Wright (JIRA)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1564:
---

Assignee: Karl Wright

> Support preemptive authentication to Solr connector
> ---
>
> Key: CONNECTORS-1564
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1564
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Reporter: Erlend Garåsen
>    Assignee: Karl Wright
>Priority: Major
> Attachments: CONNECTORS-1564.patch
>
>
> We should post preemptively in case the Solr server requires basic 
> authentication. This will make the communication between ManifoldCF and Solr 
> much more effective instead of the following:
>  * Send a HTTP POST request to Solr
>  * Solr sends a 401 response
>  * Send the same request, but with a "{{Authorization: Basic}}" header
> With preemptive authentication, we can send the header in the first request.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1557) HTML Tag extractor

2019-01-03 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16733194#comment-16733194
 ] 

Karl Wright commented on CONNECTORS-1557:
-

We really cannot support two slightly different HTML extractors, so I'm 
uncomfortable committing this as-is, unless it's structured as a 
backwards-compatible extension of the existing extractor.  Therefore, can you 
explain in detail what you did, and what specific functional changes you made? 
Thanks.


> HTML Tag extractor
> --
>
> Key: CONNECTORS-1557
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1557
> Project: ManifoldCF
>  Issue Type: New Feature
>Affects Versions: ManifoldCF 2.11
>Reporter: Donald Van den Driessche
>    Assignee: Karl Wright
>Priority: Major
> Attachments: html-tag-extraction-connector.zip
>
>
> I wrote a HTML Tag extractor, based on the HTML Extractor.
> I needed to extract specific HTML tags and transfer them to their own field 
> in my output repository.
> Input
>  * Englobing tag (CSS selector)
>  * Blacklist (CSS selector)
>  * Fieldmapping (CSS selector)
>  * Strip HTML
> Process
>  * Retrieve Englobing tag
>  * Remove blacklist
>  * Map selected CSS selectors in Fieldmapping (arrays if multiple finds) + 
> strip HTML (if requested)
>  * Englobing tag minus blacklist: strip HTML (if requested) and return as 
> output (content)
> How can I best deliver the source code?
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Unexpected job status encountered

2019-01-03 Thread Karl Wright
Please do create a ticket with a patch.  I'm extremely curious.

Depending on what you're proposing, I think a valid approach might need to
be to propose appropriate changes to the HttpComponents/HttpClient library.

Karl


On Thu, Jan 3, 2019 at 7:52 AM Erlend Garåsen 
wrote:

>
> It works now because I have implemented preemptive authentication. I'll
> create a ticket, because this is something I think we should support.
>
> I have analyzed the logs once again. MCF never tries to authenticate.
> Well, it tries, but it cannot repeat the request entity. That's why I
> mentioned that preemptive authentication could be a solution. Then we
> only need to post to Solr once, not doing the unnecessary two-step
> authentication process by:
> 1. Try to post
> 2. Solr server sends a 401 response
> 3. Try to post once again using the header: "Authorization: Basic **"
>
> It's not very effective if you have to post, say, 100,000 documents.
>
> This is actually what happens:
> 1. http-outgoing-200 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
> 2. http-outgoing-200 << "HTTP/1.1 401 Unauthorized[\r][\n]"
> 3. IO exception during indexing
> https://www.journals.uio.no/index.php/bioimpedance/article/view/3350: null
> org.apache.http.client.ClientProtocolException
> (Caused by: org.apache.http.client.NonRepeatableRequestException: Cannot
> retry request with a non-repeatable request entity.)
>
> By using preemptive authentication, the following is now being sent to
> Solr in the first request:
> http-outgoing-30 >> "POST /solr/uio/update/extract HTTP/1.1[\r][\n]"
> http-outgoing-30 >> "Authorization: Basic **[\r][\n]"
>
> Preemptive authentication is also suggested as a solution to other
> developers facing the same problem:
>
> https://developer.ibm.com/answers/questions/266117/im-getting-this-exception-trying-to-add-doc-to-wat/
>
> I can create a patch or PR. It's very easy to implement, and we have
> done it for all the other Solr connectors we have developed.
>
> Erlend
>


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2019-01-02 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16732089#comment-16732089
 ] 

Karl Wright commented on CONNECTORS-1562:
-

What might be happening is that the fetch is throttled in bandwidth, and that 
means that it is taking a very long time and the server is giving up and 
closing the connection because it takes too long.

Since you're only crawling your own site, you might want to disable bandwidth 
throttling entirely in your web connection configuration, and try again.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Congratulations to the new Lucene/Solr PMC chair, Cassandra Targett

2018-12-31 Thread Karl Wright
Congratulations!


On Mon, Dec 31, 2018 at 5:20 PM Michael Sokolov  wrote:

> Heavy is the head that wears the crown - congrats and thank you! And
> here's to a peaceful transition of power in the new year :)
>
> On Mon, Dec 31, 2018 at 1:39 PM Dawid Weiss  wrote:
> >
> > Congratulations, Cassandra!
> >
> > On Mon, Dec 31, 2018 at 7:04 PM Gus Heck  wrote:
> > >
> > > Congratulations :)
> > >
> > > On Mon, Dec 31, 2018, 12:48 PM Alexandre Rafalovitch <
> arafa...@gmail.com wrote:
> > >>
> > >> Congratulations.
> > >>
> > >> Regards,
> > >>Alex
> > >>
> > >> On Mon, 31 Dec 2018 at 11:31, David Smiley 
> wrote:
> > >> >
> > >> > Congrats Cassandra!
> > >> >
> > >> > On Mon, Dec 31, 2018 at 11:28 AM Erick Erickson <
> erickerick...@gmail.com> wrote:
> > >> >>
> > >> >> Congrats Cassandra1
> > >> >>
> > >> >> On Sun, Dec 30, 2018 at 11:38 PM Adrien Grand 
> wrote:
> > >> >> >
> > >> >> > Every year, the Lucene PMC rotates the Lucene PMC chair and
> Apache
> > >> >> > Vice President position.
> > >> >> >
> > >> >> > This year we have nominated and elected Cassandra Targett as the
> > >> >> > chair, a decision that the board approved in its December 2018
> > >> >> > meeting.
> > >> >> >
> > >> >> > Congratulations, Cassandra!
> > >> >> >
> > >> >> > --
> > >> >> > Adrien
> > >> >> >
> > >> >> >
> -
> > >> >> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> >> > For additional commands, e-mail: dev-h...@lucene.apache.org
> > >> >> >
> > >> >>
> > >> >>
> -
> > >> >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >> >>
> > >> > --
> > >> > Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> > >> > LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
> http://www.solrenterprisesearchserver.com
> > >>
> > >> -
> > >> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > >> For additional commands, e-mail: dev-h...@lucene.apache.org
> > >>
> >
> > -
> > To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> > For additional commands, e-mail: dev-h...@lucene.apache.org
> >
>
> -
> To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
> For additional commands, e-mail: dev-h...@lucene.apache.org
>
>


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2018-12-31 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731338#comment-16731338
 ] 

Karl Wright commented on CONNECTORS-1562:
-

Yes, that's the error.  Specifically:

{code}
Caused by: java.io.IOException: Stream Closed
at java.io.FileInputStream.readBytes(Native Method) ~[?:1.8.0_191]
at java.io.FileInputStream.read(FileInputStream.java:255) ~[?:1.8.0_191]
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) 
~[?:1.8.0_191]
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) 
~[?:1.8.0_191]
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) ~[?:1.8.0_191]
at java.io.InputStreamReader.read(InputStreamReader.java:184) 
~[?:1.8.0_191]
at 
org.apache.manifoldcf.agents.output.elasticsearch.ElasticSearchIndex$IndexRequestEntity.writeTo(ElasticSearchIndex.java:221)
 ~[?:?]
{code}

What's happening is that a document is being streamed to ElasticSearch.  The 
input stream for the document is being read to do that.  But the stream is 
being closed early by the web connector for some reason before it's entirely 
read.  It's not clear why; it could be a difference between the size reported 
by the content type and the actual number of bytes being read, or it could be 
the actual web service closing the stream early at some point.

At any rate, it is *one* specific document doing this.  If you can figure out 
which document it is, I may be able to come up with a solution.  Is it a very 
large document?  When you try to fetch the document using (say) curl, does it 
completely fetch?  etc.


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2018-12-31 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731290#comment-16731290
 ] 

Karl Wright commented on CONNECTORS-1562:
-

{code}
Error: Repeated service interruptions - failure processing document: Stream 
Closed
{code}

This is not a crash; this just means that the job aborts.  It also comes with 
numerous stack traces, one for each time the document retries.  That stack 
trace would be very helpful to have.  Thanks!


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (CONNECTORS-1562) Documents unreachable due to hopcount are not considered unreachable on cleanup pass

2018-12-31 Thread Karl Wright (JIRA)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16731290#comment-16731290
 ] 

Karl Wright edited comment on CONNECTORS-1562 at 12/31/18 12:06 PM:


{code}
Error: Repeated service interruptions - failure processing document: Stream 
Closed
{code}

This is not a crash; this just means that the job aborts.  It also comes with 
numerous stack traces, in the manifoldcf log, one for each time the document 
retries.  That stack trace would be very helpful to have.  Thanks!



was (Author: kwri...@metacarta.com):
{code}
Error: Repeated service interruptions - failure processing document: Stream 
Closed
{code}

This is not a crash; this just means that the job aborts.  It also comes with 
numerous stack traces, one for each time the document retries.  That stack 
trace would be very helpful to have.  Thanks!


> Documents unreachable due to hopcount are not considered unreachable on 
> cleanup pass
> 
>
> Key: CONNECTORS-1562
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1562
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector, Web connector
>Affects Versions: ManifoldCF 2.11
> Environment: Manifoldcf 2.11
> Elasticsearch 6.3.2
> Web inputconnector
> elastic outputconnecotr
> Job crawls website input and outputs content to elastic
>    Reporter: Tim Steenbeke
>Assignee: Karl Wright
>Priority: Critical
>  Labels: starter
> Fix For: ManifoldCF 2.12
>
> Attachments: Screenshot from 2018-12-31 11-17-29.png, 
> manifoldcf.log.cleanup, manifoldcf.log.init, manifoldcf.log.reduced
>
>   Original Estimate: 4h
>  Remaining Estimate: 4h
>
> My documents aren't removed from ElasticSearch index after rerunning the 
> changed seeds
> I update my job to change the seedmap and rerun it or use the schedualer to 
> keep it runneng even after updating it.
> After the rerun the unreachable documents don't get deleted.
> It only adds doucments when they can be reached.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Unexpected job status encountered

2018-12-27 Thread Karl Wright
Thanks for looking harder into this!

The credential encoding in httpcomponents/httpclient has been problem free
as far as I have seen, so if you determine that that's the issue I am sure
it will be news to a lot of people.  But by using the wire logging you
should be able to see the headers, including the encoded credentials, and
compare/contrast what's working and what's not pretty easily.

Karl


On Thu, Dec 27, 2018 at 5:42 AM Erlend Garåsen 
wrote:

>
> It wasn't necessary to deal with tools like tcpdump etc. Adding the
> following to the logging.xml did the trick:
> 
>   
> 
>
> So now I know what's going on. Bad credentials:
>
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "HTTP/1.1 401 Unauthorized[\r][\n]"
>
> Strange, because connection is working according to the Solr Output
> Connector. I'll double-check whether the Solr server has another
> password for index writing (path "/solr/uio/update/extract"). Or maybe
> we have an encoding issue with the password since it's long and contains
> special characters.
>
> --8<--
>
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> " [\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> " [\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >>
> "[\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\n]"
> DEBUG 2018-12-27T11:18:41,591 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "2f[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "**[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >>
> "0[\r][\n]"
> DEBUG 2018-12-27T11:18:41,592 (Thread-1508) - http-outgoing-2 >> "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "HTTP/1.1 401 Unauthorized[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "Date:
> Thu, 27 Dec 2018 10:18:41 GMT[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Server: Apache/2.4.6 (Red Hat Enterprise Linux)
> OpenSSL/1.0.2k-fips[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "WWW-Authenticate: Basic realm="Solr"[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Content-Length: 381[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Keep-Alive: timeout=10, max=100[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Connection: Keep-Alive[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 <<
> "Content-Type: text/html; charset=iso-8859-1[\r][\n]"
> DEBUG 2018-12-27T11:18:41,593 (Thread-1508) - http-outgoing-2 << "[\r][\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "401 Unauthorized[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "Unauthorized[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "This server could not verify that you[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "are
> authorized to access the document[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "requested.  Either you supplied the wrong[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "credentials (e.g., bad password), or your[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "browser doesn't understand how to supply[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 << "the
> credentials required.[\n]"
> DEBUG 2018-12-27T11:18:41,594 (Thread-1508) - http-outgoing-2 <<
> "[\n]"
>  WARN 2018-12-27T11:18:41,599 (Worker thread '48') - IO exception during
> indexing https://www

[ANNOUNCE] Apache ManifoldCF 2.12 has been released

2018-12-23 Thread Karl Wright
On December 20th, we released Apache ManifoldCF 2.12.  It is available for
download from the Apache ManifoldCF 2.12 site here:
http://manifoldcf.apache.org .  Enjoy!!

Karl


[ANNOUNCE] Apache ManifoldCF 2.12 has been released

2018-12-23 Thread Karl Wright
On December 20th, we released Apache ManifoldCF 2.12.  It is available for
download from the Apache ManifoldCF 2.12 site here:
http://manifoldcf.apache.org .  Enjoy!!

Karl


<    7   8   9   10   11   12   13   14   15   16   >