[jira] [Commented] (SOLR-8636) Incorrect distance returned for indexed polygon shapes

2021-01-20 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268965#comment-17268965
 ] 

Karl Wright commented on SOLR-8636:
---

Sorry, see GeoBaseMembershipShape.outsideDistance().


> Incorrect distance returned for indexed polygon shapes
> --
>
> Key: SOLR-8636
> URL: https://issues.apache.org/jira/browse/SOLR-8636
> Project: Solr
>  Issue Type: Bug
>  Components: spatial
>Affects Versions: 5.2.1
>Reporter: Rahul Jain
>Assignee: David Smiley
>Priority: Major
>
> We have a location_rpt field with multivalued=true and we are indexing 
> multiple shapes of type LINESTRING() in a single spatial field per document. 
> We are using JTS for spatial and polygon indexing and filtering.
> Solr query:
> q={!geofilt sfield=geo pt=-27,153 score=distance d=50}=*,score
> For above query, we get the results perfectly fine (i.e. documents with at 
> least one intersecting shape is returned) but the returned distance has 
> following behavior:
> 1. When only shapes (LINESTRING(), LINESTRING()) are indexed then the 
> distance returned is 180 degrees or 20015.115 kms.
> 2. When only points are indexed then the distance to nearest point is 
> returned.
> 3. When both points and shapes are indexed, distance to nearest point is 
> returned.
> Using above distance in sorting causes sorting to go haywire.
> Does Solr not return the distance it used during document filtering? Is there 
> a workaround or am I doing something wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-8636) Incorrect distance returned for indexed polygon shapes

2021-01-20 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-8636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17268632#comment-17268632
 ] 

Karl Wright commented on SOLR-8636:
---

[~dsmiley], there is an "outside distance", which is the distance from a point 
to a shape.  There is no optimal way of computing distances from shapes to each 
other, however.  For polygons, especially large polygons, it would be 
computationally prohibitive to do it the naive way.

Not all shapes support outside distance, by the way.  Only a subset do.  This 
functionality is described (as usual) in Geo3D by an independent interface - 
see GeoDistanceShape to see how it is derived.

If you want full generality, therefore, you would need general solutions even 
for some of the weirder shapes for which this is not currently implemented.  I 
would therefore ask what the use case is for such a feature before going down 
this route?


> Incorrect distance returned for indexed polygon shapes
> --
>
> Key: SOLR-8636
> URL: https://issues.apache.org/jira/browse/SOLR-8636
> Project: Solr
>  Issue Type: Bug
>  Components: spatial
>Affects Versions: 5.2.1
>Reporter: Rahul Jain
>Assignee: David Smiley
>Priority: Major
>
> We have a location_rpt field with multivalued=true and we are indexing 
> multiple shapes of type LINESTRING() in a single spatial field per document. 
> We are using JTS for spatial and polygon indexing and filtering.
> Solr query:
> q={!geofilt sfield=geo pt=-27,153 score=distance d=50}=*,score
> For above query, we get the results perfectly fine (i.e. documents with at 
> least one intersecting shape is returned) but the returned distance has 
> following behavior:
> 1. When only shapes (LINESTRING(), LINESTRING()) are indexed then the 
> distance returned is 180 degrees or 20015.115 kms.
> 2. When only points are indexed then the distance to nearest point is 
> returned.
> 3. When both points and shapes are indexed, distance to nearest point is 
> returned.
> Using above distance in sorting causes sorting to go haywire.
> Does Solr not return the distance it used during document filtering? Is there 
> a workaround or am I doing something wrong?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: Retry mechanism authority connectors

2021-01-11 Thread Karl Wright
When I was designing this, I thought long and hard about this question.
Since it is security related, in the end I decided to let the application
who called the authority service have whatever retry logic made sense.

I don't see any objection to retrying certain errors at the connector
level, but for framework handling, remember that ServiceInterruption on the
crawler side is hooked into the framework in a way that allows queue-based
retries over a long period of time.  ServiceInterruption errors thrown can
specify a specific retry schedule which the framework performs.  That works
only because, if a single document fails, there are plenty of other
documents to process.  For the authority service, the back-off would be
just dead wait time, and there's no integration with anything else in the
framework at all.  So it really doesn't matter where retries happen.

Karl


On Mon, Jan 11, 2021 at 11:10 AM  wrote:

> Hi,
>
> I noticed that with authority connectors that perform API calls, seldom
> issues like network failures lead to DEAD_AUTHORITY. Assuming one wants to
> have a retry/fail mechanism, would you suggest that each connector should
> implement its own, or would you rather suggest - as it is the case with
> other connectors - to throw a specific exception (like the
> ServiceInterruption) that triggers an already implemented retry/fail
> mechanism ?
>
> Regards,
> Julien Massiera
>
>
>
>
>
>


Re: Job status stuck in terminating

2021-01-07 Thread Karl Wright
Hi,

Usually the reason a job doesn't complete is because a document is retrying
indefinitely.  You can see what's going on by looking at the Simple History
job report, or, if you prefer, tailing the manifoldcf log.

Other times a job won't complete because somebody shut down the agents
process.  But that is not the answer for the simple example single-process
deployment.

Karl


On Thu, Jan 7, 2021 at 5:52 PM Isaac Kunz  wrote:

> I have a job that is stuck in terminating for 12 hrs. it is a small test
> job and I am wondering if there is a way to fix this? The job ran once and
> completed 175k documents. I modified the query to the job and reseeded. The
> job was modified to process a smaller document set. I assume reseeding will
> allow the same documents to be indexed. I do not need the metadata for this
> job so if needed I could clear it via db if I knew how. I am a new user.
>
> Thanks,
>
> Isaac
> --
> 
> -SECURITY/CONFIDENTIALITY WARNING-
>
> This message and any attachments are intended solely for the individual or
> entity to which they are addressed. This communication may contain
> information that is privileged, confidential, or exempt from disclosure
> under applicable law (e.g., personal health information, research data,
> financial information). Because this e-mail has been sent without
> encryption, individuals other than the intended recipient may be able to
> view the information, forward it to others or tamper with the information
> without the knowledge or consent of the sender. If you are not the intended
> recipient, or the employee or person responsible for delivering the message
> to the intended recipient, any dissemination, distribution or copying of
> the communication is strictly prohibited. If you received the communication
> in error, please notify the sender immediately by replying to this message
> and deleting the message and any accompanying files from your system. If,
> due to the security risks, you do not wish to receive further
> communications via e-mail, please reply to this message and inform the
> sender that you do not wish to receive further e-mail from the sender.
> (LCP301)
> 
>


[jira] [Comment Edited] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-05 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259310#comment-17259310
 ] 

Karl Wright edited comment on CONNECTORS-1661 at 1/6/21, 12:30 AM:
---

Hi [~kishorekumar], using the obfuscation utility in a Linux shell requires 
proper quotes and/or escaping.  The & is being picked up as a process separator 
otherwise.  Examples:

{code}
kawright@1USDKAWRIGHT:/mnt/c/wip/mcf/trunk/dist/obfuscation-utility$ 
./obfuscate.sh "#Sample%"
oxYHd75QgtNbohFHQZz/RY9ZwvkAWOSiOhFj7hby0Eg=
kawright@1USDKAWRIGHT:/mnt/c/wip/mcf/trunk/dist/obfuscation-utility$ 
./obfuscate.sh "#Sample%&="
JHaGsB88QKSsF31Zea/8366cKmXUzENU2FDmBgxQjmM=
kawright@1USDKAWRIGHT:/mnt/c/wip/mcf/trunk/dist/obfuscation-utility$
{code}




was (Author: kwri...@metacarta.com):
Hi [~kishorekumar], using the obfuscation utility in a Linux shell requires 
proper quotes and/or escaping.  The & is being picked up as a process separator 
otherwise.


> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-05 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259310#comment-17259310
 ] 

Karl Wright commented on CONNECTORS-1661:
-

Hi [~kishorekumar], using the obfuscation utility in a Linux shell requires 
proper quotes and/or escaping.  The & is being picked up as a process separator 
otherwise.


> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-04 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17258407#comment-17258407
 ] 

Karl Wright commented on CONNECTORS-1661:
-

This sounds like a UI issue to me.  Assigning to Kishore to see if he will look 
at it.


> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1661) Admin UI does not handle UTF8 passwords

2021-01-04 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1661:
---

Assignee: Kishore Kumar

> Admin UI does not handle UTF8 passwords
> ---
>
> Key: CONNECTORS-1661
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1661
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: API
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Kishore Kumar
>Priority: Critical
>
> Setting UTF-8 non alphanumeric characters in the password for the admin user 
> does not work when obfuscating the password and setting it through the 
> org.apache.manifoldcf.login.password.obfuscated parameter of the 
> properties.xml file.
> Alphanumeric characters work well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Indexation Not OK

2021-01-01 Thread Karl Wright
Hi,
I don't have the ability to delete mail from mailing lists.  You have to
request Apache Infra do that.

Karl


On Thu, Dec 31, 2020 at 11:38 AM Michael Cizmar 
wrote:

> Ritika – We have had some discussions regarding docker and etc.  The
> public one that is out there builds a single node and does not use an
> RDBM.  I would not recommend using that to index billions of documents.
> You can turn on debugging in the connector and look at the logs to see if
> that traffic is actually going to Elastic search.
>
>
>
> Karl – I believe Ritika said Elastic.
>
>
>
>
>
> --
>
> Michael Cizmar
>
>
>
> *From:* ritika jain 
> *Sent:* Thursday, December 31, 2020 7:33 AM
> *To:* user@manifoldcf.apache.org
> *Subject:* Re: Indexation Not OK
>
>
>
> Elastic search output connector with some custom changes for some fields
>
> On Thursday, December 31, 2020, Karl Wright  wrote:
>
> Hi,
> Can you let us know what you are using for the output connector?
>
> Thanks,
>
> Karl
>
>
>
>
>
> On Thu, Dec 31, 2020 at 8:24 AM ritika jain 
> wrote:
>
> Hi,
>
>
>
> I am using Manifoldcf 2.14 and JCIFS connector, to ingest some billions of
> records into elastic search
>
> I am facing an issue in which when Job is run some time, successful
> indexation happens but after sometime , manifoldcf loops the records and
> Indexation is not getting OK.
>
>
>
>
>
> and it keeps on retrying for those specific records, then to again start
> up, I need to restart the docker container everytime and after restart
> Indexation works fine for those records too.
>
> And also checked JSON formation of elastic search connector is fine, which
> sures that the files are not having any problem.
>
> Can anybody please guide me the reason for this
>
>
>
> Thanks
>
> Ritika
>
>
>
>
>
>


Re: Indexation Not OK

2020-12-31 Thread Karl Wright
Sorry, I couldn't quite understand everything in your email, but it sounds
like the problem is in the ES connection.  It is possible that ES expires
your connection and the indexing fails after that happens.  If that is
happening, however, I would expect to see a much more detailed error
message in both the logs and in the simple history.  Can you provide any
error messages from the log that seem to be coming from the output
connection?

Thanks,
Karl


On Thu, Dec 31, 2020 at 8:30 AM Karl Wright  wrote:

> Hi,
> Can you let us know what you are using for the output connector?
> Thanks,
> Karl
>
>
> On Thu, Dec 31, 2020 at 8:24 AM ritika jain 
> wrote:
>
>> Hi,
>>
>> I am using Manifoldcf 2.14 and JCIFS connector, to ingest some billions
>> of records into elastic search
>> I am facing an issue in which when Job is run some time, successful
>> indexation happens but after sometime , manifoldcf loops the records and
>> Indexation is not getting OK.
>>
>> [image: image.png]
>>
>> and it keeps on retrying for those specific records, then to again start
>> up, I need to restart the docker container everytime and after restart
>> Indexation works fine for those records too.
>> And also checked JSON formation of elastic search connector is fine,
>> which sures that the files are not having any problem.
>> Can anybody please guide me the reason for this
>>
>> Thanks
>> Ritika
>>
>>
>>


Re: Indexation Not OK

2020-12-31 Thread Karl Wright
Hi,
Can you let us know what you are using for the output connector?
Thanks,
Karl


On Thu, Dec 31, 2020 at 8:24 AM ritika jain 
wrote:

> Hi,
>
> I am using Manifoldcf 2.14 and JCIFS connector, to ingest some billions of
> records into elastic search
> I am facing an issue in which when Job is run some time, successful
> indexation happens but after sometime , manifoldcf loops the records and
> Indexation is not getting OK.
>
> [image: image.png]
>
> and it keeps on retrying for those specific records, then to again start
> up, I need to restart the docker container everytime and after restart
> Indexation works fine for those records too.
> And also checked JSON formation of elastic search connector is fine, which
> sures that the files are not having any problem.
> Can anybody please guide me the reason for this
>
> Thanks
> Ritika
>
>
>


[RESULT] [VOTE] Release Apache ManifoldCF 2.18, RC0

2020-12-28 Thread Karl Wright
Three +1's, >72 hrs.  Vote passes!

Karl

On Mon, Dec 28, 2020 at 5:49 PM Furkan KAMACI 
wrote:

> Hi,
>
> + 1 from me (binding)!
>
> Kind Regards,
> Furkan KAMACI
>
> On 29 Dec 2020 Tue at 01:44 Karl Wright  wrote:
>
> > Still need another vote.  Anyone?
> >
> > Karl
> >
> > On Wed, Dec 23, 2020 at 8:06 PM Karl Wright  wrote:
> >
> > > Thanks!
> > >
> > > We just need one more vote.
> > > Karl
> > >
> > > On Wed, Dec 23, 2020 at 2:42 PM Markus Schuch 
> > > wrote:
> > >
> > >> Hi,
> > >>
> > >> tested the single process example from the 2.18-RC0 distribution.
> > >>
> > >> Successfully used the crawler-ui and ingested some docs to Solr 8.7.0.
> > >>
> > >> Also used Solr 8.7.0 as repository with the new solr ingester
> connector
> > >> and successfully ingested some solr docs to a null output connection.
> > >>
> > >> +1 from me
> > >>
> > >> Cheers,
> > >> Markus
> > >>
> > >> Am 16.12.2020 um 19:44 schrieb Karl Wright:
> > >> > Hi,
> > >> >
> > >> > Please vote on whether to release Apache ManifoldCF 2.18, RC0.  The
> > >> release
> > >> > artifact can be found here:
> > >> >
> > >>
> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.18
> > >> .
> > >> > There is also a release tag at
> > >> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.18-RC0 .
> > >> >
> > >> > This release has a brand new connector (the solr ingester
> connector),
> > as
> > >> > well as build support for Java 11.  We still do not guarantee that
> all
> > >> > connectors operate properly under JDK 11, but hopefully we can get
> > >> feedback
> > >> > from this release sufficient to close that hole.
> > >> >
> > >> > Thanks,
> > >> > Karl
> > >> >
> > >>
> > >
> >
>


Re: [VOTE] Release Apache ManifoldCF 2.18, RC0

2020-12-23 Thread Karl Wright
Thanks!

We just need one more vote.
Karl

On Wed, Dec 23, 2020 at 2:42 PM Markus Schuch  wrote:

> Hi,
>
> tested the single process example from the 2.18-RC0 distribution.
>
> Successfully used the crawler-ui and ingested some docs to Solr 8.7.0.
>
> Also used Solr 8.7.0 as repository with the new solr ingester connector
> and successfully ingested some solr docs to a null output connection.
>
> +1 from me
>
> Cheers,
> Markus
>
> Am 16.12.2020 um 19:44 schrieb Karl Wright:
> > Hi,
> >
> > Please vote on whether to release Apache ManifoldCF 2.18, RC0.  The
> release
> > artifact can be found here:
> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.18
> .
> > There is also a release tag at
> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.18-RC0 .
> >
> > This release has a brand new connector (the solr ingester connector), as
> > well as build support for Java 11.  We still do not guarantee that all
> > connectors operate properly under JDK 11, but hopefully we can get
> feedback
> > from this release sufficient to close that hole.
> >
> > Thanks,
> > Karl
> >
>


Re: [VOTE] Release Apache ManifoldCF 2.18, RC0

2020-12-23 Thread Karl Wright
Thank you for the verification!
Karl

On Wed, Dec 23, 2020 at 4:53 AM Cedric Ulmer 
wrote:

> Hi Karl,
>
> We are not committers, but for info we have updated MCF to 2.18RC0 in our
> Datafari 5.0Beta to see what happens; we have tested the following and it
> runs fine:
> 1.  Indexing with the web connector
> 2.  Indexing with the file connector (without security)
> 3.  Using a transformation connector for the file connector job
> 4.  Using the API to backup and restore jobs
> 5.  Using the API to restart MCF
> 6.  Updating password (without accents since it still does not work)
> 7.  Running with Java 11 openjdk
> 8.  Output connector is Solr 8.5.2
>
> Regards,
>
> Cedric and Olivier
>
> France Labs – Makers of Datafari Enterprise Search
> www.datafari.com
>
> -Message d'origine-
> De : Karl Wright 
> Envoyé : dimanche 20 décembre 2020 21:20
> À : dev 
> Objet : Re: [VOTE] Release Apache ManifoldCF 2.18, RC0
>
> Tests passed, and I was able to use the UI too.  +1 from me.
>
> Karl
>
>
> On Wed, Dec 16, 2020 at 1:44 PM Karl Wright  wrote:
>
> > Hi,
> >
> > Please vote on whether to release Apache ManifoldCF 2.18, RC0.  The
> > release artifact can be found here:
> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.
> > 18
> > .  There is also a release tag at
> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.18-RC0 .
> >
> > This release has a brand new connector (the solr ingester connector),
> > as well as build support for Java 11.  We still do not guarantee that
> > all connectors operate properly under JDK 11, but hopefully we can get
> > feedback from this release sufficient to close that hole.
> >
> > Thanks,
> > Karl
> >
> >
>


Re: [VOTE] Release Apache ManifoldCF 2.18, RC0

2020-12-20 Thread Karl Wright
Tests passed, and I was able to use the UI too.  +1 from me.

Karl


On Wed, Dec 16, 2020 at 1:44 PM Karl Wright  wrote:

> Hi,
>
> Please vote on whether to release Apache ManifoldCF 2.18, RC0.  The
> release artifact can be found here:
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.18
> .  There is also a release tag at
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.18-RC0 .
>
> This release has a brand new connector (the solr ingester connector), as
> well as build support for Java 11.  We still do not guarantee that all
> connectors operate properly under JDK 11, but hopefully we can get feedback
> from this release sufficient to close that hole.
>
> Thanks,
> Karl
>
>


Re: Password admin UI

2020-12-17 Thread Karl Wright
There's no issue I know of, provided that the API rework a couple of years
ago didn't break something.

So I would create a ticket and we'll see if someone can look at it.

Karl


On Thu, Dec 17, 2020 at 8:47 AM  wrote:

> I should mention that I used the obfuscation method provided by
> org.apache.manifoldcf.core.system.ManifoldCF.obfuscate(String) and set the
> obfuscated password in the org.apache.manifoldcf.login.password.obfuscated
> and org.apache.manifoldcf.apilogin.password.obfuscated properties of the
> properties.xml file
>
>
>
> I can also guarantee you that I used UTF-8 encoding to provide the
> password to the obfuscate method and that testing the deobfuscate method
> provides the right password with UTF-8 chars
>
>
>
> Julien
>
>
>
> *De :* Karl Wright 
> *Envoyé :* mercredi 16 décembre 2020 19:40
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Password admin UI
>
>
>
> Hi Julien,
> The properties file is read as utf-8, so as long as you make sure that the
> encoding in your editor is utf-8, it should work.
>
> Many editors default to the Microsoft code page so use something like
> scite or emacs.
>
>
> Karl
>
>
>
> On Wed, Dec 16, 2020 at 12:31 PM  wrote:
>
> Hi,
>
>
>
> I tried different type of password for the admin UI and it appears that
> passwords containing accentuated characters or special characters do not
> work. Is it “normal” or not ?
>
>
>
> Regards,
>
> Julien
>
>
>
>


[VOTE] Release Apache ManifoldCF 2.18, RC0

2020-12-16 Thread Karl Wright
Hi,

Please vote on whether to release Apache ManifoldCF 2.18, RC0.  The release
artifact can be found here:
https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.18 .
There is also a release tag at
https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.18-RC0 .

This release has a brand new connector (the solr ingester connector), as
well as build support for Java 11.  We still do not guarantee that all
connectors operate properly under JDK 11, but hopefully we can get feedback
from this release sufficient to close that hole.

Thanks,
Karl


Re: Password admin UI

2020-12-16 Thread Karl Wright
Hi Julien,
The properties file is read as utf-8, so as long as you make sure that the
encoding in your editor is utf-8, it should work.

Many editors default to the Microsoft code page so use something like scite
or emacs.

Karl

On Wed, Dec 16, 2020 at 12:31 PM  wrote:

> Hi,
>
>
>
> I tried different type of password for the admin UI and it appears that
> passwords containing accentuated characters or special characters do not
> work. Is it “normal” or not ?
>
>
>
> Regards,
>
> Julien
>
>
>


Re: JDK 11

2020-12-12 Thread Karl Wright
Thanks for the comment.

OpenJDK, though, has also changed.  It always has been released from a fork
of the same java that goes into the Oracle releases.  While it might be
maintained by a different bunch of people, strategic decisions made by
Oracle also impact OpenJDK and, under the license terms, I cannot imagine
the situation changing.

Karl


On Sat, Dec 12, 2020 at 7:20 AM Piergiorgio Lucidi 
wrote:

> Hi Karl,
>
> I think that we should consider only OpenJDK edition of JDK 11 without
> considering the Oracle version.
>
> Probably we could be lucky with OpenJDK, anyway we have to go in deep also
> with OpenJDK investing more time on this.
>
> Hope to find some time soon to understand more details for this porting
> task to do :-P
>
> PJ
>
> Il Sab 12 Dic 2020, 13:01 Karl Wright  ha scritto:
>
> > Hi,
> >
> > Whereas I was under the impression that one of our committers had
> addressed
> > the JDK 11 problem in ManifoldCF, upon detailed inspection and trial I
> have
> > determined that this is in fact quite false.  The reason is that the
> > relatively new csws connector used a ton of J2EE classes that are no
> longer
> > bundled with JDK 11.
> >
> > I've managed to get this to build now, while adding a number of these
> > classes to the appropriate classpaths in connector-build.xml, but here's
> > the problem: unless somebody actually tries this connector, and uses it
> > under JDK 11 in a Livelink environment, I have no idea what classes may
> be
> > missing at runtime.  So I am hoping we have a committer somewhere who
> might
> > be able to work with me on this experiment.
> >
> > There's more news, and it isn't good: Oracle has deprecated some
> ubiquitous
> > methods and some techniques that many open-source projects we depend on
> > use.  For example, new Long(long) is going away.  I have no idea why they
> > did this but they must know it will effectively deprecate 100% of the
> Java
> > codebase that is lightly maintained, and force massive rework of pretty
> > near all opensource code.  That MUST be intentional on Oracle's part, and
> > to me it represents the writing on the wall concerning the lifetime of
> > legacy ManifoldCF connectors.  It will likely not be possible to support
> > our existing connector family when this happens.
> >
> > Meanwhile, we still need to go through the JDK 11 deprecations in our own
> > codebase and fix those, as well as verify proper function of all
> connectors
> > on JDK 11.  I'll be tackling that project myself in my voluminous spare
> > time.
> >
> > Karl
> >
>


JDK 11

2020-12-12 Thread Karl Wright
Hi,

Whereas I was under the impression that one of our committers had addressed
the JDK 11 problem in ManifoldCF, upon detailed inspection and trial I have
determined that this is in fact quite false.  The reason is that the
relatively new csws connector used a ton of J2EE classes that are no longer
bundled with JDK 11.

I've managed to get this to build now, while adding a number of these
classes to the appropriate classpaths in connector-build.xml, but here's
the problem: unless somebody actually tries this connector, and uses it
under JDK 11 in a Livelink environment, I have no idea what classes may be
missing at runtime.  So I am hoping we have a committer somewhere who might
be able to work with me on this experiment.

There's more news, and it isn't good: Oracle has deprecated some ubiquitous
methods and some techniques that many open-source projects we depend on
use.  For example, new Long(long) is going away.  I have no idea why they
did this but they must know it will effectively deprecate 100% of the Java
codebase that is lightly maintained, and force massive rework of pretty
near all opensource code.  That MUST be intentional on Oracle's part, and
to me it represents the writing on the wall concerning the lifetime of
legacy ManifoldCF connectors.  It will likely not be possible to support
our existing connector family when this happens.

Meanwhile, we still need to go through the JDK 11 deprecations in our own
codebase and fix those, as well as verify proper function of all connectors
on JDK 11.  I'll be tackling that project myself in my voluminous spare
time.

Karl


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248078#comment-17248078
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Patches applied, thanks.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> patch_solr_ingester_connector_11_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: It's release time again

2020-12-11 Thread Karl Wright
Hi Olivier,

The reason these patches were not submitted right away was because there
are problems with both of them, both minor, but something I did not have
time to address myself of late.  If you could upgrade them accordingly I
would appreciate it.

Thanks,
Karl


On Fri, Dec 11, 2020 at 10:19 AM Olivier Tavard <
olivier.tav...@francelabs.com> wrote:

> Hi Karl,
>
> Based on your suggestion to remind you of some actions that remain, could
> you take a look at the 2 patches I sent please :
> - one patch for Solr ingester connector :
>
> https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17242801=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17242801
>
> About this patch, I also asked you if I could propose a patch in order to
> integrate the documentation relative to this connector :
> https://issues.apache.org/jira/browse/CONNECTORS-1653?focusedCommentId=17236664=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17236664
>
> - one patch for MCF HTML connector :
> https://issues.apache.org/jira/projects/CONNECTORS/issues/CONNECTORS-1660
>
> Thanks,
>
> Olivier
>
> > Le 8 déc. 2020 à 21:49, Karl Wright  a écrit :
> >
> > We have a new connector in the family this release, and a number of bug
> > fixes - both major and minor - have been done.  I'm planning on spinning
> a
> > release candidate in about 2 weeks.
> >
> > I've been extremely busy with my day job this quarter, so if anyone is
> > aware of any issue or proposal or patch that might have been overlooked,
> > please remind me to look at it before then.  Thanks in advance!
> >
> > Karl
>
>


[jira] [Commented] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17248000#comment-17248000
 ] 

Karl Wright commented on CONNECTORS-1660:
-

Please remove the log statement, since it will dump the entire document and 
will overwhelm the logs.


> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_html_extractor_connector_02_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-12-11 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247997#comment-17247997
 ] 

Karl Wright commented on CONNECTORS-1653:
-

This patch has bugs in it.  Specifically:

{code}
-  private final static String defaultAuthorityDenyToken = "__nosecurity__";
+  private final static String defaultAuthorityDenyToken = "DEAD_AUTHORITY";
{code}

There is no token called "DEAD_AUTHORITY".  There is a value available to 
represent this in the superclass.  You should use that.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>    Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_solr_ingester_connector_02_12_2020.txt, 
> patch_solr_ingester_connector_03_12_2020.txt, 
> solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: JDBC authority - Make optional the ID query

2020-12-11 Thread Karl Wright
I think if there is an option for not needing to do the lookup then by all
means we should allow a pass-through.  But I believe there may already be
that option in other existing authority connectors.  It may be best in any
case to have a simple "pass through" authority connector available that can
be used everywhere, rather than make this an option of the JDBC connector.

Karl



On Tue, Dec 8, 2020 at 7:56 AM  wrote:

> Hi Karl,
>
> Currently, the query to retrieve the User ID from the USERNAME in the JDBC
> authority connector configuration is mandatory, an error is triggered if it
> is not fulfilled or if the query does not work. However, we may have a
> token
> query which only uses the USERNAME to work, making the User ID query
> useless
> and resources consuming for nothing.
>
> We may update the code to make the user ID query not mandatory, what do you
> think ?
>
> Regards,
> Julien Massiera
>
>
>
>
>
>


It's release time again

2020-12-08 Thread Karl Wright
We have a new connector in the family this release, and a number of bug
fixes - both major and minor - have been done.  I'm planning on spinning a
release candidate in about 2 weeks.

I've been extremely busy with my day job this quarter, so if anyone is
aware of any issue or proposal or patch that might have been overlooked,
please remind me to look at it before then.  Thanks in advance!

Karl


[jira] [Updated] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-02 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright updated CONNECTORS-1660:

Fix Version/s: ManifoldCF 2.18

> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: patch_html_extractor_connector_02_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1660) Patch for MCF HTML extractor connector

2020-12-02 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1660:
---

Assignee: Karl Wright

> Patch for MCF HTML extractor connector
> --
>
> Key: CONNECTORS-1660
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1660
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: HTML extractor
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: patch_html_extractor_connector_02_12_2020.txt
>
>
> Hello,
> Here is a patch for the HTML extractor connector regarding the text 
> extraction with or without HTML stripping : 
> [^patch_html_extractor_connector_02_12_2020.txt]
>  * Extraction of HTML code : I added a whitelist through the Jsoup cleaner to 
> define what HTML elements are allowed to inforce the security. In the code I 
> set to “relaxed”:
> This whitelist allows a full range of text and structural body HTML: a, b, 
> blockquote, br, caption, cite, code, col, colgroup, dd, div, dl, dt, em, h1, 
> h2, h3, h4, h5, h6, i, img, li, ol, p, pre, q, small, span, strike, strong, 
> sub, sup, table, tbody, td, tfoot, th, thead, tr, u, ul
> (more details here : 
> [https://jsoup.org/apidocs/org/jsoup/safety/Whitelist.html#relaxed()])
> A future improvement of the code would be to add a new parameter on the 
> interface to choose what whitelist to choose.
>  
>  * Extraction of text with stripping HTML activated : we keep only text nodes 
> : all HTML will be stripped (same thing as before). The change is the Jsoup 
> pretty print option is now set to false to keep line breaks.
>  
> Best regards



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Oauth2 for web connector

2020-11-30 Thread Karl Wright
The web connector would need to know how to obtain an access token.  If the
web site you are crawling directs you to a login page, and that is how you
get your access token, then no work needs to be done because the token is
basically just included as a cookie.

If what you are crawling is an internal web site that is protected by a
combination of SSO access tokens plus browser customizations, then I fear
there is no standard way to do this.  The token might be in a separate
header than the standard cookie.

Karl


On Mon, Nov 30, 2020 at 11:59 AM  wrote:

> Hi Karl,
>
>
>
> Since OAuth2 is now widely used, would it be challenging to upgrade the web
> connector so that it takes this authentication method into account ?
>
>
>
> Regards
>
> Julien Massiera
>
>
>
>


[jira] [Commented] (CONNECTORS-1659) Enable SSL/TLS in Jetty server

2020-11-23 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237463#comment-17237463
 ] 

Karl Wright commented on CONNECTORS-1659:
-

JettyRunner has nothing to do with this.
Please describe the exact steps you have taken to enable TLS/SSL for Jetty.


> Enable SSL/TLS in Jetty server
> --
>
> Key: CONNECTORS-1659
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1659
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Framework core
>Affects Versions: ManifoldCF 2.16
>Reporter: Jegan Baskaran
>Priority: Minor
>
> Could you please give some idea how to incorporate the SSL/TLS in jettey 
> server. Currently Jetty server is not accepting SSL details.  I had added 
> some SSL details as per the 
> [https://wiki.eclipse.org/Jetty/Howto/Configure_SSL] 
> but it does not work as expected and do we need to change the code in 
> ManifoldCFJettyRunner.java? could you please help on this issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1653) Solr ingester connector contribution

2020-11-21 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1653.
-
Fix Version/s: ManifoldCF 2.18
   Resolution: Fixed

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Fix For: ManifoldCF 2.18
>
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-11-21 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17236615#comment-17236615
 ] 

Karl Wright commented on CONNECTORS-1653:
-

I've created branches/CONNECTORS-1653 with this work.

I've integrated it with the "solr" connector family, which now has two 
connectors in it: a "Solr" output connector, and a "Solr" repository connector. 
 This cuts down on dependencies and maintenance in the future.

[~olivierfl], if you wish to check out and build this branch, and verify that 
the connector works as expected, I'd appreciate it.  I will be doing the same 
thing as time permits over the next few days.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>    Reporter: Olivier Tavard
>Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CONNECTORS-1658) java.lang.NoClassDefFoundError: javax/activation/DataSource

2020-11-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1658.
-
Fix Version/s: ManifoldCF 2.18
 Assignee: Karl Wright
   Resolution: Cannot Reproduce

> java.lang.NoClassDefFoundError: javax/activation/DataSource
> ---
>
> Key: CONNECTORS-1658
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1658
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build
>Reporter: Nav
>    Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.18
>
> Attachments: csws.log
>
>
> I am trying to compile the project with Ant. For both the main Project & the 
> csws connector, I am facing the below error. It has been a decade since I 
> used Ant. Please help:
> _classcreate-wsdl-cxf:_
>  _[java] Exception in thread "main" java.lang.NoClassDefFoundError: 
> javax/activation/DataSource_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.(RuntimeBuiltinLeafInfoImpl.java:478)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeTypeInfoSetImpl.(RuntimeTypeInfoSetImpl.java:63)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:128)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:84)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.ModelBuilder.(ModelBuilder.java:162)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.(RuntimeModelBuilder.java:92)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:455)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:303)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:139)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1156)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:165)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:289)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_
>  _[java] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
>  _[java] at java.base/java.lang.reflect.Method.invoke(Method.java:566)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:217)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:175)_
>  _[java] at javax.xml.bind.ContextFinder.find(ContextFinder.java:353)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:508)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:465)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:366)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.init(PluginLoader.java:78)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.(PluginLoader.java:73)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.newInstance(PluginLoader.java:106)_
>  _[java] at org.apache.cxf.tools.wsdlto.WSDLToJava.(WSDLToJava.java:48)_
>  _[java] at org.apache.cxf.tools.wsdlto.WSDLToJava.main(WSDLToJava.java:182)_
>  _[java] Caused by: java.lang.ClassNotFoundException: 
> javax.activation.DataSource_
>  _[java] at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)_
>  _[java] at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)_
>  _[java] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)_
>  _[java] ... 27 more_
>  _[java] Java Result: 1_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1658) java.lang.NoClassDefFoundError: javax/activation/DataSource

2020-11-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232708#comment-17232708
 ] 

Karl Wright commented on CONNECTORS-1658:
-

First, you need to set JAVA_HOME appropriately, e.g:

{code}
C:\wip\mcf\trunk>echo %JAVA_HOME%
c:\Program Files\Java\jdk1.8.0_181

C:\wip\mcf\trunk>
{code}

You can use jdk 11 but it MUST be a jdk you point at, not just a runtime.

Second, when I build here from a fresh checkout, this is what I see:

{code}
C:\wip\mcf\trunk>ant make-core-deps
...(lots of download activity)...
download-via-maven:
  [get] Getting: 
https://repo1.maven.org/maven2/com/fasterxml/jackson/core/jackson-annotations/2.9.8/jackson-annotations-2.9.8.jar
  [get] To: C:\wip\mcf\trunk\lib\jackson-annotations-2.9.8.jar

make-core-deps:
 [copy] Copying 3 files to C:\wip\mcf\trunk\lib

BUILD SUCCESSFUL
Total time: 7 minutes 55 seconds
{code}

You only need to do the above ONCE, unless it errors out for some reason.

Then:

{code}
C:\wip\mcf\trunk>ant build
...(lots of build activity)...
general-add-repository-connector-proprietary-commented:

general-add-repository-connector-proprietary-non-commented:

general-add-repository-connector:

build:

BUILD SUCCESSFUL
Total time: 2 minutes 55 seconds
{code}

This builds the framework and all the individual connectors.  After you run 
this step, you can update the individual connectors individually and that too 
works, but until the framework is built that won't work.

If the above steps do not work for you, please let me know at what point they 
fail.


> java.lang.NoClassDefFoundError: javax/activation/DataSource
> ---
>
> Key: CONNECTORS-1658
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1658
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build
>Reporter: Nav
>Priority: Major
> Attachments: csws.log
>
>
> I am trying to compile the project with Ant. For both the main Project & the 
> csws connector, I am facing the below error. It has been a decade since I 
> used Ant. Please help:
> _classcreate-wsdl-cxf:_
>  _[java] Exception in thread "main" java.lang.NoClassDefFoundError: 
> javax/activation/DataSource_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.(RuntimeBuiltinLeafInfoImpl.java:478)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeTypeInfoSetImpl.(RuntimeTypeInfoSetImpl.java:63)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:128)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:84)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.ModelBuilder.(ModelBuilder.java:162)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.(RuntimeModelBuilder.java:92)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:455)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:303)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:139)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1156)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:165)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:289)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_
>  _[java] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
>  _[java] at java.base/java.lang.reflect.Method.invoke(Method.java:566)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:217)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:175)_
>  _[java] at javax.xml.bind.ContextFinder.find(ContextFinder.java:353)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:508)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:465)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:366)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.init(PluginLoader.java:78)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.(PluginLoader.java:73)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.newInstance(PluginLoader.java:106)_
>  _[java] at org.apache.cxf.tools

[jira] [Commented] (CONNECTORS-1658) java.lang.NoClassDefFoundError: javax/activation/DataSource

2020-11-13 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17231784#comment-17231784
 ] 

Karl Wright commented on CONNECTORS-1658:
-

Did you do "ant make-core-deps" first?  Without that, the dependencies are not 
fetched.


> java.lang.NoClassDefFoundError: javax/activation/DataSource
> ---
>
> Key: CONNECTORS-1658
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1658
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build
>Reporter: Nav
>Priority: Major
>
> I am trying to compile the project with Ant. For both the main Project & the 
> csws connector, I am facing the below error. It has been a decade since I 
> used Ant. Please help:
> _classcreate-wsdl-cxf:_
>  _[java] Exception in thread "main" java.lang.NoClassDefFoundError: 
> javax/activation/DataSource_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeBuiltinLeafInfoImpl.(RuntimeBuiltinLeafInfoImpl.java:478)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeTypeInfoSetImpl.(RuntimeTypeInfoSetImpl.java:63)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:128)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.createTypeInfoSet(RuntimeModelBuilder.java:84)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.ModelBuilder.(ModelBuilder.java:162)_
>  _[java] at 
> com.sun.xml.bind.v2.model.impl.RuntimeModelBuilder.(RuntimeModelBuilder.java:92)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.getTypeInfoSet(JAXBContextImpl.java:455)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:303)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl.(JAXBContextImpl.java:139)_
>  _[java] at 
> com.sun.xml.bind.v2.runtime.JAXBContextImpl$JAXBContextBuilder.build(JAXBContextImpl.java:1156)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:165)_
>  _[java] at 
> com.sun.xml.bind.v2.ContextFactory.createContext(ContextFactory.java:289)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method)_
>  _[java] at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)_
>  _[java] at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)_
>  _[java] at java.base/java.lang.reflect.Method.invoke(Method.java:566)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:217)_
>  _[java] at javax.xml.bind.ContextFinder.newInstance(ContextFinder.java:175)_
>  _[java] at javax.xml.bind.ContextFinder.find(ContextFinder.java:353)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:508)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:465)_
>  _[java] at javax.xml.bind.JAXBContext.newInstance(JAXBContext.java:366)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.init(PluginLoader.java:78)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.(PluginLoader.java:73)_
>  _[java] at 
> org.apache.cxf.tools.wsdlto.core.PluginLoader.newInstance(PluginLoader.java:106)_
>  _[java] at org.apache.cxf.tools.wsdlto.WSDLToJava.(WSDLToJava.java:48)_
>  _[java] at org.apache.cxf.tools.wsdlto.WSDLToJava.main(WSDLToJava.java:182)_
>  _[java] Caused by: java.lang.ClassNotFoundException: 
> javax.activation.DataSource_
>  _[java] at 
> java.base/jdk.internal.loader.BuiltinClassLoader.loadClass(BuiltinClassLoader.java:581)_
>  _[java] at 
> java.base/jdk.internal.loader.ClassLoaders$AppClassLoader.loadClass(ClassLoaders.java:178)_
>  _[java] at java.base/java.lang.ClassLoader.loadClass(ClassLoader.java:521)_
>  _[java] ... 27 more_
>  _[java] Java Result: 1_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1657) Web connector - Handle sitemap instruction in robot.txt

2020-10-22 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219054#comment-17219054
 ] 

Karl Wright commented on CONNECTORS-1657:
-

This is a warning only:

{code}
Logging.connectors.warn("Web: Unknown robots.txt line from '"+hostName+"': 
'"+problemLine+"'");
{code}

No problems are caused when the robots.txt line is found.


> Web connector - Handle sitemap instruction in robot.txt
> ---
>
> Key: CONNECTORS-1657
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1657
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Major
>
> Currently the web connector does not understand when the robot.txt file 
> points a sitemap. As an example, for the site 
> [https://www.persee.fr,|https://www.persee.fr%2C/] in the simple history one 
> can find the following error:
> Unknown robots.txt line: 'Sitemap: [https://www.persee.fr/sitemap.xml']
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1656) HTML extractor produces invalid XML

2020-10-20 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17217609#comment-17217609
 ] 

Karl Wright commented on CONNECTORS-1656:
-

The issue, in my opinion, is that the document produced identifies itself as 
XML when it is not.  The first line therefore may be all you need to change to 
get Tika to not blow up on badly formed XML that comes from HTML.

If you want to research this, you might be able to find out what Tika accepts 
and what it does not pretty readily with some offline experimentation.



> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1656) HTML extractor produces invalid XML

2020-10-20 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1656:
---

Assignee: Karl Wright

> HTML extractor produces invalid XML
> ---
>
> Key: CONNECTORS-1656
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1656
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: HTML extractor
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Major
>
> The HTML extractor connector produces valid HTML doc (when the 'Strip HTML' 
> option is disabled) but invalid XML (some tags like img do not have closing 
> tag), and in some cases it is problematic. For example, when Tika is used 
> behind, it processes the document as an XML document and most of the time a 
> parse exception is raised.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: HTML extractor produces invalid XML

2020-10-20 Thread Karl Wright
I've added the component as requested.  As for the advice, I suggest you
create a ticket and we can discuss there.
Karl


On Tue, Oct 20, 2020 at 6:24 AM  wrote:

> Hi,
>
>
>
> I noticed a problem with the HTML extractor connector. It produces valid
> HTML doc (when the 'Strip HTML' option is disabled) but invalid XML (some
> tags like img do not have closing tag), and in some cases it is
> problematic.
> For example, when Tika is used behind, it processes the document as an XML
> document and most of the time a parse exception is raised and the document
> content is lost.
>
>
>
> I would like to create a ticket for this issue and I would be glad to
> propose a patch and do the commit myself but I need two things:
>
> 1/ Create The "HTML extractor" component in Jira
>
>
>
> 2/ Your advise concerning the way to resolve the issue: Either we configure
> this connector to always output XML valid document (when the "Strip HTML"
> option is disabled), or we add a new option in the configuration to enforce
> XML output when enabled ?
>
>
>
> Regards,
> Julien
>
>


[jira] [Resolved] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1655.
-
Fix Version/s: ManifoldCF 2.18
   Resolution: Fixed

r1882582


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
> Fix For: ManifoldCF 2.18
>
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215311#comment-17215311
 ] 

Karl Wright commented on CONNECTORS-1655:
-

Ah, but wait a minute: the issue is that the document in question has an 
illegal content-type:

"utf-8; filename=rseventspro_rss20_56.xml"

A patch for that is possible.  


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1655:
---

Assignee: Karl Wright

> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Assignee: Karl Wright
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-16 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215309#comment-17215309
 ] 

Karl Wright commented on CONNECTORS-1655:
-

Basically what is failing is using character encoding "utf-8".  As you know 
this is a very standard charset and almost nothing will work without it.  This 
is not on the list of things removed from JDK 11 as far as I am aware.  Perhaps 
its name has changed and we therefore need to add a list of names that map to 
it somewhere.  But usage would be strewn throughout ManifoldCF in any case.

But the official Oracle doc says it should be there, and isn't case sensitive 
either:

https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/nio/charset/Charset.html

I'm afraid it's up to you to do research as to why it's not found in your setup.


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1655) Web connector - UnsupportedEncodingException utf-8

2020-10-15 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214873#comment-17214873
 ] 

Karl Wright commented on CONNECTORS-1655:
-

So you are using a non-standard JVM that doesn't understand utf-8 character 
encoding.
Sorry, you don't get a fix for that. o_O  Use a standard JVM please.


> Web connector - UnsupportedEncodingException utf-8
> --
>
> Key: CONNECTORS-1655
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1655
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Web connector
>Affects Versions: ManifoldCF 2.17
>Reporter: Julien Massiera
>Priority: Critical
>
> When crawling some sites (for instance this one: 
> [http://www.antibes-juanlespins.com/] ) the job manages to index some 
> documents, but the stops with the following error code:
> Error: IO error: utf-8; filename=rseventspro_rss20_56.xml
> Here is one the MCF stacktrace: 
> Exception tossed: IO error: utf-8; filename=rseventspro_rss20_56.xml
> org.apache.manifoldcf.core.interfaces.ManifoldCFException: IO error: utf-8; 
> filename=rseventspro_rss20_56.xml
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4203)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.extractLinks(WebcrawlerConnector.java:3855)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.processDocuments(WebcrawlerConnector.java:746)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) 
> [mcf-pull-agent.jar:?]
> Caused by: java.io.UnsupportedEncodingException: utf-8; 
> filename=rseventspro_rss20_56.xml
> at sun.nio.cs.StreamDecoder.forInputStreamReader(StreamDecoder.java:71) 
> ~[?:1.8.0_212]
> at java.io.InputStreamReader.(InputStreamReader.java:100) ~[?:1.8.0_212]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.DecodingByteReceiver.dealWithBytes(DecodingByteReceiver.java:47)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.BOMEncodingDetector.dealWithRemainder(BOMEncodingDetector.java:250)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.SingleByteReceiver.dealWithBytes(SingleByteReceiver.java:52)
>  ~[?:?]
> at 
> org.apache.manifoldcf.connectorcommon.fuzzyml.Parser.parseWithCharsetDetection(Parser.java:74)
>  ~[?:?]
> at 
> org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.handleXML(WebcrawlerConnector.java:4174)
>  ~[?:?]
> ... 3 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-15 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214684#comment-17214684
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Looked briefly at the code; looked good so far from what I see.

However, one question.  The connector build.xml has this in it:

{code}
+
+
+
+
+  
+
+
+
+  
+
+
{code}

These are the ManifoldCF solr security plugins.  Do they apply here?


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-14 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17214029#comment-17214029
 ] 

Karl Wright commented on CONNECTORS-1653:
-

Sadly, little time for anything.  Not sure when the crunch will end either.


> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1653) Solr ingester connector contribution

2020-10-14 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1653:
---

Assignee: Karl Wright

> Solr ingester connector contribution
> 
>
> Key: CONNECTORS-1653
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1653
> Project: ManifoldCF
>  Issue Type: New Feature
>Reporter: Olivier Tavard
>    Assignee: Karl Wright
>Priority: Minor
> Attachments: solr_ingester_connector_patch.txt
>
>
> Hi,
> We developed a new repository connector for crawling data from Solr and we 
> would like to contribute to MCF by releasing the code into Apache v2 license.
> The goal of this connector is to crawl Solr instances and manage it in MCF 
> rather than using DIH for instance.
> So to do it, we send requests to Solr and we manage the large number of 
> results thanks to the cursormark. The Solr fields must be stored in order to 
> be gathered.
> By the way we do not use any specific libraries, all the dependencies are 
> already into MCF. We tested it so far for Solr 7 and 8 versions.
> The documentation is here : 
> https://datafari.atlassian.net/wiki/spaces/DATAFARI/pages/673742849/Solr+ingester+crawler+connector
> The code is attached.
> Best regards,
> Olivier Tavard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Memory problem on Agent ?

2020-10-02 Thread Karl Wright
Please check your -Xmx switch.

Memory will not be released because that is not how Java works.  It
allocates the memory it needs and periodically garbage collects within
that.  You have given it too much memory and you should not expect Java to
release it ever.  The solution is to give it less.  A rule of thumb is to
leave 10gb free for system usage and divide the remainder among your Java
processes.

Thanks,
Karl


On Fri, Oct 2, 2020 at 11:21 AM Bisonti Mario 
wrote:

> Yes, buti t seems that, when the indexing finished, the memory is not
> released
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* venerdì 2 ottobre 2020 17:14
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Memory problem on Agent ?
>
>
>
> Hi Mario,
>
> Java processes only use the memory you hand them.
>
>
> It looks like you are handing Java more memory than your machine has.
>
> This will not work.
>
>
> Karl
>
>
>
>
>
> On Fri, Oct 2, 2020 at 10:45 AM Bisonti Mario 
> wrote:
>
>
>
> Hallo.
>
>
>
> When I scan the content of Repository , I note that memory used is very
> high and it isn’t released
>
>
>
> i.e. 60GB on 70GB available
>
>
>
> I tried to free shutting down tjhe agent but I am not able:
>
>
>
> /opt/manifoldcf/multiprocess-zk-example-proprietary/stop-agents.sh
>
> OpenJDK 64-Bit Server VM warning: INFO:
> os::commit_memory(0x7f4d5800, 34359738368, 0) failed; error='Not
> enough space' (errno=12)
>
> #
>
> # There is insufficient memory for the Java Runtime Environment to
> continue.
>
> # Native memory allocation (mmap) failed to map 34359738368 bytes for
> committing reserved memory.
>
> # An error report file with more information is saved as:
>
> # /opt/manifoldcf/multiprocess-zk-example-proprietary/hs_err_pid2796.log
>
>
>
> So, to free memory, I have to restart the server
>
> How could I solve this?
>
>
>
> Thanks a lot
>
> Mario
>
>
>
>


Re: Memory problem on Agent ?

2020-10-02 Thread Karl Wright
Hi Mario,

Java processes only use the memory you hand them.

It looks like you are handing Java more memory than your machine has.

This will not work.

Karl


On Fri, Oct 2, 2020 at 10:45 AM Bisonti Mario 
wrote:

>
>
> Hallo.
>
>
>
> When I scan the content of Repository , I note that memory used is very
> high and it isn’t released
>
>
>
> i.e. 60GB on 70GB available
>
>
>
> I tried to free shutting down tjhe agent but I am not able:
>
>
>
> /opt/manifoldcf/multiprocess-zk-example-proprietary/stop-agents.sh
>
> OpenJDK 64-Bit Server VM warning: INFO:
> os::commit_memory(0x7f4d5800, 34359738368, 0) failed; error='Not
> enough space' (errno=12)
>
> #
>
> # There is insufficient memory for the Java Runtime Environment to
> continue.
>
> # Native memory allocation (mmap) failed to map 34359738368 bytes for
> committing reserved memory.
>
> # An error report file with more information is saved as:
>
> # /opt/manifoldcf/multiprocess-zk-example-proprietary/hs_err_pid2796.log
>
>
>
> So, to free memory, I have to restart the server
>
> How could I solve this?
>
>
>
> Thanks a lot
>
> Mario
>
>
>


[jira] [Resolved] (CONNECTORS-1654) java.lang.NoClassDefFoundError: org/eclipse/jetty/client/util/SPNEGOAuthentication

2020-09-30 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1654.
-
Fix Version/s: ManifoldCF 2.18
   Resolution: Fixed

Is currently fixed on trunk.


> java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/client/util/SPNEGOAuthentication
> --
>
> Key: CONNECTORS-1654
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1654
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build, Solr 7.x component
>Affects Versions: ManifoldCF 2.16
>Reporter: Jegan Baskaran
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.18
>
>
> Hi, 
> This is issue related to Connector-1629. I had the same issue as enable the 
> kerberos in the solr and try to save and its giving me the  
> java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/client/util/SPNEGOAuthentication  error.
> Am using Manifold 2.16 binary version(multiprocess-file-example) and i 
> manually copied the jetty-client jetty-client-9.4.19.v20190610.jar in the lib 
> folder and the jar has been added into the jetty-options-unix file as well. 
> Even then am getting class not found error. Do we need to upgrade all the 
> jetty version under the lib folder to take effect or only jetty-client is 
> enough to change. Since its a production change the jetty version should not 
> hamper the existing behavior. Please advise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1654) java.lang.NoClassDefFoundError: org/eclipse/jetty/client/util/SPNEGOAuthentication

2020-09-30 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1654:
---

Assignee: Karl Wright

> java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/client/util/SPNEGOAuthentication
> --
>
> Key: CONNECTORS-1654
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1654
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Build, Solr 7.x component
>Affects Versions: ManifoldCF 2.16
>Reporter: Jegan Baskaran
>Assignee: Karl Wright
>Priority: Major
>
> Hi, 
> This is issue related to Connector-1629. I had the same issue as enable the 
> kerberos in the solr and try to save and its giving me the  
> java.lang.NoClassDefFoundError: 
> org/eclipse/jetty/client/util/SPNEGOAuthentication  error.
> Am using Manifold 2.16 binary version(multiprocess-file-example) and i 
> manually copied the jetty-client jetty-client-9.4.19.v20190610.jar in the lib 
> folder and the jar has been added into the jetty-options-unix file as well. 
> Even then am getting class not found error. Do we need to upgrade all the 
> jetty version under the lib folder to take effect or only jetty-client is 
> enough to change. Since its a production change the jetty version should not 
> hamper the existing behavior. Please advise. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1629) Support Solr Kerberos Authentication

2020-09-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201833#comment-17201833
 ] 

Karl Wright commented on CONNECTORS-1629:
-

I'm pretty sure that was overlooked.

r1881997 adds it, and upgrades the version of jetty as requested.  You'll have 
to build from source however since a new release isn't coming until December.



> Support Solr Kerberos Authentication
> 
>
> Key: CONNECTORS-1629
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1629
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Lucene/SOLR connector
>Affects Versions: ManifoldCF 2.14
>Reporter: Jörn Franke
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.16
>
>
> Several enterprise deployments of Solr are leveraging SolrCloud Kerberos 
> authentication.
> The integration seems to be rather simple and the goal of this Jira is to 
> evaluate the possential needed step to eventually contribute the Kerberos 
> integration to the ManifoldCF project.
> The following steps would be needed:
>  * One can pass the JVM parameter java.security.auth.login.config to the 
> ManifoldCF JVM using -Djava.security.auth.login.config=/path/to/jaas.confg in 
> which Kerberos authentication details, such as keytab and principal that has 
> the right access to Solr is configured
>  * A small adaption to the SolrCloudClient that is used within Manifold needs 
> to be done to enable Kerberos authentication: 
> HttpClientUtil.setConfigurer(new Krb5HttpClientConfigurer());
> Should this be integrated in Manifold, one may want to consider one input 
> field in the configuration in the UI where one can select / flow which user 
> defined in the Jaas conf (you can define multiple one) should be chosen. By 
> default one may simply select "client" or "SolrJClient" if Jaas.conf is 
> present in the System properties. This does not mean the user needs to be 
> named like this, but the configuration entry referencing any user should be 
> named like this.
> Having a confiugration allows to have a different users per flow. This might 
> also be needed in case you have multiple Solr clusters. 
> Related discussion 
> [http://mail-archives.apache.org/mod_mbox/manifoldcf-user/201912.mbox/browser]
> SolrJ Kerberos integration: 
> [https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr]
> Jaas conf documentation: 
> [https://docs.oracle.com/javase/8/docs/technotes/guides/security/jgss/tutorials/LoginConfigFile.html]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


I updated the site with the new release yesterday, but hasn't gone live

2020-09-18 Thread Karl Wright
Svnpubsub seems to be broken.
I sent email to infrastruct...@apache.org but apparently nobody reads that
anymore.  Stay tuned.

Karl


I updated the site with the new release yesterday, but hasn't gone live

2020-09-18 Thread Karl Wright
Svnpubsub seems to be broken.
I sent email to infrastruct...@apache.org but apparently nobody reads that
anymore.  Stay tuned.

Karl


[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-09-17 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198112#comment-17198112
 ] 

Karl Wright commented on CONNECTORS-1648:
-

Site has been updated but the mirror is apparently not working.  I've alerted 
infrastruct...@apache.org.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
> Attachments: image-2020-09-17-11-42-02-988.png
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-09-17 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17198018#comment-17198018
 ] 

Karl Wright commented on CONNECTORS-1648:
-

It has already been released.  Site update hasn't happened yet is all.  No 
time; I have multiple work-related crises and just have no cycles.

You can download the release with the same URL as the 2.16 release; just change 
the 2.16 to 2.17.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
> Attachments: image-2020-09-17-11-42-02-988.png
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[RESULT] [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-16 Thread Karl Wright
Three +1's, >>72 hours.  Vote passes, finally!

Karl

On Wed, Sep 16, 2020 at 1:12 AM Piergiorgio Lucidi 
wrote:

> +1 from me.
>
> PJ
>
>
> Il Sab 5 Set 2020, 12:32 Karl Wright  ha scritto:
>
> > Please vote on whether to release Apache ManifoldCF 2.17, RC1.  The
> release
> > artifact can be found here:
> >
> > https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17
> >
> > There is also a release tag at:
> >
> > https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.17-RC1
> >
> > This release does not contain anything major - just a few bug fixes,
> > summarized in the CHANGES.txt file.  It does include documentation,
> > however, which did not get successfully built for the 2.16 release.
> Please
> > review carefully with that in mind.
> >
> > The respin was required because the ElasticSearch test did not properly
> > work on the Mac.
> >
> > Thanks!
> > Karl
> >
>


Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-15 Thread Karl Wright
Hi all,
This is a vote thread, and RC1 is still waiting for votes.  Still need 2
more +1s.

Karl


On Tue, Sep 15, 2020 at 10:32 AM Cihad Guzel  wrote:

> Hi,
>
> I think that we can vote again to release
>
> Kınd Regards,
> Cihad Guzel
>
>
> Michael Cizmar , 15 Eyl 2020 Sal, 14:13
> tarihinde şunu yazdı:
>
> > Karl,
> >
> > This occurs on mac and then only on maven.  Both of these are secondary
> > targets for the build/release process.   I don't know if there's any
> > difference to doing a follow up RC candidate  because the original build
> > works on the targeted platforms and as you mentioned, this is just a path
> > issue with a test and no code has been modified in this connector.
> >
> > M
> >
> > On Mon, Sep 14, 2020 at 6:28 AM Karl Wright  wrote:
> >
> > > I have time this week only to spin a new RC, if that's what the
> community
> > > wants, but not to modify the maven build to download ElasticSearch and
> > > unpack it.  Mind you, there's still no difference in production code
> > > between RC0, RC1, and what's currently on the branch.  We've been
> fixing
> > a
> > > test only.
> > >
> > > Please let me know what you feel is necessary for a release to succeed.
> > > Karl
> > >
> > >
> > > On Sun, Sep 13, 2020 at 6:36 AM Karl Wright 
> wrote:
> > >
> > > > Works fine now for Maven (although I had to upgrade the version of
> > > > failsafe plugin to work with my current version of Maven), provided
> you
> > > run
> > > > the ant make-dependencies first.
> > > >
> > > > Karl
> > > >
> > > >
> > > > On Sat, Sep 12, 2020 at 9:28 PM Karl Wright 
> > wrote:
> > > >
> > > >> I used a -D variable that is set differently by maven and ant
> builds.
> > > >>
> > > >> It works fine for ant.  Bandwidth limitations tonight mean I will
> try
> > > >> tomorrow morning for maven.
> > > >>
> > > >> Karl
> > > >>
> > > >>
> > > >> On Sat, Sep 12, 2020 at 9:26 PM Michael Cizmar <
> > > mich...@michaelcizmar.com>
> > > >> wrote:
> > > >>
> > > >>> Ok.  What about an environmental variable that is used for the
> > download
> > > >>> and
> > > >>> then is read in the test case?
> > > >>>
> > > >>> On Sat, Sep 12, 2020 at 7:14 PM Karl Wright 
> > > wrote:
> > > >>>
> > > >>> > Ok, the path change will break the Ant test.  The maven test
> seems
> > to
> > > >>> have
> > > >>> > the current directory set connectors/elasticsearch during
> testing;
> > > the
> > > >>> ant
> > > >>> > test explicitly sets it to one of the build directories below
> that.
> > > >>> But in
> > > >>> > any case I will need to consider how the test can be changed to
> > use a
> > > >>> > specific ES source directory; maybe a -D can be pushed into it.
> > > >>> >
> > > >>> > Karl
> > > >>> >
> > > >>> > On Sat, Sep 12, 2020 at 3:30 PM Michael Cizmar <
> > > >>> mich...@michaelcizmar.com>
> > > >>> > wrote:
> > > >>> >
> > > >>> > > Two in BaseITHSQLDB's setupElasticSearch method
> > > >>> > >
> > > >>> > > First to set the Java_HOME
> > > >>> > > Map envs = pb.environment();
> > > >>> > > if (System.getenv("JAVA_HOME")!= null) {
> > > >>> > > envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
> > > >>> > > } else {
> > > >>> > > throw new Exception("Missing JAVA_HOME as a system environment
> > > >>> > variable");
> > > >>> > > }
> > > >>> > >
> > > >>> > > The second removing the double dot
> > > >>> > > if (isUnix) {
> > > >>> > > pb.command("bash", "-c",
> > > >>> > > "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
> > > >>> > > -q -Expack.ml.enabled=false");
> > > >>

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-13 Thread Karl Wright
Works fine now for Maven (although I had to upgrade the version of failsafe
plugin to work with my current version of Maven), provided you run the ant
make-dependencies first.

Karl


On Sat, Sep 12, 2020 at 9:28 PM Karl Wright  wrote:

> I used a -D variable that is set differently by maven and ant builds.
>
> It works fine for ant.  Bandwidth limitations tonight mean I will try
> tomorrow morning for maven.
>
> Karl
>
>
> On Sat, Sep 12, 2020 at 9:26 PM Michael Cizmar 
> wrote:
>
>> Ok.  What about an environmental variable that is used for the download
>> and
>> then is read in the test case?
>>
>> On Sat, Sep 12, 2020 at 7:14 PM Karl Wright  wrote:
>>
>> > Ok, the path change will break the Ant test.  The maven test seems to
>> have
>> > the current directory set connectors/elasticsearch during testing; the
>> ant
>> > test explicitly sets it to one of the build directories below that.
>> But in
>> > any case I will need to consider how the test can be changed to use a
>> > specific ES source directory; maybe a -D can be pushed into it.
>> >
>> > Karl
>> >
>> > On Sat, Sep 12, 2020 at 3:30 PM Michael Cizmar <
>> mich...@michaelcizmar.com>
>> > wrote:
>> >
>> > > Two in BaseITHSQLDB's setupElasticSearch method
>> > >
>> > > First to set the Java_HOME
>> > > Map envs = pb.environment();
>> > > if (System.getenv("JAVA_HOME")!= null) {
>> > > envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
>> > > } else {
>> > > throw new Exception("Missing JAVA_HOME as a system environment
>> > variable");
>> > > }
>> > >
>> > > The second removing the double dot
>> > > if (isUnix) {
>> > > pb.command("bash", "-c",
>> > > "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
>> > > -q -Expack.ml.enabled=false");
>> > > System.out.println("Unix process");
>> > > } else {
>> > > pb.command("cmd.exe", "/c", "..\\test-materials\\windows\\
>> > > elasticsearch-7.6.2\\bin\\elasticsearch.bat -q
>> > -Expack.ml.enabled=false");
>> > > System.out.println("Windows process");
>> > > }
>> > >
>> > >
>> > > ===
>> > >
>> > > ---
>> > >
>> > >
>> >
>> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
>> > > (revision
>> > > 1881665)
>> > >
>> > > +++
>> > >
>> > >
>> >
>> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
>> > > (working
>> > > copy)
>> > >
>> > > @@ -32,6 +32,7 @@
>> > >
>> > >  import org.apache.http.util.EntityUtils;
>> > >
>> > >  import org.apache.http.impl.client.HttpClients;
>> > >
>> > >  import java.io.IOException;
>> > >
>> > > +import java.util.Map;
>> > >
>> > >  import java.io.File;
>> > >
>> > >
>> > >
>> > >
>> > >
>> > > @@ -44,11 +45,13 @@
>> > >
>> > >  {
>> > >
>> > >
>> > >
>> > >final static boolean isUnix;
>> > >
>> > > +
>> > >
>> > >static {
>> > >
>> > >  final String os = System.getProperty("os.name").toLowerCase();
>> > >
>> > >  if (os.contains("win")) {
>> > >
>> > >isUnix = false;
>> > >
>> > >  } else {
>> > >
>> > > +  //Unix
>> > >
>> > >isUnix = true;
>> > >
>> > >  }
>> > >
>> > >}
>> > >
>> > > @@ -84,9 +87,16 @@
>> > >
>> > >  final File absFile = new File(".").getAbsoluteFile();
>> > >
>> > >  System.out.println("ES working directory is '"+absFile+"'");
>> > >
>> > >  pb.directory(absFile);
>> > >
>> > > +Map envs = pb.environment();
>> > >
>> > > +if 

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
I used a -D variable that is set differently by maven and ant builds.

It works fine for ant.  Bandwidth limitations tonight mean I will try
tomorrow morning for maven.

Karl


On Sat, Sep 12, 2020 at 9:26 PM Michael Cizmar 
wrote:

> Ok.  What about an environmental variable that is used for the download and
> then is read in the test case?
>
> On Sat, Sep 12, 2020 at 7:14 PM Karl Wright  wrote:
>
> > Ok, the path change will break the Ant test.  The maven test seems to
> have
> > the current directory set connectors/elasticsearch during testing; the
> ant
> > test explicitly sets it to one of the build directories below that.  But
> in
> > any case I will need to consider how the test can be changed to use a
> > specific ES source directory; maybe a -D can be pushed into it.
> >
> > Karl
> >
> > On Sat, Sep 12, 2020 at 3:30 PM Michael Cizmar <
> mich...@michaelcizmar.com>
> > wrote:
> >
> > > Two in BaseITHSQLDB's setupElasticSearch method
> > >
> > > First to set the Java_HOME
> > > Map envs = pb.environment();
> > > if (System.getenv("JAVA_HOME")!= null) {
> > > envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
> > > } else {
> > > throw new Exception("Missing JAVA_HOME as a system environment
> > variable");
> > > }
> > >
> > > The second removing the double dot
> > > if (isUnix) {
> > > pb.command("bash", "-c",
> > > "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
> > > -q -Expack.ml.enabled=false");
> > > System.out.println("Unix process");
> > > } else {
> > > pb.command("cmd.exe", "/c", "..\\test-materials\\windows\\
> > > elasticsearch-7.6.2\\bin\\elasticsearch.bat -q
> > -Expack.ml.enabled=false");
> > > System.out.println("Windows process");
> > > }
> > >
> > >
> > > ===
> > >
> > > ---
> > >
> > >
> >
> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
> > > (revision
> > > 1881665)
> > >
> > > +++
> > >
> > >
> >
> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
> > > (working
> > > copy)
> > >
> > > @@ -32,6 +32,7 @@
> > >
> > >  import org.apache.http.util.EntityUtils;
> > >
> > >  import org.apache.http.impl.client.HttpClients;
> > >
> > >  import java.io.IOException;
> > >
> > > +import java.util.Map;
> > >
> > >  import java.io.File;
> > >
> > >
> > >
> > >
> > >
> > > @@ -44,11 +45,13 @@
> > >
> > >  {
> > >
> > >
> > >
> > >final static boolean isUnix;
> > >
> > > +
> > >
> > >static {
> > >
> > >  final String os = System.getProperty("os.name").toLowerCase();
> > >
> > >  if (os.contains("win")) {
> > >
> > >isUnix = false;
> > >
> > >  } else {
> > >
> > > +  //Unix
> > >
> > >isUnix = true;
> > >
> > >  }
> > >
> > >}
> > >
> > > @@ -84,9 +87,16 @@
> > >
> > >  final File absFile = new File(".").getAbsoluteFile();
> > >
> > >  System.out.println("ES working directory is '"+absFile+"'");
> > >
> > >  pb.directory(absFile);
> > >
> > > +Map envs = pb.environment();
> > >
> > > +if (System.getenv("JAVA_HOME")!= null) {
> > >
> > > +  envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
> > >
> > > +} else {
> > >
> > > +  throw new Exception("Missing JAVA_HOME as a system environment
> > > variable");
> > >
> > > +}
> > >
> > >
> > >
> > > +
> > >
> > >  if (isUnix) {
> > >
> > > -  pb.command("bash", "-c",
> > > "../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch -q
> > > -Expack.ml.enabled=false");
> > >
> > > +  pb.command("bash", 

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
Ok, I committed code that should work in the following environment: when
you run ant first to download the ES jar and unpack it.  After that the
maven build should be able to run the test.

r1881672

Karl


On Sat, Sep 12, 2020 at 8:14 PM Karl Wright  wrote:

> Ok, the path change will break the Ant test.  The maven test seems to have
> the current directory set connectors/elasticsearch during testing; the ant
> test explicitly sets it to one of the build directories below that.  But in
> any case I will need to consider how the test can be changed to use a
> specific ES source directory; maybe a -D can be pushed into it.
>
> Karl
>
> On Sat, Sep 12, 2020 at 3:30 PM Michael Cizmar 
> wrote:
>
>> Two in BaseITHSQLDB's setupElasticSearch method
>>
>> First to set the Java_HOME
>> Map envs = pb.environment();
>> if (System.getenv("JAVA_HOME")!= null) {
>> envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
>> } else {
>> throw new Exception("Missing JAVA_HOME as a system environment variable");
>> }
>>
>> The second removing the double dot
>> if (isUnix) {
>> pb.command("bash", "-c",
>> "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
>> -q -Expack.ml.enabled=false");
>> System.out.println("Unix process");
>> } else {
>> pb.command("cmd.exe", "/c", "..\\test-materials\\windows\\
>> elasticsearch-7.6.2\\bin\\elasticsearch.bat -q -Expack.ml.enabled=false");
>> System.out.println("Windows process");
>> }
>>
>>
>> ===
>>
>> ---
>>
>> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
>> (revision
>> 1881665)
>>
>> +++
>>
>> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
>> (working
>> copy)
>>
>> @@ -32,6 +32,7 @@
>>
>>  import org.apache.http.util.EntityUtils;
>>
>>  import org.apache.http.impl.client.HttpClients;
>>
>>  import java.io.IOException;
>>
>> +import java.util.Map;
>>
>>  import java.io.File;
>>
>>
>>
>>
>>
>> @@ -44,11 +45,13 @@
>>
>>  {
>>
>>
>>
>>final static boolean isUnix;
>>
>> +
>>
>>static {
>>
>>  final String os = System.getProperty("os.name").toLowerCase();
>>
>>  if (os.contains("win")) {
>>
>>isUnix = false;
>>
>>  } else {
>>
>> +  //Unix
>>
>>isUnix = true;
>>
>>  }
>>
>>}
>>
>> @@ -84,9 +87,16 @@
>>
>>  final File absFile = new File(".").getAbsoluteFile();
>>
>>  System.out.println("ES working directory is '"+absFile+"'");
>>
>>  pb.directory(absFile);
>>
>> +Map envs = pb.environment();
>>
>> +if (System.getenv("JAVA_HOME")!= null) {
>>
>> +  envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
>>
>> +} else {
>>
>> +  throw new Exception("Missing JAVA_HOME as a system environment
>> variable");
>>
>> +}
>>
>>
>>
>> +
>>
>>  if (isUnix) {
>>
>> -  pb.command("bash", "-c",
>> "../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch -q
>> -Expack.ml.enabled=false");
>>
>> +  pb.command("bash", "-c",
>> "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
>> -q -Expack.ml.enabled=false");
>>
>>System.out.println("Unix process");
>>
>>  } else {
>>
>>pb.command("cmd.exe", "/c",
>> "..\\test-materials\\windows\\elasticsearch-7.6.2\\bin\\elasticsearch.bat
>> -q -Expack.ml.enabled=false");
>>
>> @@ -93,10 +103,13 @@
>>
>>System.out.println("Windows process");
>>
>>  }
>>
>>
>>
>> +
>>
>> +
>>
>>  File log = new File("es.log");
>>
>>  pb.redirectErrorStream(true);
>>
>>  pb.redirectOutput(ProcessBuilder.Redirect.appendTo(log));
>>
>>  esTestProcess = pb.start();
>>
>> +
>>
>>  System.out.println("ElasticSearch is starting...&qu

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
Ok, the path change will break the Ant test.  The maven test seems to have
the current directory set connectors/elasticsearch during testing; the ant
test explicitly sets it to one of the build directories below that.  But in
any case I will need to consider how the test can be changed to use a
specific ES source directory; maybe a -D can be pushed into it.

Karl

On Sat, Sep 12, 2020 at 3:30 PM Michael Cizmar 
wrote:

> Two in BaseITHSQLDB's setupElasticSearch method
>
> First to set the Java_HOME
> Map envs = pb.environment();
> if (System.getenv("JAVA_HOME")!= null) {
> envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
> } else {
> throw new Exception("Missing JAVA_HOME as a system environment variable");
> }
>
> The second removing the double dot
> if (isUnix) {
> pb.command("bash", "-c",
> "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
> -q -Expack.ml.enabled=false");
> System.out.println("Unix process");
> } else {
> pb.command("cmd.exe", "/c", "..\\test-materials\\windows\\
> elasticsearch-7.6.2\\bin\\elasticsearch.bat -q -Expack.ml.enabled=false");
> System.out.println("Windows process");
> }
>
>
> ===
>
> ---
>
> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
> (revision
> 1881665)
>
> +++
>
> connector/src/test/java/org/apache/manifoldcf/agents/output/elasticsearch/tests/BaseITHSQLDB.java
> (working
> copy)
>
> @@ -32,6 +32,7 @@
>
>  import org.apache.http.util.EntityUtils;
>
>  import org.apache.http.impl.client.HttpClients;
>
>  import java.io.IOException;
>
> +import java.util.Map;
>
>  import java.io.File;
>
>
>
>
>
> @@ -44,11 +45,13 @@
>
>  {
>
>
>
>final static boolean isUnix;
>
> +
>
>static {
>
>  final String os = System.getProperty("os.name").toLowerCase();
>
>  if (os.contains("win")) {
>
>isUnix = false;
>
>  } else {
>
> +  //Unix
>
>isUnix = true;
>
>  }
>
>}
>
> @@ -84,9 +87,16 @@
>
>  final File absFile = new File(".").getAbsoluteFile();
>
>  System.out.println("ES working directory is '"+absFile+"'");
>
>  pb.directory(absFile);
>
> +Map envs = pb.environment();
>
> +if (System.getenv("JAVA_HOME")!= null) {
>
> +  envs.put("JAVA_HOME",System.getenv("JAVA_HOME"));
>
> +} else {
>
> +  throw new Exception("Missing JAVA_HOME as a system environment
> variable");
>
> +}
>
>
>
> +
>
>  if (isUnix) {
>
> -  pb.command("bash", "-c",
> "../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch -q
> -Expack.ml.enabled=false");
>
> +  pb.command("bash", "-c",
> "./test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch
> -q -Expack.ml.enabled=false");
>
>System.out.println("Unix process");
>
>  } else {
>
>pb.command("cmd.exe", "/c",
> "..\\test-materials\\windows\\elasticsearch-7.6.2\\bin\\elasticsearch.bat
> -q -Expack.ml.enabled=false");
>
> @@ -93,10 +103,13 @@
>
>System.out.println("Windows process");
>
>  }
>
>
>
> +
>
> +
>
>  File log = new File("es.log");
>
>  pb.redirectErrorStream(true);
>
>  pb.redirectOutput(ProcessBuilder.Redirect.appendTo(log));
>
>  esTestProcess = pb.start();
>
> +
>
>  System.out.println("ElasticSearch is starting...");
>
>  //the default port is 9200
>
>
>
>
>
> On Sat, Sep 12, 2020 at 2:19 PM Karl Wright  wrote:
>
> > What changes did you make?
> > Karl
> >
> >
> > On Sat, Sep 12, 2020 at 2:41 PM Michael Cizmar <
> mich...@michaelcizmar.com>
> > wrote:
> >
> > > Didn't reach ES; waiting...
> > > Didn't reach ES; waiting...
> > > Didn't reach ES; waiting...
> > > Response from ES: HTTP/1.1 200 OK
> > > ES came up!
> > > ElasticSearch is started on port 9200
> > >
> > > @Karl - How would you like this packaged up?
> > >
> > > On Sat, Sep 12, 2020 at 12:52 PM Michael Cizmar <
> > mich...@michaelcizmar.com
> > > >
> > > wrote:
> > >
> > > > Agreed.  We want to limit this unpacking becau

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
What changes did you make?
Karl


On Sat, Sep 12, 2020 at 2:41 PM Michael Cizmar 
wrote:

> Didn't reach ES; waiting...
> Didn't reach ES; waiting...
> Didn't reach ES; waiting...
> Response from ES: HTTP/1.1 200 OK
> ES came up!
> ElasticSearch is started on port 9200
>
> @Karl - How would you like this packaged up?
>
> On Sat, Sep 12, 2020 at 12:52 PM Michael Cizmar  >
> wrote:
>
> > Agreed.  We want to limit this unpacking because elastic packages them
> > differently.  I started down the rabbit hole of making a macOS download
> but
> > then got into permission issues and started issuing chmod.
> >
> > When I get back from lunch I am going to just set the JDK of the process
> > to be the system environment variable and I think that will fix the
> problem.
> >
> > On Sat, Sep 12, 2020 at 12:37 PM Karl Wright  wrote:
> >
> >> Hi Michael,
> >>
> >>
> >>
> >> JAVA_HOME is not usually a requirement for Maven building but it's not
> >>
> >> unreasonable to have it, especially since maven itself looks for it.
> >>
> >>
> >>
> >> I suspect that, in order for the Maven build to work, you currently need
> >> to
> >>
> >> do this:
> >>
> >> - Set JAVA_HOME
> >>
> >> - Run the ant build first
> >>
> >>
> >>
> >> That's a little pain in the butt but we can fix this going forward - at
> >>
> >> least the unpacking part.
> >>
> >> Karl
> >>
> >>
> >>
> >>
> >>
> >> On Sat, Sep 12, 2020 at 1:33 PM Michael Cizmar <
> mich...@michaelcizmar.com
> >> >
> >>
> >> wrote:
> >>
> >>
> >>
> >> > I figured it out.  Just working through it.  There was a path issue.
> >> When
> >>
> >> > I start the process it's looking for a JAVA_HOME.  I've got that now
> >> set to
> >>
> >> > the JDK that comes with elastic.   The download of elastic that ant is
> >>
> >> > doing is specific for linx Linux so that's failing.
> >>
> >> >
> >>
> >> > Does the build require JAVA_HOME to be set?   The machine I'm working
> on
> >>
> >> > does not have that set.
> >>
> >> >
> >>
> >> > On Sat, Sep 12, 2020 at 12:27 PM Karl Wright 
> >> wrote:
> >>
> >> >
> >>
> >> > > The ant build unpacks the ES binary and puts in the place needed to
> >> run.
> >>
> >> > > My guess is that we need to add similar unpacking to the maven pom.
> >> I'll
> >>
> >> > > see if there is a way to do that.
> >>
> >> > > Karl
> >>
> >> > >
> >>
> >> > >
> >>
> >> > > On Sat, Sep 12, 2020 at 12:00 PM Michael Cizmar <
> >>
> >> > mich...@michaelcizmar.com
> >>
> >> > > >
> >>
> >> > > wrote:
> >>
> >> > >
> >>
> >> > > > I think you are right.  The es.log file contains the following:
> >>
> >> > > >
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > > file or directory
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > > file or directory
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > > file or directory
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > > file or directory
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > > file or directory
> >>
> >> > > > bash:
> ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch:
> >> No
> >>
> >> > > such
> >>
> >> > > &g

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
Hi Michael,

JAVA_HOME is not usually a requirement for Maven building but it's not
unreasonable to have it, especially since maven itself looks for it.

I suspect that, in order for the Maven build to work, you currently need to
do this:
- Set JAVA_HOME
- Run the ant build first

That's a little pain in the butt but we can fix this going forward - at
least the unpacking part.
Karl


On Sat, Sep 12, 2020 at 1:33 PM Michael Cizmar 
wrote:

> I figured it out.  Just working through it.  There was a path issue.  When
> I start the process it's looking for a JAVA_HOME.  I've got that now set to
> the JDK that comes with elastic.   The download of elastic that ant is
> doing is specific for linx Linux so that's failing.
>
> Does the build require JAVA_HOME to be set?   The machine I'm working on
> does not have that set.
>
> On Sat, Sep 12, 2020 at 12:27 PM Karl Wright  wrote:
>
> > The ant build unpacks the ES binary and puts in the place needed to run.
> > My guess is that we need to add similar unpacking to the maven pom.  I'll
> > see if there is a way to do that.
> > Karl
> >
> >
> > On Sat, Sep 12, 2020 at 12:00 PM Michael Cizmar <
> mich...@michaelcizmar.com
> > >
> > wrote:
> >
> > > I think you are right.  The es.log file contains the following:
> > >
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > > bash: ../test-materials/unix/elasticsearch-7.6.2/bin/elasticsearch: No
> > such
> > > file or directory
> > >
> > > On Sat, Sep 12, 2020 at 10:49 AM Karl Wright 
> wrote:
> > >
> > > > Ok, I didn't realize this was from Maven only.  It may be a working
> > > > directory or dependency version issue.
> > > >
> > > >
> > > > On Sat, Sep 12, 2020 at 11:37 AM Michael Cizmar <
> > > mich...@michaelcizmar.com
> > > > >
> > > > wrote:
> > > >
> > > > > I reproduced the infinite Elastic Search Loop when building from
> > maven.
> > > > > I'm investigating it now.
> > > > >
> > > > > On Sat, Sep 12, 2020 at 9:30 AM Michael Cizmar <
> > > > mich...@michaelcizmar.com>
> > > > > wrote:
> > > > >
> > > > > > I'll take a look at the build this AM.
> > > > > >
> > > > > > On Sat, Sep 12, 2020 at 5:13 AM Karl Wright 
> > > > wrote:
> > > > > >
> > > > > >> Hi all,
> > > > > >>
> > > > > >> I don't have a Mac, so if we can't figure out how to get ES to
> > start
> > > > > >> properly on a Mac, I have no ability to debug it myself.  I
> would
> > be
> > > > > >> forced
> > > > > >> to recommend we just disable the test - or continue with the
> vote,
> > > > since
> > > > > >> nothing has changed and we released it this way for the last
> > release
> > > > in
> > > > > >> April.  We have people waiting for the Postgresql updates.
> > > > > >>
> > > > > >> Karl
> > > > > >>
> > > > > >>
> > > > > >> On Wed, Sep 9, 2020 at 11:54 AM Cihad Guzel 
> > > > wrote:
> > > > > >>
> > > > > >> > Michael,
> > > > > >> >
> > > > > >> > I have log lines repeating like as follow:
> > > > > >> >
> > > > > >> > ---
> > > > > >> >  T E S T S
> > > > > >> > ---
> > > > > >> > Running
> > > > > >> >
> > > > > >>
> > > > >
> > >
> org.apache.manifoldcf.agents.output.elasticsearch.tests.APISanityHSQLDBIT
> > > > > >&g

Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-12 Thread Karl Wright
Ok, I didn't realize this was from Maven only.  It may be a working
directory or dependency version issue.


On Sat, Sep 12, 2020 at 11:37 AM Michael Cizmar 
wrote:

> I reproduced the infinite Elastic Search Loop when building from maven.
> I'm investigating it now.
>
> On Sat, Sep 12, 2020 at 9:30 AM Michael Cizmar 
> wrote:
>
> > I'll take a look at the build this AM.
> >
> > On Sat, Sep 12, 2020 at 5:13 AM Karl Wright  wrote:
> >
> >> Hi all,
> >>
> >> I don't have a Mac, so if we can't figure out how to get ES to start
> >> properly on a Mac, I have no ability to debug it myself.  I would be
> >> forced
> >> to recommend we just disable the test - or continue with the vote, since
> >> nothing has changed and we released it this way for the last release in
> >> April.  We have people waiting for the Postgresql updates.
> >>
> >> Karl
> >>
> >>
> >> On Wed, Sep 9, 2020 at 11:54 AM Cihad Guzel  wrote:
> >>
> >> > Michael,
> >> >
> >> > I have log lines repeating like as follow:
> >> >
> >> > ---
> >> >  T E S T S
> >> > ---
> >> > Running
> >> >
> >>
> org.apache.manifoldcf.agents.output.elasticsearch.tests.APISanityHSQLDBIT
> >> > Configuration file successfully read
> >> > [main] INFO org.eclipse.jetty.util.log - Logging initialized @7246ms
> >> > [main] INFO org.eclipse.jetty.server.Server - jetty-9.2.3.v20140905
> >> > [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started
> >> > o.e.j.w.WebAppContext@2ad48653
> >> >
> >> >
> >>
> {/mcf-crawler-ui,file:/private/var/folders/gw/4lgs06cd065d09gnm6ythcp8gn/T/jetty-0.0.0.0-8346-mcf-crawler-ui.war-_mcf-crawler-ui-any-5325495261063795321.dir/webapp/,AVAILABLE}{../dependency/mcf-crawler-ui.war}
> >> > [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started
> >> > o.e.j.w.WebAppContext@6bb4dd34
> >> >
> >> >
> >>
> {/mcf-authority-service,file:/private/var/folders/gw/4lgs06cd065d09gnm6ythcp8gn/T/jetty-0.0.0.0-8346-mcf-authority-service.war-_mcf-authority-service-any-1339291969162319913.dir/webapp/,AVAILABLE}{../dependency/mcf-authority-service.war}
> >> > [main] INFO org.eclipse.jetty.server.handler.ContextHandler - Started
> >> > o.e.j.w.WebAppContext@7d9f158f
> >> >
> >> >
> >>
> {/mcf-api-service,file:/private/var/folders/gw/4lgs06cd065d09gnm6ythcp8gn/T/jetty-0.0.0.0-8346-mcf-api-service.war-_mcf-api-service-any-2172701003493912621.dir/webapp/,AVAILABLE}{../dependency/mcf-api-service.war}
> >> > [main] INFO org.eclipse.jetty.server.ServerConnector - Started
> >> > ServerConnector@2796aeae{HTTP/1.1}{0.0.0.0:8346}
> >> > [main] INFO org.eclipse.jetty.server.Server - Started @10899ms
> >> > ES working directory is
> >> >
> >> >
> >>
> '/Users/cguzel/Projects/apache/svn/release-2.17-RC1/connectors/elasticsearch/target/test-output/.'
> >> > Unix process
> >> > ElasticSearch is starting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> > Didn't reach ES; waiting...
> >> >
> >> > Cihad
> >> >
> >> >
> >> > Michael Cizmar , 9 Eyl 2020 Çar, 12:40
> >> > tarihinde
> >> > şunu yazdı:
> >> >
> >> > > Cihad,
> >> > >
> >> > > What was the error that you received when compiling in maven?
> >> > >
> >> > > On Wed, Sep 9, 2020 at 3:39 AM Cihad Guzel 
> wrote:
> >> > >
> >> > > > Hi Karl,
> >> > > >
> >> > > > I have successfully compiled using ant build. I tried to compile
> the
> >> > > tag[1]
> >> > > > using maven, but the ElasticSearch tests still fail using Mac [2]
> .
> >> > > > <https://issues.apache.org/jira/browse/CONNECTORS-1651>
> >> > > >
> >> > > > 

[jira] [Comment Edited] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-09-08 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192363#comment-17192363
 ] 

Karl Wright edited comment on CONNECTORS-1648 at 9/8/20, 5:59 PM:
--

Ok, we tested through 12.2, but this has not been released yet.  Commit was in 
May of this year.

See:

https://issues.apache.org/jira/browse/CONNECTORS-1642

There is a release candidate waiting and you can download it if you want:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17





was (Author: kwri...@metacarta.com):
Ok, we tested through 12.2.  If Postgres has chosen to get rid of more stuff 
since then you're on your own to find what they did and find a workaround.  We 
can't keep up with them at the moment.

See:

https://issues.apache.org/jira/browse/CONNECTORS-1642

The date on that ticket is *May* of this year, and they've gone through two 
releases since then and broken things in doing so.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-09-08 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17192363#comment-17192363
 ] 

Karl Wright commented on CONNECTORS-1648:
-

Ok, we tested through 12.2.  If Postgres has chosen to get rid of more stuff 
since then you're on your own to find what they did and find a workaround.  We 
can't keep up with them at the moment.

See:

https://issues.apache.org/jira/browse/CONNECTORS-1642

The date on that ticket is *May* of this year, and they've gone through two 
releases since then and broken things in doing so.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-08 Thread Karl Wright
Tests pass.  +1 from me.

Looking for a few other voters?

Karl


On Sat, Sep 5, 2020 at 6:32 AM Karl Wright  wrote:

> Please vote on whether to release Apache ManifoldCF 2.17, RC1.  The
> release artifact can be found here:
>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17
>
> There is also a release tag at:
>
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.17-RC1
>
> This release does not contain anything major - just a few bug fixes,
> summarized in the CHANGES.txt file.  It does include documentation,
> however, which did not get successfully built for the 2.16 release.  Please
> review carefully with that in mind.
>
> The respin was required because the ElasticSearch test did not properly
> work on the Mac.
>
> Thanks!
> Karl
>
>
>


[VOTE] Release Apache ManifoldCF 2.17, RC1

2020-09-05 Thread Karl Wright
Please vote on whether to release Apache ManifoldCF 2.17, RC1.  The release
artifact can be found here:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17

There is also a release tag at:

https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.17-RC1

This release does not contain anything major - just a few bug fixes,
summarized in the CHANGES.txt file.  It does include documentation,
however, which did not get successfully built for the 2.16 release.  Please
review carefully with that in mind.

The respin was required because the ElasticSearch test did not properly
work on the Mac.

Thanks!
Karl


[CANCEL] [VOTE] Release Apache ManifoldCF 2.17, RC0

2020-09-05 Thread Karl Wright
Canceled due to issues running ES tests on the Mac.  Fixed now for RC1.

On Mon, Aug 31, 2020 at 12:08 PM Karl Wright  wrote:

> Thanks!
>
> Now what?  Should we proceed with the voting?
>
> Karl
>
>
> On Mon, Aug 31, 2020 at 8:46 AM Piergiorgio Lucidi 
> wrote:
>
>> Created a new ticket for this to investigate and solve this issue:
>> https://issues.apache.org/jira/browse/CONNECTORS-1651
>>
>> PJ
>>
>> Il giorno dom 30 ago 2020 alle ore 18:04 Michael Cizmar <
>> mich...@michaelcizmar.com> ha scritto:
>>
>> > From what I see it does not appear that the tests are failing.  The
>> Elastic
>> > search container is not starting.  I agree with Karl.
>> >
>> > On Sat, Aug 29, 2020 at 9:24 AM Karl Wright  wrote:
>> >
>> > > Hmm, the way it starts the process is the same on Windows and Linux.
>> The
>> > > version of ES we download for the test is the Linux distribution, so
>> I am
>> > > surprised that it's not actually working on Linux.  Maybe the
>> environment
>> > > variables are incorrect for that?
>> > >
>> > > Since it passes on Windows, I think this should not be a blocker.
>> But we
>> > > should open a ticket and investigate the issue.  Would you like to
>> create
>> > > that ticket?
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Sat, Aug 29, 2020 at 10:08 AM Piergiorgio Lucidi <
>> > > piergior...@apache.org>
>> > > wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > -1 from me, it seems that the ElasticSearch tests are failing using
>> Mac
>> > > on
>> > > > both JDK 8 and JDK 11:
>> > > >
>> > > > [INFO] -< org.apache.manifoldcf:mcf-elasticsearch-connector
>> > > > > >--
>> > > > > [INFO] Building ManifoldCF - Connectors - ElasticSearch 2.17
>> > > > >  [39/64]
>> > > > > [INFO] [ jar
>> > > > > ]-
>> > > > > [INFO]
>> > > > > [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @
>> > > > > mcf-elasticsearch-connector ---
>> > > > > [INFO]
>> > > > > [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
>> > > > > mcf-elasticsearch-connector ---
>> > > > > [INFO]
>> > > > > [INFO] --- maven-dependency-plugin:2.8:copy (copy-war) @
>> > > > > mcf-elasticsearch-connector ---
>> > > > > [INFO] Configured Artifact:
>> > > > org.apache.manifoldcf:mcf-api-service:2.17:war
>> > > > > [INFO] Configured Artifact:
>> > > > > org.apache.manifoldcf:mcf-authority-service:2.17:war
>> > > > > [INFO] Configured Artifact:
>> > > org.apache.manifoldcf:mcf-crawler-ui:2.17:war
>> > > > > [INFO] Copying mcf-api-service-2.17.war to
>> > > > >
>> > > >
>> > >
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-api-service.war
>> > > > > [INFO] Copying mcf-authority-service-2.17.war to
>> > > > >
>> > > >
>> > >
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-authority-service.war
>> > > > > [INFO] Copying mcf-crawler-ui-2.17.war to
>> > > > >
>> > > >
>> > >
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-crawler-ui.war
>> > > > > [INFO]
>> > > > > [INFO] --- maven-resources-plugin:2.5:resources
>> (default-resources) @
>> > > > > mcf-elasticsearch-connector ---
>> > > > > [debug] execute contextualize
>> > > > > [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> > > > > [INFO] Copying 5 resources
>> > > > > [INFO] Copying 4 resources
>> > > > > [INFO] Copying 3 resources
>> > > > > [INFO]
>> > > > > [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @
>> > > > > mcf-elasticsearch-connector ---
>> > > > > [INFO] Compiling 8 source files to
>> > > > 

[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-09-04 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17190764#comment-17190764
 ] 

Karl Wright commented on CONNECTORS-1648:
-

Hi,

This was addressed in MCF release 2.16.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1651) ElasticSearch server is not starting during integration test

2020-09-02 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189184#comment-17189184
 ] 

Karl Wright commented on CONNECTORS-1651:
-

I committed this to trunk and also pulled it up to the release branch.  Should 
we spin another RC?  Or should the vote continue?


> ElasticSearch server is not starting during integration test
> 
>
> Key: CONNECTORS-1651
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1651
> Project: ManifoldCF
>  Issue Type: Bug
>  Components: Elastic Search connector
>Reporter: Piergiorgio Lucidi
>Assignee: Piergiorgio Lucidi
>Priority: Major
>
> Trying to run the test suite on my Mac (with JDK 1.8 but also with JDK 11), 
> ElasticSearch server is not starting properly:
> {noformat}
> [INFO] -< org.apache.manifoldcf:mcf-elasticsearch-connector 
> >--
> [INFO] Building ManifoldCF - Connectors - ElasticSearch 2.17            
> [39/64]
> [INFO] [ jar 
> ]-
> [INFO]
> [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ 
> mcf-elasticsearch-connector ---
> [INFO]
> [INFO] --- maven-remote-resources-plugin:1.5:process (default) @ 
> mcf-elasticsearch-connector ---
> [INFO]
> [INFO] --- maven-dependency-plugin:2.8:copy (copy-war) @ 
> mcf-elasticsearch-connector ---
> [INFO] Configured Artifact: org.apache.manifoldcf:mcf-api-service:2.17:war
> [INFO] Configured Artifact: 
> org.apache.manifoldcf:mcf-authority-service:2.17:war
> [INFO] Configured Artifact: org.apache.manifoldcf:mcf-crawler-ui:2.17:war
> [INFO] Copying mcf-api-service-2.17.war to 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-api-service.war
> [INFO] Copying mcf-authority-service-2.17.war to 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-authority-service.war
> [INFO] Copying mcf-crawler-ui-2.17.war to 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-crawler-ui.war
> [INFO]
> [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @ 
> mcf-elasticsearch-connector ---
> [debug] execute contextualize
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] Copying 5 resources
> [INFO] Copying 4 resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @ 
> mcf-elasticsearch-connector ---
> [INFO] Compiling 8 source files to 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes
> [INFO]
> [INFO] --- native2ascii-maven-plugin:1.0-beta-1:native2ascii 
> (native2ascii-utf8) @ mcf-elasticsearch-connector ---
> [INFO] Includes: [**/*.properties]
> [INFO] Excludes: []
> [INFO] Processing 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_en_US.properties
> [INFO] Processing 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_ja_JP.properties
> [INFO] Processing 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_zh_CN.properties
> [INFO] Processing 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_fr_FR.properties
> [INFO] Processing 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_es_ES.properties
> [INFO]
> [INFO] --- maven-resources-plugin:2.5:testResources (default-testResources) @ 
> mcf-elasticsearch-connector ---
> [debug] execute contextualize
> [INFO] Using 'UTF-8' encoding to copy filtered resources.
> [INFO] skip non existing resourceDirectory 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/connector/src/test/resources
> [INFO] Copying 3 resources
> [INFO]
> [INFO] --- maven-compiler-plugin:2.3.2:testCompile (default-testCompile) @ 
> mcf-elasticsearch-connector ---
> [INFO] Compiling 6 source files to 
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/test-classes
> [INFO]
> [INFO] --- maven-surefi

Re: [VOTE] Release Apache ManifoldCF 2.17, RC0

2020-08-31 Thread Karl Wright
Thanks!

Now what?  Should we proceed with the voting?

Karl


On Mon, Aug 31, 2020 at 8:46 AM Piergiorgio Lucidi 
wrote:

> Created a new ticket for this to investigate and solve this issue:
> https://issues.apache.org/jira/browse/CONNECTORS-1651
>
> PJ
>
> Il giorno dom 30 ago 2020 alle ore 18:04 Michael Cizmar <
> mich...@michaelcizmar.com> ha scritto:
>
> > From what I see it does not appear that the tests are failing.  The
> Elastic
> > search container is not starting.  I agree with Karl.
> >
> > On Sat, Aug 29, 2020 at 9:24 AM Karl Wright  wrote:
> >
> > > Hmm, the way it starts the process is the same on Windows and Linux.
> The
> > > version of ES we download for the test is the Linux distribution, so I
> am
> > > surprised that it's not actually working on Linux.  Maybe the
> environment
> > > variables are incorrect for that?
> > >
> > > Since it passes on Windows, I think this should not be a blocker.  But
> we
> > > should open a ticket and investigate the issue.  Would you like to
> create
> > > that ticket?
> > >
> > > Karl
> > >
> > >
> > > On Sat, Aug 29, 2020 at 10:08 AM Piergiorgio Lucidi <
> > > piergior...@apache.org>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > -1 from me, it seems that the ElasticSearch tests are failing using
> Mac
> > > on
> > > > both JDK 8 and JDK 11:
> > > >
> > > > [INFO] -< org.apache.manifoldcf:mcf-elasticsearch-connector
> > > > > >--
> > > > > [INFO] Building ManifoldCF - Connectors - ElasticSearch 2.17
> > > > >  [39/64]
> > > > > [INFO] [ jar
> > > > > ]-
> > > > > [INFO]
> > > > > [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @
> > > > > mcf-elasticsearch-connector ---
> > > > > [INFO]
> > > > > [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
> > > > > mcf-elasticsearch-connector ---
> > > > > [INFO]
> > > > > [INFO] --- maven-dependency-plugin:2.8:copy (copy-war) @
> > > > > mcf-elasticsearch-connector ---
> > > > > [INFO] Configured Artifact:
> > > > org.apache.manifoldcf:mcf-api-service:2.17:war
> > > > > [INFO] Configured Artifact:
> > > > > org.apache.manifoldcf:mcf-authority-service:2.17:war
> > > > > [INFO] Configured Artifact:
> > > org.apache.manifoldcf:mcf-crawler-ui:2.17:war
> > > > > [INFO] Copying mcf-api-service-2.17.war to
> > > > >
> > > >
> > >
> >
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-api-service.war
> > > > > [INFO] Copying mcf-authority-service-2.17.war to
> > > > >
> > > >
> > >
> >
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-authority-service.war
> > > > > [INFO] Copying mcf-crawler-ui-2.17.war to
> > > > >
> > > >
> > >
> >
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-crawler-ui.war
> > > > > [INFO]
> > > > > [INFO] --- maven-resources-plugin:2.5:resources
> (default-resources) @
> > > > > mcf-elasticsearch-connector ---
> > > > > [debug] execute contextualize
> > > > > [INFO] Using 'UTF-8' encoding to copy filtered resources.
> > > > > [INFO] Copying 5 resources
> > > > > [INFO] Copying 4 resources
> > > > > [INFO] Copying 3 resources
> > > > > [INFO]
> > > > > [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @
> > > > > mcf-elasticsearch-connector ---
> > > > > [INFO] Compiling 8 source files to
> > > > >
> > > >
> > >
> >
> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes
> > > > > [INFO]
> > > > > [INFO] --- native2ascii-maven-plugin:1.0-beta-1:native2ascii
> > > > > (native2ascii-utf8) @ mcf-elasticsearch-connector ---
> > > > > [INFO] Includes: [**/*.properties]
> > > > > [INFO] Excludes: []
> > > > > [INFO] Processing
> > > > >
> > > >
> > >

Re: [VOTE] Release Apache ManifoldCF 2.17, RC0

2020-08-29 Thread Karl Wright
Looking at the log, it appears that the ES process just didn't start for
some reason on your Mac.  This is going to require somebody with a Mac to
diagnose.
Karl

On Sat, Aug 29, 2020 at 10:24 AM Karl Wright  wrote:

> Hmm, the way it starts the process is the same on Windows and Linux.  The
> version of ES we download for the test is the Linux distribution, so I am
> surprised that it's not actually working on Linux.  Maybe the environment
> variables are incorrect for that?
>
> Since it passes on Windows, I think this should not be a blocker.  But we
> should open a ticket and investigate the issue.  Would you like to create
> that ticket?
>
> Karl
>
>
> On Sat, Aug 29, 2020 at 10:08 AM Piergiorgio Lucidi <
> piergior...@apache.org> wrote:
>
>> Hi,
>>
>> -1 from me, it seems that the ElasticSearch tests are failing using Mac on
>> both JDK 8 and JDK 11:
>>
>> [INFO] -< org.apache.manifoldcf:mcf-elasticsearch-connector
>> > >--
>> > [INFO] Building ManifoldCF - Connectors - ElasticSearch 2.17
>> >  [39/64]
>> > [INFO] [ jar
>> > ]-
>> > [INFO]
>> > [INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @
>> > mcf-elasticsearch-connector ---
>> > [INFO]
>> > [INFO] --- maven-remote-resources-plugin:1.5:process (default) @
>> > mcf-elasticsearch-connector ---
>> > [INFO]
>> > [INFO] --- maven-dependency-plugin:2.8:copy (copy-war) @
>> > mcf-elasticsearch-connector ---
>> > [INFO] Configured Artifact:
>> org.apache.manifoldcf:mcf-api-service:2.17:war
>> > [INFO] Configured Artifact:
>> > org.apache.manifoldcf:mcf-authority-service:2.17:war
>> > [INFO] Configured Artifact:
>> org.apache.manifoldcf:mcf-crawler-ui:2.17:war
>> > [INFO] Copying mcf-api-service-2.17.war to
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-api-service.war
>> > [INFO] Copying mcf-authority-service-2.17.war to
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-authority-service.war
>> > [INFO] Copying mcf-crawler-ui-2.17.war to
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/dependency/mcf-crawler-ui.war
>> > [INFO]
>> > [INFO] --- maven-resources-plugin:2.5:resources (default-resources) @
>> > mcf-elasticsearch-connector ---
>> > [debug] execute contextualize
>> > [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> > [INFO] Copying 5 resources
>> > [INFO] Copying 4 resources
>> > [INFO] Copying 3 resources
>> > [INFO]
>> > [INFO] --- maven-compiler-plugin:2.3.2:compile (default-compile) @
>> > mcf-elasticsearch-connector ---
>> > [INFO] Compiling 8 source files to
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes
>> > [INFO]
>> > [INFO] --- native2ascii-maven-plugin:1.0-beta-1:native2ascii
>> > (native2ascii-utf8) @ mcf-elasticsearch-connector ---
>> > [INFO] Includes: [**/*.properties]
>> > [INFO] Excludes: []
>> > [INFO] Processing
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_en_US.properties
>> > [INFO] Processing
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_ja_JP.properties
>> > [INFO] Processing
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_zh_CN.properties
>> > [INFO] Processing
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_fr_FR.properties
>> > [INFO] Processing
>> >
>> /Users/piergiorgiolucidi/Downloads/apache-manifoldcf-2.17/connectors/elasticsearch/target/classes/org/apache/manifoldcf/agents/output/elasticsearch/common_es_ES.properties
>> > [INFO]
>> > [INFO] --- maven-resources-plugin:2.5:testResources
>> > (default-testResources) @ mcf-elasticsearch-connector ---
>> > [debug] execute contextualize
>> > [INFO] Using 'UTF-8' encoding to copy filtered resources.
>> > [INF

Re: [VOTE] Release Apache ManifoldCF 2.17, RC0

2020-08-28 Thread Karl Wright
Tests pass and documentation seems OK. +1 from me.

Karl


On Tue, Aug 25, 2020 at 8:03 AM Karl Wright  wrote:

> Please vote on whether to release Apache ManifoldCF 2.17, RC0.  The
> release artifact can be found here:
>
> https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17
>
> There is also a release tag at:
>
> https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.17-RC0
>
> This release does not contain anything major - just a few bug fixes,
> summarized in the CHANGES.txt file.  It does include documentation,
> however, which did not get successfully built for the 2.16 release.  Please
> review carefully with that in mind.
>
> Karl
>
>
>
>


[VOTE] Release Apache ManifoldCF 2.17, RC0

2020-08-25 Thread Karl Wright
Please vote on whether to release Apache ManifoldCF 2.17, RC0.  The release
artifact can be found here:

https://dist.apache.org/repos/dist/dev/manifoldcf/apache-manifoldcf-2.17

There is also a release tag at:

https://svn.apache.org/repos/asf/manifoldcf/tags/release-2.17-RC0

This release does not contain anything major - just a few bug fixes,
summarized in the CHANGES.txt file.  It does include documentation,
however, which did not get successfully built for the 2.16 release.  Please
review carefully with that in mind.

Karl


[jira] [Commented] (CONNECTORS-1650) Convert README.txt to README.md

2020-08-25 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183896#comment-17183896
 ] 

Karl Wright commented on CONNECTORS-1650:
-

The README is usually not the right place to be putting build status since then 
every build has a new svn rev associated with it, and that would trigger a new 
build, etc etc.  There are other places in the manifoldcf svn project we can 
put this status I think.


> Convert README.txt to README.md
> ---
>
> Key: CONNECTORS-1650
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1650
> Project: ManifoldCF
>  Issue Type: Improvement
>  Components: Documentation
>Reporter: Kishore Kumar
>Priority: Minor
>
> I would like to rename the README.txt to README.md and have the content 
> re-formatted for Markdown.
> Pros: We can embed the Jenkin Job Build Status to our README.md to let users 
> know about current status. The below is the live status of our ManifoldCF-ant 
> job.
> [!https://ci-builds.apache.org/buildStatus/icon?job=ManifoldCF%2FManifoldCF-ant!|https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-ant/]
> If agreed, I will push a commit this week with the changes.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Job interrupted

2020-08-24 Thread Karl Wright
Ok, I found the 'hard fail' situation.  Here is a patch to fix it:

Index:
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
===
---
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
 (revision 1881006)
+++
connectors/jcifs/connector/src/main/java/org/apache/manifoldcf/crawler/connectors/sharedrive/SharedDriveConnector.java
 (working copy)
@@ -1349,7 +1349,7 @@
   Logging.connectors.warn("JCIFS: 'File in Use' response when
"+activity+" for "+documentIdentifier+": retrying...",se);
   // 'File in Use' skip the document and keep going
   throw new ServiceInterruption("Timeout or other service
interruption: "+se.getMessage(),se,currentTime + 30L,
-currentTime + 3 * 60 * 6L,-1,true);
+currentTime + 3 * 60 * 6L,-1,false);
 }
 else if (se.getMessage().indexOf("cannot find") != -1 ||
se.getMessage().indexOf("cannot be found") != -1)
 {

I'll commit to trunk as well.
Karl

On Mon, Aug 24, 2020 at 9:19 AM Karl Wright  wrote:

> Ok, then let me examine the code and see why it's not catching it.
> Karl
>
>
> On Mon, Aug 24, 2020 at 8:49 AM Bisonti Mario 
> wrote:
>
>> Yes, I see only that exception inside the manifoldcf.log and the job
>> stops with:
>>
>>
>>
>>
>>
>> Error: Repeated service interruptions - failure processing document: The
>> process cannot access the file because it is being used by another process.
>>
>>
>>
>>
>>
>> *Da:* Karl Wright 
>> *Inviato:* lunedì 24 agosto 2020 12:27
>> *A:* user@manifoldcf.apache.org
>> *Oggetto:* Re: Job interrupted
>>
>>
>>
>> Well, we look for certain kinds of exceptions from JCIFS and allow the
>> job to continue if we can't succeed.  You have to be sure though that the
>> failure was from *that* exception.  The reason I point that out is because
>> we have already a check for that, I believe.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Mon, Aug 24, 2020 at 5:55 AM Bisonti Mario 
>> wrote:
>>
>> Yes, but after I obtain:
>>
>>
>>
>> Error: Repeated service interruptions - failure processing document: The
>> process cannot access the file because it is being used by another process.
>>
>>
>>
>> And the job stops
>>
>>
>>
>>
>>
>> *Da:* Karl Wright 
>> *Inviato:* lunedì 24 agosto 2020 11:52
>> *A:* user@manifoldcf.apache.org
>> *Oggetto:* Re: Job interrupted
>>
>>
>>
>> Hi,
>> That's a warning.  The job will keep running and the document will be
>> retried later.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
>> wrote:
>>
>> Hallo.
>>
>> I have some problems about job interrupted.
>>
>> The job execute a windows share scan
>>
>>
>>
>> After many errors, sometimes it stops
>>
>>
>>
>> I see in the manifoldcf.log many errors:
>>
>>
>>
>>
>>
>> at
>> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
>> [mcf-jcifs-connector.jar:?]
>>
>> at
>> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
>> [mcf-pull-agent.jar:?]
>>
>> WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use'
>> response when getting document version for smb://
>> fileserver.net/Workgroups/Dir/Dir2/finename.xlsx
>> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7Cd726636fb2744bb0882c08d848185962%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=lvpKI2hFeY40s4vgbQViO%2BfxXQBivrz4CFD3kHNKy2Q%3D=0>:
>> retrying...
>>
>> jcifs.smb.SmbException: The process cannot access the file because it is
>> being used by another process.
>>
>> at
>> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
>> ~[jcifs-ng-2.1.2.jar:?]
>>
>> at
>> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
>> ~[jcifs-ng-2.1.2.jar:?]
>>
>> at
>> jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
>> ~[jcifs-ng-2.1.2.jar:?]
>>
&g

Re: Job interrupted

2020-08-24 Thread Karl Wright
Ok, then let me examine the code and see why it's not catching it.
Karl


On Mon, Aug 24, 2020 at 8:49 AM Bisonti Mario 
wrote:

> Yes, I see only that exception inside the manifoldcf.log and the job stops
> with:
>
>
>
>
>
> Error: Repeated service interruptions - failure processing document: The
> process cannot access the file because it is being used by another process.
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* lunedì 24 agosto 2020 12:27
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job interrupted
>
>
>
> Well, we look for certain kinds of exceptions from JCIFS and allow the job
> to continue if we can't succeed.  You have to be sure though that the
> failure was from *that* exception.  The reason I point that out is because
> we have already a check for that, I believe.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 24, 2020 at 5:55 AM Bisonti Mario 
> wrote:
>
> Yes, but after I obtain:
>
>
>
> Error: Repeated service interruptions - failure processing document: The
> process cannot access the file because it is being used by another process.
>
>
>
> And the job stops
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* lunedì 24 agosto 2020 11:52
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job interrupted
>
>
>
> Hi,
> That's a warning.  The job will keep running and the document will be
> retried later.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
> wrote:
>
> Hallo.
>
> I have some problems about job interrupted.
>
> The job execute a windows share scan
>
>
>
> After many errors, sometimes it stops
>
>
>
> I see in the manifoldcf.log many errors:
>
>
>
>
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use'
> response when getting document version for smb://
> fileserver.net/Workgroups/Dir/Dir2/finename.xlsx
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7Cd726636fb2744bb0882c08d848185962%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=lvpKI2hFeY40s4vgbQViO%2BfxXQBivrz4CFD3kHNKy2Q%3D=0>:
> retrying...
>
> jcifs.smb.SmbException: The process cannot access the file because it is
> being used by another process.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.queryPath(SmbFile.java:763)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.exists(SmbFile.java:844)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Timeout or other
> service interruption: The process cannot access the file because it is
> being used by another process.
>
>
>
>
>
> What  could I check?
>
>
>
> Thanks a lot
>
> Mario
>
>


Re: Job interrupted

2020-08-24 Thread Karl Wright
Well, we look for certain kinds of exceptions from JCIFS and allow the job
to continue if we can't succeed.  You have to be sure though that the
failure was from *that* exception.  The reason I point that out is because
we have already a check for that, I believe.

Karl


On Mon, Aug 24, 2020 at 5:55 AM Bisonti Mario 
wrote:

> Yes, but after I obtain:
>
>
>
> Error: Repeated service interruptions - failure processing document: The
> process cannot access the file because it is being used by another process.
>
>
>
> And the job stops
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* lunedì 24 agosto 2020 11:52
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: Job interrupted
>
>
>
> Hi,
> That's a warning.  The job will keep running and the document will be
> retried later.
>
>
>
> Karl
>
>
>
>
>
> On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
> wrote:
>
> Hallo.
>
> I have some problems about job interrupted.
>
> The job execute a windows share scan
>
>
>
> After many errors, sometimes it stops
>
>
>
> I see in the manifoldcf.log many errors:
>
>
>
>
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use'
> response when getting document version for smb://
> fileserver.net/Workgroups/Dir/Dir2/finename.xlsx
> <https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffileserver.net%2FWorkgroups%2FDir%2FDir2%2Ffinename.xlsx=01%7C01%7CMario.Bisonti%40vimar.com%7Ca26fd37fa4af4fe8b96708d848135dc1%7Ca1f008bcd59b4c668f8760fd9af15c7f%7C1=iMDk%2FqLW6FLe3gPsqwKVba6OFJw7HZd5XoRTUQGH7tg%3D=0>:
> retrying...
>
> jcifs.smb.SmbException: The process cannot access the file because it is
> being used by another process.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.queryPath(SmbFile.java:763)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.exists(SmbFile.java:844)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Timeout or other
> service interruption: The process cannot access the file because it is
> being used by another process.
>
>
>
>
>
> What  could I check?
>
>
>
> Thanks a lot
>
> Mario
>
>


[jira] [Commented] (CONNECTORS-1649) Move to new CI

2020-08-24 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17183141#comment-17183141
 ] 

Karl Wright commented on CONNECTORS-1649:
-

I don't recall ever requesting or adding plugins.  The original automation was 
not built by me, though.  Probably safe to remove it.


> Move to new CI
> --
>
> Key: CONNECTORS-1649
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1649
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Kishore Kumar
>Assignee: Kishore Kumar
>Priority: Major
>
> As per instruction 
> [here|https://cwiki.apache.org/confluence/display/INFRA/Migrating+jobs+from+Jenkins+to+Cloudbees],
>  we are migrating ManifoldCF jobs to Cloudbees



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Job interrupted

2020-08-24 Thread Karl Wright
Hi,
That's a warning.  The job will keep running and the document will be
retried later.

Karl


On Mon, Aug 24, 2020 at 5:24 AM Bisonti Mario 
wrote:

> Hallo.
>
> I have some problems about job interrupted.
>
> The job execute a windows share scan
>
>
>
> After many errors, sometimes it stops
>
>
>
> I see in the manifoldcf.log many errors:
>
>
>
>
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,501 (Worker thread '59') - JCIFS: 'File in Use'
> response when getting document version for smb://
> fileserver.net/Workgroups/Dir/Dir2/finename.xlsx: retrying...
>
> jcifs.smb.SmbException: The process cannot access the file because it is
> being used by another process.
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus2(SmbTransportImpl.java:1441)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> jcifs.smb.SmbTransportImpl.checkStatus(SmbTransportImpl.java:1552)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.sendrecv(SmbTransportImpl.java:1007)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTransportImpl.send(SmbTransportImpl.java:1523)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbSessionImpl.send(SmbSessionImpl.java:409)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeImpl.send(SmbTreeImpl.java:472)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send0(SmbTreeConnection.java:399)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:314)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeConnection.send(SmbTreeConnection.java:294)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:130)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbTreeHandleImpl.send(SmbTreeHandleImpl.java:117)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1747)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1716)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.withOpen(SmbFile.java:1710)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.queryPath(SmbFile.java:763)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at jcifs.smb.SmbFile.exists(SmbFile.java:844)
> ~[jcifs-ng-2.1.2.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.fileExists(SharedDriveConnector.java:2188)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector.processDocuments(SharedDriveConnector.java:610)
> [mcf-jcifs-connector.jar:?]
>
> at
> org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399)
> [mcf-pull-agent.jar:?]
>
> WARN 2020-08-24T11:17:25,502 (Worker thread '59') - Service interruption
> reported for job 1533797717712 connection 'WinShare': Timeout or other
> service interruption: The process cannot access the file because it is
> being used by another process.
>
>
>
>
>
> What  could I check?
>
>
>
> Thanks a lot
>
> Mario
>
>


Re: Build failed in Jenkins: ManifoldCF » ManifoldCF-Artifacts-Ant-JDK11 #2

2020-08-20 Thread Karl Wright
The cause of the failure is a 403 error when downloading the mongo_db 2.2.0
jar.  Looks like the people who run sonatype have blocked whatever server
IPs apache is using for this.

Karl


On Wed, Aug 19, 2020 at 9:39 PM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/2/display/redirect?page=changes
> >
>
> Changes:
>
> [Karl Wright] Prepare trunk for 2.18 development
>
> [Karl Wright] Tie off release 2.17
>
>
> --
> [...truncated 1.29 MB...]
>
> obfuscation-utility:
>
> preclean-obfuscate-processes:
>[delete] Deleting: <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/framework/dist/obfuscation-utility/options.env.win
> >
>[delete] Deleting: <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/framework/dist/obfuscation-utility/options.env.unix
> >
>
> scripts-common:
>
> scripts-obfuscate:
>  [copy] Copying 2 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/framework/dist/obfuscation-utility
> >
>
> compile-core:
>
> jar-core:
>
> compile-ui-core:
>
> jar-ui-core:
>
> compile-agents:
>
> jar-agents:
>
> compile-pull-agent:
>
> jar-pull-agent:
>
> compile-jetty-runner:
>
> jar-jetty-runner:
>
> compile-script-engine:
>
> jar-script-engine:
>
> lib:
>
> obfuscate-lib-classpath:
>
> setup-obfuscate-processes:
>
> general-set-obfuscate-classpath:
>
> file-resources:
>
> buildfiles:
>
> compile-core-tests:
>
> jar-core-tests:
>
> compile-agents-tests:
>
> jar-agents-tests:
>
> compile-pull-agent-tests:
>
> jar-pull-agent-tests:
>
> compile-script-engine-tests:
>
> jar-script-engine-tests:
>
> jar-tests:
>
> test-lib:
>
> build:
>
> deliver-framework:
>  [copy] Copying 2 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/web
> >
>  [copy] Copying 4 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/web-proprietary
> >
>  [copy] Copying 3 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/lib-proprietary
> >
>  [copy] Copying 7 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/multiprocess-file-example
> >
>  [copy] Copying 7 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/multiprocess-file-example-proprietary
> >
>  [copy] Copying 9 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/multiprocess-zk-example
> >
>  [copy] Copying 9 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/multiprocess-zk-example-proprietary
> >
>  [copy] Copying 5 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/example
> >
>  [copy] Copying 6 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/example-proprietary
> >
>  [copy] Copying 2 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/script-engine
> >
>  [copy] Copying 2 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist/obfuscation-utility
> >
>  [copy] Copying 2 files to <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/dist
> >
>
> download-connectors-dependencies:
>
> download-dependencies:
>
> download-dependencies:
>
> setup-maven-url:
>
> download-via-maven:
>   [get] Getting:
> https://repo1.maven.org/maven2/com/github/maoo/indexer/alfresco-indexer-webscripts-war/0.8.1/alfresco-indexer-webscripts-war-0.8.1.war
>   [get] To: <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/connectors/alfresco-webscript/test-materials-proprietary/alfresco-indexer-webscripts-war-0.8.1.war
> >
>
> download-alfresco-ws-client:
>   [get] Getting:
> https://artifacts.alfresco.com/nexus/service/local/repositories/releases/content/org/alfresco/alfresco-web-service-client/4.2.c/alfresco-web-service-client-4.2.c.jar
>   [get] To: <
> https://ci-builds.apache.org/job/ManifoldCF/job/ManifoldCF-Artifacts-Ant-JDK11/ws/connectors/

Re: How to reset job status

2020-08-19 Thread Karl Wright
So Mario,

First it appears that you mysteriously cannot build where everyone else
can.  Now you are having mysterious problems with ManifoldCF being able to
do basic state transitions.  I'm unable to reproduce any of these things.
More worrisome, you seem to have the opinion that rather than fix
underlying deployment or infrastructure issues, the right solution is just
to hack away at the database or the code.

This doesn't work for me.

I'd like to help you out here but there's a basic level of cooperation
needed for that.  The way you do deployments in ManifoldCF that we know
will be successful is by starting with one of the distribution examples and
(if needed) modifying that to meet your individual needs.  If you are
having bizarre things take place, almost always it's because you didn't
start with one of the examples and therefore you wound up configuring
things in a bizarre way.  So if you cannot get past your current problem, I
STRONGLY recommend you start over:

- Checkout a new copy of trunk and build following the instructions I gave
in the other email thread.  Follow them to the letter please.
- Pick your deployment model.
- Point it at your database instance.
- Start it USING THE SCRIPTS PROVIDED.

Your problems should resolve.  If not, you should have logging in
manifoldcf.log telling you what is going wrong.

Karl


On Wed, Aug 19, 2020 at 6:40 AM Karl Wright  wrote:

> You do not see log output.  Therefore I need to ask you some questions.
>
> What deployment model are you using?  single process or multi-process?
> what is the synchronization method?
>
>
> On Wed, Aug 19, 2020 at 6:38 AM Karl Wright  wrote:
>
>> Usually when you shut down the agents process (or the whole thing) and
>> restart it will fix problems like that UNLESS the problem persists because
>> a step in the state flow is failing.  If it is failing you would see log
>> output.  Do you see log output?
>>
>> Karl
>>
>>
>> On Wed, Aug 19, 2020 at 5:40 AM Bisonti Mario 
>> wrote:
>>
>>> No, I haven’t a notification connector, buti it isn’t the problem.
>>>
>>> Manifoldcf.log is empty
>>>
>>>
>>>
>>> The problemi s that job is on hanging state and I would like to reset
>>> its state
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *Da:* Karl Wright 
>>> *Inviato:* mercoledì 19 agosto 2020 11:31
>>> *A:* user@manifoldcf.apache.org
>>> *Oggetto:* Re: How to reset job status
>>>
>>>
>>>
>>> There should be output in your manifoldcf.log file, no?  This may be the
>>> result of you not having a notification connector's code actually
>>> registered so you get no class found errors.  The only solution is to put
>>> the missing jar in place and restart your agents process.  Have a look at
>>> the log to confirm.
>>>
>>>
>>>
>>> Karl
>>>
>>>
>>>
>>>
>>>
>>> On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
>>> wrote:
>>>
>>> Hallo
>>>
>>> I have a job in a status “End notification” that hangs on this state.
>>>
>>>
>>>
>>> Is there a way to reset it?
>>>
>>>
>>>
>>> I tried the script lock-clean.sh without effect.
>>>
>>>
>>>
>>> In thise state I am not able to manage jobs.
>>>
>>>
>>>
>>>
>>> What could I try, please?
>>>
>>>
>>>
>>>
>>> Thanks a lot
>>>
>>> Mario
>>>
>>>


Re: How to reset job status

2020-08-19 Thread Karl Wright
You do not see log output.  Therefore I need to ask you some questions.

What deployment model are you using?  single process or multi-process?
what is the synchronization method?


On Wed, Aug 19, 2020 at 6:38 AM Karl Wright  wrote:

> Usually when you shut down the agents process (or the whole thing) and
> restart it will fix problems like that UNLESS the problem persists because
> a step in the state flow is failing.  If it is failing you would see log
> output.  Do you see log output?
>
> Karl
>
>
> On Wed, Aug 19, 2020 at 5:40 AM Bisonti Mario 
> wrote:
>
>> No, I haven’t a notification connector, buti it isn’t the problem.
>>
>> Manifoldcf.log is empty
>>
>>
>>
>> The problemi s that job is on hanging state and I would like to reset its
>> state
>>
>>
>>
>>
>>
>>
>>
>> *Da:* Karl Wright 
>> *Inviato:* mercoledì 19 agosto 2020 11:31
>> *A:* user@manifoldcf.apache.org
>> *Oggetto:* Re: How to reset job status
>>
>>
>>
>> There should be output in your manifoldcf.log file, no?  This may be the
>> result of you not having a notification connector's code actually
>> registered so you get no class found errors.  The only solution is to put
>> the missing jar in place and restart your agents process.  Have a look at
>> the log to confirm.
>>
>>
>>
>> Karl
>>
>>
>>
>>
>>
>> On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
>> wrote:
>>
>> Hallo
>>
>> I have a job in a status “End notification” that hangs on this state.
>>
>>
>>
>> Is there a way to reset it?
>>
>>
>>
>> I tried the script lock-clean.sh without effect.
>>
>>
>>
>> In thise state I am not able to manage jobs.
>>
>>
>>
>>
>> What could I try, please?
>>
>>
>>
>>
>> Thanks a lot
>>
>> Mario
>>
>>


Re: How to reset job status

2020-08-19 Thread Karl Wright
Usually when you shut down the agents process (or the whole thing) and
restart it will fix problems like that UNLESS the problem persists because
a step in the state flow is failing.  If it is failing you would see log
output.  Do you see log output?

Karl


On Wed, Aug 19, 2020 at 5:40 AM Bisonti Mario 
wrote:

> No, I haven’t a notification connector, buti it isn’t the problem.
>
> Manifoldcf.log is empty
>
>
>
> The problemi s that job is on hanging state and I would like to reset its
> state
>
>
>
>
>
>
>
> *Da:* Karl Wright 
> *Inviato:* mercoledì 19 agosto 2020 11:31
> *A:* user@manifoldcf.apache.org
> *Oggetto:* Re: How to reset job status
>
>
>
> There should be output in your manifoldcf.log file, no?  This may be the
> result of you not having a notification connector's code actually
> registered so you get no class found errors.  The only solution is to put
> the missing jar in place and restart your agents process.  Have a look at
> the log to confirm.
>
>
>
> Karl
>
>
>
>
>
> On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
> wrote:
>
> Hallo
>
> I have a job in a status “End notification” that hangs on this state.
>
>
>
> Is there a way to reset it?
>
>
>
> I tried the script lock-clean.sh without effect.
>
>
>
> In thise state I am not able to manage jobs.
>
>
>
>
> What could I try, please?
>
>
>
>
> Thanks a lot
>
> Mario
>
>


Re: How to reset job status

2020-08-19 Thread Karl Wright
There should be output in your manifoldcf.log file, no?  This may be the
result of you not having a notification connector's code actually
registered so you get no class found errors.  The only solution is to put
the missing jar in place and restart your agents process.  Have a look at
the log to confirm.

Karl


On Wed, Aug 19, 2020 at 4:56 AM Bisonti Mario 
wrote:

> Hallo
>
> I have a job in a status “End notification” that hangs on this state.
>
>
>
> Is there a way to reset it?
>
>
>
> I tried the script lock-clean.sh without effect.
>
>
>
> In thise state I am not able to manage jobs.
>
>
>
>
> What could I try, please?
>
>
>
>
> Thanks a lot
>
> Mario
>
>


[jira] [Commented] (CONNECTORS-1649) Move to new CI

2020-08-17 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17179107#comment-17179107
 ] 

Karl Wright commented on CONNECTORS-1649:
-

The MCF dev group just received email indicating failure of all jobs.
It looks like the issue is an incompatibility between the version of ant being 
used and the version of Java being used.  But we see the same exact thing with 
Maven.  Any ideas?



> Move to new CI
> --
>
> Key: CONNECTORS-1649
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1649
> Project: ManifoldCF
>  Issue Type: Task
>Reporter: Kishore Kumar
>Assignee: Kishore Kumar
>Priority: Major
>
> As per instruction 
> [here|https://cwiki.apache.org/confluence/display/INFRA/Migrating+jobs+from+Jenkins+to+Cloudbees],
>  we are migrating ManifoldCF jobs to Cloudbees



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Fwd: Jenkins instance builds.a.o being shut down 22nd August.

2020-08-15 Thread Karl Wright
Volunteers needed - doesn't look hard, but somebody needs to contact Infra
to request a ManifoldCF folder it sounds like first.  Anyone got time
before next weekend?  I'm tapped out until very late in the coming week.

Karl


-- Forwarded message -
From: Gavin McDonald 
Date: Sat, Aug 15, 2020 at 8:24 AM
Subject: Jenkins instance builds.a.o being shut down 22nd August.
To: Apache Infrastructure 


Hi All PMCs,

As per the many notifications on the builds@a.o mailing list over the last
few weeks; and the big yellow banner notification on builds.a.o instance
itself, Jenkins is due to be turned off.

The original date for turning off the service was 15th August - *Today!* -
However some projects are yet to make the migration to the new server(s)
(either ci-builds.a.o for the majority or ci-hadoop.a.o for Hadoop and
related projects.)

The deadline has - for one time only - been extended to *22nd August* - so
you have 1 more week only to move your jobs.

After that, builds.a.o WILL be turned off for good and the DNS will
redirect to ci-builds.a.o.

Today I will be moving over most of the remaining agents to the new
masters, leaving just a minimal few for those that still need to migrate.

Job config xml files will be saved. However you can always grab them
yourselves at any time.

See the handy script [1] that can help you migrate all jobs at once - take
only 5 minutes! - After which tweaks may need to be done, or in most cases
not.

[1] -
https://cwiki.apache.org/confluence/display/INFRA/Migrating+Jenkins+jobs+from+Jenkins+to+Cloudbees


Please feel free to forward this message to your dev communities. Replies
should go to users@infra.a.o or new questions should go to builds@a.o


Release time again

2020-08-11 Thread Karl Wright
Hi all,
August 31 is our next scheduled release date.  The release should probably
go out because the documentation in the last release did not properly build
(since corrected, but no new artifacts have been pushed).  Other than that,
I can think of nothing that hints at a new release - the mailing lists have
been quite slow, and there are three issues in the release, that is all.

Given that, I propose doing a patch release, not a full release.  Any
objections?

Karl


Re: Document Splitter

2020-07-08 Thread Karl Wright
Hi all,
Julien is correct; all documents must originate in the document
repository.  You can create document components this way, but they're all
subsidiaries of the principle document, so really the framework only tracks
the principle document in that case.

So you have a choice: either use the component approach, or have each row
be a full document in its own right.

>From what I see, the component approach would be the best one.

Karl


On Wed, Jul 8, 2020 at 1:25 PM Michael Cizmar 
wrote:

> Good point, I was thinking that I could do a:
> return activities.sendDocument(documentURI,docCopy);
>
> For each row of the XML or JSON.
>
>
>
> 
> From: julien.massi...@francelabs.com 
> Sent: Wednesday, July 8, 2020 9:45 AM
> To: dev@manifoldcf.apache.org 
> Subject: RE: Document Splitter
>
> Hi Michael,
>
> if I am not wrong (and that Karl confirms), what you want to do is not
> possible in a transformation connector. A transformation connector cannot
> transform 1 incoming document into several ones. The only way to do that is
> in a repository connector but it would then be bound to the type of the
> repo source.
>
> Regards,
> Julien
>
> -Message d'origine-
> De : Karl Wright 
> Envoyé : mercredi 8 juillet 2020 16:16
> À : dev 
> Objet : Re: Document Splitter
>
> Not that I know of.  But I'll let others answer as to what they may have
> written.
> Karl
>
>
> On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
> wrote:
>
> > I have a Json file which has an array of objects that I want to index
> > as separate documents.  Before I build a transformer to split it, is
> > there a ready made transformer to do this?
> >
> > Thanks!
> >
> > Michael
> >
>
>


Re: Document Splitter

2020-07-08 Thread Karl Wright
Not that I know of.  But I'll let others answer as to what they may have
written.
Karl


On Tue, Jul 7, 2020 at 7:38 PM Michael Cizmar 
wrote:

> I have a Json file which has an array of objects that I want to index as
> separate documents.  Before I build a transformer to split it, is there a
> ready made transformer to do this?
>
> Thanks!
>
> Michael
>


[jira] [Resolved] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-07-07 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright resolved CONNECTORS-1648.
-
Fix Version/s: ManifoldCF 2.17
   Resolution: Fixed

r1879592 updates the docs


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
> Fix For: ManifoldCF 2.17
>
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-07-07 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152789#comment-17152789
 ] 

Karl Wright commented on CONNECTORS-1648:
-

Going back over my notes, we updated the code to work with Postgresql 12.1 not 
long ago.  So I will update the compatibility list in the doc.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-07-07 Thread Karl Wright (Jira)


[ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17152786#comment-17152786
 ] 

Karl Wright commented on CONNECTORS-1648:
-

Anyone with a successful deployment on newer versions is evidence that the 
newer versions are ok and we can update the doc.  Other than running the load 
tests, there is no other magic we do to verify a platform.


> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (CONNECTORS-1648) PostgreSQL 10,11 and 12 support

2020-07-07 Thread Karl Wright (Jira)


 [ 
https://issues.apache.org/jira/browse/CONNECTORS-1648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karl Wright reassigned CONNECTORS-1648:
---

Assignee: Karl Wright

> PostgreSQL 10,11 and 12 support
> ---
>
> Key: CONNECTORS-1648
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1648
> Project: ManifoldCF
>  Issue Type: Improvement
>Affects Versions: ManifoldCF 2.11, ManifoldCF 2.12, ManifoldCF 2.13, 
> ManifoldCF 2.14, ManifoldCF 2.15, ManifoldCF 2.16
>Reporter: DK
>Assignee: Karl Wright
>Priority: Major
>
> As per the current documentation, "ManifoldCF has been tested against version 
> 8.3.7, 8.4.5 and 9.1 of PostgreSQL. "
> 9.1 release date is Sep 12 , 2011 and EOL on Oct 27 2016 already.
> No support for newer versions.
> This is important to deploy ManifoldCF in an enterprise environment where 
> value added service such as HA, Backup and Recovery and Monitoring etc are 
> provided by third party vendors. These vendors do not support postgreSql 
> versions which are reaching end of life.
> Any plan to test and certify recent versions such as 10,11 and 12.3?
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: WebCrawler Connector code

2020-07-07 Thread Karl Wright
Hi Ritika,

You do not want to load the list of seeds on every document processing that
is done for performance reasons.  The connector API does not support
accessing arbitrary job data in part for this reason.

You should NEVER be calling JobManager methods from a connector either.
You have *Activity methods that you can call.

Karl


On Tue, Jul 7, 2020 at 4:04 AM ritika jain  wrote:

> Hi  Karl,
>
> Many thanks for your response.!!
>
> The problem I faced is to get Current JobID , so that's why I used the
> JobStatus class. another thing is to get the seeds corresponding to the
> running JOb ID.
>
> activities object is having value of job ID set in its constructor object.
> But no way  to get the value in WebCrawlerConnector.java as no getter is
> defined.
>
> Another thing is JobManager is having a function getAllSeeds which is
> defined in its interface class IJobManager, but not defined in its
> implementation class JobManager, so it is always returning an empty value.
>
> Thanks
>
>
> On Mon, Jul 6, 2020 at 6:44 PM Karl Wright  wrote:
>
>> Hi Ritika,
>>
>> ' My requirement is to abort a job whenever a seed-corresponding site is
>> down or returning some 5xx response codes. '
>>
>> (1) Connector methods, like addSeedDocuments(), are called by the
>> framework.  You do not call them yourself when you write a connector.  So
>> you are looking in the wrong place here.
>> (2) All that addSeedDocuments does in the web connector is add seed URLs
>> to the queue for the job.  You do not want to change this implementation.
>> (3) The only time the web connector fetches anything is when it is
>> processing documents, in the processDocuments() method.
>> (4) You don't get to control the queue.  Documents are processed by the
>> framework in the order *it* determines they should be processed.  You can
>> create an "event" which must be satisfied before processing can occur but
>> that is all the control you get at the connector level.
>> (5) Similarly, you don't get told which document URLs are seeds.  This
>> information is in the job, and it is included in the job queue "isSeed"
>> field for each document, but it is never sent to any connector method.
>>
>> It is therefore possible to add "isSeed" to the IRepositoryConnector
>> processDocuments() method, which will change the contract for all
>> connectors.  You might be able to prevent carnage by creating a
>> BaseRepositoryConnector method implementation and abstract method that
>> would provide a shim for most connectors.
>>
>> Karl
>>
>>
>>
>>
>>
>>
>> On Mon, Jul 6, 2020 at 8:52 AM ritika jain 
>> wrote:
>>
>>> Hi All,
>>>
>>> I have confusion regarding WebCrawler connector code.My requirement is
>>> to abort a job whenever a seed-corresponding site is down or returning some
>>> 5xx response codes.
>>> So I have used the jobManager errorAbort method for this
>>> in addSeedDocuments method of Webcrawlerconnector.java.., JobStatus class
>>> to get a Job ID.
>>>
>>> My confusion here is to get all seeds corresponding to corresponding job
>>> iD. So I used getAllSeeds() method declared in IJobManager Class.
>>>
>>> Query here is getAllSeeds method when used is returning a length zero
>>> array always.As I doubt this method is not having its corresponding
>>> definition in its implementation class.
>>> *Why this method has not been implemented in its Implementation class
>>> JobManager.*
>>>
>>> *Code done is:-*
>>> String[]
>>> array1=jobManager.getAllSeeds(Long.parseLong(jsr[k].getJobID()));
>>> array 1 is always returning empty array.
>>>
>>> *Also another query is *
>>> public String addSeedDocuments(ISeedingActivity activities,
>>> Specification spec,
>>> String lastSeedVersion, long seedTime, int jobMode)
>>> throws ManifoldCFException, ServiceInterruption
>>>
>>> activities object is having jobID of the job which is calling this
>>> addSeeds method, but the interface as well its implementation class is
>>> having no getter(java) method to get JobID in the method.(it is set in
>>> constructor only)
>>>
>>>
>>> Can anybody please guide me on this.
>>>
>>> Thanks
>>> Ritika
>>>
>>>
>>>
>>>


Re: WebCrawler Connector code

2020-07-06 Thread Karl Wright
Hi Ritika,

' My requirement is to abort a job whenever a seed-corresponding site is
down or returning some 5xx response codes. '

(1) Connector methods, like addSeedDocuments(), are called by the
framework.  You do not call them yourself when you write a connector.  So
you are looking in the wrong place here.
(2) All that addSeedDocuments does in the web connector is add seed URLs to
the queue for the job.  You do not want to change this implementation.
(3) The only time the web connector fetches anything is when it is
processing documents, in the processDocuments() method.
(4) You don't get to control the queue.  Documents are processed by the
framework in the order *it* determines they should be processed.  You can
create an "event" which must be satisfied before processing can occur but
that is all the control you get at the connector level.
(5) Similarly, you don't get told which document URLs are seeds.  This
information is in the job, and it is included in the job queue "isSeed"
field for each document, but it is never sent to any connector method.

It is therefore possible to add "isSeed" to the IRepositoryConnector
processDocuments() method, which will change the contract for all
connectors.  You might be able to prevent carnage by creating a
BaseRepositoryConnector method implementation and abstract method that
would provide a shim for most connectors.

Karl






On Mon, Jul 6, 2020 at 8:52 AM ritika jain  wrote:

> Hi All,
>
> I have confusion regarding WebCrawler connector code.My requirement is to
> abort a job whenever a seed-corresponding site is down or returning some
> 5xx response codes.
> So I have used the jobManager errorAbort method for this
> in addSeedDocuments method of Webcrawlerconnector.java.., JobStatus class
> to get a Job ID.
>
> My confusion here is to get all seeds corresponding to corresponding job
> iD. So I used getAllSeeds() method declared in IJobManager Class.
>
> Query here is getAllSeeds method when used is returning a length zero
> array always.As I doubt this method is not having its corresponding
> definition in its implementation class.
> *Why this method has not been implemented in its Implementation class
> JobManager.*
>
> *Code done is:-*
> String[]
> array1=jobManager.getAllSeeds(Long.parseLong(jsr[k].getJobID()));
> array 1 is always returning empty array.
>
> *Also another query is *
> public String addSeedDocuments(ISeedingActivity activities, Specification
> spec,
> String lastSeedVersion, long seedTime, int jobMode)
> throws ManifoldCFException, ServiceInterruption
>
> activities object is having jobID of the job which is calling this
> addSeeds method, but the interface as well its implementation class is
> having no getter(java) method to get JobID in the method.(it is set in
> constructor only)
>
>
> Can anybody please guide me on this.
>
> Thanks
> Ritika
>
>
>
>


Re: Unable to connect ADFS user through the ManifoldCF SharePoint connector.

2020-07-06 Thread Karl Wright
Hi Sagar,

You do not appear to be signed up for any ManifoldCF list so I am still
moderating your posts in.

ADFS is Kerberos.
Basically, because tickets are what are used to manage Kerberos access, and
the HTTP libraries we use integrate Kerberos as a sidecar file containing
the tickets, the process cannot be done via the UI at this time.

There is, however, already work done to support Kerberos in other
connectors.  I don't recall whether it was done for Sharepoint yet but I
doubt it; I think it was just the Solr connector that was done.

There is a dev list thread that describes how this works, and the
documentation (how-to-build-and-deploy) was also updated with this
information.  Unfortunately that doc hasn't gone live due to a formatting
problem which we did not detect.  But you can have a look at the
documentation source for that page under
https://svn.apache.org/repos/asf/manifoldcf/trunk/site/src/documentation/content/en_US
.

I believe you also need to add a single line to the connector to enable
Kerberos as well.  This is the diff for the Solr connector:

>>>>>>
+initializeKerberos();
+
 String location = "";
 if (webapp != null)
   location = "/" + webapp;
@@ -292,6 +300,21 @@
 solrServer = new ModifiedHttpSolrClient(httpSolrServerUrl,
localClient, new XMLResponseParser(), allowCompression);
   }

+  private static void initializeKerberos()
+  {
+
+final String loginConfig =
System.getProperty("java.security.auth.login.config");
+if (loginConfig != null && loginConfig.trim().length() > 0) {
+  if (Logging.ingest.isInfoEnabled()) {
+Logging.ingest.info("Using Kerberos for Solr Authentication");
+  }
+  Krb5HttpClientBuilder krbBuild = new Krb5HttpClientBuilder();
+  SolrHttpClientBuilder kb = krbBuild.getBuilder();
+  HttpClientUtil.setHttpClientBuilder(kb);
+}
+
+  }
+
<<<<<<

... along with appropriate imports.

I'm happy to commit any changes needed to make Sharepoint work this way, as
soon as you prove it works.

Thanks,
Karl


On Mon, Jul 6, 2020 at 7:59 AM Sagar Gole  wrote:

> I have subscribed to both lists, but I'm not sure what I missed. Anyway
> thanks for replying.
> My query was not clear, my apology. I'm explaining the situation again.
>
> We are using ManifoldCF for indexing SharePoint content into the SOLR.
> Earlier we were using the windows based authentication i.e. NTLM which was
> working fine for authority and repository connection.
> Now, our SharePoint authentication is moved from NTLM to ADFS and I cannot
> figure out what connection type I have to choose in ManifoldCF UI for
> repository and authory connection. Without any success, I tried
> SharePoint/Native but it doesn't work.
>
> It appears that there is no ADFS connector, that I can use so we are
> planning to add a connector to connect ADFS based SharePoint sites. But for
> this we need your guidance. We are new to the ManifoldCF, so we need your
> help and inputs on the design and implemenation.
> Requesting you to please help me.
>
> On 2020/07/06 09:17:33, Karl Wright  wrote:
> > I saw your question to the dev list earlier and had to moderate both of
> > these questions through because you haven't signed up for these lists.
> > 403 means your credentials and/or authentication method is incorrect.
> > Probably it means you are using kerberos for authentication rather than
> > NTLM.
> >
> > Karl
> >
> >
> > On Mon, Jul 6, 2020 at 2:30 AM Sagar Gole 
> wrote:
> >
> > > Trying to create authority connection in ManifoldCF 2.15 to the ADFS
> based
> > > SharePoint (2013) web application. But it is giving an error : Http 403
> > > error (forbidden).
> > >
> > > Can someone help me on this issue.
> > >
> > > Thanks.
> > >
> >
>


Re: Unable to connect ADFS user through the ManifoldCF SharePoint connector.

2020-07-06 Thread Karl Wright
I saw your question to the dev list earlier and had to moderate both of
these questions through because you haven't signed up for these lists.
403 means your credentials and/or authentication method is incorrect.
Probably it means you are using kerberos for authentication rather than
NTLM.

Karl


On Mon, Jul 6, 2020 at 2:30 AM Sagar Gole  wrote:

> Trying to create authority connection in ManifoldCF 2.15 to the ADFS based
> SharePoint (2013) web application. But it is giving an error : Http 403
> error (forbidden).
>
> Can someone help me on this issue.
>
> Thanks.
>


<    1   2   3   4   5   6   7   8   9   10   >