Re: Could an authority connection that is not working clash the search

2015-06-13 Thread Karl Wright
or anybody else have best practices setting the timeout. What's a good default? best -Rüdiger Zitat von Karl Wright daddy...@gmail.com: I believe you can set the socket timeout for the Solr plugin via plugin parameters. I suggest you try that to limit the damage you'd get if the request could

Re: Could an authority connection that is not working clash the search

2015-06-11 Thread Karl Wright
I believe you can set the socket timeout for the Solr plugin via plugin parameters. I suggest you try that to limit the damage you'd get if the request could not be completed. Thanks, Karl On Thu, Jun 11, 2015 at 9:13 AM, k...@vnc-online.de wrote: Hi there, we have the following situation

Re: MCF API Services

2015-06-09 Thread Karl Wright
History: repositoryconnectionhistory/ *encoded_connection_name* *Karl* On Tue, Jun 9, 2015 at 3:58 AM, Karl Wright daddy...@gmail.com wrote: Hi Smitha, (1) The passwords need to be obfuscated. Probably the safest thing to do is to obfuscate them by hand using the script provided

Re: MCF API Services

2015-06-09 Thread Karl Wright
/programmatic-operation.html But not sure how to form the URL. Appreciate your help very much. Thanks Regards, Smitha *From:* Karl Wright [mailto:daddy...@gmail.com] *Sent:* Tuesday, June 9, 2015 1:29 PM *To:* Smitha S *Subject:* Re: MCF API Services Hi Smitha, (1

Re: Job definition metadata with multiple path attribute names

2015-06-08 Thread Karl Wright
argument specifies the regular expression group number, with an optional suffix of l or u meaning upper-case or lower-case.)/p Karl On Mon, Jun 8, 2015 at 7:15 AM, Karl Wright daddy...@gmail.com wrote: Hi Vigi, bq. I think the easiest would be to be able to define multiple

Re: Job definition metadata with multiple path attribute names

2015-06-05 Thread Karl Wright
Hi Vigi, You get, for free, the file name of the document as metadata, from all repository connectors, including the jcifs connector: rd.setFileName(fileNameString); The problem is that this is not something you can manipulate in MCF via regular expression with the current

RE: Solr does not get ows__ModerationStatus

2015-06-05 Thread Karl Wright
Hi Dale, As I wrote in your ticket, I will need the actual XML back-and-forth with SharePoint in order to figure out what is going on. The best way to get that is to enable httpcomponents/httpclient wire debugging in logging.ini. There are instructions on how to do that online. Then, restart

Re: Tika Extraction to Solr: Field names

2015-06-02 Thread Karl Wright
Hi Will, ManifoldCF will post all fields that make it through the pipeline to Solr. If you don't want them all, you can use the Metadata Adjuster to remove some. So it's up to you. Karl On Tue, Jun 2, 2015 at 11:38 AM, Will Martin wmar...@synthostech.com wrote: We are getting a solr

RE: ManifoldCF upgradation to 2.0.2- DB Issues

2015-05-29 Thread Karl Wright
quickly; we do. Although jobs themselves are somewhat trickier than repos/auths/output …. ?? regards will martin *From:* Karl Wright [mailto:daddy...@gmail.com] *Sent:* Friday, May 29, 2015 1:44 AM *To:* Smitha S; user-h...@manifoldcf.apache.org; user@manifoldcf.apache.org *Subject:* RE

Re: Scheduled Job won't run

2015-05-28 Thread Karl Wright
Here is the problem: start_mode:manual You need to specify either begin at the start of the window, or begin inside the window. Karl On Thu, May 28, 2015 at 4:32 PM, Delapasse, Deanna ddelapa...@oceaneering.com wrote: I have created a job that runs fine manually. I attempted to add logic

RE: ManifoldCF upgradation to 2.0.2- DB Issues

2015-05-28 Thread Karl Wright
Hi Smitha, There is no automatic upgrade from mcf 1.x to mcf 2.x. I suggest that you upgrade instead to 1.9. Karl Sent from my Windows Phone -- From: Smitha S Sent: 5/28/2015 11:21 PM To: user-h...@manifoldcf.apache.org; user@manifoldcf.apache.org Cc: Karl Wright

Re: (Sharepoint 2010) Getting page language from its site

2015-05-27 Thread Karl Wright
If it's not part of the metadata for a document, I don't know where it would be. Usually SharePoint inherits metadata from parent site to children. That's what I would expect to see in this case too. Karl On Wed, May 27, 2015 at 5:10 AM, Salih Sen sa...@dilisim.com wrote: Hi everyone, We

Re: (Sharepoint 2010) Getting page language from its site

2015-05-27 Thread Karl Wright
...@dilisim.com wrote: By children do you mean only subsites or lists, documents and pages as well? Because in this case we need language field on pages and documents as well. On Wed, May 27, 2015 at 12:58 PM, Karl Wright daddy...@gmail.com wrote: If it's not part of the metadata for a document, I

Re: FW: Store file size in Solr

2015-05-27 Thread Karl Wright
Hi Vigi, Are you looking for the document length, or the extracted content length? In any case, the binary length of the document is available for indexing in the output connector, but none of our output connectors deal with it at this time. In addition, if you want the *original* binary

RE: Store file size in Solr

2015-05-27 Thread Karl Wright
What I mean is that this will need to be added as a new feature. If you would like to create a ticket that would be great. Karl Sent from my Windows Phone -- From: Virgiliu R Sent: 5/27/2015 9:02 AM To: user@manifoldcf.apache.org Subject: RE: Store file size in Solr

RE: Health Check of database connections

2015-05-22 Thread Karl Wright
anyways. Very likely I've missed something important, so please correct me, if my assumption is wrong. I'm new to ManifoldCF ;-) -Rüdiger [1] https://github.com/brettwooldridge/HikariCP On Thu, May 21, 2015 at 1:47 PM, Karl Wright daddy...@gmail.com wrote: I should also mention that I've done

Re: Renaming Connector Classes

2015-05-21 Thread Karl Wright
*Gesendet:* Mittwoch, 20. Mai 2015 um 15:25 Uhr *Von:* Karl Wright daddy...@gmail.com daddy...@gmail.com *An:* user@manifoldcf.apache.org user@manifoldcf.apache.org user@manifoldcf.apache.org user@manifoldcf.apache.org *Betreff:* Re: Renaming Connector Classes Hi Marcus, The name

Re: Health Check of database connections

2015-05-21 Thread Karl Wright
The test is: RSSFlakyHSQLDBIT It's one of the RSS connector tests, and simulates database interruption by way of a replacing the HSQLDB database instance with one that generates database errors when told to. Karl On Thu, May 21, 2015 at 6:02 AM, ruediger.k...@deutschebahn.com wrote: Hi

Re: Need to create a stop-webapps.sh

2015-05-21 Thread Karl Wright
/resources-stats-analyze-carrydown On Wed, May 20, 2015 at 8:23 PM, Karl Wright daddy...@gmail.com wrote: Hi Deanna, First of all, file-based synchronization is deprecated at this point, so we'd much prefer you use zookeeper. If you are using file-based synch, getting tomcat working is very

Re: Need to create a stop-webapps.sh

2015-05-21 Thread Karl Wright
at 10:16 AM, Karl Wright daddy...@gmail.com wrote: Hi Deanna, The scary messages are not scary at all; Zookeeper is very noisy and all the errors you are seeing are in fact INFO messages. So you are good. Karl On Thu, May 21, 2015 at 10:08 AM, Delapasse, Deanna ddelapa...@oceaneering.com

Re: Health Check of database connections

2015-05-20 Thread Karl Wright
Hi Markus, This is not a general problem with ManifoldCF, because we even have tests that exercise this functionality. Probably the issue is that some JDBC drivers are more resilient than others. I have not researched what the MySQL driver does in this case, but I wouldn't be surprised if the

Re: Renaming Connector Classes

2015-05-20 Thread Karl Wright
Hi Marcus, The name of the connector class is a key for the connection names that depended on that class. To rename a connection class, therefore, you need to do the following: (1) BEFORE renaming the class, delete all jobs and connections that refer to that connector. (2) UNREGISTER the

Re: Need to create a stop-webapps.sh

2015-05-20 Thread Karl Wright
Hi Deanna, First of all, file-based synchronization is deprecated at this point, so we'd much prefer you use zookeeper. If you are using file-based synch, getting tomcat working is very hard because *all* mcf processes need to be running as the same user. With zookeeper that is not necessary.

Re: dbname.data is huge

2015-05-19 Thread Karl Wright
FWIW, if you want to reset, just delete the hsqldb database files and start over. This happens when you ant clean as well. Karl On Tue, May 19, 2015 at 9:34 PM, Karl Wright daddy...@gmail.com wrote: Hi Deanna, HSQLDB is not great for production use for a number of reasons; it's also

Re: dbname.data is huge

2015-05-19 Thread Karl Wright
Hi Deanna, HSQLDB is not great for production use for a number of reasons; it's also unconstrained in memory consumption. Indexing 30 rows over and over should not create a huge table; I suspect that if you queried it you would find the number of rows to be tiny. The reason it gets big has to do

Re: Timeout and strange url in logfile (CMIS=ES)

2015-05-18 Thread Karl Wright
Hi Deanna, Do you have the curl utility installed? The request that is failing looks fine as far as I can tell. I would try submitting it with curl and debugging the ElasticSearch side, because clearly that's where something is going wrong. PUT

Re: ManifoldCF SharePoint/ActiveDirectory Authority Connection Issue

2015-05-14 Thread Karl Wright
Hi Daniel, We're not Active Directory experts here, but if you looked at the authority connector code and saw what it did, you can readily see what the capabilities and limitations are. There's no magic around that you haven't already found. ;-) The authority *must* look up the user from one of

Re: Question about obtaining metadata values via CMIS connector = ElasticSearch

2015-05-13 Thread Karl Wright
(WebScriptsAlfrescoClient.java:347) at org.apache.manifoldcf.crawler.connectors.alfrescowebscript.AlfrescoConnector.check(AlfrescoConnector.java:124) at org.apache.jsp.viewconnection_jsp._jspService(viewconnection_jsp.java:285) On Tue, May 12, 2015 at 9:04 AM, Karl Wright daddy

Re: Question about obtaining metadata values via CMIS connector = ElasticSearch

2015-05-12 Thread Karl Wright
I created CONNECTORS-1200 for the error handling issue in the check() method. Karl On Tue, May 12, 2015 at 4:32 AM, Karl Wright daddy...@gmail.com wrote: Hi Maurizio, The Jasper exception is due to the connection check throwing a RuntimeException or Error of some kind. Karl On Mon, May

Re: Question about obtaining metadata values via CMIS connector = ElasticSearch

2015-05-12 Thread Karl Wright
:15 AM, Karl Wright daddy...@gmail.com wrote: Hi Deanna, Here's what the ManifoldCF log says it is trying to do: DEBUG 2015-05-11 21:04:22,608 (qtp380224087-322) - http-outgoing-0 GET /alfresco/service/api/node/auth/resolve/admin HTTP/1.1[\r][\n] DEBUG 2015-05-11 21:04:22,608 (qtp380224087

Re: Question about obtaining metadata values via CMIS connector = ElasticSearch

2015-05-12 Thread Karl Wright
then I think we know what the issue is -- although I'm still unsure as to the proper way to fix it. Thanks, Karl On Tue, May 12, 2015 at 5:56 AM, Karl Wright daddy...@gmail.com wrote: I created CONNECTORS-1200 for the error handling issue in the check() method. Karl On Tue, May 12

Re: File system continuous crawl settings

2015-05-10 Thread Karl Wright
sábado, 9 de mayo de 2015, Karl Wright daddy...@gmail.com escribió: Hi Rafa, Two points. First, Allesandro's case is arguably insolvable by any mechanism that doesn't involve a sidecar process, because whether or not a job is running or mcf is even up the credential tokens must

RE: File system continuous crawl settings

2015-05-09 Thread Karl Wright
for the whole job in the getsession method and tried to trick it with class members variables as flags. Where was exactly the problem with the session management? Cheers, Rafa El sábado, 9 de mayo de 2015, Karl Wright daddy...@gmail.com escribió: Hi Timo, I've taken a deep look

Re: File system continuous crawl settings

2015-05-09 Thread Karl Wright
not solve your immediate problem, since I will be making other changes to the connector to bring it in line with ManifoldCF standards. Karl On Fri, May 8, 2015 at 8:01 PM, Karl Wright daddy...@gmail.com wrote: That error is what I was afraid of. We need the complete exception trace. Can

RE: File system continuous crawl settings

2015-05-09 Thread Karl Wright
that possibility. Should we consider to include that functionality? Some initializations can be expensive and it is not possible always to use a singleton. Thanks Karl! El sábado, 9 de mayo de 2015, Karl Wright daddy...@gmail.com escribió: Hi Rafa, The problem was twofold. As stated before

Re: File system continuous crawl settings

2015-05-08 Thread Karl Wright
I just tried your configuration here. A deleted document in the file system was indeed picked up as expected. I did notice that your expiration setting is, essentially, cleaning out documents at a rapid clip. With this setting, documents will be expired before they are recrawled. You probably

RE: File system continuous crawl settings

2015-05-08 Thread Karl Wright
thoughts? On May 8, 2015, at 6:18 AM, Karl Wright daddy...@gmail.com wrote: I just tried your configuration here. A deleted document in the file system was indeed picked up as expected. I did notice that your expiration setting is, essentially, cleaning out documents at a rapid clip

Re: File System output connector error

2015-05-07 Thread Karl Wright
Hi Andrea, The file system output connector was intended to emulate wget. Unfortunately, this has two major problems: (1) wget is a unix utility, so it obeys unix file rules, and (2) wget does not have any kind of formal specification, so whenever anyone finds something weird we need to research

[ANNOUNCE] Apache ManifoldCF 1.9 has been released!

2015-05-04 Thread Karl Wright
This release fixes a number of bugs and other problems and upgrades HttpClient and Solr support to version 5.1.0. This may be the last major release of ManifoldCF 1.x. Thanks to all who helped put it together! Karl

[ANNOUNCE] Apache ManifoldCF 2.1 has been released!

2015-05-04 Thread Karl Wright
Apache ManifoldCF 2.1 has been released! This release includes many of the same fixes and improvements found in the 1.9 release, with the addition of many new features, such as support for notification connectors and also a new SearchBlox repository connector. Thanks to all for pulling this

Re: Content filltering/exclusion with MCF

2015-04-30 Thread Karl Wright
I've created a ticket to continue the discussion about whether we want such a feature and if so what it should look like. CONNECTORS-1193. Karl On Wed, Apr 29, 2015 at 7:28 PM, Karl Wright daddy...@gmail.com wrote: Hi Arcadius, The key question is, how big do you expect the dictionary

Re: Content filltering/exclusion with MCF

2015-04-29 Thread Karl Wright
friendly messages such as The document you are looking for has expired with a 200 HTTP header instead of 404. How feasible would it be to exclude document from the index based on the content on the document? Thank you very much. Arcadius. On 28 April 2015 at 12:18, Karl Wright daddy

Re: Content filltering/exclusion with MCF

2015-04-29 Thread Karl Wright
from the index, we could exit on the first match i.e no need to match the whole dictionary. There is a pull-request for dealing with that https://github.com/robert-bor/aho-corasick/pull/14 Thanks. Arcadius. On 29 April 2015 at 22:50, Karl Wright daddy...@gmail.com wrote: Hi Arcadius

RE: Migrating from ManifoldCF 1.8.2 to 2.0.2

2015-04-19 Thread Karl Wright
Hi Arcadius, Since many duplicate features present in 1.8.2 are not there at all in 2.0.2, a general port is not possible. But is you can be more specific about your needs, it might be possible to write something specific for your case. I would look at using the rest API to dump and read back

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
Clearly the logs must have rolled then? Either that or you are using a broken jdk. Karl On Wed, Apr 15, 2015 at 7:37 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: On Wed, Apr 15, 2015 at 07:27:56AM -0400, Karl Wright wrote: Hi Kamil: kawright@duck76:/data/kawright/analysis$ gzip

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
1.8.0_45 Java(TM) SE Runtime Environment (build 1.8.0_45-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode) it's broken? I don't know. How can I prevend rolling backtrace? It's look like infinity loop for me. K On Wed, Apr 15, 2015 at 07:41:37AM -0400, Karl Wright wrote

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
, Kamil Żyta kamil.z...@pwr.edu.pl wrote: On Wed, Apr 15, 2015 at 11:16:44AM -0400, Karl Wright wrote: Hi Kamil, I bet that it is one specific file that was causing the problem. By increasing the stack space, you allowed the file to be processed. Now it won't get processed again until

Re: Metadata Adjuster transformer

2015-04-15 Thread Karl Wright
based on the mime type value in the core field? Timo On Apr 15, 2015, at 3:13 PM, Karl Wright daddy...@gmail.com wrote: Hi Timo, The metadata adjuster currently does not give you access to the core document fields, only to the document's general metadata. Basically, anything that ManifoldCF

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
) at java.util.regex.Pattern$Curly.match0(Pattern.java:4263) (...) ~1k lines for continuous job but agents is not exiting. Propably this two errors below isn't correlated (patterns and agents oom). K On Tue, Apr 14, 2015 at 05:28:18PM -0400, Karl Wright wrote: Without some kind of usable stack trace I can't

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
On Wed, Apr 15, 2015 at 6:41 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: these 1k lines are the same. I attached full manifoldcf.log. K On Wed, Apr 15, 2015 at 06:33:06AM -0400, Karl Wright wrote: Hi Kamil, There is a complete trace in there, believe me. The JVM did not say

Re: agents process ran out of memory

2015-04-15 Thread Karl Wright
:07AM -0400, Karl Wright wrote: Hi Kamil, kawright@duck76:~$ cd /data/kawright/analysis/ kawright@duck76:/data/kawright/analysis$ gunzip manifoldcf.log.gz gzip: manifoldcf.log.gz: invalid compressed data--crc error gzip: manifoldcf.log.gz: invalid compressed data--length error

Re: Error with CMIS connector for Alfresco

2015-04-07 Thread Karl Wright
Hi Timo, The key message is: org.apache.chemistry.opencmis.commons.exceptions.CmisRuntimeException: 03073325596 Request failed 500 /solr/alfresco/cmis?wt=json amp;fl=DBID%2Cscoreamp;rows=10amp;df=TEXTamp; start=0amp;locale=en_USamp;fq=%7B%21afts%7DAUTHORITY_

Re: Error with CMIS connector for Alfresco

2015-04-07 Thread Karl Wright
200 ·GET /alfresco/cmisatom?repositoryId= HTTP/1.1 200 ·POST /alfresco/cmisatom/xx/query HTTP/1.1” 500 The post seems to have ended with a 500 error. We will research further on Alfresco side. Thanks, Timo On Apr 7, 2015, at 1:09 PM, Karl

Re: MCF 2 and Solr Cloud 5

2015-04-02 Thread Karl Wright
Any luck figuring this out? Karl On Wed, Apr 1, 2015 at 1:01 PM, Karl Wright daddy...@gmail.com wrote: The button works fine. So the problem must be on the repository side. Karl On Wed, Apr 1, 2015 at 12:56 PM, Karl Wright daddy...@gmail.com wrote: If your simple history shows

Re: Setting up authentication for the REST interface ?

2015-04-01 Thread Karl Wright
The ticket is completed and will be released as part of both 1.9 and 2.1, due out at the end of the month. Thanks, Karl On Mon, Mar 30, 2015 at 8:51 PM, Karl Wright daddy...@gmail.com wrote: I've created CONNECTORS-1177 to track this issue. Offhand I think it is straightforward to add some

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
Hi Kamil, Solrj 5.0 changed massively from Solrj 4.x. The work to use Solrj 5.0 has been done on trunk. You will need to check out and build trunk in order to use Solr 5. Thanks, Karl On Wed, Apr 1, 2015 at 9:23 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: Hi, I set up solr 5 (Cloud) and

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
, Apr 01, 2015 at 09:37:47AM -0400, Karl Wright wrote: Hi Kamil, Solrj 5.0 changed massively from Solrj 4.x. The work to use Solrj 5.0 has been done on trunk. You will need to check out and build trunk in order to use Solr 5. Thanks, Karl On Wed, Apr 1, 2015 at 9:23 AM, Kamil

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
Also, don't forget that MCF is incremental. You should probably click the output connection's Reindex all documents button before trying again. Karl On Wed, Apr 1, 2015 at 10:15 AM, Karl Wright daddy...@gmail.com wrote: Hi Kamil, So you are still seeing a NullPointerException from

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
(CloudSolrClient.java:892) at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:795) ... 3 more K On Wed, Apr 01, 2015 at 10:15:13AM -0400, Karl Wright wrote: Hi Kamil, So you are still seeing a NullPointerException from

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
only start/access/stop activities. Access denied is normal in my setup. So how can I debug the problem? K On Wed, Apr 01, 2015 at 08:32:42AM -0700, Karl Wright wrote: Hi Kamil, Can you look at the simple history report, to verify whether manifoldcf is even attempting to post documents

RE: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
K On Wed, Apr 01, 2015 at 10:53:39AM -0400, Karl Wright wrote: When I put 'esci' as collection name I get a error. When I put 'collection1' I get 'Connection working' and no errors in logs but still no docs in solr. Hi Kamil, Do you get the exception when you use collection1

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
, 2015 at 10:27:50AM -0400, Karl Wright wrote: Hi Kamil, This is happening on the commit. It looks to me like it's because you are specifying a collection that doesn't actually exist: DocCollection col = getDocCollection(clusterState, collection); DocRouter router

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
The button works fine. So the problem must be on the repository side. Karl On Wed, Apr 1, 2015 at 12:56 PM, Karl Wright daddy...@gmail.com wrote: If your simple history shows no documents being processed or indexed, then that's the problem, or at least one of them. I will try to confirm

Re: MCF 2 and Solr Cloud 5

2015-04-01 Thread Karl Wright
at 12:07:47PM -0400, Karl Wright wrote: Hi Kamil, If no attempts are being made to actually index documents, then no documents will be indexed. (1) What repository connection is this? Can you try something simple first, like indexing from the file system? I use cifs, in 'Status

Re: Index DIrectory names

2015-03-30 Thread Karl Wright
as separate docs in index (with m/a/ctime as metadata) or same other way to query only for directory. Empty dirs must by included. K On Mon, Mar 30, 2015 at 10:27:34AM -0400, Karl Wright wrote: Hi Kamil, Directory names are include as path metadata, if you configure your job to include

Re: Setting up authentication for the REST interface ?

2015-03-30 Thread Karl Wright
I've created CONNECTORS-1177 to track this issue. Offhand I think it is straightforward to add some degree of session login support. Karl On Mon, Mar 30, 2015 at 12:00 PM, Karl Wright daddy...@gmail.com wrote: Hi Jan, The reason that the REST interface is a separate web application is so

Re: Setting up authentication for the REST interface ?

2015-03-30 Thread Karl Wright
Hi Jan, The reason that the REST interface is a separate web application is so you can protect it in the manner of your choice, within the context of the application server. It was written before there were any particular standards for authentication of REST web services. If you have an idea

Re: Solr indexing permissions even when disabling authorities for a repository connection

2015-03-26 Thread Karl Wright
Hi Antonio, A repository connector does not build its RepositoryDocument objects based on its ManifoldCF environment. To do that would be a challenge, and probably would introduce dependencies we really don't want. The access tokens themselves get qualified by the authority group name by the

Re: Solr indexing permissions even when disabling authorities for a repository connection

2015-03-26 Thread Karl Wright
of no prepending anything on the specific ACL, right? Regards On Thu, Mar 26, 2015 at 5:36 PM, Karl Wright daddy...@gmail.com wrote: Hi Antonio, A repository connector does not build its RepositoryDocument objects based on its ManifoldCF environment. To do that would be a challenge

Re: Need examples of expressions used to specify multiple folders to index

2015-03-24 Thread Karl Wright
referred to, so MCF was still set to use only 256 Mb despite my thinking otherwise. I've bumped it up to 4 Gb, and the job recovered and is finally again moving along. -Ian Karl Wright daddy...@gmail.com 3/20/2015 10:55 AM Hi Ian, HSQLDB is an interesting database

Re: Need examples of expressions used to specify multiple folders to index

2015-03-24 Thread Karl Wright
(CompositeParser.java:244) snip Caused by: org.apache.poi.EncryptedDocumentException: Cannot process encrypted word file Karl Wright daddy...@gmail.com 3/24/2015 11:22 AM failure processing document Server athttp://localhost:8983/solr returned non ok status:500, message:Server Error That's

Re: A hopfully a few simple question about ManifoldCF and SharePoint

2015-03-23 Thread Karl Wright
community (or other communities) that you know of might be available as consultants to create that module? Best , Hank On Thu, Mar 19, 2015 at 3:43 PM, Karl Wright daddy...@gmail.com wrote: If output connectors have access to the access tokens then I am presuming a custom output connector

Re: ManifoldCF Authentication

2015-03-23 Thread Karl Wright
Hi Smitha, As long as you provide some means of authentication in your web application, other than Active Directory, ManifoldCF can certainly do the authorization part. Karl On Mon, Mar 23, 2015 at 1:57 AM, Smitha S smitha_...@infosys.com wrote: Hi Karl, I have a requirement to crawl

Re: Need examples of expressions used to specify multiple folders to index

2015-03-20 Thread Karl Wright
overhead limit exceeded FATAL 2015-03-19 18:32:09,198 (Seeding thread) - SeedingThread initialization error tossed: GC overhead limit exceeded java.lang.OutOfMemoryError: GC overhead limit exceeded Karl Wright daddy...@gmail.com 3/19/2015 3:34 PM Hi Ian, ManifoldCF operates under what

Re: A hopfully a few simple question about ManifoldCF and SharePoint

2015-03-19 Thread Karl Wright
Hi Hank, Our project involves a database that has a private secure user space for each user. Our database is built on Lucene and indexes every object in the database. Each user presumably has some number of SharePoint sites that they have access to. We want to index each sharepoint object (file

Re: A hopfully a few simple question about ManifoldCF and SharePoint

2015-03-19 Thread Karl Wright
the key question is really, can we tell ManifoldCF to limit results to those visible to a specific user and would there be any performance or other unexpected downsides to doing that. Hank On Thu, Mar 19, 2015 at 1:53 PM, Karl Wright daddy...@gmail.com wrote: Hi Hank, Our project

Re: Job specific metadata

2015-03-16 Thread Karl Wright
Hi Paul, This second question is easier to answer. See the Metadata Adjuster transformation connector. As for the first question: what setting have you selected for the access tokens for your connection? Native, or SIDs? Karl On Mon, Mar 16, 2015 at 11:09 AM, Paul Bieles

Re: Job specific metadata

2015-03-16 Thread Karl Wright
Hi Paul, I'd look at Madalina's response with respect to the missing groups. For the Metadata Adjuster, I know for a fact that this works. For any transformation connection, you need to insert it into the pipeline after you add your output connection. If you send me a screen shot of your

Re: SharePoint Path Mapping not working

2015-03-12 Thread Karl Wright
Hi Frank, I am not sure what you mean by nothing happens, but if you turn on connector debugging, you will be able to see what happens in the ManifoldCF log. // Add the path metadata item into the mix, if enabled String pathAttributeName = sDesc.getPathAttributeName(); if

Re: Metadata expressions

2015-03-12 Thread Karl Wright
the 'Move metadata' tab) from the Metadata Adjuster. There is only one checkbox 'Keep all incoming data' at the end of the expressions list together with the 'Remove empty metadata values'. Frank Am 12.03.2015 um 11:26 schrieb Karl Wright: Hi Frank, Looking at the code, I cannot find

Re: SharePoint Path Mapping not working

2015-03-11 Thread Karl Wright
string: http//host:port/$(1) and it works :-) You say that each match rule is applied repeatedly but I have only one rule. Why is it applied twice? Thank you Frank Am 11.03.2015 um 15:26 schrieb Karl Wright: Hi Frank, Each match rule is applied repeatedly, which is why you

Re: SharePoint Path Mapping not working

2015-03-11 Thread Karl Wright
: List but no SharePoint: Path... entries as mentioned below. Thanks Frank Am 11.03.2015 um 11:09 schrieb Karl Wright: Hi Frank, I am not sure what you mean by nothing happens, but if you turn on connector debugging, you will be able to see what happens in the ManifoldCF log

Re: SharePoint Path Mapping not working

2015-03-11 Thread Karl Wright
11.03.2015 um 12:15 schrieb Karl Wright: Ah, ok, so basically you are saying that the UI is broken. That should be easy to confirm. FWIW, there is already a metadata field from SharePoint called url. You can manipulate it by adding a Metadata Adjuster to your job pipeline. So that might be a better

Re: SharePoint Path Mapping not working

2015-03-11 Thread Karl Wright
What version of ant are you using? Because it doesn't seem to be the right one. Karl On Wed, Mar 11, 2015 at 8:41 AM, Karl Wright daddy...@gmail.com wrote: Did you run ant make-core-deps first? Or download the lib distribution and unpack according to the instructions in the README? Karl

Re: SharePoint Path Mapping not working

2015-03-11 Thread Karl Wright
on August 24 2010 Frank Am 11.03.2015 um 13:43 schrieb Karl Wright: I am seeing none of these problems here. Karl On Wed, Mar 11, 2015 at 8:42 AM, Karl Wright daddy...@gmail.com mailto:daddy...@gmail.com wrote: What version of ant are you using? Because it doesn't seem to be the right

Re: No results from solr with mcf plugin

2015-03-09 Thread Karl Wright
and server/solr/elcore/conf/schema.xml according to the plugins README. # bin/solr restart Start the ManifoldCF job. Did I understood something wrong? Kind regards Frank Am 06.03.2015 um 12:40 schrieb Karl Wright: The README in the solr plugin is a pretty good resource for how

Re: connectors.xml is ignored?

2015-03-06 Thread Karl Wright
Hi Jan, jcifs is a proprietary connector because of the LGPL licensing of jcifs.jar. Therefore in order to use it you need to be using the proprietary examples and connectors-proprietary.xml . Nothing has changed here for years, but people do forget about the proprietary side of things. Karl

Re: connectors.xml is ignored?

2015-03-06 Thread Karl Wright
Geschäftsführung: C. Decker, K. Freese. Sitz der Gesellschaft: 28832 Achim. Handelsregister: Walsrode HRB 121162 Managing Directors: C. Decker, K. Freese. Company domicile: 28832 Achim. Trade Register: Walsrode HRB 121162 *Von:* Karl Wright [mailto:daddy...@gmail.com] *Gesendet

Re: No results from solr with mcf plugin

2015-03-06 Thread Karl Wright
The README in the solr plugin is a pretty good resource for how to configure Solr, BTW. Karl On Fri, Mar 6, 2015 at 6:39 AM, Karl Wright daddy...@gmail.com wrote: Hi Frank, Yes, you need all SIX attributes, with the proper default values. In fact, you will need to force a reindex if you

Re: No results from solr with mcf plugin

2015-03-06 Thread Karl Wright
Hi Frank, Yes, you need all SIX attributes, with the proper default values. In fact, you will need to force a reindex if you didn't have working definitions, since otherwise the default values in solr don't take effect. Karl On Fri, Mar 6, 2015 at 6:28 AM, Frank Brendel

Re: Move metadata changes content

2015-03-05 Thread Karl Wright
Hi Frank, The OpenSearchServer connector reserves uri to be the document's actual URI, which in ManifoldCF's output connectors means the document's key. So you cannot override that. Nor does it actually come from a metadata field called uri. So, if I assume you are trying to move that around

Re: SharePoint Connector does not work with SharePoint 2013

2015-03-03 Thread Karl Wright
Hi Frank, Can you post a screen shot of your path rules page? Having the wrong rules is the typical way people have the symptoms you are seeing. Karl On Tue, Mar 3, 2015 at 7:34 AM, Frank Brendel frank.bren...@eurolog.com wrote: Hello, I've installed ManifoldCF 2.0.1 and the SharePoint

Re: [ANNOUNCE] Apache ManifoldCF 1.8.2 and 2.0.2 released

2015-03-03 Thread Karl Wright
, Both links seem to be broken. Regards, Rafa En 3 de marzo de 2015 en 13:01:43, Karl Wright (daddy...@gmail.com) escrito: Apache ManifoldCF 1.8.2 and 2.0.2 have been released. These bug-fix releases include a number of important fixes since 1.8.1 and 2.0.1 was released. See the complete list

Re: SharePoint Connector does not work with SharePoint 2013

2015-03-03 Thread Karl Wright
xx.xx.xx.xx POST /_vti_bin/MCPermissions.asmx - 2014 0#.w|domain\user yy.yy.yy.yy Axis/1.4 200 0 0 78 Thank you Frank Am 03.03.2015 um 13:47 schrieb Karl Wright: Hi Frank, Can you post a screen shot of your path rules page? Having the wrong rules is the typical way people have

Re: Tikka content extractor transformation connection

2015-03-03 Thread Karl Wright
Hi Madalina, If you are using MCF 1.7 or greater, you can specify multiple output connections for a job, and different transformations for each output connection. So you should be able to do anything you like, provided the transformations you are attempting are supported as transformation

[ANNOUNCE] Apache ManifoldCF 1.8.2 and 2.0.2 released

2015-03-03 Thread Karl Wright
Apache ManifoldCF 1.8.2 and 2.0.2 have been released. These bug-fix releases include a number of important fixes since 1.8.1 and 2.0.1 was released. See the complete list at: https://svn.apache.org/repos/asf/manifoldcf/release-1.8-branch/CHANGES.txt and

Re: SharePoint Connector does not work with SharePoint 2013

2015-03-03 Thread Karl Wright
that such rules are mandatory. I thought a lazy search all within this site would be enough. Many thanks! Frank Am 03.03.2015 um 14:13 schrieb Karl Wright: Hi Frank, The three transactions per request is due to NTLM authentication, and is normal. Your rules are missing a library rule

Re: ElasticSearch Security plugin

2015-02-19 Thread Karl Wright
, Karl Wright daddy...@gmail.com wrote: Hi Kamil, The plugin must be integrated at the Java level, because ES does not provide any field specific to security, and thus you will need to integrate the plugin's query modifications into your ES query structure. My suggestion is to read the code

Re: ElasticSearch Security plugin

2015-02-19 Thread Karl Wright
Hi Kamil, The plugin must be integrated at the Java level, because ES does not provide any field specific to security, and thus you will need to integrate the plugin's query modifications into your ES query structure. My suggestion is to read the code and especially the Javadoc. Thanks, Karl

Re: Recursive Site Selection Problem While Adding Path To Crawling Job

2015-02-12 Thread Karl Wright
Hi Salih, I've seen this before on AWS SharePoint instances. There's something that needs to be done to the IIS configuration to make them work properly with subsites. There has been list discussion of this in the past, but I don't recall what the exact fix is offhand. Please let me know if

<    4   5   6   7   8   9   10   11   12   13   >