Re: Redundancy of ManifoldCF AuthorityServiceBaseURL

2015-02-10 Thread Karl Wright
Niktabar nikta...@yahoo.com wrote: Thanks Karl, I was thinking that there might be an internal solution but since it's not I may look into that load balancing workaround Regards Kambiz -- *From:* Karl Wright daddy...@gmail.com *To:* user@manifoldcf.apache.org user

Re: Redundancy of ManifoldCF AuthorityServiceBaseURL

2015-02-10 Thread Karl Wright
Hi Kambiz, If you want redundancy for document authorization, you would want to specify the URL of a load balancer (not included), that would send requests on to multiple authority service instances. Karl On Tue, Feb 10, 2015 at 6:13 AM, Kambiz Niktabar nikta...@yahoo.com wrote: Hello, By

Re: how web crawler crawls contents after the first crawling

2015-02-06 Thread Karl Wright
Hi Shigeki, The web connector creates a checksum for all content. The checksum has to match exactly for the content to be reindexed. Of course, computing the checksum means that the content must be refetched in any case. Karl

Re: Error: Unexpected jobqueue status

2015-02-05 Thread Karl Wright
[mailto:adrian.con...@arup.com] *Sent:* 04 February 2015 17:24 *To:* user@manifoldcf.apache.org *Subject:* RE: Error: Unexpected jobqueue status Thanks Karl, I’ll build/deploy tomorrow and keep you posted. Adrian *From:* Karl Wright [mailto:daddy...@gmail.com daddy...@gmail.com] *Sent

Re: FileNet Connector

2015-02-04 Thread Karl Wright
CONNECTORS-1157 for the fix you just provided. Karl On Wed, Feb 4, 2015 at 6:47 PM, Karl Wright daddy...@gmail.com wrote: Hi Guy, If you would be willing to create ManifoldCF tickets, and include patches or code snippets, I can add the functionality into the source tree. Thanks, Karl

Re: Error: Unexpected jobqueue status

2015-02-04 Thread Karl Wright
adrian.con...@arup.com wrote: At that point, only a few (three or so) minutes. I left it for another 5 or 6 six minutes after I grabbed the stack trace before I finally ‘-9’ed it. HTH, Adrian *From:* Karl Wright [mailto:daddy...@gmail.com] *Sent:* 04 February 2015 15:21

Re: Error: Unexpected jobqueue status

2015-02-04 Thread Karl Wright
CONNECTORS-1156. Karl On Wed, Feb 4, 2015 at 10:33 AM, Karl Wright daddy...@gmail.com wrote: Hi Adrian, Clearly, the shutdown is happening at a time when ManifoldCF is in the midst of an ANALYZE TABLE operation. The shutdown is attempting to interrupt this operation, and is probably

[ANNOUNCE] Apache ManifoldCF 2.0.1 has been released!

2015-02-03 Thread Karl Wright
This point release/bug fix release includes the fixes for four critical tickets that existed in the 2.0 release. You can read about them by downloading the CHANGES.txt document from the ManifoldCF site. Thanks to all who made this release possible! Karl

[ANNOUNCE] Apache ManifoldCF 1.8.1 has been released!

2015-02-03 Thread Karl Wright
This point release/bug fix release includes the fixes for four critical tickets that existed in the 1.8 release. You can read about them by downloading the CHANGES.txt document from the ManifoldCF site. Thanks to all who made this release possible! Karl

Re: Using HSQLDB for production

2015-02-02 Thread Karl Wright
Hi Bert, HSQLDB is fairly aggressive at using memory, to the extent that I believe it tries to keep entire tables in memory. If you decide to use HSQLDB in embedded mode, you will therefore have to give MCF memory that is consistent with your particular task, and expect to run out of memory

RE: Scheduling Info

2015-01-29 Thread Karl Wright
am wrong. For 1000 jobs assuming every repository connector is different from other and max thread per repository connector is 10 then will there be 10,000 threads spanned for repository connections? Thanks, Jitu On Thu, Jan 29, 2015 at 1:42 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu

Re: FileNet Connector Error

2015-01-25 Thread Karl Wright
Hi Guy, This looks like it was broken during the MCF 1.8 release cycle. I've created CONNECTORS-1151 to track this; hope to have a patch for you shortly. Thanks, Karl On Sun, Jan 25, 2015 at 5:10 AM, Guy Sperry guyspe...@yahoo.com wrote: I am unable to get the FileNet connector to process

Re: FileNet Connector Error

2015-01-25 Thread Karl Wright
AM, Karl Wright daddy...@gmail.com wrote: Hi Guy, This looks like it was broken during the MCF 1.8 release cycle. I've created CONNECTORS-1151 to track this; hope to have a patch for you shortly. Thanks, Karl On Sun, Jan 25, 2015 at 5:10 AM, Guy Sperry guyspe...@yahoo.com wrote: I am

Re: FileNet Connector Error

2015-01-25 Thread Karl Wright
) at org.apache.tools.ant.launch.Launcher.run(Launcher.java:280) at org.apache.tools.ant.launch.Launcher.main(Launcher.java:109) Total time: 15 seconds On Sunday, January 25, 2015 4:19 AM, Karl Wright daddy...@gmail.com wrote: Hi Guy, I've updated the release branches also -- that would

Re: Full reindex for continuous job

2015-01-15 Thread Karl Wright
Hi Colin, There are two buttons on every output connection's view page. These are meant to handle the case where somebody did something to a downstream index, and reindexing is needed. That's probably what you want; read up on them in the manual. Thanks, Karl On Thu, Jan 15, 2015 at 5:58 AM,

Re: ManifoldCF 2.0 - Hangs when saving output connector

2015-01-14 Thread Karl Wright
, Karl On Wed, Jan 14, 2015 at 7:44 PM, Karl Wright daddy...@gmail.com wrote: Hi Michael, I was finally able to reproduce your issue this evening. The possible change was that connection saves use non-ex write locks now, rather than exclusive write locks. I've looked carefully at the file

Re: ManifoldCF 2.0 - Hangs when saving output connector

2015-01-14 Thread Karl Wright
Hi Michael, I was finally able to reproduce your issue this evening. The possible change was that connection saves use non-ex write locks now, rather than exclusive write locks. I've looked carefully at the file-based locking code for this though and found no obvious problems. I'll keep

Re: Error: Unexpected jobqueue status

2015-01-12 Thread Karl Wright
they sort themselves out the next time the job runs? Is there anything I should do to “help” the job along? Adrian *From:* Karl Wright [mailto:daddy...@gmail.com] *Sent:* 12 January 2015 00:27 *To:* user@manifoldcf.apache.org *Subject:* Re: Error: Unexpected jobqueue status Also

Re: Error: Unexpected jobqueue status

2015-01-11 Thread Karl Wright
Also, if you are having trouble shutting down the agents process, it would be great if you could get a thread dump and post it, before you kill -9 it. Karl On Sun, Jan 11, 2015 at 7:25 PM, Karl Wright daddy...@gmail.com wrote: Hi Adrian, If you noted the comment stream in CONNECTORS-590, I

Re: [ANNOUNCE] Apache ManifoldCF 2.0 has been released!

2015-01-05 Thread Karl Wright
has helped in any sense for releasing this version!! El lunes, 29 de diciembre de 2014, Karl Wright daddy...@gmail.com escribió: Apache ManifoldCF 2.0 has officially been released. ManifoldCF 2.0 is *not* backwards compatible with any ManifoldCF 1.x release. This allowed

Re: Migrating Manifold configs

2015-01-05 Thread Karl Wright
Have you tried export and import? There's a java command for those. Karl On Mon, Jan 5, 2015 at 5:21 PM, Colin colinjoyce...@gmail.com wrote: Hi Guys, I want to migrate my manifold connector and job config across a number of mcf instances - there will be slight differences between each

Re: MCF 2.0 issue with PostgreSQL

2015-01-02 Thread Karl Wright
Hi Kambiz, As I mentioned before, ManifoldCF 2.0 is not backwards compatible with MCF 1.x. One of the things that changed is the way obfuscated data is stored. You appear to have moved the obfuscated postgresql password from MCF 1.x to 2.0, and that will of course not work. Karl On Fri, Jan

[ANNOUNCE] Apache ManifoldCF 2.0 has been released!

2014-12-28 Thread Karl Wright
Apache ManifoldCF 2.0 has officially been released. ManifoldCF 2.0 is *not* backwards compatible with any ManifoldCF 1.x release. This allowed for the reduction of the MCF 2.0 schema, the reorganization of some APIs, and the removal of much duplicate/redundant functionality. In addition, the

[ANNOUNCE] Apache ManifoldCF 1.8 has been released

2014-12-28 Thread Karl Wright
Apache ManifoldCF 1.8 has officially been released. ManifoldCF 1.8 is backwards compatible with all ManifoldCF 1.x releases. Other than for improvements which required non-backwards-compatible changes, all MCF 2.0 features are also present in MCF 1.8. The high points of this release are as

Re: schedule information

2014-12-23 Thread Karl Wright
); Thanks, Jitu On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, Your client's needs seem rather unusual, and will potentially be somewhat expensive performance-wise. So unless I hear from others as well that this is a key feature, there's no point in contributing

Re: start minimal option even deletes contents whose links are deleted

2014-12-23 Thread Karl Wright
Hi Shigeki, Minimal crawls do not guarantee that there is no document deletion. Such crawls only do the least amount of work possible based on what model the underlying connector implements. This often just means not doing the cleanup phase at the end of the job run, which typically removes

Re: schedule information

2014-12-22 Thread Karl Wright
it. Thanks, Jitu On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, You can certainly add a unique string associated with a job to every document using the Metadata Adjuster transformation connector (which of course can be the job name). The time of indexing

Re: schedule information

2014-12-22 Thread Karl Wright
for the same? Thanks, Jitu On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, I'm sorry for the miscommunication. What I meant is that without any modifications, you can add the job's name as metadata for all documents indexed with the job. If you need

Re: Not possible to add DFS root in windows share repository

2014-12-19 Thread Karl Wright
Hi Kambiz, I presume the Couldn't connect to server: A duplicate name exists on the network. error occurs on the connection view page? If so, you might be able to just ignore this error, and type in a path on the Paths tab. If that doesn't work for you, I can't help you debug your network.

Re: ElastiSearch missing doc

2014-12-16 Thread Karl Wright
) at org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:383) And another question: I use Solr 4.10 with Tika 1.5. MCF 1.8 have tika 1.6. How this affect document parsing? K On Mon, Dec 15, 2014 at 08:45:31AM -0500, Karl Wright wrote: If you changed this file, you would need to rerun

Re: ElastiSearch missing doc

2014-12-16 Thread Karl Wright
and no problem. Thx Karl for your time. K On Tue, Dec 16, 2014 at 07:28:34AM -0500, Karl Wright wrote: The commons-compress code just makes a simple reference to the Coder class, with no reflection or anything suspicious going on. If you can send me the binary file that causes this issue

Re: ElastiSearch missing doc

2014-12-15 Thread Karl Wright
does not exist'. How can I upgrade db schema? I tried ./initialize.sh without success. K On Fri, Dec 12, 2014 at 10:40:39AM -0500, Karl Wright wrote: Ok, committed a fix. CONNECTORS-1121. Karl On Fri, Dec 12, 2014 at 10:32 AM, Karl Wright daddy...@gmail.com wrote: Ah, thanks

Re: User permission for crawling windows share

2014-12-15 Thread Karl Wright
Hi Smitha, Can you tell me what version of MCF you are using? There was a fix for this very situation made a while ago, if I recall correctly. Karl On Mon, Dec 15, 2014 at 2:51 AM, Smitha S smitha_...@infosys.com wrote: Hi, We are using ManifoldCF to crawl using window share repository

Re: ElastiSearch missing doc

2014-12-15 Thread Karl Wright
You have to run ./initialize.sh on the MCF 1.8 codebase for the upgrade to take place. Karl On Mon, Dec 15, 2014 at 7:43 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: With release-1.8-branch is the same problem. K On Mon, Dec 15, 2014 at 06:47:12AM -0500, Karl Wright wrote: Hi Kamil

Re: User permission for crawling windows share

2014-12-15 Thread Karl Wright
Hi Smitha, The support for folder security was added in MCF 1.6. See CONNECTORS-886. Karl On Mon, Dec 15, 2014 at 6:49 AM, Karl Wright daddy...@gmail.com wrote: Hi Smitha, Can you tell me what version of MCF you are using? There was a fix for this very situation made a while ago, if I

Re: ElastiSearch missing doc

2014-12-15 Thread Karl Wright
, 2014 at 08:00:12AM -0500, Karl Wright wrote: You have to run ./initialize.sh on the MCF 1.8 codebase for the upgrade to take place. Karl On Mon, Dec 15, 2014 at 7:43 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: With release-1.8-branch is the same problem. K On Mon

Re: ElastiSearch missing doc

2014-12-15 Thread Karl Wright
=org.apache.manifoldcf.crawler.connectors.sharedrive.SharedDriveConnector/ (...) K On Mon, Dec 15, 2014 at 08:39:07AM -0500, Karl Wright wrote: Hi Kamil, What does connectors-proprietary.xml say about the jcifs connector? Karl On Mon, Dec 15, 2014 at 8:35 AM, Kamil Żyta kamil.z...@pwr.edu.pl wrote: Right

Re: User permission for crawling windows share

2014-12-15 Thread Karl Wright
Regards, Smitha S *From:* Karl Wright [mailto:daddy...@gmail.com] *Sent:* Monday, December 15, 2014 6:40 PM *To:* user@manifoldcf.apache.org *Subject:* Re: User permission for crawling windows share Hi Smitha, The support for folder security was added in MCF 1.6. See CONNECTORS-886

Re: stuffamountfactor and getting more work done

2014-12-12 Thread Karl Wright
Hi Aeham, Before you assume that stuffing is just not happening fast enough, you will want to confirm that you have enough documents that are *eligible* for processing. In a continuous job, documents may well be scheduled to be crawled at some time in the future, and are ineligible for crawling

Re: stuffamountfactor and getting more work done

2014-12-12 Thread Karl Wright
FWIW, you can diagnose a slow stuffer query by getting a thread dump. If there are tons of idle worker threads AND your stuffer thread is waiting on Postgresql, that's a good sign it is not keeping up due to database reasons. Karl On Fri, Dec 12, 2014 at 7:23 AM, Karl Wright daddy...@gmail.com

Re: ElastiSearch missing doc

2014-12-12 Thread Karl Wright
Hi Kamil, You are getting a 404 error when ManifoldCF tries to delete a document from the ElasticSearch index: else if (code == 404) { setResult(IOutputHistoryActivity.HTTP_ERROR,Result.ERROR, Page not found: + response); throw new ManifoldCFException(Server/page not

Re: stuffamountfactor and getting more work done

2014-12-12 Thread Karl Wright
Yes, I believe it is. Karl On Fri, Dec 12, 2014 at 1:09 PM, Aeham Abushwashi aeham.abushwa...@exonar.com wrote: Thanks Karl. I see that JobManager#fetchAndProcessDocuments invokes database.beginTransaction soon after acquiring the lock. Is that transaction necessary?

Re: Windows Share/DFS Repository Connection

2014-12-09 Thread Karl Wright
If you share a local folder, you should be able to use the Windows Share/DFS Repository Connector to crawl it, yes. Karl On Tue, Dec 9, 2014 at 5:23 AM, Jitu abj...@gmail.com wrote: Hi, I am trying to share my local folder and crawl using Windows Share/DFS Repository Connection. Is it

Re: Windows Share/DFS Repository Connection

2014-12-09 Thread Karl Wright
is never succeeding. i tried localhost, 127.0.0.1, ip address etc., in server. Thanks, Jitu On Tue, Dec 9, 2014 at 4:50 PM, Karl Wright daddy...@gmail.com wrote: If you share a local folder, you should be able to use the Windows Share/DFS Repository Connector to crawl it, yes. Karl

Re: Windows Share/DFS Repository Connection

2014-12-09 Thread Karl Wright
Hi Jitu, Did you use the fully-qualified domain name of the machine? Karl On Tue, Dec 9, 2014 at 7:56 AM, Jitu abj...@gmail.com wrote: Hi Karl, Even that din't work. Thanks, Jitu On Tue, Dec 9, 2014 at 6:19 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, You would need to use

Re: Windows Share/DFS Repository Connection

2014-12-09 Thread Karl Wright
not able to connect. please advice Thanks, Jitu On Tue, Dec 9, 2014 at 6:49 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, Did you use the fully-qualified domain name of the machine? Karl On Tue, Dec 9, 2014 at 7:56 AM, Jitu abj...@gmail.com wrote: Hi Karl, Even that din't work

Re: Continues Job Crawling

2014-12-08 Thread Karl Wright
Hi Babita, How you use continuous crawling depends on what you are trying to accomplish with it. In the continuous crawling model, ManifoldCF requeues documents after it crawls them, and checks them again after an interval that is determined in part by how often they've changed in the past. The

Re: Enable Authority Connections Logging

2014-12-01 Thread Karl Wright
Hi Joel, A SQL exception winds up throwing a ManifoldCFException in the JDBC connectors, as follows: if (thr instanceof java.sql.SQLException) throw new ManifoldCFException(Exception doing connector query '+query+': +thr.getMessage(),thr); This is thrown up the stack through

Re: Enable Authority Connections Logging

2014-12-01 Thread Karl Wright
joelkempi...@gmail.com wrote: Karl, Thanks for your help and quick response. I see the logs under my Tomcat/bin/logs. Thanks, Joel Joel Kempista (302) 858-0731 On Mon, Dec 1, 2014 at 10:24 AM, Karl Wright daddy...@gmail.com wrote: Hi Joel, A SQL exception winds up throwing

Re: What does JDBC repository connector do?

2014-11-27 Thread Karl Wright
Hi Dan, The JDBC connector pulls content from a column in a database table. It does not fetch the contents of any URLs. The use case you are describing sounds most similar to the RSS connector. Karl On Wed, Nov 26, 2014 at 5:27 PM, Dan Davis dansm...@gmail.com wrote: I've seen the online

Re: Document components

2014-11-25 Thread Karl Wright
Hi Markus, I've created a ticket for the exception. CONNECTORS-1114. As for removal of a primary document that is not mentioned, do you mean that within processDocuments(), if you don't call any disposition method for a primary document, then that document is left around? If so, that behavior

Re: Document components

2014-11-25 Thread Karl Wright
changes. Karl On Tue, Nov 25, 2014 at 4:46 AM, Karl Wright daddy...@gmail.com wrote: Hi Markus, I've created a ticket for the exception. CONNECTORS-1114. As for removal of a primary document that is not mentioned, do you mean that within processDocuments(), if you don't call any

Re: Document components

2014-11-25 Thread Karl Wright
obviously didn't actually use it, and although I'd intended to write a test, that did not get done. Thanks, Karl On Tue, Nov 25, 2014 at 5:03 AM, Karl Wright daddy...@gmail.com wrote: I believe I've found the problem with removeDocument(), and will commit a fix shortly. To clarify your

Re: ManifoldCF to accept and remember cookies in a crawling session

2014-11-25 Thread Karl Wright
Hi Arcadius, Did you determine that that was the problem with CONNECTORS-1113? The issue with cookies that change the experience of a user is very complex, because it usually means that there's some sequence of pages that get fetched which you DON'T want to index, that are involved in setting

Re: Document components

2014-11-25 Thread Karl Wright
Hi Markus, noDocument() removes the document or the specified component from the output but keeps track of the version in the status queue. The decision of not indexing the document/component is considered persistent as long as the version string does not change. deleteDocument() removes the

Re: Document components

2014-11-25 Thread Karl Wright
See CONNECTORS-1115. I looked into this; looked relatively easy to add a method to IProcessActivity that does what you request. Please give it a try and let me know how it works for you. Thanks, Karl On Tue, Nov 25, 2014 at 6:22 PM, Karl Wright daddy...@gmail.com wrote: Hi Markus

Re: Filesystem + Security

2014-11-22 Thread Karl Wright
Hi Alejandro, To do this in ManifoldCF, you could in theory write a transformation connector that would decorate documents with security information and metadata from a database. I haven't heard of anyone who has written such a transformation connector yet, so you may need to develop this

Re: Document components

2014-11-21 Thread Karl Wright
Hi Markus, Your example looks correct. I suspect there may be a bug. I'll open a ticket. CONNECTORS-1110. Karl On Fri, Nov 21, 2014 at 7:59 AM, Markus Schuch markus_sch...@web.de wrote: Hi, is there any example implementation of the new document component feature invented with

Re: Document components

2014-11-21 Thread Karl Wright
Hi Markus, You can mix and match, yes. The non-component version of the method is equivalent to using a component ID of null. I've created CONNECTORS- for the cleanup issue. Thanks, Karl On Fri, Nov 21, 2014 at 9:20 AM, Markus Schuch markus_sch...@web.de wrote: Hi Karl, as i already

Re: Multiple Data Sources + Job's Access Token Query

2014-11-21 Thread Karl Wright
Hi Alejandro, There is no reason why that would not work. Karl On Fri, Nov 21, 2014 at 11:10 AM, Alejandro Calbazana acalbaz...@gmail.com wrote: Hello, Quick question... Is it possible to associate my job's access token query with a different data source? I have data sitting in two data

Re: Solr Plugin

2014-11-20 Thread Karl Wright
TOKEN:authGroup:A127839-1411291 AUTHORIZED:authConn2 TOKEN:authGroup:A127839-1411291 TOKEN:authGroup:A127839-1413366 TOKEN:authGroup:A127839-1413038 It doesn't even matter if the auth connectors are placed in separate groups. Thanks, Alejandro On Fri, Nov 7, 2014 at 12:43 PM, Karl Wright daddy

Re: Solr Plugin

2014-11-20 Thread Karl Wright
TOKEN:authGroup:2 AUTHORIZED:authConn1 TOKEN:authGroup:1 Thanks, Alejandro On Thu, Nov 20, 2014 at 2:21 PM, Karl Wright daddy...@gmail.com wrote: Hi Alejandro, I'm having a bit of trouble from your email figuring out what your authorities are each doing. Within an authority group

Re: Solr Plugin

2014-11-20 Thread Karl Wright
Hi Alejandro, This is a problem with JDBC authority caching. See CONNECTORS-1109. I'll have a fix available shortly. Karl

Re: Solr Plugin

2014-11-20 Thread Karl Wright
Karl On Thu, Nov 20, 2014 at 2:59 PM, Karl Wright daddy...@gmail.com wrote: Hi Alejandro, This is a problem with JDBC authority caching. See CONNECTORS-1109. I'll have a fix available shortly. Karl

Re: Solr Plugin

2014-11-20 Thread Karl Wright
with this. Alejandro On Thu, Nov 20, 2014 at 3:42 PM, Karl Wright daddy...@gmail.com wrote: A fix is available and is attached to the ticket. Alternatively, you can check out the dev_1x branch or trunk. Please be aware that the schema for dev_1x and trunk has changed, so unless you are using

Re: Unexpected value for job status: 37

2014-11-19 Thread Karl Wright
][org.apache.manifoldcf.db ] Requested query: [SELECT id FROM jobs WHERE status IN (?,?,?,?,?,?,?,?,?,?)] Thanks, Jitu On Tue, Nov 18, 2014 at 5:55 PM, Karl Wright daddy...@gmail.com wrote: I looked at this in more detail; the problem is a dangling state in the framework. I should have a patch shortly

Re: proxy settings

2014-11-19 Thread Karl Wright
and there is no guarantee that it can be applied on the 1.7 branch. Thanks, Karl On Wed, Nov 19, 2014 at 6:14 AM, Jitu abj...@gmail.com wrote: Hi Karl, Yes its windows proxy with authentication. Thanks, Jitu On Mon, Nov 17, 2014 at 4:54 PM, Karl Wright daddy...@gmail.com wrote: I've created

Re: Unexpected value for job status: 37

2014-11-18 Thread Karl Wright
Hi Kamil, I really don't know the failure modes of postgresql when it runs out of disk space. But if transactional integrity is maintained, it should be sufficient to shut down all agents processes and start them back up. Karl On Tue, Nov 18, 2014 at 4:24 AM, Kamil Żyta kamil.z...@pwr.edu.pl

Re: Unexpected value for job status: 37

2014-11-18 Thread Karl Wright
If you are using zookeeper, you may also want to purge the zookeeper disk area as well, before starting any MCF processes. Karl On Tue, Nov 18, 2014 at 6:19 AM, Karl Wright daddy...@gmail.com wrote: Hi Kamil, I really don't know the failure modes of postgresql when it runs out of disk

Re: Unexpected value for job status: 37

2014-11-18 Thread Karl Wright
Please let me know if you were able to get past this. MCF's database states are quite resilient, so I'd be interested to see what kinds of problems you have. Karl On Tue, Nov 18, 2014 at 6:21 AM, Karl Wright daddy...@gmail.com wrote: If you are using zookeeper, you may also want to purge

Re: Date normalization URL mapping

2014-11-18 Thread Karl Wright
Hi Kambiz, What version of MCF are you using? In 1.7, the file system connector sets the RepositoryDocument's modifiedDate field, which the solr output connector formats as iso 8601 format: if ( modifiedDateAttributeName != null ) { Date date = document.getModifiedDate();

Re: Date normalization URL mapping

2014-11-18 Thread Karl Wright
/Indexing_test1.doc (1485122413007994880)]} 0 15 I see that URL Mapping tab in the job created based on Windows share repository. by the way, How the request should be created? Regards Kambiz -- *From:* Karl Wright daddy...@gmail.com *To:* user

Re: Date normalization URL mapping

2014-11-18 Thread Karl Wright
to do declare fields for created date and modified date attributes and which fields contains the value for created date and modified date in addition to createdOn and lastModified fields. Can you advise? Regards Kambiz -- *From:* Karl Wright daddy...@gmail.com

Re: Importing a Manifoldcf config

2014-11-18 Thread Karl Wright
I've attached a patch. The ticket was already resolved in the dev_1x branch. Thanks, Karl On Tue, Nov 18, 2014 at 4:03 PM, Karl Wright daddy...@gmail.com wrote: Hi Michael, You are correct. See CONNECTORS-1107. I will be attaching a patch shortly. Karl On Tue, Nov 18, 2014 at 3:52

Re: proxy settings

2014-11-17 Thread Karl Wright
()) .setDefaultSocketConfig(SocketConfig.custom() .setTcpNoDelay(true) .setSoTimeout(socketTimeout) .build()) .setDefaultCredentialsProvider(credentialsProvider); Thanks, Jitu On Tue, Oct 7, 2014 at 7:56 PM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, There are tons of different ways

Re: proxy settings

2014-11-17 Thread Karl Wright
but that requires a new release of HttpClient to properly be implemented.) Thanks, Karl On Mon, Nov 17, 2014 at 6:09 AM, Karl Wright daddy...@gmail.com wrote: Hi Jitu, Is this a Windows proxy? Is it an authenticated proxy? Karl On Mon, Nov 17, 2014 at 5:07 AM, Jitu abj...@gmail.com wrote: Hi Karl

[ANNOUNCE] Apache ManifoldCF 1.7.2 has been released

2014-11-09 Thread Karl Wright
I am pleased to announce that Apache ManifoldCF 1.7.2 has been released. This bug fix release addresses mainly performance and scaling-related issues. Thanks, Karl

Re: Error while creating Meridio authority connection

2014-11-03 Thread Karl Wright
Hi Kambiz, There are a number of possibilities. I'll list them in order of likelihood. (1) The Meridio API might have changed, so that the WSDLs and XSDs we compile against might need to be updated. What version of Meridio are you working with? (2) It's possible that the plugin would need to

Re: JDBC Repository Connector + Access Token Question

2014-11-01 Thread Karl Wright
AS lcf__id FROM crawltest; arguments = () OK 0 1 What are you doing that differs in any significant way from this? For example, are you supplying a version query, or not? Thanks, Karl On Fri, Oct 31, 2014 at 8:37 PM, Karl Wright daddy...@gmail.com wrote: Hi Alejandro, Is there anything

Re: Two Active directory connections in Authority group

2014-10-31 Thread Karl Wright
-- *From:* Kambiz Niktabar nikta...@yahoo.com *To:* Karl Wright daddy...@gmail.com; user@manifoldcf.apache.org user@manifoldcf.apache.org *Sent:* Tuesday, October 28, 2014 10:24 PM *Subject:* Re: Two Active directory connections in Authority group Hi Karl, Thanks a lot

Re: replace of document

2014-10-31 Thread Karl Wright
Hi Jitu, There is no way in the current API to know this. Karl On Fri, Oct 31, 2014 at 4:09 AM, Jitu abj...@gmail.com wrote: Hi Karl, Thanks for your continued support. Inside output connector's addOrReplaceDocumentWithException method is there a way to find if its new or update of

Re: JDBC Repository Connector + Access Token Question

2014-10-31 Thread Karl Wright
for access tokens in the database? a joined table, or a delimiter-separated list? A joined table would be preferable. Thanks, Alejandro On Thu, Oct 30, 2014 at 12:55 PM, Karl Wright daddy...@gmail.com wrote: Hi Alejandro, The reason you are confused about document security in JDBC is because

RE: JDBC Repository Connector + Access Token Question

2014-10-31 Thread Karl Wright
On Fri, Oct 31, 2014 at 8:55 AM, Karl Wright daddy...@gmail.com wrote: Hi Alejandro, I've completed development of this feature. Remember that this work is currently based on unreleased MCF 2.0 code, so you will want to try it out in a way that does not impact any existing instances of MCF

Re: Crawling a JSON REST API

2014-10-30 Thread Karl Wright
Hi Arcadius, Yes, if you write an appropriate connector, MCF can crawl JSON and write each entity as a component of the document. You would need a distinct URL each component, and a component identifier, but that is pretty much all. Karl On Thu, Oct 30, 2014 at 10:47 AM, Arcadius Ahouansou

Re: JDBC Repository Connector + Access Token Question

2014-10-30 Thread Karl Wright
Hi Alejandro, The reason you are confused about document security in JDBC is because, heretofore, nobody has implemented document security using JDBC. We've been waiting for quite a while for a user who needs this functionality, so that we develop it properly. If you are hoping to get security

Re: About the processing of MCF incremental crawling

2014-10-29 Thread Karl Wright
Hi Minhui, ManifoldCF is not designed to somehow know about external changes you have made to an index outside of ManifoldCF. It updates entire documents, using delete/insert. If you want to manage other fields, I suggest that the best way to do it is to write a transformation connector that

Re: Extracting Content from Web Crawler using the new PipeLine

2014-10-28 Thread Karl Wright
Ok, I see now how it's supposed to work. See CONNECTORS-1088. Karl On Tue, Oct 28, 2014 at 3:42 AM, Arcadius Ahouansou arcad...@menelic.com wrote: Hello Karl. On 23 October 2014 17:57, Karl Wright daddy...@gmail.com wrote: Looking at the SOLR patch, I have two concerns. First, here's

Re: Two Active directory connections in Authority group

2014-10-28 Thread Karl Wright
documents from the associated authority group. Thanks, Karl On Tue, Oct 28, 2014 at 12:09 PM, Karl Wright daddy...@gmail.com wrote: Hi Kambiz, The Active Directory authority is not an additive authority, so you cannot use it within the same authorization group with other authorities, and expect

Re: Google Drive processing

2014-10-27 Thread Karl Wright
Hi Ethan, This does not sound like it is related in any way to the google drive connection, unless for some reason the google API is considering some of the documents fetched to have only metadata and no content. In this case, you'd see size of zero in the simple history for indexing activity

Re: Google Drive processing

2014-10-27 Thread Karl Wright
On Oct 27, 2014, at 2:27 PM, Karl Wright daddy...@gmail.com wrote: Hi Ethan, This does not sound like it is related in any way to the google drive connection, unless for some reason the google API is considering some of the documents fetched to have only metadata and no content

Re: Zookeeper configured MCF not working in production mode

2014-10-26 Thread Karl Wright
Hi Aeham, Here are the svn revs involved: r1626105 | kwright | 2014-09-18 19:42:24 -0400 (Thu, 18 Sep 2014) | 1 line Pull up fix for CONNECTORS-1038 from dev_1x branch

Re: Zookeeper configured MCF not working in production mode

2014-10-24 Thread Karl Wright
Hi Aeham, Other fixes in 1.7.1 fixed this particular problem, I believe. Karl On Fri, Oct 24, 2014 at 11:42 AM, Aeham Abushwashi aeham.abushwa...@exonar.com wrote: All relevant patches from 1.7.x have been applied (to our internal 1.6.1 branch), including changes to files bundled in the

Re: MariaDB Support

2014-10-23 Thread Karl Wright
Hi Markus, I would suggest creating a class that extends DBInterfaceMySQL; call it DBInterfaceMariaDB. Change only those methods that select the database driver. You can then see if everything works; just change your properties.xml file to point at your new class. FWIW, LGPL is not compatible

Re: Extracting Content from Web Crawler using the new PipeLine

2014-10-23 Thread Karl Wright
Looking at the SOLR patch, I have two concerns. First, here's the pertinent part of the patch: + boilerpipe = de.l3s.boilerpipe.extractors. + boilerpipe; + try { +ClassLoader loader = BoilerpipeExtractor.class.getClassLoader(); +Class extractorClass =

Re: Zookeeper configured MCF not working in production mode

2014-10-21 Thread Karl Wright
Hi Aeham, You should apply all patches that affect ZooKeeperConnection.java. Karl On Tue, Oct 21, 2014 at 12:45 PM, Aeham Abushwashi aeham.abushwa...@exonar.com wrote: Hi, I wasn't sure whether to continue the conversation on this thread or start a new one in the dev list, so if this is

Re: Internal server error (500) causing a crawl interruption

2014-10-20 Thread Karl Wright
, Karl Wright daddy...@gmail.com wrote: Hi Luca, There is a solr setting which configures Solr Cell to ignore tika errors. I don't remember what it is offhand, but you will want to set it properly to disable tika errors. Thanks, Karl On Mon, Oct 6, 2014 at 7:08 AM, Basso Luca

Re: Internal server error (500) causing a crawl interruption

2014-10-20 Thread Karl Wright
have 'Transformation Connections' where I added Tika extractor. How this works? It's only metadata extraction or mime detection? If manifoldCF had complete Tika extraction it would had better handle Tika errors. Regards, KŻ On Mon, Oct 20, 2014 at 06:15:52AM -0400, Karl Wright wrote: Hi

Re: Internal server error (500) causing a crawl interruption

2014-10-20 Thread Karl Wright
:16PM -0400, Karl Wright wrote: Well, that's clear enough: ERROR - 2014-10-20 10:54:00.355; org.apache.solr.common.SolrException; null:java.lang.RuntimeException: java.lang.OutOfMemoryError: Requested array size exceeds VM limit OutOfMemoryExceptions are never fully

Re: Migration from mcf 1.1 to 1.7

2014-10-20 Thread Karl Wright
Hi Marcus, The schema changes are handled automatically by ManifoldCF, so you do not need to start with a clean database. All you have to do is run either the new single-process example (or combined war), or the initialize.bat script (for multiprocess deployments) after configuring

Re: Issue with windows share access rights

2014-10-16 Thread Karl Wright
Unless you want to turn security off, you have to use a user with sufficient rights to obtain share security for a file. It's up to you how you manage that, but there is no other way. Thanks, Karl On Thu, Oct 16, 2014 at 9:10 AM, Jan Kossow j.kos...@desma.de wrote: Hi Karl, thanks for

<    5   6   7   8   9   10   11   12   13   14   >