The code seems correct and many people are using it without encountering
this problem. There may be another SharePoint configuration parameter you
also need to look at somewhere.
Karl
On Fri, Dec 20, 2019 at 6:38 AM Jorge Alonso Garcia
wrote:
>
> Hi Karl,
> On sharepoint the list view
Hi Karl,
On sharepoint the list view threshold is 150,000 but we only receipt 20,000
from mcf
[image: image.png]
Jorge Alonso Garcia
El jue., 19 dic. 2019 a las 19:19, Karl Wright ()
escribió:
> If the job finished without error it implies that the number of documents
> returned from this
Hi Priya,
the container you trying to interactivily executing a command with is no
longer running. It is not possible to execute command with stopped
containers.
The logger issues might be related to missing file system permissions.
But thats a wild guess. Is there a "Caused by" part in the
Hi All,
When i am trying to execute bash command inside manifoldcf container
getting error.
[image: image.png]
And when checking logs Sudo docker logs
2019-12-19 18:09:05,848 Job start thread ERROR Unable to write to stream
logs/manifoldcf.log for appender MyFile
2019-12-19 18:09:05,848 Seeding
Hi Markus,
Many thanks for your reply!!.
I tried this approach to reproduce the scenario in a different environment,
but the case where I listed the error above is when I am crawling INTRANET
sites which can be accessible over a remote server. Also I have used
Transformation connectors:-Allow
Hi Priya,
in my experience, i would focus on the OutOfMemoryError (OOME).
8 Gigs can be enough, but they don't have to.
At first i would check if the jvm is really getting the desired heap
size. The dockered environment make that a little harder find find out,
since you need to get access to the
Hi Markus ,
Heap size defined is 8GB. Manifoldcf start-options-unix file Xmx etc
parameters is defined to have memory 8192mb.
It seems to be an issue with memory also, and also when manifoldcf tries to
communicate to Database. Do you explicitly define somewhere connection
timer when to
Hi Priya,
your manifoldcf JVM suffers from high garbage collection pressure:
java.lang.OutOfMemoryError: GC overhead limit exceeded
What is your current heap size?
Without knowing that, i suggest to increase the heap size. (java -Xmx...)
Cheers,
Markus
Am 20.12.2019 um 09:02 schrieb Priya
Hi,
The job finnish ok (several times) but always with this 2 documents,
for some reason the loop only execute twice
Jorge Alonso Garcia
El jue., 19 dic. 2019 a las 18:14, Karl Wright ()
escribió:
> If the are all in one document, then you'd be running this code:
>
> >>
> int
If the are all in one document, then you'd be running this code:
>>
int startingIndex = 0;
int amtToRequest = 1;
while (true)
{
com.microsoft.sharepoint.webpartpages.GetListItemsResponseGetListItemsResult
itemsResult =
If you are using the MCF plugin, and selecting the appropriate version of
Sharepoint in the connection configuration, there is no hard limit I'm
aware of for any Sharepoint job. We have lots of other people using
SharePoint and nobody has reported this ever before.
If your SharePoint connection
Hi,
On UI shows 20,002 documents (on a firts phase show 10,001,and after
sometime of process raise to 20,002) .
It looks like a hard limit, there is more files on sharepoint with the used
criteria
Jorge Alonso Garcia
El jue., 19 dic. 2019 a las 16:05, Karl Wright ()
escribió:
> Hi Jorge,
>
>
Hi Jorge,
When you run the job, do you see more than 20,000 documents as part of it?
Do you see *exactly* 20,000 documents as part of it?
Unless you are seeing a hard number like that in the UI for that job on the
job status page, I doubt very much that the problem is a numerical
limitation in
Hi Karl,
We had installed the shaterpoint plugin, and access properly http:/server/
_vti_bin/MCPermissions.asmx
[image: image.png]
Sharepoint has more than 20,000 documents, but when execute the jon only
extract these 20,000. How Can I check where is the issue?
Regards
Jorge Alonso Garcia
By "stop at 20,000" do you mean that it finds more than 20,000 but stops
crawling at that time? Or what exactly do you mean here?
FWIW, the behavior you describe sounds like you may not have installed the
SharePoint plugin and may have selected a version of SharePoint that is
inappropriate. All
Found the problem: needed to update a pom dependency.
Everything passes now.
Karl
On Tue, Dec 17, 2019 at 8:07 PM Karl Wright wrote:
> I just created a plugin directory at
> https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/trunk .
> Code committed there builds but it doesn't
I just created a plugin directory at
https://svn.apache.org/repos/asf/manifoldcf/integration/solr-8.x/trunk .
Code committed there builds but it doesn't test properly because of the
following exception:
>>
[ERROR] Failed to execute goal
Here you find it: https://issues.apache.org/jira/browse/CONNECTORS-1629
I will try it out this year I hope.
I will try it though with Solr 8.3.1 and will take into account
https://issues.apache.org/jira/browse/CONNECTORS-1586
On Tue, Dec 17, 2019 at 1:09 PM Karl Wright wrote:
> Please do!
>
Please do!
Karl
On Tue, Dec 17, 2019 at 7:06 AM Jörn Franke wrote:
> Thanks a lot Karl for your feedback. Do you mind if I create a Jira where
> I report on the progress?
>
> Am 17.12.2019 um 12:22 schrieb Karl Wright :
>
>
> Well, you can certainly attempt this simply enough then if you
Thanks a lot Karl for your feedback. Do you mind if I create a Jira where I
report on the progress?
> Am 17.12.2019 um 12:22 schrieb Karl Wright :
>
>
> Well, you can certainly attempt this simply enough then if you build from
> source. I'd prefer that you validate the approach before we
Well, you can certainly attempt this simply enough then if you build from
source. I'd prefer that you validate the approach before we make permanent
commits.
Please let me know what works and what doesn't.
Karl
On Tue, Dec 17, 2019 at 1:22 AM Jörn Franke wrote:
> I agree.
> The delegation
I agree.
The delegation part is not relevant for me. I also do not believe it makes
sense at the ETL level.
I think still we need add the one line of code that allows to use Kerberos
(second line in the example).
> Am 17.12.2019 um 01:35 schrieb Karl Wright :
>
>
> Hi Jorn,
>
> The code
Hi Jorn,
The code referenced cannot be set up differently from connection to
connection so there is no point in having this be anything other than
global. In that case you can point at the config file with
-D=value and it will do the same thing as setting a system
property.
The token delegation
Thanks a lot for the quick reply. Actually it is here:
https://lucene.apache.org/solr/guide/8_3/kerberos-authentication-plugin.html#using-solrj-with-a-kerberized-solr
It is also available in the previous versions of Solr.
I wonder how easy it would be to add a configuration to the Manifold UI to
The Solr Output Connector uses a patched HttpComponents/HttpClient for
communication with the various Solr Cloud replicas, along with custom
versions of some of the SolrJ classes which allow multipart posts to work.
Other than that it's standard SolrJ. Whatever SolrJ needs to work with
Kerberos,
Hallo,
does the Solr Output Connector support SolrCloud with Kerberos
authentication and Zookeeper with Kerberos authentication?
If so, how can this be configured?
If it is not supported, is there an "easy" way to integrate this? From a
development perspective the Kerberos Authentication with
Hello, Manifold CF Community Members:
I would like to put a document to a web API that accepts JSON documents
from MCF.
So, for example, in a directory, a lot of JSONs are stored.
I want MCF to read the JSON from the directory ( with FileSystem Repository
Connection) and send the document to API
Hi Kaya,
The best way to form proper JSON is to create a job with the UI and export
its JSON, and use that as a model.
Thanks,
Karl
On Thu, Nov 28, 2019 at 3:05 AM Kayak28 wrote:
> Hello, Community Members:
>
> I have a question about the form of JSON when I call a job-creation API.
> I would
Hello, Community Members:
I have a question about the form of JSON when I call a job-creation API.
I would like to use the following API.
jobs POST Create a job {"job":**} {"job_id":**}
*OR* {"error":**}
The URL I should send with curl POST command is:
Thanks Karl.
On Tue, Nov 26, 2019 at 1:39 PM Karl Wright wrote:
> No, just changing the job characteristics will NOT cause the incremental
> behavior to be erased.
>
> Karl
>
>
> On Mon, Nov 25, 2019 at 10:20 PM Sreejith Variyath <
> sreejith.variy...@tarams.com> wrote:
>
>> Yes. I understood.
No, just changing the job characteristics will NOT cause the incremental
behavior to be erased.
Karl
On Mon, Nov 25, 2019 at 10:20 PM Sreejith Variyath <
sreejith.variy...@tarams.com> wrote:
> Yes. I understood. Thanks Karl.
>
> I have another question. If I update job type from
Yes. I understood. Thanks Karl.
I have another question. If I update job type from TYPE_SPECIFIED to
TYPE_CONTINUOUS , Then the document versioning will reset and job will pick
all the documents again?.
On Tue, Nov 26, 2019, 05:12 Karl Wright wrote:
> One of the characteristics of continuous
One of the characteristics of continuous jobs is that they call
addSeedDocuments multiple times on a single job run. The job run never
ends, so this is how the job picks up documents for the infinitely-running
job. That's just the way it works. Have you read the book?
Karl
On Mon, Nov 25,
Hi Every One,
I am trying to setup a job which is having a JDBC repository connector. One
transformation connector and a custom output connector.
I want this job needs to run in two mode.
- Sample Mode : This is a sample migration mode. Job will pick 10
documents and migrate to output
I was incorrect. The value comes from one of the properties:
Karl
On Tue, Nov 19, 2019 at 6:16 AM Priya Arora wrote:
> I am using docker commands to install manifoldcf inside docker container.
> So what I understand is that mcf downloads latest crawler-ui.war files in
> the web
I am using docker commands to install manifoldcf inside docker container.
So what I understand is that mcf downloads latest crawler-ui.war files in
the web folder(that is what i checked in the local system). Do I need to
check somewhere else.
[image: image.png]
On Tue, Nov 19, 2019 at 4:40 PM
That version comes directly from the ant build version that was used to
compile the UI. What version of crawler-ui.war do you have?
Karl
On Tue, Nov 19, 2019 at 5:50 AM Priya Arora wrote:
> Hi All,
>
> I have upgraded manifoldcf version on the server to version 2.14, I
> re-confirmed it via
Hi All,
I have upgraded manifoldcf version on the server to version 2.14, I
re-confirmed it via docker build command that it is downloading 2.14
version only.
[image: image.png]
But when I am starting up manifold wythe version is showing me up 2.10 as
shown in the screenshot above. Is there
Hello, Mr. Karl, Mr. Issei, and Community members.
I have a similar issue with Mr.Issei.
Here is a sample website structure that I want to crawl with MCF.
index.html -link to -> sample1.html -link to-> sample2.html
I made this sample website to explore the behavior of "Hop count mode."
The
Can you do the following:
>>
C:\wip\mcf\trunk>dir lib\less*
Volume in drive C is Windows
Volume Serial Number is F4D8-E4E0
Directory of C:\wip\mcf\trunk\lib
09/06/2019 02:52 PM 1,304,630 less4j-1.17.2.jar
1 File(s) 1,304,630 bytes
0 Dir(s)
Hi Karl,
Thank you for a quick response.
It seems that I have completely misunderstood the specifications so it'd be
helpful if you could show specific examples for each Hop count mode.
Is those below my understanding correct?
- "keep unreachable documents, for now" and "... forever" is the
(1) Download source distribution and lib distribution
(2) Unpack and follow directions for placing lib folder in place
(3) Run 'ant make-deps' to download the correct version of jcifs
(4) Run "ant build" to make a distribution that includes proprietary
examples
(5) Use the proprietary example you
This didn't work even. Is that(manifoldcf version 2.14) something to do
with java version also. If yes , I am using JAVA_HOME :_ java version 8.
Can you suggest something
On Fri, Nov 8, 2019 at 4:16 PM Sreejith Variyath <
sreejith.variy...@tarams.com> wrote:
> place the jcifs.jar into the
place the jcifs.jar into the *connector-lib-proprietary* directory
On Fri, Nov 8, 2019 at 2:38 PM Priya Arora wrote:
> Hi All
>
> I installed the 2.14 version of manifoldcf , then uncommented the line in
> connectors.xml file "" , but when I try to start with(java- jar start.jar) gives error:
>
Hi All
I installed the 2.14 version of manifoldcf , then uncommented the line in
connectors.xml file "" , but when I try to start with(java- jar start.jar) gives error:
I also checked it mcf-jcifs-connector.jar is also present in connector-lib.
Do i need to do something else also.Here is the
Ok thanks Karl.
On Fri, Nov 8, 2019, 02:20 Karl Wright wrote:
> Have you tried deploying the combined war on tomcat instead?
>
> I honestly do not know what is wrong but if the combined war works you
> have something to compare/contrast against.
>
> Karl
>
>
> On Thu, Nov 7, 2019 at 2:45 PM
Have you tried deploying the combined war on tomcat instead?
I honestly do not know what is wrong but if the combined war works you have
something to compare/contrast against.
Karl
On Thu, Nov 7, 2019 at 2:45 PM SREEJITH va wrote:
> Thanks Karl, Here is quick summary on how I embedded
Thanks Karl, Here is quick summary on how I embedded Manifold in my
application.
- All the required manifold jar dependencies are in pom.
- The properties.xml is served through org.apache.manifoldcf.configfile
settings in catalina.properties
- There is an application ready Lister
How are you embedding ManifoldCF in your application?
What looks like is happening is that thread contexts are being lost
somehow. ManifoldCF uses thread contexts to keep track of worker
thread-local information, and it appears that you are calling into
ManifoldCF code assuming that (for
Hi All,
I have an spring based application in which Manifold is embedded and
running in tomcat. At some point I am getting below exceptions. Any lead
on why this happening would be greatly appreciated.
One scenario in which I can see this in my logs is while shutting down the
tomcat. And if it
Many thanks!!
On Wed, Nov 6, 2019 at 12:14 PM SREEJITH va wrote:
> Yes. Exactly.
>
> On Wed, Nov 6, 2019 at 12:04 PM Priya Arora wrote:
>
>> Hi Sir,
>>
>>
>>
>> This connector code I am looking for , does JCIF connector is the same ?
>> [image: image.png]
>>
>> Thanks
>> Priya
>>
>> On Wed,
Hi Sir,
This connector code I am looking for , does JCIF connector is the same ?
[image: image.png]
Thanks
Priya
On Wed, Nov 6, 2019 at 11:58 AM SREEJITH va wrote:
> I think you are searching for JCIFS connector. Its the windows repository
> connector. Its in manifoldcf\connectors\jcifs
>
>
I think you are searching for JCIFS connector. Its the windows repository
connector. Its in manifoldcf\connectors\jcifs
On Wed, Nov 6, 2019 at 11:19 AM Priya Arora wrote:
> Hi,
>
> I need to implement "Window Shares" type Repository connection type and
> needs to access and understand code
When I created a new job and followed the process of lifecycle/execution of
identifier, then it didnt start the Deletion process. There was no any
change in job configuration and in database start-up and configurations.
On Sat, Nov 2, 2019 at 12:41 AM Priya Arora wrote:
> No, I am not deleting
Hi,
I need to implement "Window Shares" type Repository connection type and
needs to access and understand code first.
But I am unable to find its code at path:-
No, I am not deleting the job after it run.. its status is getting updated as
‘Done’ after all process.
Although the process involves indexation of documents and just before the job
ends deletion process executed
Sequence is fetch etc, indexation ,extract other processes , deletion then job
Ok, so pick ONE of these identifiers.
What I want to see is the entire lifecycle of the ONE identifier. That
includes what the Web Connection logs as well as what the indexation logs.
Ideally I'd like to see:
- job start and end
- web connection events
- indexing events
I'd like to see these
Indexation screenshot is as below.
[image: image.png]
On Tue, Oct 29, 2019 at 7:57 PM Karl Wright wrote:
> I need both ingestion and deletion.
> Karl
>
>
> On Tue, Oct 29, 2019 at 8:09 AM Priya Arora wrote:
>
>> History is shown as below as it does not indicates any error.
>> [image: 12.JPG]
I need both ingestion and deletion.
Karl
On Tue, Oct 29, 2019 at 8:09 AM Priya Arora wrote:
> History is shown as below as it does not indicates any error.
> [image: 12.JPG]
>
> Thanks
> Priya
>
> On Tue, Oct 29, 2019 at 5:02 PM Karl Wright wrote:
>
>> What does the history say about these
History is shown as below as it does not indicates any error.
[image: 12.JPG]
Thanks
Priya
On Tue, Oct 29, 2019 at 5:02 PM Karl Wright wrote:
> What does the history say about these documents?
> Karl
>
> On Tue, Oct 29, 2019 at 6:53 AM Priya Arora wrote:
>
>>
>> it may be that (a) they
What does the history say about these documents?
Karl
On Tue, Oct 29, 2019 at 6:53 AM Priya Arora wrote:
>
> it may be that (a) they weren't found, or (b) that the document
> specification in the job changed and they are no longer included in the job.
>
> URL's that were deleted are valid
Hi, JAVA_HOME is set to /usr/lib/jvm/java-8-openjdk-amd64
On Fri, Oct 18, 2019 at 11:26 AM Priya Arora wrote:
> Hi Sreejith,
>
> Can you please let me know,,JAVA_HOME variable value you have set.
>
> Thanks
> Priya
>
> On Thu, Oct 17, 2019 at 12:17 PM SREEJITH va
> wrote:
>
>> Hi, We use
Hi Sreejith,
Can you please let me know,,JAVA_HOME variable value you have set.
Thanks
Priya
On Thu, Oct 17, 2019 at 12:17 PM SREEJITH va wrote:
> Hi, We use manifold with openjdk version "1.8.0_222" in Centos and did not
> faced any issues.
>
> On Thu, Oct 17, 2019 at 12:04 PM Bisonti Mario
Hi, We use manifold with openjdk version "1.8.0_222" in Centos and did not
faced any issues.
On Thu, Oct 17, 2019 at 12:04 PM Bisonti Mario
wrote:
> Hallo, I use Ubuntu 18.04.02 LTS with:
> openjdk version "11.0.4" 2019-07-16
>
>
>
> And I have no issue with ManifoldCF
>
>
>
> Mario
>
>
>
>
Hallo, I use Ubuntu 18.04.02 LTS with:
openjdk version "11.0.4" 2019-07-16
And I have no issue with ManifoldCF
Mario
Da: Markus Schuch
Inviato: giovedì 17 ottobre 2019 07:35
A: user@manifoldcf.apache.org; Praveen Bejji
Oggetto: Re: Manifold with OpenJDK
Hi Praveen,
we use openjdk 8 in
Hi Praveen,
we use openjdk 8 in dockered red hat linux for 2 years now and didn't have
problems with it.
We had one minor issue when we migrated: the image processing capabilities of
openjdk are somehow different from Oracle JDK. One of our connectors creates
image thumbnails and on openjdk
I use it this way all the time.
Karl
On Wed, Oct 16, 2019 at 11:32 AM Praveen Bejji
wrote:
> Hi,
>
> We are planning on using ManifoldCF with Open JDK 1.8 on Linux server.
> Can you please let us know if there are any known issues/challenges on
> using ManifldCF with Open JDK?
>
>
> Thanks,
>
Hi,
We are planning on using ManifoldCF with Open JDK 1.8 on Linux server. Can
you please let us know if there are any known issues/challenges on using
ManifldCF with Open JDK?
Thanks,
Praveen
If there is such a connector, I don't know about it. Hopefully we'll find
out soon if somebody has developed one on the outside they're willing to
contribute or make available.
Karl
On Fri, Oct 11, 2019 at 2:17 PM SREEJITH va wrote:
> Hi, I am working on a document migration project, which
Hi, I am working on a document migration project, which requires to migrate
documents to Box( https://www.box.com/) system. Do we have any output
connector exist for box system or any development in progress?
Thanks for your answer Karl. I was unsure about that concerning the output
connections but it is still the same pipeline after all.
Message d'origine De : Karl Wright Date :
10/09/2019 20:08 (GMT+01:00) À : user@manifoldcf.apache.org Objet : Re: Job
Multiple Outputs Hi
Hi Julien,
You must understand that a job with a complex pipeline is really not
running N independent jobs; it's running ONE job. Every document is
processed through the pipeline only once. The pipeline may have faster
components and slower components; doesn't matter; the document takes the
sum
Ok, so to be sure I understood what you are saying:
suppose a job with two output connections and one of the outputs is
twice time faster than the other one to index documents. At a given time
t, both of the outputs will have indexed the same amount of documents,
no matter if one output is
The output connection contract is that a request to index is made to the
connector, and the connector returns when it is done.
When there are multiple output connections, these are each handed a copy of
the document, one after the other, and told to index it. This is all done
by one worker
Hi,
I would like to have an explanation about the behavior of a job when
several outputs are configured. My main question is : for each output,
how is the docs ingestion managed ? More precisely, are the ingest
processes synchronized or not ? (in other words, is the ingestion of the
next
Hi Karl,
yes, this helps.
The webpage is now ingested after tika extraction and i only have to
include the mime type text/html in the solr output connection.
Many thanks.
Cheers
Markus
Am 23.08.2019 um 13:45 schrieb Karl Wright:
> Created a ticket: CONNECTORS-1621. Added a fix. Please let
Created a ticket: CONNECTORS-1621. Added a fix. Please let me know if it
resolves the problem for you.
Thanks,
Karl
On Fri, Aug 23, 2019 at 7:33 AM Karl Wright wrote:
> Hi Markus,
>
> You are correct.
> This code was added as part of
> https://issues.apache.org/jira/browse/CONNECTORS-1482 .
Hi Markus,
You are correct.
This code was added as part of
https://issues.apache.org/jira/browse/CONNECTORS-1482 . The code that was
added does look at the content mime type.
The reason that the mime type is not modified in the document being passed
to Solr by Tika is because we want Solr to
I already have "update" in the handler field. One can see that in the
gist link i posted and it is not working.
The HttpPoster of the SolrConnector takes
RepositoryDocument.getMimeType() and checks the mime type against the
hardcoded plain text mime type list, if solr cell mode (extracting
There are two possible ways to configure Tika with Solr.
First way: Tika extractor + Solr update handler
Second way: no Tika extractor + Solr update/extract handler
For the first way, the Solr Connector completely ignores any "accepted mime
types" you set for it, and only accepts text/plain. For
Hi Karl,
what do i have to do to make tika declare the extracted plain text with
mime type text/plain in my setup?
As i said, i have a tika extractor in place:
Pipeline:
1) Webcrawler Connector (Repository Connection)
2) Tika Extractor (Transformation)
3) Solr Connector (Output
Hi Markus,
If you use the straight update handler, with no Tika filter, then the Solr
Connector by design restricts input to textual documents. We can perhaps
broaden that to web pages but then you will be indexing HTML tags as well
and I rather doubt that's what you want.
If you run Tika
Hi Rafa,
Thank you for your valuable suggestions.
On Tue, Aug 13, 2019 at 5:25 PM Rafa Haro wrote:
> Hi Dileepa,
>
> IMHO, Furkan's approach makes the most sense here. As Olivier pointed out,
> to retrieve the original content from a Lucene based index, all the fields
> you are interested in
Hi All,
Thank you for your replies.
@Furkan, Olivier, thanks for the pointers. I will check the approach of the
Solr repository connector as per given references.
@Olivier if you can contribute the Solr repo-connector you are working on,
to MCF that will be awesome! Will be looking forward to an
Hello,
We are currently working on this kind of repository connector for a customer.
We plan to give the code to the MCF project if the customer lets us do it
legally. We will know it at the end of the month or at the beginning of next
month.
In order to have this working, all the fields of
Hi Dileepa,
Writing a custom repository connector can let you achieve your goal. Read
and directly write to an output connector.
You should check your requirements i.e. which data sources you will
connect. MCF may rid of huge integration pains compared to many other ETL
tools in your case.
On
I would strongly suggest going directly to the repositories rather than the
Solr index, where possible, as the source for the documents you are
indexing. This is MCF's standard use case. It is meant to handle
disparate repositories all going into a single output. Effort is made in
every
Hi Karl and all,
In my use-case, one of the data-sources is an already populated Solr index
which is an e-commerce web-site data index (customers, products &
services).
Apart from the Solr Index, I need to ingest several other heterogeneous
data-sources such as PostgresSQL databases, CRM data etc
If you are trying to extract data from a Solr index, I know of no way to do
that.
Karl
On Mon, Aug 5, 2019 at 9:08 AM Dileepa Jayakody
wrote:
> Hi All,
>
> Thanks for your replies.
> I'm looking for a repository connector. I've used the Solr output
> connector before. But now what I need is to
Hi All,
Thanks for your replies.
I'm looking for a repository connector. I've used the Solr output connector
before. But now what I need is to connect to a solr index as a repository
and retrieve the documents from there. So I need a Solr repository
connector.
@Karl
I will look at the Solr
Hi Dileepa,
You can check all MFC Connectors list from
https://manifoldcf.apache.org/release/release-2.13/en_US/included-connectors.html
MFC have a Solr Output Connector. It is not a repository connector. if you
want to use as repository connector, you should write a new repository
connector.
If you use Solr Cloud, ManifoldCF's Solr Connector should work for you.
Karl
On Mon, Aug 5, 2019 at 6:18 AM Dileepa Jayakody
wrote:
> Hi All,
>
> I'm working on a project which needs to implement a federated search
> solution with heterogeneous data repositories. One repository is a Solr
>
Hi All,
I'm working on a project which needs to implement a federated search
solution with heterogeneous data repositories. One repository is a Solr
index. I would like to use ManifoldCF as the data ingestion engine in this
project as I have worked with MCF before.
Does ManifoldCF has a Solr
Hello Karl,
What can I modify exactly in DDB to communicate at ManifoldCF to reindex
certain documents ? (Lines and Table)
I would like him to go back to some documents according to certain scripted
conditions.
Thanks you
Hi Praveen,
If there is a broken query plan, it will show up in the ManifoldCF log; any
query that takes more than 60 seconds to run gets dumped and explained. So
it should be possible to rule that out with low effort.
The kind of situation I have seen with very large document jobs is that
Hi,
We are trying to index close to one million document using documentum
connector. Indexing is working fine but we see a drop in indexing
performance after first day. Connector is able to index 21k/hr on the first
day but it drops to 10k/hr after 24-28 hours. Although we don't see any
errors
Thanks Karl
Yes, i do agree that looking at the Jetty logs should give us some clue. I
will check and get back on this.
On Tue, Jul 16, 2019 at 2:31 PM Karl Wright wrote:
> Hosting on a different app server is something you could easily do. Or,
> since this takes many months before it
Hosting on a different app server is something you could easily do. Or,
since this takes many months before it appears, you might just live with it.
But first, there should be access logs the Jetty writes to. It should be
possible for you to see what's happening from those logs if you can find
@Michael,
There are no error in the on the logs. The app just goes down abruptly.
@Karl,
Assuming that Jetty server has some issue, what do you suggest? Is hosting
Manifold on some other server(say Tomcat ) an alternative?
On Mon, Jul 15, 2019 at 9:04 AM Michael Cizmar
wrote:
> Are there
Hallo.
Thanks..I didn’t read the documentation about the sidecar documentum process.
Thanks a lot.
.
Da: Karl Wright
Inviato: martedì 16 luglio 2019 13:20
A: user@manifoldcf.apache.org
Oggetto: Re: Documentum connection not working
Are you running the documentum connector sidecar processes?
301 - 400 of 3346 matches
Mail list logo