Re: Multiprocess file installation of manifold
File synchronization is still supported but is deprecated. We recommend zookeeper synchronization unless you have a very good reason not to. Karl On Wed, Feb 17, 2021 at 12:26 PM Ananth Peddinti wrote: > Hello Team , > > > I would like to know if someone has already done multi-process model > installation of manifold on linux machine .I would like to know the > process in detail.We are running into issues with the quick start model. > > > > Regards > > Ananth > -- > > -SECURITY/CONFIDENTIALITY WARNING- > > This message and any attachments are intended solely for the individual or > entity to which they are addressed. This communication may contain > information that is privileged, confidential, or exempt from disclosure > under applicable law (e.g., personal health information, research data, > financial information). Because this e-mail has been sent without > encryption, individuals other than the intended recipient may be able to > view the information, forward it to others or tamper with the information > without the knowledge or consent of the sender. If you are not the intended > recipient, or the employee or person responsible for delivering the message > to the intended recipient, any dissemination, distribution or copying of > the communication is strictly prohibited. If you received the communication > in error, please notify the sender immediately by replying to this message > and deleting the message and any accompanying files from your system. If, > due to the security risks, you do not wish to receive further > communications via e-mail, please reply to this message and inform the > sender that you do not wish to receive further e-mail from the sender. > (LCP301) > >
Multiprocess file installation of manifold
Hello Team , I would like to know if someone has already done multi-process model installation of manifold on linux machine .I would like to know the process in detail.We are running into issues with the quick start model. Regards Ananth -- -SECURITY/CONFIDENTIALITY WARNING- This message and any attachments are intended solely for the individual or entity to which they are addressed. This communication may contain information that is privileged, confidential, or exempt from disclosure under applicable law (e.g., personal health information, research data, financial information). Because this e-mail has been sent without encryption, individuals other than the intended recipient may be able to view the information, forward it to others or tamper with the information without the knowledge or consent of the sender. If you are not the intended recipient, or the employee or person responsible for delivering the message to the intended recipient, any dissemination, distribution or copying of the communication is strictly prohibited. If you received the communication in error, please notify the sender immediately by replying to this message and deleting the message and any accompanying files from your system. If, due to the security risks, you do not wish to receive further communications via e-mail, please reply to this message and inform the sender that you do not wish to receive further e-mail from the sender. (LCP301)
Re: Job Content Length issue
The internal Tika is not memory bounded; some transformations stream, but others put everything into memory. You can try using the external tika, with a tika instance you run separately, and that would likely help. But you may need to give it lots of memory too. Karl On Wed, Feb 17, 2021 at 3:50 AM ritika jain wrote: > Hi Karl, > > I am using Elastic search as an output connector and yes using an internal > Tika extracter, not using solr output connection. > > Also Elastic search server is on hosted on different server with huge > memory allocation. > > On Tue, Feb 16, 2021 at 7:29 PM Karl Wright wrote: > >> Hi, do you mean content limiter length of 100? >> >> I assume you are using the internal Tika transformer? Are you combining >> this with a Solr output connection that is not using the extract handler? >> >> By "manifold crashes" I assume you actually mean it runs out of memory. >> The "long running query" concern is a red herring because that does not >> cause a crash under any circumstances. >> >> This is quite likely if I described your setup above, because if you do >> not use the Solr extract handler, the entire content of every document must >> be loaded into memory. That is why we require you to fill in a Solr field >> on those kind of output connections that limits the number of bytes. >> >> Karl >> >> >> On Tue, Feb 16, 2021 at 8:45 AM ritika jain >> wrote: >> >>> >>> >>> Hi users >>> >>> >>> I am using manifoldcf 2.14 Fileshare connector to crawl files from smb >>> server which is having some millions billions of records to process and >>> crawl. >>> >>> Total system memory is 64Gb out of which start options file of manifold >>> is defined as 32GB. >>> >>> We have some larger files to crawl around 30 MB of file or more that >>> than . >>> >>> When mentioned size in the content limiter tab is 10 that is 1 MB >>> job works fine but when changed to 1000 that is 10 MB .. manifold >>> crashes with some logs with long running query . >>> >>> How we can achieve or optimise job specifications to process large >>> documents also. >>> >>> Do I need to increase or decrease the number of connections or number of >>> worker thread count or something. >>> >>> Can anybody help me on this to crawl larger files too at least till 10 MB >>> >>> Thanks >>> >>> Ritika >>> >>
Re: Job Content Length issue
Hi Karl, I am using Elastic search as an output connector and yes using an internal Tika extracter, not using solr output connection. Also Elastic search server is on hosted on different server with huge memory allocation. On Tue, Feb 16, 2021 at 7:29 PM Karl Wright wrote: > Hi, do you mean content limiter length of 100? > > I assume you are using the internal Tika transformer? Are you combining > this with a Solr output connection that is not using the extract handler? > > By "manifold crashes" I assume you actually mean it runs out of memory. > The "long running query" concern is a red herring because that does not > cause a crash under any circumstances. > > This is quite likely if I described your setup above, because if you do > not use the Solr extract handler, the entire content of every document must > be loaded into memory. That is why we require you to fill in a Solr field > on those kind of output connections that limits the number of bytes. > > Karl > > > On Tue, Feb 16, 2021 at 8:45 AM ritika jain > wrote: > >> >> >> Hi users >> >> >> I am using manifoldcf 2.14 Fileshare connector to crawl files from smb >> server which is having some millions billions of records to process and >> crawl. >> >> Total system memory is 64Gb out of which start options file of manifold >> is defined as 32GB. >> >> We have some larger files to crawl around 30 MB of file or more that than >> . >> >> When mentioned size in the content limiter tab is 10 that is 1 MB job >> works fine but when changed to 1000 that is 10 MB .. manifold crashes >> with some logs with long running query . >> >> How we can achieve or optimise job specifications to process large >> documents also. >> >> Do I need to increase or decrease the number of connections or number of >> worker thread count or something. >> >> Can anybody help me on this to crawl larger files too at least till 10 MB >> >> Thanks >> >> Ritika >> >