[ https://issues.apache.org/jira/browse/CONNECTORS-1579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Wright resolved CONNECTORS-1579. ------------------------------------- Resolution: Fixed Fix Version/s: ManifoldCF 2.13 r1853008 > Error when crawling a MSSQL table > --------------------------------- > > Key: CONNECTORS-1579 > URL: https://issues.apache.org/jira/browse/CONNECTORS-1579 > Project: ManifoldCF > Issue Type: Bug > Components: JDBC connector > Affects Versions: ManifoldCF 2.12 > Reporter: Donald Van den Driessche > Assignee: Karl Wright > Priority: Major > Fix For: ManifoldCF 2.13 > > Attachments: 636_bb2.csv, CONNECTORS-1579.patch > > > When I'm crawling a MSSQL table through the JDBC connector I get following > error on multiple lines: > > {noformat} > FATAL 2019-02-05T13:21:58,929 (Worker thread '40') - Error tossed: Multiple > document primary component dispositions not allowed: document '636' > java.lang.IllegalStateException: Multiple document primary component > dispositions not allowed: document '636' > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.checkMultipleDispositions(WorkerThread.java:2125) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1624) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread$ProcessActivity.noDocument(WorkerThread.java:1605) > ~[mcf-pull-agent.jar:?] > at > org.apache.manifoldcf.crawler.connectors.jdbc.JDBCConnector.processDocuments(JDBCConnector.java:944) > ~[?:?] > at > org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:399) > [mcf-pull-agent.jar:?]{noformat} > I looked this error up on the internet and it said that it might have > something to do with using the same key for different lines. > I checked, but I couldn't find any duplicates that match any of the selected > fields in the JDBC. > Hereby my queries: > Seeding query > {code:java} > SELECT pk1 as $(IDCOLUMN) > FROM dbo.bb2 > WHERE search_url IS NOT NULL > AND mimetype IS NOT NULL AND mimetype NOT IN ('unknown/unknown', > 'application/xml', 'application/zip'); > {code} > Version check query: none > Access token query: none > Data query: > > > {code:java} > SELECT > pk1 AS $(IDCOLUMN), > search_url AS $(URLCOLUMN), > ISNULL(content, '') AS $(DATACOLUMN), > doc_id, > search_url AS url, > ISNULL(title, '') as title, > ISNULL(groups,'') as groups, > ISNULL(type,'') as document_type, > ISNULL(users, '') as users > FROM dbo.bb2 > WHERE pk1 IN $(IDLIST); > {code} > The hereby added csv is the corresponding line from the table. > [^636_bb2.csv] > > Due to this problem, the whole crawling pipeline is being held up. It keeps > on retrying this line. > Could you help me understand this error? > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)