[
https://issues.apache.org/jira/browse/NUTCH-1774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
sreemanth pulagam updated NUTCH-1774:
-
Attachment: NUTCH-1774.patch
Patch file to fix this issue.
Resolution:
1. Generate the b
sreemanth pulagam created NUTCH-1774:
Summary: Crawling from REST API giving NullPointerException
Key: NUTCH-1774
URL: https://issues.apache.org/jira/browse/NUTCH-1774
Project: Nutch
Issu
Hi Diaa,
Yes, you can open an issue for these fixes and attach patches if you can.
Cheers,
Markus
Diaa Abdallah schreef:Hi,
I noticed that nutch doesn't handle cleaning up (removing temp folders) in case
of error.
In the following classes temp directories are created but not removed when
th
Hi,
I noticed that nutch doesn't handle cleaning up (removing temp folders) in
case of error.
In the following classes temp directories are created but not removed when
there is an error:
1. Injector
2. CrawlDBReader
3. Deduplication
4. SegmentReader
For example in injector you find:
RunningJob ma
[
https://issues.apache.org/jira/browse/NUTCH-1772?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1772:
-
Attachment: NUTCH-1772.patch
> Injector does not need merging if no pre-existing crawldb
> --
Julien Nioche created NUTCH-1772:
Summary: Injector does not need merging if no pre-existing crawldb
Key: NUTCH-1772
URL: https://issues.apache.org/jira/browse/NUTCH-1772
Project: Nutch
Issue
[
https://issues.apache.org/jira/browse/NUTCH-1752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995055#comment-13995055
]
Julien Nioche commented on NUTCH-1752:
--
Looks good! +1
> cache robots.txt rules per
[
https://issues.apache.org/jira/browse/NUTCH-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Diaa closed NUTCH-1766.
---
Fixed. Thanks
> Generator to unlock crawldb and remove tempdir if generate job fails
> -
[
https://issues.apache.org/jira/browse/NUTCH-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993766#comment-13993766
]
Ralf commented on NUTCH-1770:
-
I just compiled the 2.x branch, no problems parsing PDF's here.
[
https://issues.apache.org/jira/browse/NUTCH-1669?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995059#comment-13995059
]
Julien Nioche commented on NUTCH-1669:
--
Hi Rafael
Looks like this issue went unnotic
[
https://issues.apache.org/jira/browse/NUTCH-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1613:
-
Fix Version/s: (was: 2.4)
1.9
2.3
> Timeouts in protoco
[
https://issues.apache.org/jira/browse/NUTCH-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13992728#comment-13992728
]
Ralf commented on NUTCH-1679:
-
HI,
I would love to participate, how can I check out the 2.3 c
[
https://issues.apache.org/jira/browse/NUTCH-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche reassigned NUTCH-1766:
Assignee: Julien Nioche
> Generator to unlock crawldb and remove tempdir if generate job fa
[
https://issues.apache.org/jira/browse/NUTCH-1714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994921#comment-13994921
]
Julien Nioche commented on NUTCH-1714:
--
[~shekoufa]
bq. After applying NUTCH-1714 a
Diaa created NUTCH-1771:
---
Summary: Solrindex fails if a segment is corrupted or incomplete
Key: NUTCH-1771
URL: https://issues.apache.org/jira/browse/NUTCH-1771
Project: Nutch
Issue Type: Bug
[
https://issues.apache.org/jira/browse/NUTCH-1770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994908#comment-13994908
]
Julien Nioche commented on NUTCH-1770:
--
[~tilman] There are warnings in the logs + a
[
https://issues.apache.org/jira/browse/NUTCH-1622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13994191#comment-13994191
]
Daniel Kugel commented on NUTCH-1622:
-
I might have done something wrong but reading t
[
https://issues.apache.org/jira/browse/NUTCH-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche resolved NUTCH-1766.
--
Resolution: Fixed
Committed revision 1593901.
Thanks!
> Generator to unlock crawldb and remov
[
https://issues.apache.org/jira/browse/NUTCH-1766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Julien Nioche updated NUTCH-1766:
-
Priority: Minor (was: Major)
> Generator to unlock crawldb and remove tempdir if generate job fa
19 matches
Mail list logo