limit crawler to defined depth
--
Key: NUTCH-1331
URL: https://issues.apache.org/jira/browse/NUTCH-1331
Project: Nutch
Issue Type: New Feature
Components: generator, parser, storage
Affects
parser not extract outlinks to external web sites
-
Key: NUTCH-1329
URL: https://issues.apache.org/jira/browse/NUTCH-1329
Project: Nutch
Issue Type: Bug
Components: parser
a problem with regex-normalize.xml
--
Key: NUTCH-1328
URL: https://issues.apache.org/jira/browse/NUTCH-1328
Project: Nutch
Issue Type: Bug
Components: parser
Affects Versions: 1.4
fetch queue management
--
Key: NUTCH-1309
URL: https://issues.apache.org/jira/browse/NUTCH-1309
Project: Nutch
Issue Type: Improvement
Components: fetcher
Affects Versions: 1.4
Reporter: behnam
Fetcher to skip queues for URLS getting repeated exceptions, based on percentage
Key: NUTCH-1303
URL: https://issues.apache.org/jira/browse/NUTCH-1303
Project: Nutch
it is better for fetchItemQueues to select items from greater queues first
--
Key: NUTCH-1297
URL: https://issues.apache.org/jira/browse/NUTCH-1297
Project: Nutch
Issue
Generator should not generate filter and not found and denied and gone and
permanently moved pages
--
Key: NUTCH-1288
URL:
linkdb scalability
--
Key: NUTCH-1282
URL: https://issues.apache.org/jira/browse/NUTCH-1282
Project: Nutch
Issue Type: Improvement
Components: linkdb
Affects Versions: 1.4
Reporter: behnam nikbakht
tika parser not work properly with unwanted file types that passed from filters
in nutch
Key: NUTCH-1281
URL: https://issues.apache.org/jira/browse/NUTCH-1281
Fetch Improvement in threads per host
-
Key: NUTCH-1278
URL: https://issues.apache.org/jira/browse/NUTCH-1278
Project: Nutch
Issue Type: New Feature
Components: fetcher
Affects Versions: 1.4
Generate main problems
--
Key: NUTCH-1269
URL: https://issues.apache.org/jira/browse/NUTCH-1269
Project: Nutch
Issue Type: Improvement
Components: generator
Affects Versions: 1.4
Environment:
some of Deflate encoded pages not fetched
-
Key: NUTCH-1270
URL: https://issues.apache.org/jira/browse/NUTCH-1270
Project: Nutch
Issue Type: Bug
Components: fetcher
Affects Versions: 1.4
not all of pages parsed
---
Key: NUTCH-1204
URL: https://issues.apache.org/jira/browse/NUTCH-1204
Project: Nutch
Issue Type: Bug
Components: parser
Affects Versions: 1.3
Reporter: behnam
unfetched URLs problem
--
Key: NUTCH-1199
URL: https://issues.apache.org/jira/browse/NUTCH-1199
Project: Nutch
Issue Type: Improvement
Components: fetcher, generator
Reporter: behnam nikbakht
14 matches
Mail list logo