Hi Everyone,
I started to use Nutch 1.10 for my homework and I see that every time I
perform a crawl using the same configuration and same seed urls I get a
different number of fetched urls. This occurs even when the old crawl data
is deleted.
This way I would not be able to identify which URLs
[
https://issues.apache.org/jira/browse/NUTCH-1730?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317829#comment-14317829
]
Markus Jelsma commented on NUTCH-1730:
--
Anything to add to this modificiation?
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sebastian Nagel resolved NUTCH-1939.
Resolution: Fixed
Committed to trunk, v1659227. Thanks, [~leoyey]!
Fetcher fails to
[
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319064#comment-14319064
]
Markus Jelsma commented on NUTCH-1925:
--
Ja, ill check it in tomorrow. Any comments on
I think I have possibly finished installing.
What you need to do:
0. git status and checkout what you have modified.
1. patch -p0 YOUR_PATCH_FILE
2. ant clean jar
3. ant runtime
Will try crawling using selenium later on. Hope this helped. _
On Thu, Feb 12, 2015 at 9:20 AM, Mattmann, Chris A
Hi,
Please send a message to dev-subscr...@nutch.apache.org to subscribe to the
list.
Tyler
On Feb 12, 2015 6:54 PM, Poojan Jhaveri pjhav...@usc.edu wrote:
Cool. Issue resolved now.
Thanks Sebastian !
On Wed, Feb 11, 2015 at 12:21 PM, Sebastian Nagel
wastl.na...@googlemail.com wrote:
Hi,
the jetty-client-6.1.22.jar
is a dependency needed only for testing.
Consequently, it's placed in
build/test/lib/
but only if you run the tests, resp.
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14319277#comment-14319277
]
Leo Ye commented on NUTCH-1939:
---
Good to see we fixed it. Thank you, [~wastl-nagel]
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1323:
-
Fix Version/s: (was: 1.11)
1.10
AjaxNormalizer
--
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1323.
--
Resolution: Fixed
Just in time for 1.10, Committed to trunk in revision 1659167.
[
https://issues.apache.org/jira/browse/NUTCH-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317826#comment-14317826
]
Markus Jelsma commented on NUTCH-1921:
--
Anything to add to this optional settings?
[
https://issues.apache.org/jira/browse/NUTCH-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317828#comment-14317828
]
Markus Jelsma commented on NUTCH-1684:
--
Anything to add to this? I think this can go
[
https://issues.apache.org/jira/browse/NUTCH-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317874#comment-14317874
]
Hudson commented on NUTCH-1913:
---
SUCCESS: Integrated in Nutch-trunk #2971 (See
[
https://issues.apache.org/jira/browse/NUTCH-1323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317873#comment-14317873
]
Hudson commented on NUTCH-1323:
---
SUCCESS: Integrated in Nutch-trunk #2971 (See
[
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317815#comment-14317815
]
Markus Jelsma commented on NUTCH-1925:
--
Committed to trunk in revision 1659168.
Hi all,
Anyone here knows where to find the setup tutorial for Selenium on Mac ?? I
find it difficult to install Xvfb on mac.
Best,
Jiaxin
On Tue, Feb 10, 2015 at 9:42 PM, Sapnashri Suresh sapna...@usc.edu wrote:
Hi Shuo Li,
We were facing a similar issue. Prof. Mattman suggested we look
Interestingly, I'm a mac user but I don't want to screw my laptop so I'm
using vagrant with Ubuntu Trusty. It doesn't have GUI but Xvfb can still be
installed properly. The issue would be I don't know how to integrate
Selenium with Nutch 1.10.
On Thu, Feb 12, 2015 at 12:04 AM, Jiaxin Ye
[
https://issues.apache.org/jira/browse/NUTCH-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14317816#comment-14317816
]
Markus Jelsma commented on NUTCH-1913:
--
Thanks Sebastian, committed to trunk in
[
https://issues.apache.org/jira/browse/NUTCH-1913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma resolved NUTCH-1913.
--
Resolution: Fixed
LinkDB to implement db.ignore.external.links
Hi Li, Shuo. You are so right. I finished installing and successfully run
the butch with selenium and Firefox. I have a question though, does your
Firefox plug out for always all the urls we crawled?
Hi Prof Mattmann. I think here is the way we install selenium on MAC with
OS higher than 10.6 I
This is great, Jiaxin, can you please make a wiki page on the Nutch
wiki that has this information?
++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory
Sure. I will do it once I confirm it works...
On Thursday, February 12, 2015, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
This is great, Jiaxin, can you please make a wiki page on the Nutch
wiki that has this information?
Julien Nioche created NUTCH-1942:
Summary: Remove TopLevelDomain
Key: NUTCH-1942
URL: https://issues.apache.org/jira/browse/NUTCH-1942
Project: Nutch
Issue Type: Task
Reporter:
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.
The instructions for integrating Selenium with Nutch 1.10-trunk
are here:
https://issues.apache.org/jira/browse/NUTCH-1933
Cheers,
Chris
[
https://issues.apache.org/jira/browse/NUTCH-1939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318099#comment-14318099
]
Hudson commented on NUTCH-1939:
---
SUCCESS: Integrated in Nutch-trunk #2972 (See
[
https://issues.apache.org/jira/browse/NUTCH-1925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1925:
-
Attachment: NUTCH-1925-2x.patch
Patch for 2.x, it seems to be working. Please confirm.
Upgrade
Hi professor, but can we use Selenium on Mac?
On Thursday, February 12, 2015, Mattmann, Chris A (3980)
chris.a.mattm...@jpl.nasa.gov wrote:
You need Selenium Jiaxin, in order to crawl dynamic pages in the
polar dataset you have been assigned in my CSCI 572 search engines class.
The
[
https://issues.apache.org/jira/browse/NUTCH-1942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14318308#comment-14318308
]
Chris A. Mattmann commented on NUTCH-1942:
--
Julien can you tell me more about
Yes I believe you need to install X11 - why don't you try and report back what
you find thanks.
Sent from my iPhone
On Feb 12, 2015, at 8:28 AM, Jiaxin Ye
jiaxi...@usc.edumailto:jiaxi...@usc.edu wrote:
Hi professor, but can we use Selenium on Mac?
On Thursday, February 12, 2015, Mattmann,
[
https://issues.apache.org/jira/browse/NUTCH-1724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated NUTCH-1724:
-
Attachment: NUTCH-1724-trunk.patch
Modified to adhere to Lewis' changes. Will commit shortly
31 matches
Mail list logo