't use anything.
>> >> > Hadoop uses pmd integrate in Hudson.
>> >> >
>> >>
>> >> Does this mean we do not need pmd jars in nutch ( are they provided by
>> hudson)?
>> >>
>> >> > Otis
>> >> &
;>
>>> > Otis
>>> > --
>>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>> >
>>> >
>>> >
>>> > - Original Message
>>> >> From: Doğacan Güney
>>> >> To: nutch-de
pmd-ext contains PMD (http://pmd.sourceforge.net/) libraries. I have
committed them long time ago in an attempt to bring some static
analysis toools to nutch sources. There was a short discussion around
it and we all thought t was worth doing but it never gained enough
momentum. There is a pmd ta
Congratulations and welcome,
Piotr
On 3/5/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
OK. I finally figured out how to republish the site. Only took me 3
days. Feeling hazed now! :)
Dennis Kubes
Sami Siren wrote:
> Welcome on board Dennis!
>
> --
> Sami Siren
>
> Dennis Kubes wrote:
>> Hi
Chris,
I have documented the process in the wiki. Doug have sent the links
already. If you have any questions I would be willing to help. I can
even do it myself if find it difficult - I simply do not want to be
the bottleneck as I am behind my schedule at work and in private life.
I still hope I
Otis,
Some time ago people on the list said that they are willing to at
least maintain Nutch 0.7 branch. As a committer (not very active
recently) I volunteered to commit patches when they appear - I do not
have enough time at the moment to do active coding. I have created a
7.3 release in JIRA so
[
https://issues.apache.org/jira/browse/NUTCH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Piotr Kosiorowski closed NUTCH-429.
---
Resolution: Invalid
Please use nutch-user mailing list for such questions and JIRA for
As no objections were raised I created a 0.7.3 version in JIRA so we can
start assigning current JIRA issues to it.
Regards
Piotr
Piotr Kosiorowski wrote:
Hello committers,
Based on a recent discussion on nutch user list - (Strategic Direction
of Nutch) I would like to prepare 0.7.3 release
Hello committers,
Based on a recent discussion on nutch user list - (Strategic Direction
of Nutch) I would like to prepare 0.7.3 release. The idea is to allow
people who still use 0.7.2 to get rid of most important bugs and allow
them to add some small features they would need as the claim is 0.8.
Please read the tutorial on nutch site. O suggest posting such issues
to nutch-user - you will have much higher chance of getting useful
response there.
regards
Piotr
On 11/9/06, kauu <[EMAIL PROTECTED]> wrote:
or it's the same with the version 0.8.x
any idea is preciated
On 11/9/06, kauu <[EMA
I think it might be a problem with ant version - it seems that you
have pretty old one. Please use latest ant version and try again.
Regards
Piotr
On 11/9/06, kauu <[EMAIL PROTECTED]> wrote:
hi :
i get a problem now ,i can't build the nutch in the linux os with ant
and my ant version is
Apa
+1
On 10/16/06, Doug Cutting <[EMAIL PROTECTED]> wrote:
Sami Siren wrote:
> looks like somebody just enabled email-to-jira-comments-feature. I was
> just wondering would it be good to use this feature more widely.
I think it would be good. That way mailing list discussion would be
logged to th
I had a look at it and it seems I do not have enough permissions to
change it. So probably this one goes to Doug...
P.
Chris Mattmann wrote:
Hey Guys,
Speaking of which, I noticed that Sami's issue below is a "Task" in JIRA,
which reminded me of a task that I input a long time ago that would b
I am looking at some easy JIRA issues to get back into Nutch now. I have
not seen any plans for releases on the list (I might have missed
something a but I tried to at least read the nutch lists) - do we have
some plans? Do we want to make a 0.8.2 release or rather go for 0.9 in
near (lets say -
[ http://issues.apache.org/jira/browse/NUTCH-374?page=all ]
Piotr Kosiorowski resolved NUTCH-374.
-
Fix Version/s: 0.9.0
Resolution: Fixed
Commited. Thanks.
> when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip
&
[ http://issues.apache.org/jira/browse/NUTCH-374?page=all ]
Piotr Kosiorowski reassigned NUTCH-374:
---
Assignee: Piotr Kosiorowski
> when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip
> or x-gzip , it can not fet
I like Hadoop version of workflow. I do not think that we would have
problems with reopenning as issues would not be closed immidiatelly
after resolving them. In some extreme situations one can always open a
new bug that references closed one.
Piotr
On 9/1/06, Chris Mattmann <[EMAIL PROTECTED]> wr
No objections form me. We waited long and we can fix things in
maitenance release in few weeks.
Regards
Piotr
On 7/26/06, Sami Siren <[EMAIL PROTECTED]> wrote:
Andrzej Bialecki wrote:
> Sami Siren wrote:
>
>> There is a package available for testing in
>> http://people.apache.org/~siren/nutch-0
I think I would log in both situations but different message.
+1
P.
On 7/21/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
Hi Developers,
another thing in the discussion to be more polite.
I suggest that we log a message in case an requested URL was blocked
by a robots.txt.
Optimal would be if
+1.
P.
Andrzej Bialecki wrote:
Sami Siren wrote:
How would folks feel about releasing 0.8 now, there has been quite a
lot of improvements/new features
since 0.7 series and I strongly feel that we should push the first 0.8
series release (alfa/beta)
out the door now. It would IMO lower the barri
,
is there a reason why this (among other) documentation (for all relevant
versions)
could not be maintained in trunk?
--
Sami Siren
Piotr Kosiorowski wrote:
Andrzej Bialecki wrote:
+1, yes it would be really confusing. Since there are more and more
people trying 0.8, could we perhaps
it so many times that I want to cross check).
Regards
Piotr
Dawid Weiss wrote:
What kind of problems? If you need something, let me know.
D.
Piotr Kosiorowski wrote:
I got some problems while applying Dawid clustering patch (my linux
environment looks not to be setu correctly) - but I switched
I got some problems while applying Dawid clustering patch (my linux
environment looks not to be setu correctly) - but I switched to cygwin
and it looks ok. I will try to commit it today/tommorow.
Regards
Piotr
On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote:
> Hi Guys,
>
> Any progress on th
Anton Potehin wrote:
Where now placed mapred branch of nutch ?
it is developed in trunk now.
P.
Hello,
I was looking at the cross-referenced code generation but it looks
like the package I found mentioned in PMD context is JXR - and this is
the part of maven as I suspect. As we are using ant for builds I would
not like to mix these two systems. Do you know any other source
cross-referen
Jérôme Charron wrote:
2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar
things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far
as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one
oro jar than.
Piotr, please keep oro-2.0.8 in pmd-ext
I th
.8. Do you think we can upgrade (as far
as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one
oro jar than.
So happy PMD-ing,
Piotr
Doug Cutting wrote:
Piotr Kosiorowski wrote:
I will make it totally separate target (so test do not
depend on it).
That was actually Doug
Hello Christopher,
I personally do not like combining logging with severe error handling
but it is one of the features of Nutch for some time and I do not think
it causes infinite loops in normal installations. Changing it as we are
preparing to release a new version is not a good idea in my op
Doug Cutting wrote:
So we start out comitting it as an independent target, and then add it
to the "test" target? Is that the plan? If so, +1.
Exactly - I will do it over the weekend.
P.
Doug Cutting wrote:
Piotr, would you like to make this release, or should I?
I would prefer you would do it this time - I am not sure if I can find
some time next week. I would like to do some things before release though:
1) Commit clustering patch from Dawid (I took it over from Andrzej).
2
>
>
> > I will make it totally separate target (so test do not
> > depend on it).
>
> That was actually Doug's idea (and I agree with it) to stop the build
> file if PMD complains about something. It's similar to testing -- if
> your tests fail, the entire build file fails.
>
I totally agree with i
I do agree with Jarome - plugins should be checked too.
I would like to integrate PMD for core and plugins over the weekend based on
the Dawid's work - I will make it totally separate target (so test do not
depend on it).
The goal is to allow other developers to play with pmd easily but at the
sam
amp;atid=479921&aid=1465574&group_id=56262
D.
Piotr Kosiorowski wrote:
+1 - I offer my help - we can coordinate it and I can do a part of
work. I
will also try to commit your patches quickly.
Piotr
On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote:
Other options (raised on the
+1 - I offer my help - we can coordinate it and I can do a part of work. I
will also try to commit your patches quickly.
Piotr
On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote:
>
>
> > Other options (raised on the Hadoop list) are Checkstyle:
>
> PMD seems to be the best choice for an Apache proje
somewhere else, as
certainly the http post functionality might prove useful for other
things.
-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Thursday, March 09, 2006 8:43 PM
To: nutch-dev@lucene.apache.org
Subject: Re: Nutch 0.7.2
Piotr Kosiorowski wrote:
I found an
[ http://issues.apache.org/jira/browse/NUTCH-117?page=all ]
Piotr Kosiorowski closed NUTCH-117:
---
Fix Version: 0.7.2-dev
Resolution: Fixed
Assign To: Piotr Kosiorowski
Applied fixed by Mike. Also reported offlist by Michal Karwanski
[ http://issues.apache.org/jira/browse/NUTCH-14?page=all ]
Piotr Kosiorowski closed NUTCH-14:
--
Resolution: Cannot Reproduce
Closed according to Stefan suggestion
> NullPointerException NutchBean.getSumm
[ http://issues.apache.org/jira/browse/NUTCH-96?page=all ]
Piotr Kosiorowski closed NUTCH-96:
--
Fix Version: 0.7.2-dev
Resolution: Duplicate
Assign To: Piotr Kosiorowski
Duplicate of NUTCH-117.
> MapFile.Writer throws directory exi
[ http://issues.apache.org/jira/browse/NUTCH-94?page=all ]
Piotr Kosiorowski closed NUTCH-94:
--
Fix Version: 0.7.2-dev
Resolution: Duplicate
Assign To: Piotr Kosiorowski
Duplicate ofNUTCH-117.
> MapFile.Writer throwing 'Fil
[ http://issues.apache.org/jira/browse/NUTCH-165?page=all ]
Piotr Kosiorowski closed NUTCH-165:
---
Resolution: Won't Fix
NutchBean is cached so I am closing this issue. Please reopen if you feel it
needs further explanation/investig
[ http://issues.apache.org/jira/browse/NUTCH-239?page=all ]
Piotr Kosiorowski closed NUTCH-239:
---
Fix Version: 0.7.2-dev
Resolution: Fixed
Assign To: Piotr Kosiorowski
Applied with JavaDoc changes. Thanks.
> I changed httpclient to
[ http://issues.apache.org/jira/browse/NUTCH-91?page=all ]
Piotr Kosiorowski closed NUTCH-91:
--
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
Commited with small extension. Thanks.
> empty encoding causes except
Upps, sorry for ignoring this discussion - i was looking for comments in
JIRA and already committed the change before reading your discussion.
My motivation is to have usable version of tutorial - as simple as it is
possible to be versioned with the sources - only for historical purposes
- if so
[ http://issues.apache.org/jira/browse/NUTCH-225?page=all ]
Piotr Kosiorowski closed NUTCH-225:
---
Resolution: Won't Fix
I have just updated Nutch Web site. It contains now both tutorials (for 0.7 and
0.8).
I have also added a notr to
Hello,
I would like to release nutch 0.7.2 in a week or two. Some serious
bugfixes are already covered and I have a plan to fix one or two more.
I found an email from Doug with title "[Fwd: Crawler submits forms?]"
stating: "This has been fixed in the mapred branch, but that patch is
not in 0.
Hi,
I have updated site in 0.7 branch with latest trunk changes. I have
added both tutorials to the site so people will be aware of differences.
I have also committed DOAP file in 0.7 branch.
Nutch Website uses branch-0.7 now.
Piotr
[
http://issues.apache.org/jira/browse/NUTCH-225?page=comments#action_12369405 ]
Piotr Kosiorowski commented on NUTCH-225:
-
As stated in another thread I prefer to have a simple tutorial kept in version
control with releases.
We already have a
Personally I would like to have a "stable" minimal tutorial kept in version
control and tagged with releases. But feel fre to copy the contents to Wiki
and improve it - we will have extended version there.
regards
Piotr
On 3/7/06, Matthias Jaekle <[EMAIL PROTECTED]> wrote:
>
> > I can add both tut
Andrzej Bialecki wrote:
+1, yes it would be really confusing. Since there are more and more
people trying 0.8, could we perhaps include a short note that 0.8 and
later is NOT compatible with this tutorial, and a reference to the
tutorial for 0.8 (or the trunk/ branch in general)?
I can ad
Hi,
It looks like Nutch web site was updated with site built from latest
trunk - the only problem is it contains tutorial for unreleased (yet)
version 0.8. I think we talked about it and agreed to keep tutorial for
latest release on the Web. I have just updated site in svn (branch-0.7)
with la
[
http://issues.apache.org/jira/browse/NUTCH-79?page=comments#action_12364496 ]
Piotr Kosiorowski commented on NUTCH-79:
I think it should work without changes I suggested in previous comment - they
would be simply useful additions.
I was not using
[ http://issues.apache.org/jira/browse/NUTCH-45?page=all ]
Piotr Kosiorowski closed NUTCH-45:
--
Fix Version: 0.7.2-dev
Resolution: Fixed
Applied. Thanks.
> Log corrupt segments in SegmentMergeT
[ http://issues.apache.org/jira/browse/NUTCH-174?page=all ]
Piotr Kosiorowski closed NUTCH-174:
---
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
Fixed some time ago during preparation of 0.7.2 release. Please use version
It fails on my machine on parse-ext tests. I am not sure what is causing
it yet and I am afraid I do not have time to investigate it today -
maybe in few days. I did a small change to make it compile a few days
ago, but all tests went ok before I committed it.
Regards
Piotr
Stefan Groschupf wro
[ http://issues.apache.org/jira/browse/NUTCH-142?page=all ]
Piotr Kosiorowski closed NUTCH-142:
---
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
> NutchConf should use the thread context classloa
Andrzej,
Do you think it would be a good idea to commit it in 0.7 branch for
0.7.2 release? I personally prefer to use released libraries instead of
RC if possible. It does not require a lot of changes and you have
already tested it with existing code...
Piotr
[EMAIL PROTECTED] wrote:
Author
+1 in general
In fact I like the approach presented by Stefan to pass only required
parameters to objects that have small number of configurable params
instead of NutchConf - it makes it obvious which parameters are required
for such basic objects to run and as they are usually building blocks
[
http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361549 ]
Piotr Kosiorowski commented on NUTCH-138:
-
BTW - just create user for yourself in nutch Wiki and you shoudl be able to add
a new page with information without problems
[ http://issues.apache.org/jira/browse/NUTCH-138?page=all ]
Piotr Kosiorowski closed NUTCH-138:
---
Resolution: Invalid
Setting URIEncoding in tomcat config file fixes the problem.
> non-Latin-1 characters cannot be submitted for sea
[
http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361520 ]
Piotr Kosiorowski commented on NUTCH-138:
-
I am not sure but I would suspect it is a problem of bad tomcat configuration.
To handle special characters in query urls
[
http://issues.apache.org/jira/browse/NUTCH-142?page=comments#action_12361492 ]
Piotr Kosiorowski commented on NUTCH-142:
-
Thanks. Fixed in 0.7 branch. Left open to fix it in trunk after cleaning trunk
JUnit test problems (in next few days
AJ Chen wrote:
It would be great if I can add some new functions to the nutch code to
accomplish this. But, if it requires to customize lucene code, that's
fine. I have tried to use the most recent release (1.4.3) of lucene
source code, but it did not work. Is the lucene jar files included in
Andrzej Bialecki wrote:
Hi,
I just commited a large patch to cleanup the trunk/ of obsolete and
broken classes remaining from the 0.7.x development line. Please test
that things still work as they should ...
Hi,
I am not sure what is wrong but a lot of JUnit test simply does not
compile -
[ http://issues.apache.org/jira/browse/NUTCH-42?page=all ]
Piotr Kosiorowski closed NUTCH-42:
--
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
OpenSearch implemented.
> enhance search.jsp such that it can also returns
[ http://issues.apache.org/jira/browse/NUTCH-147?page=all ]
Piotr Kosiorowski closed NUTCH-147:
---
Resolution: Invalid
cygwin requirement on Windows is listed in nutch tutorial. Please reopen if
problems persists after using it from cygwin
[ http://issues.apache.org/jira/browse/NUTCH-148?page=all ]
Piotr Kosiorowski closed NUTCH-148:
---
Resolution: Invalid
> org.apache.nutch.tools.CrawlTool throws error while doing deleteduplica
[
http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361206 ]
Piotr Kosiorowski commented on NUTCH-148:
-
'df' command is required for NDFS operation so if you were not using NDFS in
0.7.1 and nutch shell scripts you we
[
http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361128 ]
Piotr Kosiorowski commented on NUTCH-148:
-
Do you have Cygwin installed?
Is 'df' working in your cygwin installation?
Do you run crawl from cygwin she
+1 - especially for amount of support Stefan gives to nutch users.
P.
Andrzej Bialecki wrote:
Hi,
During the past year and more Stefan participated actively in the
development, and contributed many high-quality patches. He's been
spending considerable effort on addressing many issues in JIRA, an
Doug Cutting wrote:
[EMAIL PROTECTED] wrote:
+/*
+ * (non-Javadoc)
+ * + * @see
org.apache.nutch.io.Writable#write(java.io.DataOutput)
+ */
+public final void write(DataOutput out) throws IOException {
We should either include javadoc or not. In general, all publi
Doug Cutting wrote:
Andrzej Bialecki wrote:
Please also don't forget that the trunk/ will soon be invaded by the
code from mapred, I guess some time around the middle of January (Doug?)
Thinking about this more, perhaps we should do it sooner. There's
already a branch for 0.7.x releases,
Hi,
I have problems with JUnit tests in trunk and mapred branches.
TestFetcher fails in both branches. The same test executes correctly in
0.7 branch.
Is it only my problem (environment setup) or others are having it too?
I would suspect some changes in redirect handling
Regards
Piotr
Marko Bauhardt wrote:
NUTCH-141jobdetails.jsp doesnt work on webbrowser "safari"
+1
:-)
Marko.
I have just fixed NUTCH-141 in all branches so we do not concentrate on
obvious things.
I have one additional thing - majority of issues people vote for in this
thread are mapred related. I th
[ http://issues.apache.org/jira/browse/NUTCH-141?page=all ]
Piotr Kosiorowski closed NUTCH-141:
---
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
Fixed in all branches. Thanks.
> jobdetails.jsp doesnt work on webbrow
+1 - I wanted to suggest exactly this approach - but we should try to keep
in mind not to introduce new features without serious reason (especially not
backward compatible ones).
Piotr
On 12/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrote:
>
> > What people think if we collect a list of issues and
If we are going to make 0.7.2 release I would like to commit
a patch for http://issues.apache.org/jira/browse/NUTCH-112
and probably for some build problems people are raporting (missing src
folder in nutch-extension plugin).
I will look at them in next few days.
Regards
Piotr
Stefan Groschupf w
Hi,
I started to think about implementing special kind of Lucene Query (if I
remember correctly I would have to write my own Scorer and probably a few
other classes) optimized for Nutch some time ago. I assumed having
specialized query I would be able to avoid accessing some of lucene index
structu
Jérôme Charron wrote:
[...]
build a list of file extensions to include (other ones will be excluded) in
the fecth process.
[...]
I would not like to exclude all others - as for example many extensions
are valid for html - especially dynamicly generated pages (jsp,asp,cgi
just to name the easy
Hi Adriano,
I have your previous email on mt TODO list. I had no time to commit it
yet -> are there any chanes from previous version?
Regatds
Piotr
[EMAIL PROTECTED] wrote:
Hi,
I hope that we publish my translation in Italian of Nucth.
It is possible translate also the homepage of the
Hello,
I do agree with Andrzej. I do not see it as a solution for for
parse-html. But generic XML plugin maybe will have some use for some
people (even if not for me).
Regards
Piotr
Andrzej Bialecki wrote:
Stefan Groschupf wrote:
[...]
Gentlemen, please let's keep a civilized tone to this
, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> Piotr Kosiorowski wrote:
>
> >On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> >
> >
> >>Hi,
> >>
> >>I've been profiling a Nutch installation, and to my surprise the largest
On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
>
> Hi,
>
> I've been profiling a Nutch installation, and to my surprise the largest
> amount of throwaway allocations and the most time spent was not in Nutch
> specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method.
> This m
[ http://issues.apache.org/jira/browse/NUTCH-99?page=all ]
Piotr Kosiorowski closed NUTCH-99:
--
Resolution: Fixed
Patch committed. Thanks Stefan.
> ports are hardcoded or random
> -
>
> K
[
http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357614 ]
Piotr Kosiorowski commented on NUTCH-99:
I think Doug meant that we should have:
} catch (BindException e) {
instead of generic:
} catch (Exception e) {
And I agree
EM wrote:
202443 Pages consumed: 13 (at index 13). Links fetched: 233386.
202443 Suspicious outlink count = 30442 for [http://www.dmoz.org/].
202444 Pages consumed: 135000 (at index 135000). Links fetched: 272315.
If there is maxoutlinks already specified in the xml config, why does
nut
[ http://issues.apache.org/jira/browse/NUTCH-107?page=all ]
Piotr Kosiorowski closed NUTCH-107:
---
Fix Version: 0.7.2-dev
0.8-dev
Resolution: Fixed
Assign To: Piotr Kosiorowski
Fixed in trunk and 0.7 branch. url-prefix
Committed in trunk and branch-0.7 (just in case if we decide to make a
0.7.2release sometime).
Thanks
Piotr
On 10/11/05, Stefan Groschupf <[EMAIL PROTECTED]> wrote:
>
> Hi,
> don't think I'm fuddy-duddy but is it really sensefull to do following
> in the nutchbean?
>
> File [] directories = fs.lis
Hello,
I have prepared Nutch 0.7.1 release today but I had one problem. I was
updating the site in branch but to deploy it one must use the version
from trunk. Currently I simply committed generated site in trunk but
this solution is far from perfect.
Should we have version independent site ->
Have a look at http://issues.apache.org/jira/browse/NUTCH-48. I think ngram
based appeoach is appropriate here. I was using it in our search engine.
Regards
Piotr
On 9/29/05, Jack Tang <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> I am very like Google's "Did you mean" and I notice that nutch now
> does n
Hi,
I am not sure what you mean by "injecting content into Nutch" but to
create a segment you can use SegmentWriter class. To update WebDB -
IWebDBWriter interface might be useful. The best place to learn about
what kind of data is stored in segment is probably fetcher code.
Regards
Piotr
Gol
[ http://issues.apache.org/jira/browse/NUTCH-89?page=all ]
Piotr Kosiorowski closed NUTCH-89:
--
Fix Version: 0.8-dev
0.7
Resolution: Fixed
Applied in trunk and 0.7 branch. Thanks.
> parse-rss null pointer except
[
http://issues.apache.org/jira/browse/NUTCH-95?page=comments#action_12330113 ]
Piotr Kosiorowski commented on NUTCH-95:
I was renaming segments quite often so I would vote for reading the date from
the segment instead of using dir name
Hello,
As it looks everything that was planned was commited to 0.7 branch I would
like to prepare a 0.7.1 release in next few days. I will change branch name
at the same time to comply with agreed standard.
Any objections?
Regards
Piotr
Hi Andrzej,
Is anything related to clustering commits left? Or should we proceed
with 0.7.1 release?
Piotr
[EMAIL PROTECTED] wrote:
Author: ab
Date: Mon Sep 19 07:11:07 2005
New Revision: 290163
URL: http://svn.apache.org/viewcvs?rev=290163&view=rev
Log:
Update of the clustering plugin, contri
Daniele Menozzi wrote:
ok, so the depth value is only used to stop the crawling at a certain
point, and proceed with the indexing, right?
Yes - depth means in fact - number of interations of
generate/fetch/update cycle.
But, another thing: how can I refresh old pages? What class do I have to
bin/nutch updatedb db $s1
command updates WebDB with links you fetched in segment $s1.
Regards
Piotr
Daniele Menozzi wrote:
Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do
not have really understood what is the ralationship between
depth,segments,fetching..
Take for ex
Hello Andrzej,
You can also try http://issues.apache.org/jira/browse/NUTCH-79
- I think it should also help here - it is a bit complicated as it
contain additional functionality but if you have any problems I am
willing to help. I am going to perform some test of it again and maybe
commit it in
Hello,
I have changed name of tag directory according to naming covention
agrred earlier. I am waiting with branch name change for 0.7.1 release
which should happen in few days if only Andrzej will be able to commit
changes to clustering plugin (if not I suggest to wait for these changes
as it
Hello,
You cannot do it. These structures where not designed for it. But you can
copy all the data to other ArrayFile skipping entries you want to delete.
Regards
Piotr
On 9/6/05, Ben <[EMAIL PROTECTED]> wrote:
>
> Hi
>
> How can I delete an entry in the ArrayFile/MapFile if I know the id/key?
Hello Jerome,
It looks like changes to language indentifer caused language identifier
test to fail on Windows again. If no charset is given it assumes default
platform encoding but test files are probably "UTF-8" based. I have
changed TestLanguageIdentifier.testIdentify() method to use
Strin
1 - 100 of 160 matches
Mail list logo