Re: [jira] Created: (NUTCH-680) Update external jars to latest versions

2009-01-20 Thread Piotr Kosiorowski
't use anything. >> >> > Hadoop uses pmd integrate in Hudson. >> >> > >> >> >> >> Does this mean we do not need pmd jars in nutch ( are they provided by >> hudson)? >> >> >> >> > Otis >> >> &

Re: [jira] Created: (NUTCH-680) Update external jars to latest versions

2009-01-20 Thread Piotr Kosiorowski
;> >>> > Otis >>> > -- >>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch >>> > >>> > >>> > >>> > - Original Message >>> >> From: Doğacan Güney >>> >> To: nutch-de

Re: [jira] Created: (NUTCH-680) Update external jars to latest versions

2009-01-20 Thread Piotr Kosiorowski
pmd-ext contains PMD (http://pmd.sourceforge.net/) libraries. I have committed them long time ago in an attempt to bring some static analysis toools to nutch sources. There was a short discussion around it and we all thought t was worth doing but it never gained enough momentum. There is a pmd ta

Re: Welcome Dennis Kubes as Nutch committer

2007-03-06 Thread Piotr Kosiorowski
Congratulations and welcome, Piotr On 3/5/07, Dennis Kubes <[EMAIL PROTECTED]> wrote: OK. I finally figured out how to republish the site. Only took me 3 days. Feeling hazed now! :) Dennis Kubes Sami Siren wrote: > Welcome on board Dennis! > > -- > Sami Siren > > Dennis Kubes wrote: >> Hi

Re: FW: Nutch release process help

2007-03-06 Thread Piotr Kosiorowski
Chris, I have documented the process in the wiki. Doug have sent the links already. If you have any questions I would be willing to help. I can even do it myself if find it difficult - I simply do not want to be the bottleneck as I am behind my schedule at work and in private life. I still hope I

Re: Reviving Nutch 0.7

2007-01-22 Thread Piotr Kosiorowski
Otis, Some time ago people on the list said that they are willing to at least maintain Nutch 0.7 branch. As a committer (not very active recently) I volunteered to commit patches when they appear - I do not have enough time at the moment to do active coding. I have created a 7.3 release in JIRA so

[jira] Closed: (NUTCH-429) Secured Searches

2007-01-11 Thread Piotr Kosiorowski (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Piotr Kosiorowski closed NUTCH-429. --- Resolution: Invalid Please use nutch-user mailing list for such questions and JIRA for

Re: 0.7.3 version

2006-11-23 Thread Piotr Kosiorowski
As no objections were raised I created a 0.7.3 version in JIRA so we can start assigning current JIRA issues to it. Regards Piotr Piotr Kosiorowski wrote: Hello committers, Based on a recent discussion on nutch user list - (Strategic Direction of Nutch) I would like to prepare 0.7.3 release

0.7.3 version

2006-11-16 Thread Piotr Kosiorowski
Hello committers, Based on a recent discussion on nutch user list - (Strategic Direction of Nutch) I would like to prepare 0.7.3 release. The idea is to allow people who still use 0.7.2 to get rid of most important bugs and allow them to add some small features they would need as the claim is 0.8.

Re: How to start working with MapReduce?

2006-11-11 Thread Piotr Kosiorowski
Please read the tutorial on nutch site. O suggest posting such issues to nutch-user - you will have much higher chance of getting useful response there. regards Piotr On 11/9/06, kauu <[EMAIL PROTECTED]> wrote: or it's the same with the version 0.8.x any idea is preciated On 11/9/06, kauu <[EMA

Re: why can't build in the Linux with ant

2006-11-11 Thread Piotr Kosiorowski
I think it might be a problem with ant version - it seems that you have pretty old one. Please use latest ant version and try again. Regards Piotr On 11/9/06, kauu <[EMAIL PROTECTED]> wrote: hi : i get a problem now ,i can't build the nutch in the linux os with ant and my ant version is Apa

Re: email to jira comments (WAS Re: [jira] Commented: (NUTCH-339) Refactor nutch to allow fetcher improvements)

2006-10-16 Thread Piotr Kosiorowski
+1 On 10/16/06, Doug Cutting <[EMAIL PROTECTED]> wrote: Sami Siren wrote: > looks like somebody just enabled email-to-jira-comments-feature. I was > just wondering would it be good to use this feature more widely. I think it would be good. That way mailing list discussion would be logged to th

Re: Nutch requires JDK 1.5 now?

2006-10-03 Thread Piotr Kosiorowski
I had a look at it and it seems I do not have enough permissions to change it. So probably this one goes to Doug... P. Chris Mattmann wrote: Hey Guys, Speaking of which, I noticed that Sami's issue below is a "Task" in JIRA, which reminded me of a task that I input a long time ago that would b

Re: svn commit: r451649 - /lucene/nutch/trunk/CHANGES.txt

2006-09-30 Thread Piotr Kosiorowski
I am looking at some easy JIRA issues to get back into Nutch now. I have not seen any plans for releases on the list (I might have missed something a but I tried to at least read the nutch lists) - do we have some plans? Do we want to make a 0.8.2 release or rather go for 0.9 in near (lets say -

[jira] Resolved: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing.

2006-09-30 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-374?page=all ] Piotr Kosiorowski resolved NUTCH-374. - Fix Version/s: 0.9.0 Resolution: Fixed Commited. Thanks. > when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip &

[jira] Assigned: (NUTCH-374) when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip or x-gzip , it can not fetch any thing.

2006-09-30 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-374?page=all ] Piotr Kosiorowski reassigned NUTCH-374: --- Assignee: Piotr Kosiorowski > when http.content.limit be set to -1 and Response.CONTENT_ENCODING is gzip > or x-gzip , it can not fet

Re: Patch Available status?

2006-09-01 Thread Piotr Kosiorowski
I like Hadoop version of workflow. I do not think that we would have problems with reopenning as issues would not be closed immidiatelly after resolving them. In some extreme situations one can always open a new bug that references closed one. Piotr On 9/1/06, Chris Mattmann <[EMAIL PROTECTED]> wr

Re: 0.8 release

2006-07-27 Thread Piotr Kosiorowski
No objections form me. We waited long and we can fix things in maitenance release in few weeks. Regards Piotr On 7/26/06, Sami Siren <[EMAIL PROTECTED]> wrote: Andrzej Bialecki wrote: > Sami Siren wrote: > >> There is a package available for testing in >> http://people.apache.org/~siren/nutch-0

Re: log when blocked by robots.txt

2006-07-20 Thread Piotr Kosiorowski
I think I would log in both situations but different message. +1 P. On 7/21/06, Stefan Groschupf <[EMAIL PROTECTED]> wrote: Hi Developers, another thing in the discussion to be more polite. I suggest that we log a message in case an requested URL was blocked by a robots.txt. Optimal would be if

Re: 0.8 release

2006-07-04 Thread Piotr Kosiorowski
+1. P. Andrzej Bialecki wrote: Sami Siren wrote: How would folks feel about releasing 0.8 now, there has been quite a lot of improvements/new features since 0.7 series and I strongly feel that we should push the first 0.8 series release (alfa/beta) out the door now. It would IMO lower the barri

Re: Nutch web site

2006-07-04 Thread Piotr Kosiorowski
, is there a reason why this (among other) documentation (for all relevant versions) could not be maintained in trunk? -- Sami Siren Piotr Kosiorowski wrote: Andrzej Bialecki wrote: +1, yes it would be really confusing. Since there are more and more people trying 0.8, could we perhaps

Re: 0.8 release?

2006-04-13 Thread Piotr Kosiorowski
it so many times that I want to cross check). Regards Piotr Dawid Weiss wrote: What kind of problems? If you need something, let me know. D. Piotr Kosiorowski wrote: I got some problems while applying Dawid clustering patch (my linux environment looks not to be setu correctly) - but I switched

Re: 0.8 release?

2006-04-12 Thread Piotr Kosiorowski
I got some problems while applying Dawid clustering patch (my linux environment looks not to be setu correctly) - but I switched to cygwin and it looks ok. I will try to commit it today/tommorow. Regards Piotr On 4/12/06, Chris Mattmann <[EMAIL PROTECTED]> wrote: > Hi Guys, > > Any progress on th

Re: mapred branch

2006-04-10 Thread Piotr Kosiorowski
Anton Potehin wrote: Where now placed mapred branch of nutch ? it is developed in trunk now. P.

Re: PMD integration

2006-04-09 Thread Piotr Kosiorowski
Hello, I was looking at the cross-referenced code generation but it looks like the package I found mentioned in PMD context is JXR - and this is the part of maven as I suspect. As we are using ant for builds I would not like to mix these two systems. Do you know any other source cross-referen

Re: PMD integration

2006-04-09 Thread Piotr Kosiorowski
Jérôme Charron wrote: 2) We do have oro 2-0.7 in dependencies (I think urlfilter and similar things). PMD requires oro - 2.0.8. Do you think we can upgrade (as far as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one oro jar than. Piotr, please keep oro-2.0.8 in pmd-ext I th

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
.8. Do you think we can upgrade (as far as I know 2.0.7 and 2.0.8 should be compatible)? We would have only one oro jar than. So happy PMD-ing, Piotr Doug Cutting wrote: Piotr Kosiorowski wrote: I will make it totally separate target (so test do not depend on it). That was actually Doug

Re: Patch to remove Nutch formating from logs

2006-04-07 Thread Piotr Kosiorowski
Hello Christopher, I personally do not like combining logging with severe error handling but it is one of the features of Nutch for some time and I do not think it causes infinite loops in normal installations. Changing it as we are preparing to release a new version is not a good idea in my op

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
Doug Cutting wrote: So we start out comitting it as an independent target, and then add it to the "test" target? Is that the plan? If so, +1. Exactly - I will do it over the weekend. P.

Re: 0.8 release schedule (was Re: latest build throws error - critical)

2006-04-07 Thread Piotr Kosiorowski
Doug Cutting wrote: Piotr, would you like to make this release, or should I? I would prefer you would do it this time - I am not sure if I can find some time next week. I would like to do some things before release though: 1) Commit clustering patch from Dawid (I took it over from Andrzej). 2

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
> > > > I will make it totally separate target (so test do not > > depend on it). > > That was actually Doug's idea (and I agree with it) to stop the build > file if PMD complains about something. It's similar to testing -- if > your tests fail, the entire build file fails. > I totally agree with i

Re: PMD integration

2006-04-07 Thread Piotr Kosiorowski
I do agree with Jarome - plugins should be checked too. I would like to integrate PMD for core and plugins over the weekend based on the Dawid's work - I will make it totally separate target (so test do not depend on it). The goal is to allow other developers to play with pmd easily but at the sam

PMD integration (was: Re: Add ".settings" to svn:ignore on root Nutch folder?)

2006-04-06 Thread Piotr Kosiorowski
amp;atid=479921&aid=1465574&group_id=56262 D. Piotr Kosiorowski wrote: +1 - I offer my help - we can coordinate it and I can do a part of work. I will also try to commit your patches quickly. Piotr On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote: Other options (raised on the

Re: Add ".settings" to svn:ignore on root Nutch folder?

2006-04-06 Thread Piotr Kosiorowski
+1 - I offer my help - we can coordinate it and I can do a part of work. I will also try to commit your patches quickly. Piotr On 4/6/06, Dawid Weiss <[EMAIL PROTECTED]> wrote: > > > > Other options (raised on the Hadoop list) are Checkstyle: > > PMD seems to be the best choice for an Apache proje

Re: Nutch 0.7.2

2006-03-25 Thread Piotr Kosiorowski
somewhere else, as certainly the http post functionality might prove useful for other things. -Original Message- From: Doug Cutting [mailto:[EMAIL PROTECTED] Sent: Thursday, March 09, 2006 8:43 PM To: nutch-dev@lucene.apache.org Subject: Re: Nutch 0.7.2 Piotr Kosiorowski wrote: I found an

[jira] Closed: (NUTCH-117) Crawl crashes with java.io.IOException: already exists: C:\nutch\crawl.intranet\oct18\db\webdb.new\pagesByURL

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-117?page=all ] Piotr Kosiorowski closed NUTCH-117: --- Fix Version: 0.7.2-dev Resolution: Fixed Assign To: Piotr Kosiorowski Applied fixed by Mike. Also reported offlist by Michal Karwanski

[jira] Closed: (NUTCH-14) NullPointerException NutchBean.getSummary

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-14?page=all ] Piotr Kosiorowski closed NUTCH-14: -- Resolution: Cannot Reproduce Closed according to Stefan suggestion > NullPointerException NutchBean.getSumm

[jira] Closed: (NUTCH-96) MapFile.Writer throws directory exists exception if run multiple times in the same JVM or server JVM.

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-96?page=all ] Piotr Kosiorowski closed NUTCH-96: -- Fix Version: 0.7.2-dev Resolution: Duplicate Assign To: Piotr Kosiorowski Duplicate of NUTCH-117. > MapFile.Writer throws directory exi

[jira] Closed: (NUTCH-94) MapFile.Writer throwing 'File exists error'.

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-94?page=all ] Piotr Kosiorowski closed NUTCH-94: -- Fix Version: 0.7.2-dev Resolution: Duplicate Assign To: Piotr Kosiorowski Duplicate ofNUTCH-117. > MapFile.Writer throwing 'Fil

[jira] Closed: (NUTCH-165) object pooling for nutch bean --- to impriove performance

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-165?page=all ] Piotr Kosiorowski closed NUTCH-165: --- Resolution: Won't Fix NutchBean is cached so I am closing this issue. Please reopen if you feel it needs further explanation/investig

[jira] Closed: (NUTCH-239) I changed httpclient to use javax.net.ssl instead of com.sun.net.ssl

2006-03-25 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-239?page=all ] Piotr Kosiorowski closed NUTCH-239: --- Fix Version: 0.7.2-dev Resolution: Fixed Assign To: Piotr Kosiorowski Applied with JavaDoc changes. Thanks. > I changed httpclient to

[jira] Closed: (NUTCH-91) empty encoding causes exception

2006-03-09 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-91?page=all ] Piotr Kosiorowski closed NUTCH-91: -- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed Commited with small extension. Thanks. > empty encoding causes except

Re: Tutorial

2006-03-09 Thread Piotr Kosiorowski
Upps, sorry for ignoring this discussion - i was looking for comments in JIRA and already committed the change before reading your discussion. My motivation is to have usable version of tutorial - as simple as it is possible to be versioned with the sources - only for historical purposes - if so

[jira] Closed: (NUTCH-225) Changed the links to the tutorial to point to the wiki

2006-03-09 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-225?page=all ] Piotr Kosiorowski closed NUTCH-225: --- Resolution: Won't Fix I have just updated Nutch Web site. It contains now both tutorials (for 0.7 and 0.8). I have also added a notr to

Nutch 0.7.2

2006-03-09 Thread Piotr Kosiorowski
Hello, I would like to release nutch 0.7.2 in a week or two. Some serious bugfixes are already covered and I have a plan to fix one or two more. I found an email from Doug with title "[Fwd: Crawler submits forms?]" stating: "This has been fixed in the mapred branch, but that patch is not in 0.

Site switched to branch-0.7.

2006-03-09 Thread Piotr Kosiorowski
Hi, I have updated site in 0.7 branch with latest trunk changes. I have added both tutorials to the site so people will be aware of differences. I have also committed DOAP file in 0.7 branch. Nutch Website uses branch-0.7 now. Piotr

[jira] Commented: (NUTCH-225) Changed the links to the tutorial to point to the wiki

2006-03-07 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-225?page=comments#action_12369405 ] Piotr Kosiorowski commented on NUTCH-225: - As stated in another thread I prefer to have a simple tutorial kept in version control with releases. We already have a

Re: Nutch web site

2006-03-07 Thread Piotr Kosiorowski
Personally I would like to have a "stable" minimal tutorial kept in version control and tagged with releases. But feel fre to copy the contents to Wiki and improve it - we will have extended version there. regards Piotr On 3/7/06, Matthias Jaekle <[EMAIL PROTECTED]> wrote: > > > I can add both tut

Re: Nutch web site

2006-03-06 Thread Piotr Kosiorowski
Andrzej Bialecki wrote: +1, yes it would be really confusing. Since there are more and more people trying 0.8, could we perhaps include a short note that 0.8 and later is NOT compatible with this tutorial, and a reference to the tutorial for 0.8 (or the trunk/ branch in general)? I can ad

Nutch web site

2006-03-06 Thread Piotr Kosiorowski
Hi, It looks like Nutch web site was updated with site built from latest trunk - the only problem is it contains tutorial for unreleased (yet) version 0.8. I think we talked about it and agreed to keep tutorial for latest release on the Web. I have just updated site in svn (branch-0.7) with la

[jira] Commented: (NUTCH-79) Fault tolerant searching.

2006-01-30 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-79?page=comments#action_12364496 ] Piotr Kosiorowski commented on NUTCH-79: I think it should work without changes I suggested in previous comment - they would be simply useful additions. I was not using

[jira] Closed: (NUTCH-45) Log corrupt segments in SegmentMergeTool

2006-01-20 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-45?page=all ] Piotr Kosiorowski closed NUTCH-45: -- Fix Version: 0.7.2-dev Resolution: Fixed Applied. Thanks. > Log corrupt segments in SegmentMergeT

[jira] Closed: (NUTCH-174) Problem encountered with ant during compilation

2006-01-14 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-174?page=all ] Piotr Kosiorowski closed NUTCH-174: --- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed Fixed some time ago during preparation of 0.7.2 release. Please use version

Re: test suite fails?

2006-01-09 Thread Piotr Kosiorowski
It fails on my machine on parse-ext tests. I am not sure what is causing it yet and I am afraid I do not have time to investigate it today - maybe in few days. I did a small change to make it compile a few days ago, but all tests went ok before I committed it. Regards Piotr Stefan Groschupf wro

[jira] Closed: (NUTCH-142) NutchConf should use the thread context classloader

2006-01-04 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-142?page=all ] Piotr Kosiorowski closed NUTCH-142: --- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed > NutchConf should use the thread context classloa

Re: svn commit: r365850 - in /lucene/nutch/trunk/src/plugin/protocol-httpclient: ./ lib/ src/java/org/apache/nutch/protocol/httpclient/

2006-01-04 Thread Piotr Kosiorowski
Andrzej, Do you think it would be a good idea to commit it in 0.7 branch for 0.7.2 release? I personally prefer to use released libraries instead of RC if possible. It does not require a lot of changes and you have already tested it with existing code... Piotr [EMAIL PROTECTED] wrote: Author

Re: no static NutchConf

2006-01-04 Thread Piotr Kosiorowski
+1 in general In fact I like the approach presented by Stefan to pass only required parameters to objects that have small number of configurable params instead of NutchConf - it makes it obvious which parameters are required for such basic objects to run and as they are usually building blocks

[jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361549 ] Piotr Kosiorowski commented on NUTCH-138: - BTW - just create user for yourself in nutch Wiki and you shoudl be able to add a new page with information without problems

[jira] Closed: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=all ] Piotr Kosiorowski closed NUTCH-138: --- Resolution: Invalid Setting URIEncoding in tomcat config file fixes the problem. > non-Latin-1 characters cannot be submitted for sea

[jira] Commented: (NUTCH-138) non-Latin-1 characters cannot be submitted for search

2006-01-02 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-138?page=comments#action_12361520 ] Piotr Kosiorowski commented on NUTCH-138: - I am not sure but I would suspect it is a problem of bad tomcat configuration. To handle special characters in query urls

[jira] Commented: (NUTCH-142) NutchConf should use the thread context classloader

2006-01-01 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-142?page=comments#action_12361492 ] Piotr Kosiorowski commented on NUTCH-142: - Thanks. Fixed in 0.7 branch. Left open to fix it in trunk after cleaning trunk JUnit test problems (in next few days

Re: how to add additional factor at search time to ranking score

2006-01-01 Thread Piotr Kosiorowski
AJ Chen wrote: It would be great if I can add some new functions to the nutch code to accomplish this. But, if it requires to customize lucene code, that's fine. I have tried to use the most recent release (1.4.3) of lucene source code, but it did not work. Is the lucene jar files included in

Re: Mega-cleanup in trunk/

2006-01-01 Thread Piotr Kosiorowski
Andrzej Bialecki wrote: Hi, I just commited a large patch to cleanup the trunk/ of obsolete and broken classes remaining from the 0.7.x development line. Please test that things still work as they should ... Hi, I am not sure what is wrong but a lot of JUnit test simply does not compile -

[jira] Closed: (NUTCH-42) enhance search.jsp such that it can also returns XML

2005-12-31 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-42?page=all ] Piotr Kosiorowski closed NUTCH-42: -- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed OpenSearch implemented. > enhance search.jsp such that it can also returns

[jira] Closed: (NUTCH-147) nutch map reduce does not work in windows map reduce runs in a loop

2005-12-23 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-147?page=all ] Piotr Kosiorowski closed NUTCH-147: --- Resolution: Invalid cygwin requirement on Windows is listed in nutch tutorial. Please reopen if problems persists after using it from cygwin

[jira] Closed: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

2005-12-23 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-148?page=all ] Piotr Kosiorowski closed NUTCH-148: --- Resolution: Invalid > org.apache.nutch.tools.CrawlTool throws error while doing deleteduplica

[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

2005-12-23 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361206 ] Piotr Kosiorowski commented on NUTCH-148: - 'df' command is required for NDFS operation so if you were not using NDFS in 0.7.1 and nutch shell scripts you we

[jira] Commented: (NUTCH-148) org.apache.nutch.tools.CrawlTool throws error while doing deleteduplicates

2005-12-22 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-148?page=comments#action_12361128 ] Piotr Kosiorowski commented on NUTCH-148: - Do you have Cygwin installed? Is 'df' working in your cygwin installation? Do you run crawl from cygwin she

Re: [VOTE] Commiter access for Stefan Groschupf

2005-12-19 Thread Piotr Kosiorowski
+1 - especially for amount of support Stefan gives to nutch users. P. Andrzej Bialecki wrote: Hi, During the past year and more Stefan participated actively in the development, and contributed many high-quality patches. He's been spending considerable effort on addressing many issues in JIRA, an

Re: svn commit: r357334 - in /lucene/nutch/trunk: conf/nutch-default.xml src/java/org/apache/nutch/protocol/Content.java src/java/org/apache/nutch/protocol/ContentProperties.java

2005-12-17 Thread Piotr Kosiorowski
Doug Cutting wrote: [EMAIL PROTECTED] wrote: +/* + * (non-Javadoc) + * + * @see org.apache.nutch.io.Writable#write(java.io.DataOutput) + */ +public final void write(DataOutput out) throws IOException { We should either include javadoc or not. In general, all publi

Re: [Fwd: Crawler submits forms?]

2005-12-15 Thread Piotr Kosiorowski
Doug Cutting wrote: Andrzej Bialecki wrote: Please also don't forget that the trunk/ will soon be invaded by the code from mapred, I guess some time around the middle of January (Doug?) Thinking about this more, perhaps we should do it sooner. There's already a branch for 0.7.x releases,

JUnit test failures

2005-12-15 Thread Piotr Kosiorowski
Hi, I have problems with JUnit tests in trunk and mapred branches. TestFetcher fails in both branches. The same test executes correctly in 0.7 branch. Is it only my problem (environment setup) or others are having it too? I would suspect some changes in redirect handling Regards Piotr

Re: vote for issues to fix in 0.7.2

2005-12-14 Thread Piotr Kosiorowski
Marko Bauhardt wrote: NUTCH-141jobdetails.jsp doesnt work on webbrowser "safari" +1 :-) Marko. I have just fixed NUTCH-141 in all branches so we do not concentrate on obvious things. I have one additional thing - majority of issues people vote for in this thread are mapred related. I th

[jira] Closed: (NUTCH-141) jobdetails.jsp doesnt work on webbrowser "safari"

2005-12-14 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-141?page=all ] Piotr Kosiorowski closed NUTCH-141: --- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed Fixed in all branches. Thanks. > jobdetails.jsp doesnt work on webbrow

Re: [Fwd: Crawler submits forms?]

2005-12-14 Thread Piotr Kosiorowski
+1 - I wanted to suggest exactly this approach - but we should try to keep in mind not to introduce new features without serious reason (especially not backward compatible ones). Piotr On 12/14/05, Jérôme Charron <[EMAIL PROTECTED]> wrote: > > > What people think if we collect a list of issues and

Re: [Fwd: Crawler submits forms?]

2005-12-13 Thread Piotr Kosiorowski
If we are going to make 0.7.2 release I would like to commit a patch for http://issues.apache.org/jira/browse/NUTCH-112 and probably for some build problems people are raporting (missing src folder in nutch-extension plugin). I will look at them in next few days. Regards Piotr Stefan Groschupf w

Re: Lucene performance bottlenecks

2005-12-08 Thread Piotr Kosiorowski
Hi, I started to think about implementing special kind of Lucene Query (if I remember correctly I would have to write my own Scorer and probably a few other classes) optimized for Nutch some time ago. I assumed having specialized query I would be able to avoid accessing some of lucene index structu

Re: Urlfilter Patch

2005-12-01 Thread Piotr Kosiorowski
Jérôme Charron wrote: [...] build a list of file extensions to include (other ones will be excluded) in the fecth process. [...] I would not like to exclude all others - as for example many extensions are valid for html - especially dynamicly generated pages (jsp,asp,cgi just to name the easy

Re: translation in the Italian language

2005-11-28 Thread Piotr Kosiorowski
Hi Adriano, I have your previous email on mt TODO list. I had no time to commit it yet -> are there any chanes from previous version? Regatds Piotr [EMAIL PROTECTED] wrote: Hi, I hope that we publish my translation in Italian of Nucth. It is possible translate also the homepage of the

Re: [proposal] Generic Markup Language Parser

2005-11-25 Thread Piotr Kosiorowski
Hello, I do agree with Andrzej. I do not see it as a solution for for parse-html. But generic XML plugin maybe will have some use for some people (even if not for me). Regards Piotr Andrzej Bialecki wrote: Stefan Groschupf wrote: [...] Gentlemen, please let's keep a civilized tone to this

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Piotr Kosiorowski
, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Piotr Kosiorowski wrote: > > >On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > > > > >>Hi, > >> > >>I've been profiling a Nutch installation, and to my surprise the largest

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread Piotr Kosiorowski
On 11/22/05, Andrzej Bialecki <[EMAIL PROTECTED]> wrote: > > Hi, > > I've been profiling a Nutch installation, and to my surprise the largest > amount of throwaway allocations and the most time spent was not in Nutch > specific code, or IPC, but in Lucene ConjunctionScorer.doNext() method. > This m

[jira] Closed: (NUTCH-99) ports are hardcoded or random

2005-11-14 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-99?page=all ] Piotr Kosiorowski closed NUTCH-99: -- Resolution: Fixed Patch committed. Thanks Stefan. > ports are hardcoded or random > - > > K

[jira] Commented: (NUTCH-99) ports are hardcoded or random

2005-11-14 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-99?page=comments#action_12357614 ] Piotr Kosiorowski commented on NUTCH-99: I think Doug meant that we should have: } catch (BindException e) { instead of generic: } catch (Exception e) { And I agree

Re: suspicious outlink count

2005-11-13 Thread Piotr Kosiorowski
EM wrote: 202443 Pages consumed: 13 (at index 13). Links fetched: 233386. 202443 Suspicious outlink count = 30442 for [http://www.dmoz.org/]. 202444 Pages consumed: 135000 (at index 135000). Links fetched: 272315. If there is maxoutlinks already specified in the xml config, why does nut

[jira] Closed: (NUTCH-107) Typo in plugin/urlfilter-regex/plugin.xml

2005-10-11 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-107?page=all ] Piotr Kosiorowski closed NUTCH-107: --- Fix Version: 0.7.2-dev 0.8-dev Resolution: Fixed Assign To: Piotr Kosiorowski Fixed in trunk and 0.7 branch. url-prefix

Re: to many hdd reads

2005-10-11 Thread Piotr Kosiorowski
Committed in trunk and branch-0.7 (just in case if we decide to make a 0.7.2release sometime). Thanks Piotr On 10/11/05, Stefan Groschupf <[EMAIL PROTECTED]> wrote: > > Hi, > don't think I'm fuddy-duddy but is it really sensefull to do following > in the nutchbean? > > File [] directories = fs.lis

Nutch 0.7.1 and Nutch web site

2005-10-01 Thread Piotr Kosiorowski
Hello, I have prepared Nutch 0.7.1 release today but I had one problem. I was updating the site in branch but to deploy it one must use the version from trunk. Currently I simply committed generated site in trunk but this solution is far from perfect. Should we have version independent site ->

Re: Nutch Suggestion? (Google like "did you mean")

2005-09-29 Thread Piotr Kosiorowski
Have a look at http://issues.apache.org/jira/browse/NUTCH-48. I think ngram based appeoach is appropriate here. I was using it in our search engine. Regards Piotr On 9/29/05, Jack Tang <[EMAIL PROTECTED]> wrote: > > Hi > > I am very like Google's "Did you mean" and I notice that nutch now > does n

Re: API for injecting content into Nutch?

2005-09-26 Thread Piotr Kosiorowski
Hi, I am not sure what you mean by "injecting content into Nutch" but to create a segment you can use SegmentWriter class. To update WebDB - IWebDBWriter interface might be useful. The best place to learn about what kind of data is stored in segment is probably fetcher code. Regards Piotr Gol

[jira] Closed: (NUTCH-89) parse-rss null pointer exception

2005-09-23 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-89?page=all ] Piotr Kosiorowski closed NUTCH-89: -- Fix Version: 0.8-dev 0.7 Resolution: Fixed Applied in trunk and 0.7 branch. Thanks. > parse-rss null pointer except

[jira] Commented: (NUTCH-95) DeleteDuplicates depends on the order of input segments

2005-09-21 Thread Piotr Kosiorowski (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-95?page=comments#action_12330113 ] Piotr Kosiorowski commented on NUTCH-95: I was renaming segments quite often so I would vote for reading the date from the segment instead of using dir name

0.7.1 release

2005-09-20 Thread Piotr Kosiorowski
Hello, As it looks everything that was planned was commited to 0.7 branch I would like to prepare a 0.7.1 release in next few days. I will change branch name at the same time to comply with agreed standard. Any objections? Regards Piotr

Re: svn commit: r290163 - in /lucene/nutch/branches/Release-0.7/src/plugin/clustering-carrot2: ./ lib/

2005-09-19 Thread Piotr Kosiorowski
Hi Andrzej, Is anything related to clustering commits left? Or should we proceed with 0.7.1 release? Piotr [EMAIL PROTECTED] wrote: Author: ab Date: Mon Sep 19 07:11:07 2005 New Revision: 290163 URL: http://svn.apache.org/viewcvs?rev=290163&view=rev Log: Update of the clustering plugin, contri

Re: Problems on Crawling

2005-09-17 Thread Piotr Kosiorowski
Daniele Menozzi wrote: ok, so the depth value is only used to stop the crawling at a certain point, and proceed with the indexing, right? Yes - depth means in fact - number of interations of generate/fetch/update cycle. But, another thing: how can I refresh old pages? What class do I have to

Re: Problems on Crawling

2005-09-16 Thread Piotr Kosiorowski
bin/nutch updatedb db $s1 command updates WebDB with links you fetched in segment $s1. Regards Piotr Daniele Menozzi wrote: Hi all, I have questions regarding org.apache.nutch.tools.CrawlTool: I do not have really understood what is the ralationship between depth,segments,fetching.. Take for ex

Re: DistributedSearch$Client.updateSegments() blocking other threads

2005-09-16 Thread Piotr Kosiorowski
Hello Andrzej, You can also try http://issues.apache.org/jira/browse/NUTCH-79 - I think it should also help here - it is a bit complicated as it contain additional functionality but if you have any problems I am willing to help. I am going to perform some test of it again and maybe commit it in

Re: svn commit: r280396 - /lucene/nutch/tags/Release-0.7/

2005-09-12 Thread Piotr Kosiorowski
Hello, I have changed name of tag directory according to naming covention agrred earlier. I am waiting with branch name change for 0.7.1 release which should happen in few days if only Andrzej will be able to commit changes to clustering plugin (if not I suggest to wait for these changes as it

Re: Delete an entry in ArrayFile/MapFile

2005-09-06 Thread Piotr Kosiorowski
Hello, You cannot do it. These structures where not designed for it. But you can copy all the data to other ArrayFile skipping entries you want to delete. Regards Piotr On 9/6/05, Ben <[EMAIL PROTECTED]> wrote: > > Hi > > How can I delete an entry in the ArrayFile/MapFile if I know the id/key?

Re: svn commit: r265503 - in /lucene/nutch/trunk/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/fs/ java/org/apache/nutch/mapReduce/ java/org/apache/nutch/parse/ java/org/apache/nutch/pr

2005-09-04 Thread Piotr Kosiorowski
Hello Jerome, It looks like changes to language indentifer caused language identifier test to fail on Windows again. If no charset is given it assumes default platform encoding but test files are probably "UTF-8" based. I have changed TestLanguageIdentifier.testIdentify() method to use Strin

  1   2   >