On Friday 17 July 2009 16:17:50 Grant Ingersoll wrote:
> Also, do we have any tools for setting up training/test sets for
> Wikipedia examples? Seems like a generally useful thing to have.
> Take annotated data and automatically split, no?
That certainly would be useful at tuning/ evaluation time
THanks. Let me see if I can make cs go along.
I might make a patch for all the copyrights first just to reduce the
noise level.
On Fri, Jul 17, 2009 at 12:26 PM, Sean Owen wrote:
> I think the Lucene conventions are Sun conventions plus these
> indentation rules -- here's the IntelliJ file. It's
I think the Lucene conventions are Sun conventions plus these
indentation rules -- here's the IntelliJ file. It's pretty much human
readable. All four stanzas are really the same thing.
Let me explain where I'm arriving from.
At CXF and XmlSchema (at Apache) plus on some projects I cloned from
them, there is:
a checkstyle.xml with checkstyle rules.
a pmd XML file with PMD rules
and a set of eclipse configuation settings.
They all agree. if you format the code with Eclipse (fo
[
https://issues.apache.org/jira/browse/MAHOUT-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732578#action_12732578
]
Grant Ingersoll commented on MAHOUT-147:
I think, since we are using ',' as the del
[
https://issues.apache.org/jira/browse/MAHOUT-147?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732572#action_12732572
]
Grant Ingersoll commented on MAHOUT-147:
There is a small bug in BayesTFIDFMapper t
Wikipedia Example improvements
--
Key: MAHOUT-147
URL: https://issues.apache.org/jira/browse/MAHOUT-147
Project: Mahout
Issue Type: Improvement
Components: Classification
Reporter: Grant Inge
Also, do we have any tools for setting up training/test sets for
Wikipedia examples? Seems like a generally useful thing to have.
Take annotated data and automatically split, no?
-Grant
On Jul 17, 2009, at 8:32 AM, Grant Ingersoll wrote:
On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:
[
https://issues.apache.org/jira/browse/MAHOUT-146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Grant Ingersoll updated MAHOUT-146:
---
Attachment: MAHOUT-146.patch
Patch to make this happen, plus removes one of the redundant dri
On a related note, I get this error now when building from head:
> mvn install
...
Caused by: org.apache.maven.artifact.resolver.ArtifactNotFoundException:
Unable to download the artifact from any repository
org.apache.mahout:mahout-parent:pom:0.2-SNAPSHOT
from the specified remote repositori
if you run mvn -Psourcecheck you can see the disagreements for yourself.
On Jul 17, 2009, at 8:42 AM, Sean Owen wrote:
Ditto, so Benson, I wonder if you can confirm whether these checkstyle
rules appear to be at odds with the current formatting, since in
theory I already committed a big-bang c
Ditto, so Benson, I wonder if you can confirm whether these checkstyle
rules appear to be at odds with the current formatting, since in
theory I already committed a big-bang change to fix all the
formatting. If there's much disagreement then we should investigate
why.
On Fri, Jul 17, 2009 at 1:40
On Jul 17, 2009, at 8:36 AM, Sean Owen wrote:
Yeah I think the point would be to make sure this happens
automatically. I too am wary of maintaining 2, 3, 4 style
configurations for the project.
But while I use an IDE almost exlcusively (IntelliJ), I am not sure I
agree that the project should
Yeah I think the point would be to make sure this happens
automatically. I too am wary of maintaining 2, 3, 4 style
configurations for the project.
But while I use an IDE almost exlcusively (IntelliJ), I am not sure I
agree that the project should assume an IDE, and therefore, there is
some point
On Jul 17, 2009, at 5:06 AM, Robin Anil wrote:
the reason i used countries was i couldn't think of some other
larger group
of labels.
Also wikipedia has over 100K categories, A document has multiple
categories
too. So finding a non overlapped sets of documents wasn't
easy(Which makes
it
On Jul 17, 2009, at 7:31 AM, Benson Margulies wrote:
So, would you smile on a patch that whomped all the indents to be
acceptable to this and automated setting eclipse settings? I don't
know about IntelliJ.
FWIW, I see no point in a tool that can't be replicated in an IDE.
Or is this some
Sounds like you're doing as much as checkstyle can here. Are you
saying the resulting configuration warns about indentation a lot? I
wonder if you can illustrate the nature of the warnings then? is it
"right" or is it apparently flagging things that look correctly
formatted according to our rules?
See http://checkstyle.sourceforge.net/config_misc.html under
Indentation. Checkstyle does not have a lot of knobs here. I turned
one and reduced the number of complaints. It's not obvious how to turn
the other two and match up with what's out there. Could I interest you
in having a look? Maybe I'm
I am for it. In theory I already whomped the code base to update the
formatting. If your patch produces a lot more formatting changes, then
maybe there is a mismatch between the rules we implemented. Let's
discuss before re-whomping. It's possible I got the rules wrong.
What's it look like, what's
[
https://issues.apache.org/jira/browse/MAHOUT-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732489#action_12732489
]
Grant Ingersoll commented on MAHOUT-146:
Yeah, it's really just a matter of renamin
So, would you smile on a patch that whomped all the indents to be
acceptable to this and automated setting eclipse settings? I don't
know about IntelliJ.
On Thu, Jul 16, 2009 at 10:05 PM, Ted Dunning wrote:
> Very cool.
>
> I would love to have consistency on this and be able to avoid messing up t
[
https://issues.apache.org/jira/browse/MAHOUT-146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732435#action_12732435
]
Robin Anil commented on MAHOUT-146:
---
Its already generic to some extend. Check out Mahout
>
> the reason i used countries was i couldn't think of some other larger group
> of labels.
> Also wikipedia has over 100K categories, A document has multiple categories
> too. So finding a non overlapped sets of documents wasn't easy(Which makes
> it easy to differentiate them).First thing I coul
[
https://issues.apache.org/jira/browse/MAHOUT-144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732399#action_12732399
]
Sean Owen commented on MAHOUT-144:
--
+1 to the checkstyle config and all those related chan
24 matches
Mail list logo