[Numpy-discussion] Re: Policy on AI-generated code

2024-07-05 Thread Adrin
FWIW, as Loïc already mentioned, we had the same discussions on the scikit-learn side. We noticed every now and then a few issues / PRs would come which were clearly AI generated, and in almost all those cases, the account posting them didn't look like a human / didn't have a history on GH. At th

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-05 Thread Matthew Brett
Hi, On Fri, Jul 5, 2024 at 3:17 PM Adrin wrote: > > FWIW, as Loïc already mentioned, we had the same discussions on the > scikit-learn side. > > We noticed every now and then a few issues / PRs would come which were > clearly AI generated, and in almost all those cases, the account posting them

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Ralf Gommers
On Thu, Jul 4, 2024 at 8:42 PM Matthew Brett wrote: > Hi, > > On Thu, Jul 4, 2024 at 6:44 PM Ralf Gommers > wrote: > > > > > > > > On Thu, Jul 4, 2024 at 5:08 PM Matthew Brett > wrote: > >> > >> Hi, > >> > >> On Thu, Jul 4, 2024 at 3:41 PM Ralf Gommers > wrote: > >> > > >> > > >> > > >> > On T

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 6:44 PM Ralf Gommers wrote: > > > > On Thu, Jul 4, 2024 at 5:08 PM Matthew Brett wrote: >> >> Hi, >> >> On Thu, Jul 4, 2024 at 3:41 PM Ralf Gommers wrote: >> > >> > >> > >> > On Thu, Jul 4, 2024 at 1:34 PM Matthew Brett >> > wrote: >> >> >> >> Hi, >> >> >> >> On Thu

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 7:09 PM Stefan Krah wrote: > > On Thu, Jul 04, 2024 at 04:18:03PM +0100, Matthew Brett wrote: > > I feel sure we would want to avoid GPL code if the copyright holders > > felt that we were abusing their license - regardless of whether the > > court felt the copyright wa

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Stefan Krah
On Thu, Jul 04, 2024 at 03:46:02PM +, Rohit Goswami wrote: > Doesn't the project adopting wording of this kind "pass the buck" onto the > maintainers? I think it depends. NetBSD's AI policy mentions the responsibility of the committers: https://www.netbsd.org/developers/commit-guidelines.h

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Stefan Krah
On Thu, Jul 04, 2024 at 04:18:03PM +0100, Matthew Brett wrote: > I feel sure we would want to avoid GPL code if the copyright holders > felt that we were abusing their license - regardless of whether the > court felt the copyright was realistically enforceable. Apologies for probably stating the o

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Ralf Gommers
On Thu, Jul 4, 2024 at 5:08 PM Matthew Brett wrote: > Hi, > > On Thu, Jul 4, 2024 at 3:41 PM Ralf Gommers > wrote: > > > > > > > > On Thu, Jul 4, 2024 at 1:34 PM Matthew Brett > wrote: > >> > >> Hi, > >> > >> On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers > wrote: > >> > > >> > > >> > > >> > On

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 4:46 PM Rohit Goswami wrote: > > Doesn't the project adopting wording of this kind "pass the buck" onto the > maintainers? At the end of the day, failure to enforce our stated policy will > be not only the responsibility of the authors but also the reviewers / > main

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Stefan van der Walt via NumPy-Discussion
On Thu, Jul 4, 2024, at 08:18, Daniele Nicolodi wrote: > I wish it for be common sense for contributors to an open source > codebase that they need to own the copyright on their contributions, but > I don't think it can be assumed. Adding something to these lines to the > project policy has also

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Rohit Goswami
Doesn't the project adopting wording of this kind "pass the buck" onto the maintainers? At the end of the day, failure to enforce our stated policy will be not only the responsibility of the authors but also the reviewers / maintainers on whole. In effect (and just speaking personally) wording

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 4:04 PM Daniele Nicolodi wrote: > > On 04/07/24 12:49, Matthew Brett wrote: > > Hi, > > > > Sorry to top-post! But - I wanted to bring the discussion back to > > licensing. I have great sympathy for the ecological and code-quality > > concerns, but licensing is a sepa

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi
On 04/07/24 13:29, Matthew Brett wrote: I agree it is hard to enforce, but it seems to me it would be a reasonable defensive move to say - for now - that authors will need to take full responsibility for copyright, and that, as of now, AI-generated code cannot meet that standard, so we require au

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Rohit Goswami
> Personally, I wouldn't (as a maintainer)... Especially since I know that many potential contributors may not have English as their first language so stunted language / odd patterns are not **always** an AI indicator, sometimes its just inexperience. -- Rohit On 7/4/24 3:03 PM, Daniele Nico

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 4:11 PM wrote: > > Personally, I wouldn't (as a maintainer) take a decision to reject code based > on if I feel it is generated by AI. It is much easier to rule on the quality > of the contribution itself, and as noted, at least so far the AI only > contributions are

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread rgoswami
Personally, I wouldn't (as a maintainer) take a decision to reject code based on if I feel it is generated by AI. It is much easier to rule on the quality of the contribution itself, and as noted, at least so far the AI only contributions are very probably not going to clear the barrier of bein

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 3:41 PM Ralf Gommers wrote: > > > > On Thu, Jul 4, 2024 at 1:34 PM Matthew Brett wrote: >> >> Hi, >> >> On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers wrote: >> > >> > >> > >> > On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett >> > wrote: >> >> >> >> Sorry - reposting fr

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi
On 04/07/24 12:49, Matthew Brett wrote: Hi, Sorry to top-post! But - I wanted to bring the discussion back to licensing. I have great sympathy for the ecological and code-quality concerns, but licensing is a separate question, and, it seems to me, an urgent question. The licensing issue is c

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Ralf Gommers
On Thu, Jul 4, 2024 at 1:34 PM Matthew Brett wrote: > Hi, > > On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers > wrote: > > > > > > > > On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett > wrote: > >> > >> Sorry - reposting from my subscribed address: > >> > >> Hi, > >> > >> Sorry to top-post! But - I

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Bill Ross
> It is perfectly possible that the AI will largely or completely reproduce > some existing GPL code for A, from its training data. There is no way that I > could know that the AI has done that without some substantial research. Even if it did, what if the common code were arrived at independe

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, On Thu, Jul 4, 2024 at 12:20 PM Ralf Gommers wrote: > > > > On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett wrote: >> >> Sorry - reposting from my subscribed address: >> >> Hi, >> >> Sorry to top-post! But - I wanted to bring the discussion back to >> licensing. I have great sympathy for the

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Ralf Gommers
On Thu, Jul 4, 2024 at 12:55 PM Matthew Brett wrote: > Sorry - reposting from my subscribed address: > > Hi, > > Sorry to top-post! But - I wanted to bring the discussion back to > licensing. I have great sympathy for the ecological and code-quality > concerns, but licensing is a separate quest

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Sorry - reposting from my subscribed address: Hi, Sorry to top-post! But - I wanted to bring the discussion back to licensing. I have great sympathy for the ecological and code-quality concerns, but licensing is a separate question, and, it seems to me, an urgent question. Imagine I asked some

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Matthew Brett
Hi, Sorry to top-post! But - I wanted to bring the discussion back to licensing. I have great sympathy for the ecological and code-quality concerns, but licensing is a separate question, and, it seems to me, an urgent question. Imagine I asked some AI to give me code to replicate a particular a

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Marten van Kerkwijk
Hi All, I agree with Dan that the actual contributions to the documentation are of little value: it is not easy to write good documentation, with examples that show not just the mechnanics but the purpose of the function, i.e., go well beyond just showing some random inputs and outputs. And poorl

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread Daniele Nicolodi
On 03/07/24 23:40, Matthew Brett wrote: Hi, We recently got a set of well-labeled PRs containing (reviewed) AI-generated code: https://github.com/numpy/numpy/pull/26827 https://github.com/numpy/numpy/pull/26828 https://github.com/numpy/numpy/pull/26829 https://github.com/numpy/numpy/pull/26830

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-04 Thread rgoswami
From a quick look, it seems like some of these (the masked array ones) are trivial enough to not warrant inclusion and the ctypes snippet is obvious enough that copyright claims won't be an issue. In terms of broader policy I don't really have much to say, except that in general it is probably

[Numpy-discussion] Re: Policy on AI-generated code

2024-07-03 Thread Loïc Estève via NumPy-Discussion
Hi, in scikit-learn, more of a FYI than some kind of policy (amongst other things it does not even mention explicitly "AI" and avoids the licence discussion), we recently added a note in our FAQ about "fully automated tools": https://github.com/scikit-learn/scikit-learn/pull/29287 From my persona