Re: Next Steps For the Software Heritage Problem

2024-06-28 Thread Juliana Sims
Hey y'all, I've avoided weighing in on this topic because I'm of two minds about it. Still, when members of the community raise concerns, it's important to take those concerns seriously. We must be careful how we address them because the opinions and concerns of any community member are as

Re: Next Steps For the Software Heritage Problem

2024-06-27 Thread Ludovic Courtès
Hi, Ian Eure skribis: > While this is what their paper claims[1], it doesn’t appear to be > true, since I can see my own GPL’d code in the training set. I’ve > since moved nearly all of my code off GitHub, but if you visit their > "Am I in The Stack?" page[2] and enter my old username

Re: Next Steps For the Software Heritage Problem

2024-06-27 Thread Development of GNU Guix and the GNU System distribution.
Hi Ian, On Thu, Jun 27 2024, Ian Eure wrote: > I’ve [...] moved nearly all of my code off GitHub Me too. I think closed it off from search crawlers. No one should be using Github anymore except for merge requests. I left many years ago. > if you visit their "Am I in The Stack?" page Thank

Re: Next Steps For the Software Heritage Problem

2024-06-27 Thread Ian Eure
Hi Ludo, Ludovic Courtès writes: Ian Eure skribis: Guix sends archive requests to SWH. SWH gives that source code to HuggingFace. HuggingFace demonstrably violates the licenses. Which licenses? As has been said previously, and you can verify for yourself, it does not ingest code

Re: Next Steps For the Software Heritage Problem

2024-06-27 Thread Ludovic Courtès
Ian Eure skribis: > Guix sends archive requests to SWH. SWH gives that source code to > HuggingFace. HuggingFace demonstrably violates the licenses. Which licenses? As has been said previously, and you can verify for yourself, it does not ingest code under copyleft licenses. Ludo’.

Re: Next Steps For the Software Heritage Problem

2024-06-21 Thread Luis Felipe
Hi, El 21/06/24 a las 9:19, MSavoritias escribió: On Fri, 21 Jun 2024 09:41:10 +0100 Dale Mellor wrote: `-x archival` does it, but it is too easy to forget and once the cat is out of the bag privacy is lost. I really think this should be default behaviour, or at least there should be

Re: Next Steps For the Software Heritage Problem

2024-06-21 Thread MSavoritias
On Fri, 21 Jun 2024 09:41:10 +0100 Dale Mellor wrote: > On Thu, 2024-06-20 at 22:59 +0200, Ekaitz Zarraga wrote: > > Hi, > > > > On 2024-06-20 22:54, Andreas Enge wrote: > > > Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor: > > > > I'm sure guix lint tried to push my code out

Re: Next Steps For the Software Heritage Problem

2024-06-21 Thread MSavoritias
On Thu, 20 Jun 2024 16:40:57 +0200 Simon Tournier wrote: > Being concrete and explicit, could you please share: > > 1. Which part of your code is included in the pretraining dataset? > > It’s easy, you can copy/paste a snippet and it returns the location > from where it comes from. >

Re: Next Steps For the Software Heritage Problem

2024-06-21 Thread MSavoritias
On Thu, 20 Jun 2024 16:35:10 +0200 Ekaitz Zarraga wrote: > > 2. You seem to imply that Free Software or code is apolitical. (in the > > sense of social or state politics not) Which it is not. Nothing is. > > For example Free Software is explicitly pro-capitalist and > > pro-Google/big companies.

Re: Next Steps For the Software Heritage Problem

2024-06-21 Thread Dale Mellor
On Thu, 2024-06-20 at 22:59 +0200, Ekaitz Zarraga wrote: > Hi, > > On 2024-06-20 22:54, Andreas Enge wrote: > > Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor: > > > I'm sure guix lint tried to push my code out to them the last time I > > > tried. > > > > Ah indeed, there is this

Re: Next Steps For the Software Heritage Problem (Dale Mellor)

2024-06-21 Thread MSavoritias
On Thu, 20 Jun 2024 14:43:30 -0700 Andy Tai wrote: > > Date: Wed, 19 Jun 2024 09:36:29 +0100 > > From: Dale Mellor > > I use Guix as a tool to develop my own projects, private and > > personal for reasons I'm keeping to myself. As part of that I write package > > definitions for them, and use

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Simon Tournier
Hi, On Thu, 20 Jun 2024 at 19:42, Dale Mellor wrote: > I'm sure guix lint tried to push my code out to them the last time I > tried. Yes, it’s the checker ’archival’. Therefore, running “guix lint -x archival” does not send any request to SWH. Cheers, simon

Re: Next Steps For the Software Heritage Problem (Dale Mellor)

2024-06-20 Thread Andy Tai
> Date: Wed, 19 Jun 2024 09:36:29 +0100 > From: Dale Mellor > I use Guix as a tool to develop my own projects, private and > personal for reasons I'm keeping to myself. As part of that I write package > definitions for them, and use the Guix machinery to build and test. I > *cannot* > have

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Andreas Enge
Am Thu, Jun 20, 2024 at 10:59:41PM +0200 schrieb Ekaitz Zarraga: > For this specific case we could add some flag to the command line like > `--do-not-archive` or something like that. guix lint -x archival if I understand "guix lint --help" correctly. Andreas

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Ekaitz Zarraga
Hi, On 2024-06-20 22:54, Andreas Enge wrote: Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor: I'm sure guix lint tried to push my code out to them the last time I tried. Ah indeed, there is this in guix/lint.scm: (define (check-archival package) "Check whether PACKAGE's

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Andreas Enge
Am Thu, Jun 20, 2024 at 07:42:44PM +0100 schrieb Dale Mellor: > I'm sure guix lint tried to push my code out to them the last time I tried. Ah indeed, there is this in guix/lint.scm: (define (check-archival package) "Check whether PACKAGE's source code is archived on Software Heritage. If

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Dale Mellor
On Thu, 2024-06-20 at 19:00 +0200, Andreas Enge wrote: > Am Wed, Jun 19, 2024 at 09:36:29AM +0100 schrieb Dale Mellor: > >   No, it's not.  I use Guix as a tool to develop my own projects, private > > and > > personal for reasons I'm keeping to myself.  As part of that I write package > >

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Andreas Enge
Am Wed, Jun 19, 2024 at 09:36:29AM +0100 schrieb Dale Mellor: > No, it's not. I use Guix as a tool to develop my own projects, private and > personal for reasons I'm keeping to myself. As part of that I write package > definitions for them, and use the Guix machinery to build and test. I >

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Dale Mellor
On Tue, 2024-06-18 at 07:19 -0700, Ian Eure wrote: > Hi MSavoritias, > > Thank you for the email. > > I’m going to lay out this situation as clearly as I can, in the > hope that others will better understand, and hopefully treat it > with the seriousness it deserves. > > 1. Guix requests SWH

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Simon Tournier
Hi MSavoritias, all, On Thu, 20 Jun 2024 at 09:51, MSavoritias wrote: >> Not to avoid the question but from a pragmatic point of view, one >> might ask if the source code you write and do not want to be included >> in the training dataset, if this source code is concretely part of >> that

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread Ekaitz Zarraga
Hi, On 2024-06-20 08:36, MSavoritias wrote: On Wed, 19 Jun 2024 17:46:08 +0200 Ekaitz Zarraga wrote: On 2024-06-19 12:25, raingl...@riseup.net wrote: On 2024-06-19 11:54, Efraim Flashner wrote: On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: ... One of our packages, dbxfs,

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread MSavoritias
On Wed, 19 Jun 2024 16:41:33 +0200 Simon Tournier wrote: > Hi MSavoritias, all, > > Let me provide more context. > > The concern started couple of months ago, to my knowledge. And > discussion is still on going. So I think that’s incorrect to say “any > result for over 6 months”. Hey Simon,

Re: Next Steps For the Software Heritage Problem

2024-06-20 Thread MSavoritias
On Wed, 19 Jun 2024 17:46:08 +0200 Ekaitz Zarraga wrote: > On 2024-06-19 12:25, raingl...@riseup.net wrote: > > On 2024-06-19 11:54, Efraim Flashner wrote: > >> On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: > >> ... > >> One of our packages, dbxfs, left Github a while ago and

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread MSavoritias
On Wed, 19 Jun 2024 19:56:26 -0700 Felix Lechner wrote: > Hi MSavoritias, > > On Wed, Jun 19 2024, MSavoritias wrote: > > > I am not interested what the states or licenses/copyrights allow or > > don't allow in this case. What I care about is what we expect as a > > community when we submit a

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Development of GNU Guix and the GNU System distribution.
Hi MSavoritias, On Wed, Jun 19 2024, MSavoritias wrote: > I am not interested what the states or licenses/copyrights allow or > don't allow in this case. What I care about is what we expect as a > community when we submit a package/code to guix and if that violates > our social rules and

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Ekaitz Zarraga
On 2024-06-19 12:25, raingl...@riseup.net wrote: On 2024-06-19 11:54, Efraim Flashner wrote: On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: ... One of our packages, dbxfs, left Github a while ago and continued development on a different forge. They adjusted their README to

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Simon Tournier
Hi MSavoritias, all, Let me provide more context. The concern started couple of months ago, to my knowledge. And discussion is still on going. So I think that’s incorrect to say “any result for over 6 months”. Moreover, I feel you have a misunderstanding about HuggingFace and SWH partnership.

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread MSavoritias
On Wed, 19 Jun 2024 12:54:30 +0300 Efraim Flashner wrote: > On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: > > On Wed, 19 Jun 2024 09:52:36 +0200 > > Simon Tournier wrote: > > > > > Hi Ian, all, > > > > > > On Tue, 18 Jun 2024 at 10:57, Ian Eure wrote: > > > > I think that

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread raingloom
On 2024-06-18 20:08, Ian Eure wrote: > Andy Tai writes: > >> What is the role of GNU Guix in this? If Guix is mainly a referral >> mechanism like web page links to the actual contents, the real problem >> is not Guix but the use of free software which can be obtained via >> other mechanisms

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread raingloom
On 2024-06-19 11:54, Efraim Flashner wrote: > On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: > ... > One of our packages, dbxfs, left Github a while ago and continued > development on a different forge. They adjusted their README to disallow > hosting of their code on Github. Based

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Efraim Flashner
On Tue, Jun 18, 2024 at 11:37:17AM +0300, MSavoritias wrote: > Hello, > So with that said I urge anybody who has been in contact with them in > an official Guix capacity to come forward, otherwise I can volunteer to > be that. Idk if we have a community outreach thing I need to be in also > for

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Efraim Flashner
On Wed, Jun 19, 2024 at 10:01:43AM +0300, MSavoritias wrote: > On Tue, 18 Jun 2024 13:31:02 -0400 > Greg Hogan wrote: > > > On Tue, Jun 18, 2024 at 12:33 PM MSavoritias > > wrote: > > > > > > > If you feel that LLMs/AI are violating the terms of a license, then > > feel free to pursue that

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Efraim Flashner
On Wed, Jun 19, 2024 at 12:13:38PM +0300, MSavoritias wrote: > On Wed, 19 Jun 2024 09:52:36 +0200 > Simon Tournier wrote: > > > Hi Ian, all, > > > > On Tue, 18 Jun 2024 at 10:57, Ian Eure wrote: > > > > > Guix is continuing to partner with SWH in spite of their continued > > > support of

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread MSavoritias
On Wed, 19 Jun 2024 09:52:36 +0200 Simon Tournier wrote: > Hi Ian, all, > > On Tue, 18 Jun 2024 at 10:57, Ian Eure wrote: > > > Guix is continuing to partner with SWH in spite of their continued > > support of these violations. > > Quickly because I am in the middle of a busy day. :-)

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread Simon Tournier
Hi Ian, all, On Tue, 18 Jun 2024 at 10:57, Ian Eure wrote: > Guix is continuing to partner with SWH in spite of their continued > support of these violations. Quickly because I am in the middle of a busy day. :-) I think that LLM asks ethical and legal question that even FSF or EFF or SFC

Re: Next Steps For the Software Heritage Problem

2024-06-19 Thread MSavoritias
On Tue, 18 Jun 2024 13:31:02 -0400 Greg Hogan wrote: > On Tue, Jun 18, 2024 at 12:33 PM MSavoritias > wrote: > > > > Ah it seems I wasn't clear enough. > > I meant write something like: > > > > By packaging a software project for Guix you are exposing said > > software to a code harvesting

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Ian Eure
Guix sends archive requests to SWH. SWH gives that source code to HuggingFace. HuggingFace demonstrably violates the licenses. Guix could stop sending archive requests to SWH. This wouldn’t *stop* the bad things from happening, but it would *stop condoning* them. The same as how Guix not

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Ian Eure
Hi Greg, Please read my earlier reply in this thread[1]. HuggingFace is demonstrably violating the licenses of the Free Software used to train its StarCoder2 LLM. Software Heritage is continuing to partner with HuggingFace in spite of these violations. Guix is continuing to partner with

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Greg Hogan
On Tue, Jun 18, 2024 at 12:33 PM MSavoritias wrote: > > Ah it seems I wasn't clear enough. > I meant write something like: > > By packaging a software project for Guix you are exposing said software > to a code harvesting project (also known as LLMs or "AI") run by > Software Heritage and/or

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Andy Tai
What is the role of GNU Guix in this? If Guix is mainly a referral mechanism like web page links to the actual contents, the real problem is not Guix but the use of free software which can be obtained via other mechanisms directly anyway to train LLMs if Guix is not in the loop?

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread MSavoritias
On Tue, 18 Jun 2024 12:21:33 -0400 Greg Hogan wrote: > On Tue, Jun 18, 2024 at 4:37 AM MSavoritias > wrote: > > > > 1. Add a clear disclaimer/requirment that any new package that is > > added in Guix, the person has to give consent or get consent from > > the person that the package is written

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Greg Hogan
On Tue, Jun 18, 2024 at 4:37 AM MSavoritias wrote: > > 1. Add a clear disclaimer/requirment that any new package that is added > in Guix, the person has to give consent or get consent from the person > that the package is written in. This needs to be added in the docs and > in the email

Re: Next Steps For the Software Heritage Problem

2024-06-18 Thread Ian Eure
Hi MSavoritias, Thank you for the email. I’m going to lay out this situation as clearly as I can, in the hope that others will better understand, and hopefully treat it with the seriousness it deserves. 1. Guix requests SWH to archive some source code. This is fine. 2. SWH archives the

Next Steps For the Software Heritage Problem

2024-06-18 Thread MSavoritias
Hello, Context: As you may already know there have discussions around Software Heritage and the LLM model they are collaborating with for a bit now. The model itself was announced at https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/ As I have started writing some