Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-29 Thread Mo Zhou
Hi, On 2019-05-21 23:52, Paul Wise wrote: > Has anyone repeated the training of Mozilla DeepSpeech for example? By chance I found a paper from a pile of papers (that attacks AI models) that Berkeley researchers have successfully attacked DeepSpeech: https://arxiv.org/pdf/1801.01944.pdf IHMO

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-25 Thread Don Armstrong
On Thu, 23 May 2019, Andy Simpkins wrote: > Your wording "The model /should/be reproducible with a fixed random > seed." feels correct but wonder if guidance notes along the following > lines should be added? Reproducing exact results from a deep learning model which requires extensive

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-25 Thread Mo Zhou
Hi PICCA, On 2019-05-24 12:01, PICCA Frederic-Emmanuel wrote: > What about ibm power9 with pocl ? > > it seems that this is better than the latest NVIDIA GPU. The typical workload for training neural networks is linear operations such as general matrix-matrix multiplication and convolution. I

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-25 Thread Mo Zhou
Hi Paul, On 2019-05-24 11:50, Paul Wise wrote: > On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote: > >> Non-free nvidia driver is inevitable. >> AMD GPUs and OpenCL are not sane choices. > > So no model which cannot be CPU-trained is suitable for Debian main. I've already pointed out that 1

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-25 Thread Mo Zhou
Hi Adam, On 2019-05-24 10:19, Adam Borowski wrote: > > I'm not so sure this model would be unacceptable. It's no different than > a game's image being a photo of a tree in your garden -- not reproducible by > anyone but you (or someone you invite). Or, a wordlist frequency produced > by

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, 2019-05-24 at 10:43 -0400, Sam Hartman wrote: > I wonder whether we'd accept a developer's assertion that some large pdf > in a source package could be rebuilt without actually rebuilding it on > every upload. As I understand it, ftp-master policy is that things in main be buildable

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Holger Levsen
On Fri, May 24, 2019 at 10:43:34AM -0400, Sam Hartman wrote: > I wonder whether we'd accept a developer's assertion that some large pdf > in a source package could be rebuilt without actually rebuilding it on > every upload. > I think we probably would. I dont think so, actually and AFAIK, we

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Sam Hartman
> "Paul" == Paul Wise writes: Paul> On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote: >> So for deep learning models we would require that they be >> retrainable and typically require that we have retrained them. Paul> I don't think it is currently feasible for Debian to

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote: > Non-free nvidia driver is inevitable. > AMD GPUs and OpenCL are not sane choices. So no model which cannot be CPU-trained is suitable for Debian main. > Don't doubt. Nouveau can never support CUDA well. There is coriander but nouveau doesn't

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Adam Borowski
On Thu, May 23, 2019 at 11:37:41PM -0700, Mo Zhou wrote: > - The datasets used for training a "ToxicCandy" may be > private/non-free and not everybody can access them. (This case is more > likely a result of problematic upstream licensing, but it sometimes > happens). > > One got a free

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
On 2019-05-24 15:59, Paul Wise wrote: > On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote: > >> So for deep learning models we would require that they be retrainable >> and typically require that we have retrained them. > > I don't think it is currently feasible for Debian to retrain the >

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Paul Wise
On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote: > So for deep learning models we would require that they be retrainable > and typically require that we have retrained them. I don't think it is currently feasible for Debian to retrain the models. I don't think we have any buildds with GPUs

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
On 2019-05-23 17:58, Sam Hartman wrote: > So for deep learning models we would require that they be retrainable > and typically require that we have retrained them. The two difficulties make the above point not easy to achieve:

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
Hi Andy, On 2019-05-23 17:52, Andy Simpkins wrote: > Sam. > Whilst i agree that "assets" in some packages may not have sources > with them and the application may still be in main if it pulls in > those assets from contrib or non free. > I am trying to suggest the same thing here. If the data

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
Hi Sam, On 2019-05-23 15:33, Sam Hartman wrote: > I don't think that's entirely true. Yes, that's a bit cruel to upstream. > Reproducibility is still an issue, but is no more or less an issue than > with any other software. Bit-by-bit reproducibility is not quite practical for now. The refined

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-24 Thread Mo Zhou
Hi Andy, Thanks for you comments. On 2019-05-23 09:28, Andy Simpkins wrote: > Your wording "The model /should/be reproducible with a fixed random seed." > feels > correct but wonder if guidance notes along the following lines should be > added? > >     *unless* we can reproduce the same

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-23 Thread Mo Zhou
Hi, On 2019-05-22 12:43, Sam Hartman wrote: > So, I think it's problematic to apply old assumptions to new areas. The > reproducible builds world has gotten a lot further with bit-for-bit > identical builds than I ever imagined they would. I overhauled the reproducibility section. And lowered

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-23 Thread Andy Simpkins
Sam. Whilst i agree that "assets" in some packages may not have sources with them and the application may still be in main if it pulls in those assets from contrib or non free. I am trying to suggest the same thing here. If the data set is unknown this is the *same* as a dependancy on a

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-23 Thread Sam Hartman
> "Andy" == Andy Simpkins writes: Andy> wouldn't put that in main. It is my belief that we consider Andy> training data sets as 'source' in much the same way /Andy I agree that we consider training data sets as source. We require the binaries we ship to be buildable from

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-23 Thread Sam Hartman
> "Andy" == Andy Simpkins writes: Andy>     *unless* we can reproduce the same results, from the same Andy> training data,     you cannot classify as group 1, "Free Andy> Model", because verification that     training has been Andy> carried out on the dataset explicitly

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-23 Thread Andy Simpkins
On 22/05/2019 03:53, Mo Zhou wrote: Hi Tzafrir, On 2019-05-21 19:58, Tzafrir Cohen wrote: Is there a way to prove in some way (reproducible build or something similar) that the results were obtained from that set using the specific algorithm? I wrote a dedicated section about

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-22 Thread Sam Hartman
> "Mo" == Mo Zhou writes: Mo> Hi Holger, Yes, that section is about bit-by-bit Mo> reproducibility, and identical hashsum is expected. Let's call Mo> it "Bit-by-Bit reproducible". Mo> I updated that section to make the definition of "reproducible" Mo> explicit. And the

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-22 Thread Holger Levsen
On Wed, May 22, 2019 at 03:35:20AM -0700, Mo Zhou wrote: > Yes, that section is about bit-by-bit reproducibility, > and identical hashsum is expected. Let's call it > "Bit-by-Bit reproducible". cool! > I updated that section to make the definition > of "reproducible" explicit. thank you! >

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-22 Thread Mo Zhou
Hi Holger, Yes, that section is about bit-by-bit reproducibility, and identical hashsum is expected. Let's call it "Bit-by-Bit reproducible". I updated that section to make the definition of "reproducible" explicit. And the strongest one is discussed by default. However, I'm not sure whether

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-22 Thread Holger Levsen
On Tue, May 21, 2019 at 07:53:34PM -0700, Mo Zhou wrote: > I wrote a dedicated section about reproducibility: > https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility nice, very! Though you dont specify what 'reproducible' means. Given your last line in this email (see

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Mo Zhou
Hi Tzafrir, On 2019-05-21 19:58, Tzafrir Cohen wrote: > Is there a way to prove in some way (reproducible build or something > similar) that the results were obtained from that set using the specific > algorithm? I wrote a dedicated section about reproducibility:

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Mo Zhou
Hi Paul, On 2019-05-21 23:52, Paul Wise wrote: > Are there any other case studies we could add? Anybody is welcome to open an issue and add more cases to the document. I can dig into them in the future. > Has anyone repeated the training of Mozilla DeepSpeech for example? Generally speaking,

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Mo Zhou
Hi Ben, Good catch! I'm quite sure that the 3 categories are not overlapping with each other. And I've fixed the language to make it logically correct: A **ToxicCandy Model** refers to an explicitly free software licensed model, trained from unknown or non-free dataset. A model is

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Paul Wise
On Tue, 2019-05-21 at 03:14 -0700, Mo Zhou wrote: > They are added to the case study section. Are there any other case studies we could add? Has anyone repeated the training of Mozilla DeepSpeech for example? Are deep learning models deterministically and reproducibly trainable? If I re-train

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Ben Hutchings
On Tue, 2019-05-21 at 00:11 -0700, Mo Zhou wrote: [...] > People do lazy execution on this problem. Now that a > related package entered my packaging radar, and I think > I'd better write a draft and shed some light on a safety > area. Then here is the first humble attempt: > >

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Tzafrir Cohen
Hi, On 21/05/2019 12:07, Andreas Tille wrote: > If you ask me bothering buildd with this task is insane. However I'm > positively convinced that we should ship the training data and be able > to train the models from these. > Is there a way to prove in some way (reproducible build or

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Mo Zhou
Hi Paul, They are added to the case study section. And I like that question from ffmpeg-devel: Where is the source for all those numbers? On 2019-05-21 08:02, Paul Wise wrote: > On Tue, May 21, 2019 at 3:11 PM Mo Zhou wrote: > >> I'd better write a draft and shed some light on a safety >>

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Andreas Tille
Hi Mo, thanks again for all your effort for Deep Learning in Debian. Please note, that I'm not competent in this field. On Tue, May 21, 2019 at 12:11:14AM -0700, Mo Zhou wrote: > > https://salsa.debian.org/lumin/deeplearning-policy > (issue tracker is enabled) Not sure whether this is

Re: Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Paul Wise
On Tue, May 21, 2019 at 3:11 PM Mo Zhou wrote: > I'd better write a draft and shed some light on a safety > area. Then here is the first humble attempt: > > https://salsa.debian.org/lumin/deeplearning-policy The policy looks good to me. A couple of situations this related to this policy:

Bits from /me: A humble draft policy on "deep learning v.s. freedom"

2019-05-21 Thread Mo Zhou
Hi people, A year ago I raised a topic on -devel, pointing out the "deep learning v.s. software freedom" issue. We drew no conclusion at that time, and linux distros who care about software freedom may still have doubt on some fundamental problems, e.g. "is this piece of deep learning software