Hi,
On 2019-05-21 23:52, Paul Wise wrote:
> Has anyone repeated the training of Mozilla DeepSpeech for example?
By chance I found a paper from a pile of papers (that attacks AI models)
that Berkeley researchers have successfully attacked DeepSpeech:
https://arxiv.org/pdf/1801.01944.pdf
IHMO
On Thu, 23 May 2019, Andy Simpkins wrote:
> Your wording "The model /should/be reproducible with a fixed random
> seed." feels correct but wonder if guidance notes along the following
> lines should be added?
Reproducing exact results from a deep learning model which requires
extensive
Hi PICCA,
On 2019-05-24 12:01, PICCA Frederic-Emmanuel wrote:
> What about ibm power9 with pocl ?
>
> it seems that this is better than the latest NVIDIA GPU.
The typical workload for training neural networks is linear
operations such as general matrix-matrix multiplication and
convolution.
I
Hi Paul,
On 2019-05-24 11:50, Paul Wise wrote:
> On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote:
>
>> Non-free nvidia driver is inevitable.
>> AMD GPUs and OpenCL are not sane choices.
>
> So no model which cannot be CPU-trained is suitable for Debian main.
I've already pointed out that 1
Hi Adam,
On 2019-05-24 10:19, Adam Borowski wrote:
>
> I'm not so sure this model would be unacceptable. It's no different than
> a game's image being a photo of a tree in your garden -- not reproducible by
> anyone but you (or someone you invite). Or, a wordlist frequency produced
> by
On Fri, 2019-05-24 at 10:43 -0400, Sam Hartman wrote:
> I wonder whether we'd accept a developer's assertion that some large pdf
> in a source package could be rebuilt without actually rebuilding it on
> every upload.
As I understand it, ftp-master policy is that things in main be
buildable
On Fri, May 24, 2019 at 10:43:34AM -0400, Sam Hartman wrote:
> I wonder whether we'd accept a developer's assertion that some large pdf
> in a source package could be rebuilt without actually rebuilding it on
> every upload.
> I think we probably would.
I dont think so, actually and AFAIK, we
> "Paul" == Paul Wise writes:
Paul> On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:
>> So for deep learning models we would require that they be
>> retrainable and typically require that we have retrained them.
Paul> I don't think it is currently feasible for Debian to
On Fri, 2019-05-24 at 03:14 -0700, Mo Zhou wrote:
> Non-free nvidia driver is inevitable.
> AMD GPUs and OpenCL are not sane choices.
So no model which cannot be CPU-trained is suitable for Debian main.
> Don't doubt. Nouveau can never support CUDA well.
There is coriander but nouveau doesn't
On Thu, May 23, 2019 at 11:37:41PM -0700, Mo Zhou wrote:
> - The datasets used for training a "ToxicCandy" may be
> private/non-free and not everybody can access them. (This case is more
> likely a result of problematic upstream licensing, but it sometimes
> happens).
>
> One got a free
On 2019-05-24 15:59, Paul Wise wrote:
> On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:
>
>> So for deep learning models we would require that they be retrainable
>> and typically require that we have retrained them.
>
> I don't think it is currently feasible for Debian to retrain the
>
On Fri, May 24, 2019 at 1:58 AM Sam Hartman wrote:
> So for deep learning models we would require that they be retrainable
> and typically require that we have retrained them.
I don't think it is currently feasible for Debian to retrain the
models. I don't think we have any buildds with GPUs
On 2019-05-23 17:58, Sam Hartman wrote:
> So for deep learning models we would require that they be retrainable
> and typically require that we have retrained them.
The two difficulties make the above point not easy to achieve:
Hi Andy,
On 2019-05-23 17:52, Andy Simpkins wrote:
> Sam.
> Whilst i agree that "assets" in some packages may not have sources
> with them and the application may still be in main if it pulls in
> those assets from contrib or non free.
> I am trying to suggest the same thing here. If the data
Hi Sam,
On 2019-05-23 15:33, Sam Hartman wrote:
> I don't think that's entirely true.
Yes, that's a bit cruel to upstream.
> Reproducibility is still an issue, but is no more or less an issue than
> with any other software.
Bit-by-bit reproducibility is not quite practical for now. The
refined
Hi Andy,
Thanks for you comments.
On 2019-05-23 09:28, Andy Simpkins wrote:
> Your wording "The model /should/be reproducible with a fixed random seed."
> feels
> correct but wonder if guidance notes along the following lines should be
> added?
>
> *unless* we can reproduce the same
Hi,
On 2019-05-22 12:43, Sam Hartman wrote:
> So, I think it's problematic to apply old assumptions to new areas. The
> reproducible builds world has gotten a lot further with bit-for-bit
> identical builds than I ever imagined they would.
I overhauled the reproducibility section. And lowered
Sam.
Whilst i agree that "assets" in some packages may not have sources with them
and the application may still be in main if it pulls in those assets from
contrib or non free.
I am trying to suggest the same thing here. If the data set is unknown this is
the *same* as a dependancy on a
> "Andy" == Andy Simpkins writes:
Andy> wouldn't put that in main. It is my belief that we consider
Andy> training data sets as 'source' in much the same way /Andy
I agree that we consider training data sets as source.
We require the binaries we ship to be buildable from
> "Andy" == Andy Simpkins writes:
Andy> *unless* we can reproduce the same results, from the same
Andy> training data, you cannot classify as group 1, "Free
Andy> Model", because verification that training has been
Andy> carried out on the dataset explicitly
On 22/05/2019 03:53, Mo Zhou wrote:
Hi Tzafrir,
On 2019-05-21 19:58, Tzafrir Cohen wrote:
Is there a way to prove in some way (reproducible build or something
similar) that the results were obtained from that set using the specific
algorithm?
I wrote a dedicated section about
> "Mo" == Mo Zhou writes:
Mo> Hi Holger, Yes, that section is about bit-by-bit
Mo> reproducibility, and identical hashsum is expected. Let's call
Mo> it "Bit-by-Bit reproducible".
Mo> I updated that section to make the definition of "reproducible"
Mo> explicit. And the
On Wed, May 22, 2019 at 03:35:20AM -0700, Mo Zhou wrote:
> Yes, that section is about bit-by-bit reproducibility,
> and identical hashsum is expected. Let's call it
> "Bit-by-Bit reproducible".
cool!
> I updated that section to make the definition
> of "reproducible" explicit.
thank you!
>
Hi Holger,
Yes, that section is about bit-by-bit reproducibility,
and identical hashsum is expected. Let's call it
"Bit-by-Bit reproducible".
I updated that section to make the definition
of "reproducible" explicit. And the strongest one
is discussed by default.
However, I'm not sure whether
On Tue, May 21, 2019 at 07:53:34PM -0700, Mo Zhou wrote:
> I wrote a dedicated section about reproducibility:
> https://salsa.debian.org/lumin/deeplearning-policy#neural-network-reproducibility
nice, very!
Though you dont specify what 'reproducible' means. Given your last line
in this email (see
Hi Tzafrir,
On 2019-05-21 19:58, Tzafrir Cohen wrote:
> Is there a way to prove in some way (reproducible build or something
> similar) that the results were obtained from that set using the specific
> algorithm?
I wrote a dedicated section about reproducibility:
Hi Paul,
On 2019-05-21 23:52, Paul Wise wrote:
> Are there any other case studies we could add?
Anybody is welcome to open an issue and add more
cases to the document. I can dig into them in the
future.
> Has anyone repeated the training of Mozilla DeepSpeech for example?
Generally speaking,
Hi Ben,
Good catch! I'm quite sure that the 3 categories are not overlapping
with each other. And I've fixed the language to make it logically
correct:
A **ToxicCandy Model** refers to an explicitly free software licensed
model, trained from unknown or non-free dataset.
A model is
On Tue, 2019-05-21 at 03:14 -0700, Mo Zhou wrote:
> They are added to the case study section.
Are there any other case studies we could add?
Has anyone repeated the training of Mozilla DeepSpeech for example?
Are deep learning models deterministically and reproducibly trainable?
If I re-train
On Tue, 2019-05-21 at 00:11 -0700, Mo Zhou wrote:
[...]
> People do lazy execution on this problem. Now that a
> related package entered my packaging radar, and I think
> I'd better write a draft and shed some light on a safety
> area. Then here is the first humble attempt:
>
>
Hi,
On 21/05/2019 12:07, Andreas Tille wrote:
> If you ask me bothering buildd with this task is insane. However I'm
> positively convinced that we should ship the training data and be able
> to train the models from these.
>
Is there a way to prove in some way (reproducible build or
Hi Paul,
They are added to the case study section. And I like
that question from ffmpeg-devel:
Where is the source for all those numbers?
On 2019-05-21 08:02, Paul Wise wrote:
> On Tue, May 21, 2019 at 3:11 PM Mo Zhou wrote:
>
>> I'd better write a draft and shed some light on a safety
>>
Hi Mo,
thanks again for all your effort for Deep Learning in Debian.
Please note, that I'm not competent in this field.
On Tue, May 21, 2019 at 12:11:14AM -0700, Mo Zhou wrote:
>
> https://salsa.debian.org/lumin/deeplearning-policy
> (issue tracker is enabled)
Not sure whether this is
On Tue, May 21, 2019 at 3:11 PM Mo Zhou wrote:
> I'd better write a draft and shed some light on a safety
> area. Then here is the first humble attempt:
>
> https://salsa.debian.org/lumin/deeplearning-policy
The policy looks good to me.
A couple of situations this related to this policy:
Hi people,
A year ago I raised a topic on -devel, pointing out the
"deep learning v.s. software freedom" issue. We drew no
conclusion at that time, and linux distros who care about
software freedom may still have doubt on some fundamental
problems, e.g. "is this piece of deep learning software
35 matches
Mail list logo