[Wikimedia-l] Re: Bing-ChatGPT

Felipe Schenone Wed, 29 Mar 2023 13:03:51 -0700

FYI, there's an open letter requesting a 6-month pause on AI development
<https://futureoflife.org/open-letter/pause-giant-ai-experiments/>, with
reasonable arguments (in my opinion) and signed by several big names too.
The basic rationale, as I understand it, is that similar to human
cloning, human germline modification, gain-of-function research and other
world-changing and potentially dangerous technologies, there should be some
kind of procedure to ensure that safety keeps pace with development, which
the current AI race is not allowing.


On Sun, Mar 19, 2023 at 5:20 AM Kimmo Virtanen <kimmo.virta...@wikimedia.fi>
wrote:

> Or, maybe just require an open disclosure of where the bot pulled from and
>> how much, instead of having it be a black box? "Text in this response
>> derived from: 17% Wikipedia article 'Example', 12% Wikipedia article
>> 'SomeOtherThing', 10%...".
>
>
> Current (ie. ChatGPT) systems doesn't work that way, as the source of
> information is lost in the process when the information is encoded into the
> model. The model is just a network of probabilities, and it is highly
> compressed compared to the original data. We are missing the point if we
> believe it is a copy of source data and not a tool to interact with
> information using natural languages.
>
> Soon, tools can retrieve data from external sources and write answers
> based on them[1]. For example, in the Wikipedia context, this would be to
> use a search engine to find information automatically, summarize findings,
> and generate references for the results. Or vice versa, retrieve
> information from Wikipedia or Wikidata. Then we will get source data, too,
> but the LLM model's internal reasoning will still be fuzzy.
>
> [1] https://interconnected.org/home/2023/03/16/singularity
>
> Br,
> -- Kimmo Virtanen
>
>
> On Sun, Mar 19, 2023 at 8:24 AM Todd Allen <toddmal...@gmail.com> wrote:
>
>> Or, maybe just require an open disclosure of where the bot pulled from
>> and how much, instead of having it be a black box? "Text in this response
>> derived from: 17% Wikipedia article 'Example', 12% Wikipedia article
>> 'SomeOtherThing', 10%...".
>>
>> On Sat, Mar 18, 2023 at 10:17 PM Steven Walling <steven.wall...@gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Sat, Mar 18, 2023 at 3:49 PM Erik Moeller <eloque...@gmail.com>
>>> wrote:
>>>
>>>> On Fri, Mar 17, 2023 at 7:05 PM Steven Walling <
>>>> steven.wall...@gmail.com> wrote:
>>>>
>>>> > IANAL of course, but to me this implies that responsibility for the
>>>> *egregious* lack
>>>> > of attribution in models that rely substantially on Wikipedia is
>>>> violating the Attribution
>>>> > requirements of CC licenses.
>>>>
>>>> Morally, I agree that companies like OpenAI would do well to recognize
>>>> and nurture the sources they rely upon in training their models.
>>>> Especially as the web becomes polluted with low quality AI-generated
>>>> content, it would seem in everybody's best interest to sustain the
>>>> communities and services that make and keep high quality information
>>>> available. Not just Wikimedia, but also the Internet Archive, open
>>>> access journals and preprint servers, etc.
>>>>
>>>> Legally, it seems a lot murkier. OpenAI in particular does not
>>>> distribute any of its GPT models. You can feed them prompts by various
>>>> means, and get responses back. Do those responses plagiarize
>>>> Wikipedia?
>>>>
>>>> With image-generating models like Stable Diffusion, it's been found
>>>> that the models sometimes generate output nearly indistinguishable
>>>> from source material [1]. I don't know if similar studies have been
>>>> undertaken for text-generating models yet. You can certainly ask GPT-4
>>>> to generate something that looks like a Wikipedia article -- here are
>>>> example results for generating a random Wikipedia article:
>>>>
>>>> Article: https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)
>>>> GPT-4
>>>> <https://en.wikipedia.org/wiki/The_Talented_Mr._Ripley_(film)GPT-4>
>>>> run 1: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/1
>>>> (cut off at the ChatGPT generation limit)
>>>> GPT-4 run 2:
>>>> https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2
>>>> GPT-4
>>>> <https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/2GPT-4> run
>>>> 3: https://en.wikipedia.org/wiki/User:Eloquence/GPT4_Example/3
>>>>
>>>> It imitates the form of a Wikipedia article & mixes up / makes up
>>>> assertions, but I don't know that any of its generations would meet
>>>> the standard of infringing on the Wikipedia article's copyright. IANAL
>>>> either, and as you say, the legal landscape is evolving rapidly.
>>>>
>>>> Warmly,
>>>> Erik
>>>
>>>
>>> The whole thing is definitely a hot mess. If the remixing/transformation
>>> by the model is a derivative work, it means OpenAI is potentially violating
>>> the ShareAlike requirement by not distributing the text output as CC. But
>>> on other hand the nature of the model means they’re combining CC and non
>>> free works freely / at random, unless a court would interpret whatever % of
>>> training data comes from us as the direct degree to which the model output
>>> is derived from Wikipedia. Either way it’s going to be up to some legal
>>> representation of copyright holders to test the boundaries here.
>>>
>>>
>>>> [1]
>>>> https://arstechnica.com/information-technology/2023/02/researchers-extract-training-images-from-stable-diffusion-but-its-difficult/
>>>> _______________________________________________
>>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org,
>>>> guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines
>>>> and https://meta.wikimedia.org/wiki/Wikimedia-l
>>>> Public archives at
>>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/CO3IJWXGHTBP3YE7AKUHHKPAL5HA56IC/
>>>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>>>
>>> _______________________________________________
>>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>>> https://meta.wikimedia.org/wiki/Wikimedia-l
>>> Public archives at
>>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4BZ5B4DFK3HTWM6CHPZ4Q4RDZIGIN26V/
>>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>>
>> _______________________________________________
>> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
>> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
>> https://meta.wikimedia.org/wiki/Wikimedia-l
>> Public archives at
>> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/4YHAFKDLAPFCNRQGAY77KWRIOIBRWVUH/
>> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
>
> _______________________________________________
> Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines
> at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and
> https://meta.wikimedia.org/wiki/Wikimedia-l
> Public archives at
> https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/MTEQEC2VX77LNI6D2XXIPRLAVXY37DXL/
> To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/MNT3ADJBO5QCOUPDF53NDIXZGTFP6W44/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Bing-ChatGPT

Reply via email to