[Wikimedia-l] Re: Attribution of specific Wikipedia articles as sources of a LLM's output (Was: Bing-ChatGPT)

2023-09-16 Thread Maryana Pinchuk
Hi Lauren,

Thanks for attending the Future Audiences community call this week, and 
apologies that we ran out of time before I was able to answer your question 
about the ChatGPT plugin next steps – answering now :)

You wrote:
“As for the Foundation’s ChatGPT plugin, I’m afraid I find it mostly unusable 
because it ignores everything after the first dozen paragraphs of all articles. 
That was listed as needing 3-4 days to fix on 
https://phabricator.wikimedia.org/T343932 a month ago. Do you know whether 
there are any plans to go ahead with that fix?”

First: thank you for enabling the plugin and trying it out! We've just added a 
link to a survey that we're hoping will give us more feedback on whether/how 
the current plugin is meeting user needs. If you haven't already done so, I'd 
appreciate if you could take a few minutes to fill it out (you should see it in 
the footer of the plugin response). 

On your specific point about the 12-paragraph cutoff: yes, we recognize that 
this may lead to the plugin not finding relevant information in some cases 
(though from our qualitative coding of about 300 query responses in 6 
languages, the plugin was able to return relevant information about 84% of the 
time[1]). However, as I tried to stress in my blog post,[2] this plugin is 
intended to be a quick experiment and not a fully-featured, permanently 
maintained product. We're tracking many possible optimizations, such as changes 
to the 12-paragraph cutoff, showing references, improving output quality, etc. 
But in order to go deep on optimization, we first need to get compelling 
evidence that we *should* invest in a product like this, on an external 
platform that we don't manage, longterm – because making that investment would 
require getting more resources (i.e., an actual feature development team, not 
just part-time R), which would be a nontrivial change to our annual plan. As 
the WMF staffer who would be making the case for that investment internally and 
to the community: I think while there are definitely more things we can use the 
plugin to learn, and it's always possible that usage of ChatGPT and the plugin 
may take off wildly, I don't personally feel comfortable making that 
recommendation at this moment. (All that said, we may be able to get some more 
ad hoc R resources to make some optimization tweaks as we try to learn more – 
that's what I'm currently aiming for, so stay tuned!)

Please let me know if you have more questions or want to talk about plugin 
specifics, on-list, via email, or onwiki (the project is on Meta here: 
https://meta.wikimedia.org/wiki/Talk:Future_Audiences)!

Best,
Maryana

1. 
https://meta.wikimedia.org/wiki/Future_Audiences/Experiments:_conversational/generative_AI#Preliminary_results
  
2. 
https://diff.wikimedia.org/2023/07/13/exploring-paths-for-the-future-of-free-knowledge-new-wikipedia-chatgpt-plugin-leveraging-rich-media-social-apps-and-other-experiments/
___
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IPOVMFNRDSARHDFWWZJQWLIQ3FRU6CNW/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Attribution of specific Wikipedia articles as sources of a LLM's output (Was: Bing-ChatGPT)

2023-09-10 Thread Lauren Worden
Hi Tilman, I appreciate your detailed and thoughtful response.

Are you suggesting that we may never be able to attribute information
from LLMs better than as in the Anthropic paper? What is your opinion
of different approaches such as https://arxiv.org/abs/2303.14186 ?

Relatedly, I wonder whether you agree that the first paragraph of the
Background and Related Work section on page 2 of
https://arxiv.org/pdf/2205.10770.pdf and
https://bair.berkeley.edu/blog/2020/12/20/lmmem/ suggest that LLMs
encode the rote memorization of their training data? If so, do you
believe that has any implications concerning the feasibility of
attribution? And perhaps more to your point for reviving this thread,
what about copyright law does it imply to you?

> we can't enforce citing sources - [[WP:BURDEN]] - as a legal requirement like 
> we do for [[WP:COPYPASTE]]). This should be kept in mind by folks who 
> advocate for a moral or even legal obligation for LLMs to "cite their 
> sources'' for their output (like earlier in this thread: "just require an 
> open disclosure of where the bot pulled from and how much").

Do you believe that just because we can't force people to do what we
want, that we shouldn't ask them to, or bargain with options available
to us if they decline?

As for the Foundation's ChatGPT plugin, I'm afraid I find it mostly
unusable because it ignores everything after the first dozen
paragraphs of all articles. That was listed as needing 3-4 days to fix
on https://phabricator.wikimedia.org/T343932 a month ago. Do you know
whether there are any plans to go ahead with that fix?

- LW



On Thu, Sep 7, 2023 at 2:09 AM Tilman Bayer  wrote:
>
> TL;DR: It was previously claimed on this list that it's generally technically 
> possible to attribute information in the output of a LLM-based chatbot (such 
> as ChatGPT) to specific parts of the LLM's training data (such as a Wikipedia 
> article). These claims are dubious and we shouldn't rely on them as we 
> continue to navigate the relations between Wikimedia projects and LLMs.
>
> On Sun, Mar 19, 2023 at 12:12 PM Lauren Worden  
> wrote:
> [...]
>>
>>
>> On Sun, Mar 19, 2023 at 1:20 AM Kimmo Virtanen
>>  wrote:
>> >
>> >> Or, maybe just require an open disclosure of where the bot pulled from 
>> >> and how much, instead of having it be a black box? "Text in this response 
>> >> derived from: 17% Wikipedia article 'Example', 12% Wikipedia article 
>> >> 'SomeOtherThing', 10%...".
>> >
>> > Current (ie. ChatGPT) systems doesn't work that way, as the source of 
>> > information is lost in the process when the information is encoded into 
>> > the model
>>
>> In fact, they do work that way, but it takes some effort to elucidate
>> the source of any given output. Anyone discussing these issues needs
>> to become familiar with ROME:
>> https://twitter.com/mengk20/status/1588581237345595394 Please see also
>> https://www.youtube.com/watch?v=_NMQyOu2HTo
>>
> I sense some confusion here. That paper (ROME, http://rome.baulab.info/ ) is 
> about attributing a model's factual claims to specific parts (weights, 
> neurons) of its neural network (and then changing them). It is *not* about 
> attribution to specific parts of its training data (such as Wikipedia 
> articles or other web pages), which is what Wikimedians have been expressing 
> concerns about.
> In other words, it's entirely unclear why this should contradict what Kimmo 
> had said (and, separately in this thread, Galder).
>
> (Trying to understand LLMs with analogies can be treacherous. But for people 
> who automatically assume that neural networks "do work that way" - i.e. 
> preserve this kind of provenance information - and that chatbots can be 
> required to disclose "where [they] pulled from and how much" for a particular 
> answer: Imagine someone accosting you in the street and asking you where you 
> had originally learned that Paris is the capital of France, say. How many of 
> us would be able to come up with a truthful answer like "our geography 
> teacher told us in third grade" or "I read this in Encyclopaedia Britannica 
> when I was 10 years old"?)
>
>> With luck we will all have the chance to discuss these issues in
>> detail on the March 23 Zoom discussion of large language models for
>> Wikimedia projects:
>> https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends#Open_call:_Artificial_Intelligence_in_Wikimedia
>>
> The notes from that meeting (now at 
> https://meta.wikimedia.org/wiki/Wikimedia_Foundation_Annual_Plan/2023-2024/Draft/External_Trends/Community_call_notes
>  ) contain the following statements:
>
>
> "In an ideal world, the Foundation would start internal projects to replicate 
> ROME and RARR."
> "The Foundation should make a public statement in support of increasing the 
> accuracy of attribution and verification systems such as RARR [ 
> https://arxiv.org/abs/2210.08726 ]"
>
>
> These proposals do not seem to have made it into