[Wikimedia-l] Re: Chat GPT

Adam Sobieski Sat, 04 Feb 2023 06:24:23 -0800

Brainstorming on how to drive traffic to Wikimedia content from conversational 
media, UI/UX designers could provide menu items or buttons on chatbots' 
applications or webpage components (e.g., to read more about the content, to 
navigate to cited resources, to edit the content, to discuss the content, to 
upvote/downvote the content, to share the content or the recent dialogue 
history on social media, to request review/moderation/curation for the content, 
etc.). Many of these envisioned menu items or buttons would operate 
contextually during dialogues, upon the most recent (or otherwise selected) 
responses provided by the chatbot or upon the recent transcripts. Some of these 
features could also be made available to end-users via spoken-language commands.

At any point during hypertext-based dialogues, end-users would be able to 
navigate to Wikimedia content. These navigations could utilize either URL query 
string arguments or HTTP POST. In either case, bulk usage data, e.g., those 
dialogue contexts navigated from, could be useful.

The capability to perform A/B testing across chatbots’ dialogues, over large 
populations of end-users, could also be useful. In this way, Wikimedia would be 
better able to: (1) measure end-user engagement and satisfaction, (2) measure 
the quality of provided content, (3) perform personalization, (4) retain 
readers and editors. A/B testing could be performed by providing end-users with 
various feedback buttons (as described above). A/B testing data could also be 
obtained through data mining, analyzing end-users’ behaviors, response times, 
responses, and dialogue moves. These data could be provided for the community 
at special pages and could be made available per article, possibly by enhancing 
the “Page information” system. One can also envision these kinds of analytics 
data existing at the granularity of portions of, or selections of, articles.

Best regards,

Adam

________________________________
From: Victoria Coleman <vstavridoucole...@gmail.com>
Sent: Saturday, February 4, 2023 8:10 AM
To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT

Hi Christophe,

I had not thought about the threat to Wikipedia traffic from Chat GPT but you 
have a good point. The success of the projects is always one step away from the 
next big disruption. So the WMF as the tech provider for the mission (because 
first and foremost in my view that?s what the WMF is - as well as the financial 
engine of the movement of course) needs to pay attention and experiment to 
maintain the long term viability of the mission. In fact I think the cluster of 
our projects offers compelling options. For example to your point below on data 
sets, we have the amazing Wikidata as well the excellent work on abstract 
Wikipedia. We have Wikipedia Enterprise which has built some avenues of 
collaboration with big tech. A bold vision is needed to bring all of it 
together and build an MVP for the community to experiment with.

Best regards,

Victoria Coleman

On Feb 4, 2023, at 4:14 AM, Christophe Henner <christophe.hen...@gmail.com> 
wrote:

?Hi,

On the product side, NLP based AI biggest concern to me is that it would 
drastically decrease traffic to our websites/apps. Which means less new editors 
ans less donations.

So first from a strictly positioning perspective, we have here a major change 
that needs to be managed.

And to be honest, it will come faster than we think. We are perfectionists, I 
can assure you, most companies would be happy to launch a search product with a 
80% confidence in answers quality.

>From a financial perspective, large industrial investment like this are 
>usually a pool of money you can draw from in x years. You can expect they did 
>not draw all of it yet.

Second, GPT 3 and ChatGPT are far from being the most expensive products they 
have. On top of people you need:
* datasets
* people to tag the dataset
* people to correct the algo
* computing power

I simplify here, but we already have the capacity to muster some of that, which 
drastically lowers our costs :)

I would not discard the option of the movement doing it so easily. That being 
said, it would mean a new project with the need of substantial ressources.

Sent from my iPhone

On Feb 4, 2023, at 9:30 AM, Adam Sobieski <adamsobie...@hotmail.com> wrote:

?
With respect to cloud computing costs, these being a significant component of 
the costs to train and operate modern AI systems, as a non-profit organization, 
the Wikimedia Foundation might be interested in the National Research Cloud 
(NRC) policy proposal: https://hai.stanford.edu/policy/national-research-cloud .

"Artificial intelligence requires vast amounts of computing power, data, and 
expertise to train and deploy the massive machine learning models behind the 
most advanced research. But access is increasingly out of reach for most 
colleges and universities. A National Research Cloud (NRC) would provide 
academic and non-profit researchers with the compute power and government 
datasets needed for education and research. By democratizing access and equity 
for all colleges and universities, an NRC has the potential not only to unleash 
a string of advancements in AI, but to help ensure the U.S. maintains its 
leadership and competitiveness on the global stage.

"Throughout 2020, Stanford HAI led efforts with 22 top computer science 
universities along with a bipartisan, bicameral group of lawmakers proposing 
legislation to bring the NRC to fruition. On January 1, 2021, the U.S. Congress 
authorized the National AI Research Resource Task Force Act as part of the 
National Defense Authorization Act for Fiscal Year 2021. This law requires that 
a federal task force be established to study and provide an implementation 
pathway to create world-class computational resources and robust government 
datasets for researchers across the country in the form of a National Research 
Cloud. The task force will issue a final report to the President and Congress 
next year.

"The promise of an NRC is to democratize AI research, education, and 
innovation, making it accessible to all colleges and universities across the 
country. Without a National Research Cloud, all but the most elite universities 
risk losing the ability to conduct meaningful AI research and to adequately 
educate the next generation of AI researchers."

See also: [1][2]

[1] 
https://www.whitehouse.gov/ostp/news-updates/2023/01/24/national-artificial-intelligence-research-resource-task-force-releases-final-report/
[2] https://www.ai.gov/wp-content/uploads/2023/01/NAIRR-TF-Final-Report-2023.pdf

________________________________
From: Steven Walling <steven.wall...@gmail.com>
Sent: Saturday, February 4, 2023 1:59 AM
To: Wikimedia Mailing List <wikimedia-l@lists.wikimedia.org>
Subject: [Wikimedia-l] Re: Chat GPT

On Fri, Feb 3, 2023 at 9:47 PM Gerg? Tisza 
<gti...@gmail.com<mailto:gti...@gmail.com>> wrote:
Just to give a sense of scale: OpenAI started with a $1 billion donation, got 
another $1B as investment, and is now getting a larger investment from 
Microsoft (undisclosed but rumored to be $10B). Assuming they spent most of 
their previous funding, which seems likely, their operational costs are in the 
ballpark of $300 million per year. The idea that the WMF could just choose to 
create conversational software of a similar quality if it wanted seems detached 
from reality to me.

Without spending billions on LLM development to aim for a conversational 
chatbot trying to pass a Turing test, we could definitely try to catch up to 
the state of the art in search results. Our search currently does a pretty bad 
job (in terms of recall especially). Today's featured article in English is the 
Hot Chip album "Made in the Dark", and if I enter anything but the exact 
article title the typeahead results are woefully incomplete or wrong. If I ask 
an actual question, good luck.

Google is feeling vulnerable to OpenAI here in part because everyone can see 
that their results are often full of low quality junk created for SEO, while 
ChatGPT just gives a concise answer right there.

https://en.wikipedia.org/wiki/The_Menu_(2022_film) is one of the top viewed 
English articles. If I search "The Menu reviews" the Google results are noisy 
and not so great. ChatGPT actually gives you nothing relevant because it 
doesn't know anything from 2022. If we could just manage to display the three 
sentence snippet of our article about the critical response section of the 
article, it would be awesome. It's too bad that the whole "knowledge engine" 
debacle poisoned the well when it comes to a Wikipedia search engine, because 
we could definitely do a lot to learn from what people like about ChatGPT and 
apply to Wikipedia search.

_______________________________________________
Wikimedia-l mailing list -- 
wikimedia-l@lists.wikimedia.org<mailto:wikimedia-l@lists.wikimedia.org>, 
guidelines at: https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/6OBPB7WNHKJQXXIBCK73SDXLE3DMGNMY/
To unsubscribe send an email to 
wikimedia-l-le...@lists.wikimedia.org<mailto:wikimedia-l-le...@lists.wikimedia.org>
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/SIAPXQCG4ZKE46KS4PS6PQQMYQRSDNR5/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/Q7BZ5M4MR5EIV3EJ2OS7NH3VREADLUI2/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/BMGLWIDD6MRBADEJSGRJE7FI6YTLHBUT/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/IQ6XWOCBBIWLO23GD2RFQ4YTTGKYJKAB/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Chat GPT

Reply via email to