[Wikimedia-l] Re: Bing-ChatGPT

Anders Wennersten Wed, 22 Feb 2023 08:55:13 -0800

I got the impression from the tech editor who I read the article from,that there is a big difference in how ChatGPT is used together withBing. Jimmy Wales here describes my own experience using only ChatGPT,if you ask "who is NN#, you get unusable rubbish back.

But when the techeditor asked Bing-ChatGPT "who is Linus Larsson" (hisname) he got very good result, that only exists in the article of him onswwp (no article of him exists on enwp). I can not interpret that inother way then that this version looked up Wikipedia, when asked


But I am am not a tech wizard so can be wrong

Anders

https://www.dn.se/kultur/linus-larsson-microsofts-ai-gjorde-slut-med-mig-pa-alla-hjartans-dag/

(the article in Swedish, heading says "Microsoft AI ended our relationon Valentin Day")

I also like the Ai is insulting, stating as an answer "are you a foolor only stupid?" It seems to need to get trained on our UCoC



Den 2023-02-22 kl. 17:32, skrev Sage Ross:

Luis,

OpenAI researchers have released some info about data sources that
trained GPT-3 (and hence ChatGPT): https://arxiv.org/abs/2005.14165

See section 2.2, starting on page 8 of the PDF.

The full text of English Wikipedia is one of five sources, the others
being CommonCrawl, a smaller subset of scraped websites based on
upvoted reddit links, and two unrevealed datasets of scanned books.
(I've read speculation that one of these datasets is basically the
Library Genesis archive.) Wikipedia is much smaller than the other
datasets, although they did weight it somewhat more heavily than any
other dataset. With the extra weighting, they say Wikipedia accounts
for 3% of the total training.

-Sage

On Wed, Feb 22, 2023 at 8:19 AM Luis (lu.is) <l...@lu.is> wrote:

Anders, do you have a citation for “use Wikipedia content considerably”?

Lots of early-ish ML work was heavily dependent on Wikipedia, but state-of-the-art 
Large Language Models are trained on vast quantities of text, of which Wikipedia 
is only a small part. ChatGPT does not share their data sources (as far as I know) 
but the Eleuther.ai project released their Pile a few years back, and that already 
had Wikipedia as < 5% of the text data; I think it is safe to assume that the 
percentage is smaller for newer models:  https://arxiv.org/abs/2101.00027

Techniques to improve reliability of LLM output may rely more heavily on 
Wikipedia. For example, Facebook uses Wikipedia rather heavily in this 
*research paper*: https://arxiv.org/abs/2208.03299 But I have seen no evidence 
that techniques like that are in use by OpenAI, or that they’re specifically 
trained on Wikipedia. If you’ve seen discussion of that, or evidence from 
output suggesting it, that’d be interesting and important!

Social: @luis_in_br...@social.coop
ML news: openml.fyi
On Feb 20, 2023 at 1:52 AM -0800, Anders Wennersten <m...@anderswennersten.se>, 
wrote:

BIng with ChatGPT is now released by Micrsoft.

And from what I understand they use Wikipedia content considerably. If
you ask Who is A B and A B is not widely known, the result is more or
less identical to the content from the Wikipedia article (but worse, as
it "makes up" facts that is incorrect).

In a way I am glad to see Wikipedia is fully relevant even in this
emerging AI-driven search world. But Google search has ben careful to
always have a link to Wikipedia besides their made up summary of facts,
which here it is missing (yet?). And for licences, they are all ignored.

So if this is the future the number of  accesses from users to Wikipedia
will collapse, and also their willingness to donate... (but our content
still a cornerstone for knowledge)

Anders

(I got a lot of fact from an article in Swedish main newspaper by their
tech editor. He started asking fact of himself, and when he received
facts from his Wp article plus being credited to a book he had noting to
do with, he started to try to tell/learn ChatGPT of this error. The
chatPGT only got angry accusing the techeditor for lying and in the end
cut off the conversation, as ChatGPT continued to teat the techeditor as
lyer and vandal..).
_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/GJJNX2Y7BX5RZYGAIYTUI6O6CSBN72EH/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/FPPZSZUIG4SKGIWGX57O4K7MGJO4CSI3/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/ET3W3Q2IXZAAJHLDXCMDIU3FELLLMWXX/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

_______________________________________________
Wikimedia-l mailing list -- wikimedia-l@lists.wikimedia.org, guidelines at: 
https://meta.wikimedia.org/wiki/Mailing_lists/Guidelines and 
https://meta.wikimedia.org/wiki/Wikimedia-l
Public archives at 
https://lists.wikimedia.org/hyperkitty/list/wikimedia-l@lists.wikimedia.org/message/T6TKRDT5VK37S7ZO5NX454GJ6ZKWMUI5/
To unsubscribe send an email to wikimedia-l-le...@lists.wikimedia.org

[Wikimedia-l] Re: Bing-ChatGPT

Reply via email to