Wikipedia Seems Pretty Worried About AI

tony-vlcek 3 days ago

If the bottom line are donations - as the article states - why push for getting AI companies to link people to Wikipedia instead of pushing for the companies to donate?

flohofwoe 3 days ago

Because many small donations from individuals are better than few big ones from corporations for the independence of Wikipedia? Eggs vs baskets etc...
- noir_lord 3 days ago
  
  Case in point: Mozilla.
  I love Firefox, I don't love Mozilla - I've no way to donate specifically to Firefox.

walterbell 3 days ago

Why do AI bots scrape Wikipedia pages instead of downloading the published full database?

fzeroracer 3 days ago

The rationale I've seen elsewhere is that it saves money. It means you don't need to go to the effort of downloading, storing and updating your copy of the database. You can offload all of the externalities onto whatever site you're scraping.
- danielbln 2 days ago
  
  Man, these companies have bazillions in funding and they can't keep some $100 DB in a closet for that. Smh
  - solarkraft 2 days ago
    
    They could. There’s just no upside in doing so.
    
    walterbell 13 hours ago
    
    If they destroy the relatively high-trust internet, the low-trust replacement will require digital ID for every client, with non-neutral traffic price varying by {business digital ID, content}. No more free geese, even to check whether there is a golden goose worthy of payment.
    https://utcc.utoronto.ca/~cks/space/blog/web/WeShouldBlockFo...
nness 3 days ago

My guess is that the scraping tools are specialized for web, and creating per-application interfaces isn't cost effective (although you could argue that scraping Wikipedia effectively is definitely worth the effort, but given its all text context with a robust taxonomy/hierarchy, it might be non-issue.)
My other thought is that you don't want a link showing you scraped anything... and faking browser traffic might draw less attention.
twosdai a day ago

It's possible that they don't know. I literally didn't know there was a full downloadable db until right now.
- walterbell 13 hours ago
  
  Even on offline phones! https://kiwix.org
ectospheno 3 days ago

Money. One requires you to use your hardware and your developers. The other way doesn’t.
NoPicklez 2 days ago

Because that would probably require extra work, why do that if it already crawls and scrapes it in the first place
jjtheblunt 2 days ago

i tried doing that in summer 2019, and the downloaded formats were at that time proprietary and depended on decoders which were like a tail recursive rabbit hole.
in contrast, letting their servers render the content with their proprietary tools yields the sought data, so scraping might be a pragmatic choice still.
SideburnsOfDoom 3 days ago

Sheer laziness?

nkotov 3 days ago

Seems related to another article [1] I've seen recently where a lot of e-commerce traffic is also mostly bots.

[1] https://joindatacops.com/resources/how-73-of-your-e-commerce...

janwl 3 days ago

https://archive.is/XGrVL

ChrisArchitect 3 days ago

[dupe] https://news.ycombinator.com/item?id=45651485

pflenker 2 days ago

They should be. Articles have been gotten longer and longer over time, getting an AI summary instead is the logical consequence.

moritzwarhier 2 days ago

Wikipedia is not a company.
They should mainly be worried about their reliability and trustworthiness. They should not worry about article length, as long as it's from exhaustiveness and important content is still accessible.
Serving perfectly digestible bits of information optimized for being easy to read must not be the primary goal of an encyclopedia.
By the way, "AI summaries" routinely contain misrepresentations, misleading sentences or just plain wrong information.
Wikipedia is (rightly) worried about AI slop.
The reason is that LLMs cannot "create" reliable information about the factual world, and they can also only evaluate information based on what "sounds plausible" (or matches the training priorities).
You can get an AI summary with one of the 100 buttons for this that are built into every consumer-facing product, including common OS GUIs and Web browsers.
Or "ask ChatGPT" for one.

NikitaFilonov 3 days ago

[dead]