It seems a lot of people havent heard of it, but I think its worth plugging https://perma.cc/ which is really the appropriate tool for something like Wikipedia to be using to archive pages.
It costs money beyond 10 links, which means either a paid subscription or institutional affiliation. This is problematic for an encyclopedia anyone can edit, like Wikipedia.
Wikimedia could pay, they have an endowment of ~$144M [1] (as of June 30, 2024). Perma.cc has Archive.org and Cloudflare as supporting partners, and their mission is aligned with Wikimedia [2]. It is a natural complementary fit in the preservation ecosystem. You have to pay for DOIs too, for comparison [3] (starting at $275/year and $1/identifier [4] [5]).
With all of this context shared, the Internet Archive is likely meeting this need without issue, to the best of my knowledge.
[2] https://perma.cc/about ("Perma.cc was built by Harvard’s Library Innovation Lab and is backed by the power of libraries. We’re both in the forever business: libraries already look after physical and digital materials — now we can do the same for links.")
Does Wikipedia really need to outsource this? They already do basically everything in-house, even running their own CDN on bare metal, I'm sure they could spin up an archiver which could be implicitly trusted. Bypassing paywalls would be playing with fire though.
Hypothetically, any document, article, work, or object could be uniquely identified by an appropriate URI or URN, but in practice, http URLs are how editors cite external resources.
The URLs proved to be less permanent than expected, and so the issue of "linkrot" was addressed, mostly at the Internet Archive, and then through wherever else could bypass paywalls and stash the content.
All content hosted by the WMF project wikis is licensed Creative Commons or compatible licenses, with narrow exceptions for limited, well-documented Fair Use content.
Yeah for historical links it makes sense to fall back on IAs existing archives, but going forward Wikipedia could take their own snapshots of cited pages and substitute them in if/when the original rots. It would be more reliable than hoping IA grabbed it.
I don't see the point in doxing anyone, especially those providing a useful service for the average internet user. Just because you can put some info together, it doesn't mean you should.
With this said, I also disagree with turning everyone that uses archive[.]today into a botnet that DDoS sites. Changing the content of archived pages also raises questions about the authenticity of what we're reading.
The site behaves as if it was infected by some malware and the archived pages can't be trusted. I can see why Wikipedia made this decision.
It's also kind of ironic that a site whose whole premise is to preserve sites forever whether the people involved like it or not, is seeking to take down another site because they don't like it. Live by the sword, etc.
https://news.ycombinator.com/item?id=46624740 has the earliest writeup that I know of. It was running it via a script and intentionally using cache busting techniques to try to increase load on the hosted wordpress infrastructure.
Ah good to know. My pi-hole actually was blocking the blog itself since the ublock site list made its way into one of the blocklists I use. But I've been just avoiding links as much as possible because I didn't want to contribute.
>> an analysis of existing links has shown that most of its uses can be replaced.
>Oh? Do tell!
They do. In the very next paragraph in fact:
The guidance says editors can remove Archive.today links when the original
source is still online and has identical content; replace the archive link so
it points to a different archive site, like the Internet Archive,
Ghostarchive, or Megalodon; or “change the original source to something that
doesn’t need an archive (e.g., a source that was printed on paper)
>In emails sent to Patokallio after the DDoS began, “Nora” from Archive.today threatened to create a public association between Patokallio’s name and AI porn and to create a gay dating app with Patokallio’s name.
Oh good. That's definitely a reasonable thing to do or think.
The raw sociopathy of some people. Getting doxxed isn't good, but this response is unhinged.
That was private negotiations, btw, not public statements.
In response to J.P's blog already framed AT as project grown from a carding forum + pushed his speculations onto ArsTechnica, whose parent company just destroyed 12ft and is on to a new victim. The story is full of untold conflicts of interests covered with soap opera around DDoS.
The fight is not about where it is shown and not about what, not about "links in Wikipedia", but about whether News Inc will be able to kill AT, as they did with 12FT.
They are owner of ArsTechnica which wrote 3rd (or 4th?) article on AT in a row painting it in certain colors.
The FBI article that pulled J.P's speculations out of the closet was also in ArsTechnica and by the same author, and that same article explicitly mentioned how they are happy with 12ft down
It's a reminder how fragile and tenuous are the connections between our browser/client outlays, our societal perceptions of online norms, and our laws.
We live at a moment where it's trivially easy to frame possession of an unsavory (or even illegal) number on another person's storage media, without that person even realizing (and possibly, with some WebRTC craftiness and social engineering, even get them to pass on the taboo payload to others).
The operators() of archive.today (and the other domains) are doing shadey things and the links are not working so why keep the site around as for example Internet archives waybackmachine works as alternative to it.
> Fact is, archives are essential to WP integrity and there's no credible alternative to this one.
Yes, they are essentional, and that was the main reason for not blacklisting Archive.today. But Archive.today has shown they do not actually provide such a service:
> “If this is true it essentially forces our hand, archive.today would have to go,” another editor replied. “The argument for allowing it has been verifiability, but that of course rests upon the fact the archives are accurate, and the counter to people saying the website cannot be trusted for that has been that there is no record of archived websites themselves being tampered with. If that is no longer the case then the stated reason for the website being reliable for accurate snapshots of sources would no longer be valid.”
How can you trust that the page that Archive.today serves you is an actual archive at this point?
Anyone has a short summary as to who and why Archive.today acted via DDos? Isn't that something done by malicious actors? Or did others misuse Archive.today?
Why not show both? Wikipedia could display archive links alongside original sources, clearly labeled so readers know which is which. This preserves access when originals disappear while keeping the primary source as the main reference.
It seems a lot of people havent heard of it, but I think its worth plugging https://perma.cc/ which is really the appropriate tool for something like Wikipedia to be using to archive pages.
mroe https://en.wikipedia.org/wiki/Perma.cc
It costs money beyond 10 links, which means either a paid subscription or institutional affiliation. This is problematic for an encyclopedia anyone can edit, like Wikipedia.
Wikimedia could pay, they have an endowment of ~$144M [1] (as of June 30, 2024). Perma.cc has Archive.org and Cloudflare as supporting partners, and their mission is aligned with Wikimedia [2]. It is a natural complementary fit in the preservation ecosystem. You have to pay for DOIs too, for comparison [3] (starting at $275/year and $1/identifier [4] [5]).
With all of this context shared, the Internet Archive is likely meeting this need without issue, to the best of my knowledge.
[1] https://meta.wikimedia.org/wiki/Wikimedia_Endowment
[2] https://perma.cc/about ("Perma.cc was built by Harvard’s Library Innovation Lab and is backed by the power of libraries. We’re both in the forever business: libraries already look after physical and digital materials — now we can do the same for links.")
[3] https://community.crossref.org/t/how-to-get-doi-for-our-jour...
[4] https://www.crossref.org/fees/#annual-membership-fees
[5] https://www.crossref.org/fees/#content-registration-fees
(no affiliation with any entity in scope for this thread)
There are dozen of commercial/enterprise solutions: https://www.g2.com/products/pagefreezer/competitors/alternat...
also the oldest of that kind and rarely mention free https://www.freezepage.com
Does Wikipedia really need to outsource this? They already do basically everything in-house, even running their own CDN on bare metal, I'm sure they could spin up an archiver which could be implicitly trusted. Bypassing paywalls would be playing with fire though.
Hypothetically, any document, article, work, or object could be uniquely identified by an appropriate URI or URN, but in practice, http URLs are how editors cite external resources.
The URLs proved to be less permanent than expected, and so the issue of "linkrot" was addressed, mostly at the Internet Archive, and then through wherever else could bypass paywalls and stash the content.
All content hosted by the WMF project wikis is licensed Creative Commons or compatible licenses, with narrow exceptions for limited, well-documented Fair Use content.
Archive.org is the archiver, rotted links are replaced by Archive.org links with a bot.
https://meta.wikimedia.org/wiki/InternetArchiveBot
https://github.com/internetarchive/internetarchivebot
Yeah for historical links it makes sense to fall back on IAs existing archives, but going forward Wikipedia could take their own snapshots of cited pages and substitute them in if/when the original rots. It would be more reliable than hoping IA grabbed it.
Not opposed, Wikimedia tech folks are very accessible in my experience, ask them to make a call to https://web.archive.org/save whenever a link is added via the Wiki editing mechanism. Easy peasy. Example CLI tools are https://github.com/palewire/savepagenow and https://github.com/akamhy/waybackpy
I don't see the point in doxing anyone, especially those providing a useful service for the average internet user. Just because you can put some info together, it doesn't mean you should.
With this said, I also disagree with turning everyone that uses archive[.]today into a botnet that DDoS sites. Changing the content of archived pages also raises questions about the authenticity of what we're reading.
The site behaves as if it was infected by some malware and the archived pages can't be trusted. I can see why Wikipedia made this decision.
It's also kind of ironic that a site whose whole premise is to preserve sites forever whether the people involved like it or not, is seeking to take down another site because they don't like it. Live by the sword, etc.
Did they actually run the DDoS via a script or was this a case of inserting a link and many users clicked it? They are substantially different IMO
https://news.ycombinator.com/item?id=46624740 has the earliest writeup that I know of. It was running it via a script and intentionally using cache busting techniques to try to increase load on the hosted wordpress infrastructure.
> It was running
It still is, uBlock is killing the script now but if it's allowed to load then it still tries to hammer the other blog.
Ah good to know. My pi-hole actually was blocking the blog itself since the ublock site list made its way into one of the blocklists I use. But I've been just avoiding links as much as possible because I didn't want to contribute.
Thank you this is exactly the information I was looking for.
"You found the smoking gun!"
they silently ran the DDoS script on their captcha page (which is frequently shown to visitors, even when simply viewing and not archiving a new page)
> Changing the content of archived pages also raises questions about the authenticity of what we're reading.
This is absolutely the buried lede of this whole saga, and needs to be the focus of conversation in the coming age.
At this point Archive.today provides a better service (all things considered) compared to Wikipedia, at least when it comes to current affairs.
Previously Related:
Archive.today is directing a DDoS attack against my blog?
https://news.ycombinator.com/item?id=46843805
"Non-paywalled" "ad-free" link to archive: https://en.wikipedia.org/wiki/Wikipedia:Requests_for_comment...
> an analysis of existing links has shown that most of its uses can be replaced.
Oh? Do tell!
I would be suprised if archive.today had something that was not in the wayback machine
Wayback machine removes archives upon request, so there’s definitely stuff they don’t make publicly available (they may still have it).
Accounts to bypass paywalls? The audacity to do it?
Oh yeah those where a thing. As a public organization they can't really do that.
I personally just don't use websites that paywall important information.
>> an analysis of existing links has shown that most of its uses can be replaced.
>Oh? Do tell!
They do. In the very next paragraph in fact:
Well, that's an odd idea of "can be replaced".
> editors can remove Archive.today links when the original source is still online and has identical content
Hopeless. Just begs for alteration.
> a different archive site, like the Internet Archive,
Hopeless. It allows archive tampering by the page's own JS and archive deletion by the domain owner.
> Ghostarchive, or Megalodon
Hopeless. Coverage is insignificant.
> archive.today
Hopeless. Caught tampering the archive.
The whole situation is not great.
I just quoted the very next paragraph after the sentence you quoted and asked for clarification.
I did so. You're welcome.
As for the rest, take it up with Jimmy Wiles, not me.
>In emails sent to Patokallio after the DDoS began, “Nora” from Archive.today threatened to create a public association between Patokallio’s name and AI porn and to create a gay dating app with Patokallio’s name.
Oh good. That's definitely a reasonable thing to do or think.
The raw sociopathy of some people. Getting doxxed isn't good, but this response is unhinged.
That was private negotiations, btw, not public statements.
In response to J.P's blog already framed AT as project grown from a carding forum + pushed his speculations onto ArsTechnica, whose parent company just destroyed 12ft and is on to a new victim. The story is full of untold conflicts of interests covered with soap opera around DDoS.
Can you elaborate on your point?
The fight is not about where it is shown and not about what, not about "links in Wikipedia", but about whether News Inc will be able to kill AT, as they did with 12FT.
What is News Inc? Are they a funder of Wikipedia(I think Wikipedia didn’t have a parent company so they’re not owners)?
They are owner of ArsTechnica which wrote 3rd (or 4th?) article on AT in a row painting it in certain colors.
The FBI article that pulled J.P's speculations out of the closet was also in ArsTechnica and by the same author, and that same article explicitly mentioned how they are happy with 12ft down
It's a reminder how fragile and tenuous are the connections between our browser/client outlays, our societal perceptions of online norms, and our laws.
We live at a moment where it's trivially easy to frame possession of an unsavory (or even illegal) number on another person's storage media, without that person even realizing (and possibly, with some WebRTC craftiness and social engineering, even get them to pass on the taboo payload to others).
I will no longer donate to Wikipedia as long as this is policy.
Why? The decision seems reasonable at first sight.
Second sight is advisable in such cases. Fact is, archives are essential to WP integrity and there's no credible alternative to this one.
I see WP is not proposing to run its own.
Wouldn't it be precisely because archives are important that using something known to modify the contents would be avoided?
> something known to modify the contents would be avoided?
Like Wikipedia?
Obviously not, since archive.org is encouraged.
The operators() of archive.today (and the other domains) are doing shadey things and the links are not working so why keep the site around as for example Internet archives waybackmachine works as alternative to it.
[delayed]
> Fact is, archives are essential to WP integrity and there's no credible alternative to this one.
Yes, they are essentional, and that was the main reason for not blacklisting Archive.today. But Archive.today has shown they do not actually provide such a service:
> “If this is true it essentially forces our hand, archive.today would have to go,” another editor replied. “The argument for allowing it has been verifiability, but that of course rests upon the fact the archives are accurate, and the counter to people saying the website cannot be trusted for that has been that there is no record of archived websites themselves being tampered with. If that is no longer the case then the stated reason for the website being reliable for accurate snapshots of sources would no longer be valid.”
How can you trust that the page that Archive.today serves you is an actual archive at this point?
About how much had you previously donated over the years?
Anyone has a short summary as to who and why Archive.today acted via DDos? Isn't that something done by malicious actors? Or did others misuse Archive.today?
If you read the linked article it is discussed
Why not show both? Wikipedia could display archive links alongside original sources, clearly labeled so readers know which is which. This preserves access when originals disappear while keeping the primary source as the main reference.
The objection is to this specific archieve service not archiving in general.