Archive.today: inside the web archiving service

When a web page disappears from the internet—deleted by its author, censored by a government or simply lost to time—one service has made it its mission to preserve those digital artefacts permanently. That service is archive.today, and its story reveals as much about the tensions of the modern internet as it does about the fragility of online information.

What is archive.today?

Archive.today (formerly known as archive.is) is an on-demand web archiving service that saves snapshots of web pages. Founded on May 16, 2012, the service captures websites exactly as they appear at a specific moment, preserving them permanently for future reference. Unlike automated crawlers that continuously scan the web, archive.today creates archives only when users explicitly request them.

The service captures two versions of each archived page: a functional web page with live, clickable links and a static screenshot image. By 2021, archive.today had archived about 500 million pages, storing roughly 700 terabytes of data.

Technical capabilities

Archive.today excels at archiving JavaScript-heavy sites that other services struggle with, including Google Maps, X (formerly Twitter) and other dynamic Web 2.0 applications. The service supports pages containing hash-bang fragments (#!), which were once common on single-page applications.

The maximum page size that can be archived is 50 megabytes, including images. Archive.today captures text and images but excludes videos (except from certain sites such as X), XML files, RTF documents, spreadsheets and other non-static content. Pages are captured at a fixed browser width of 1,024 pixels.

The service converts external CSS to inline styling and removes responsive design elements. JavaScript-generated content appears in a frozen state. Since Nov. 29, 2019, archive.today has used Chromium (non-headless) for scraping, replacing the previous PhantomJS engine.

The many domains of archive.today

One of archive.today’s most notable features is its use of multiple domain names, primarily to circumvent censorship and internet service provider blocks. The service is accessible through numerous top-level domains:

archive.today (the primary gateway)
archive.is (the original name, deprecated since 2019)
archive.ph
archive.md
archive.li
archive.fo
archive.vn

The service also operates a Tor hidden service (onion address) for users requiring maximum privacy and censorship resistance: archiveiya74codqgiixo33q62qlrqtkgmcitqx5u2oeqnmn5bpcbiyd.onion

In January 2019, the operator announced via Twitter: “Please do not use archive.IS mirror for linking, use others mirrors [.TODAY .FO .LI .VN .MD .PH]. .IS might stop working soon.”

The archive.today domain serves as a gateway that automatically redirects users to one of the other domains based on load balancing and availability. The operator has requested that users always link to archive.today rather than specific mirrors, as this allows flexibility in redirecting traffic as needed.

The founder’s identity

The identity of archive.today’s creator remains unconfirmed. The domain archive.is was registered in May 2012 to “Denis Petrov” of Prague, Czech Republic. However, Denis Petrov is a common Russian name, and this may be a pseudonym.

The same contact information was used to register several questionable domains, including carding forums and piracy sites (all of which have since disappeared), many containing German keywords, suggesting a possible connection to German-speaking regions.

Investigative work by online sleuths has linked the service to an account called “Masha Rabinovich,” who claimed ownership of archive.is in a 2012 forum post. The LinkedIn profile associated with archive.today’s early operations showed a profile picture linked to a “Masha Rabinovich” in Berlin. “Masha” is a Russian diminutive of Maria (or can be a Hebrew form of Moses), and Rabinovich is an Ashkenazi Jewish surname.

Early GitHub captures on archive.today were linked to an account called “volth,” a fluent Russian speaker who contributed to NixOS (which archive.today uses). This account has since been completely deleted.

The service’s FAQ, unchanged since 2013, states that the operation is located in Europe and requests PayPal donations in euros. Based on the operator’s Tumblr blog, the operator’s English is excellent but shows occasional noun capitalization suggesting a German background. However, the operator also answers questions in Russian on the blog.

A 2023 investigative article concluded: “We have a pretty good idea of how the site is run: it’s a one-person labour of love, operated by a Russian of considerable talent and access to Europe.”

Funding model

The funding of archive.today remains opaque and has been a source of uncertainty throughout its existence. According to the operator’s FAQ, the site is “privately funded” with “no complex finances behind it.”

In October 2016, the site began accepting donations after previously refusing them (redirecting donation attempts to an animal shelter). A weekly crowdfunding target of $800 was set to maintain the site.

According to comments on the operator’s blog, as of 2021, advertisements and donations covered less than 20 per cent of operating expenses. Donations in 2021 totalled approximately €6,000, though another comment from 2017 mentioned receiving “more than $1.50 every day, enough for a bowl of phở.” Another comment suggested that on good days, advertisements “almost cover expenses,” creating some inconsistency in the available information.

Operating costs have escalated significantly:

In 2012: approximately €300 per month
In 2014: €2,000 per month
In 2016: $4,000 per month
In 2021: estimated costs for hosting 500 million pages (about 700 terabytes of data)

PayPal donations were discontinued around 2022 because the operator could no longer top up the account, implying they are located in Russia and subject to financial restrictions. The operator has complained about the difficulty of making cross-border payments “across the Iron Curtain.”

Current donation methods include Liberapay (a French non-profit) and BuyMeACoffee. Notably, cryptocurrency donations are not supported—the creator has expressed skepticism about crypto payments.

Advertisements appear on archived pages when accessed via mobile devices (but not desktop). Yahoo network ads have been injected into mobile views, though the exact revenue is unclear. The operator noted in 2021 that ads were a “test run” that would likely not stay permanently.

Infrastructure

According to earlier statements from the operator, the service runs on Apache Hadoop and Apache Accumulo, with all data stored on HDFS (Hadoop Distributed File System). Textual content is reportedly replicated three times across servers in two data centres, while images are duplicated twice. Both data centres are stated to be located in Europe, with OVH hosting confirmed for at least one location.

Archive.today faced a significant infrastructure challenge on March 10, 2021, when a major fire destroyed OVH’s SBG2 data centre in Strasbourg, France, and damaged the adjacent SBG1 facility. The fire broke out shortly after midnight (reported as 12:47 a.m.) and completely destroyed the five-storey SBG2 building. The incident caused widespread outages for OVH’s customers and likely impacted archive.today’s operations, though the service’s use of multiple data centres and redundancy helped ensure data preservation.

As of February 2021, with 500 million archived pages, the service was estimated to manage approximately 700 terabytes of data. For comparison, the Internet Archive manages over 100,000 terabytes.

The scraping operation uses automated browsers running through a botnet that cycles through numerous IP addresses to avoid detection and blocking by websites. The creator has openly acknowledged that computing power for running these browsers is now the main bottleneck for expanding the service.

Why archive.today exists: purpose and use cases

Archive.today fills several important niches in web archiving.

Bypassing robots.txt

Unlike many archiving services, archive.today does not respect robots.txt exclusion files. The service justifies this by stating it acts “as a direct agent of the human user” rather than as a search engine crawler. This policy ensures that once a page is archived, it cannot be removed by the website owner changing their robots.txt file, which has been a source of frustration with other archives such as the Wayback Machine.

Preserving dynamic content

Archive.today excels at capturing JavaScript-heavy sites that automated crawlers often miss or render incorrectly. This makes it particularly useful for archiving social media posts, interactive maps and modern web applications.

Paywall circumvention

One controversial but widely used feature is archive.today’s ability to bypass paywalls on news sites. When a user archives a paywalled article, the service can often capture the full content. This has made it popular among journalists, researchers and general readers who cannot afford multiple subscriptions. However, this raises legal and ethical questions about copyright and publisher revenue.

The service achieves this through various means, including using dedicated login accounts (which the operator solicits from users) for platforms such as Instagram, X, GitHub and Reddit. Publishers have struggled to block the service because it cycles through multiple IP addresses.

Preventing content disappearance

Unlike the Wayback Machine, which can remove content retroactively at a website owner’s request, archive.today has a strict no-deletion policy. Once archived, content remains permanently, with limited exceptions for law enforcement, child exploitation material and Digital Millennium Copyright Act (DMCA) requests processed through the operator’s blog.

This permanence makes the service valuable for:

Journalists preserving evidence and sources
Researchers tracking changes in web content over time
Legal professionals documenting online evidence
Individuals preserving content that may be altered or removed

Notable controversies and challenges

The Cloudflare DNS conflict

Since May 2018, archive.today has been largely inaccessible to users of Cloudflare’s 1.1.1.1 Domain Name System (DNS) service. This ongoing technical dispute centres on Extension Mechanisms for DNS (EDNS) Client Subnet (ECS) information.

Archive.today’s DNS servers return invalid responses to queries from Cloudflare because Cloudflare does not include EDNS Client Subnet information in its DNS requests. ECS leaks geolocation data about users, which Cloudflare considers a privacy risk and refuses to implement.

The archive.today operator argues that the absence of EDNS information causes “so many troubles” for load balancing and DDoS protection, particularly because DNS requests and subsequent HTTP requests come from vastly different geographic locations when routed through Cloudflare’s global network.

Cloudflare chief executive Matthew Prince explained that fixing this on their end “would violate the integrity of DNS and the privacy and security promises we made to our users.” He noted that nation-state actors have been observed monitoring EDNS subnet information to track individuals.

This conflict has been intermittent, with temporary resolutions followed by renewed blocking. As of October 2025, the issue remains unresolved, forcing users to either switch DNS providers or use VPNs to access archive.today.

Censorship and blocking

Archive.today has faced numerous censorship attempts worldwide:

China: According to GreatFire.org, archive.today has been blocked since March 2016, archive.li since September 2017, archive.fo since July 2018, and archive.ph since December 2019.

Russia: In 2016, Roskomnadzor began blocking access to archive.is. In Russia, only HTTP access is possible; HTTPS connections are blocked. On Jan. 28, 2016, pages related to the annexation of Crimea were specifically blocked when accessed through non-encrypted traffic.

Finland: On July 21, 2015, the operators blocked access from all Finnish IP addresses, stating on Twitter they did this “to avoid escalating a dispute they allegedly had with the Finnish government.” Access has since been restored.

Australia and New Zealand: In March 2019, several internet providers blocked the site for six months following the Christchurch mosque shootings, attempting to limit distribution of footage from the attack. The blocks have since been lifted.

Reliability challenges

Users have reported persistent technical problems with archive.today, particularly since 2023. Common complaints include:

DNS resolution errors affecting various providers beyond Cloudflare
Infinite captcha loops that prevent access
Extended outages lasting days or weeks
Conflicts with virtual private network (VPN) services and antivirus software
Slow page loading times

The operator’s Tumblr blog, which previously provided updates and support, has not been updated for more than a year as of late 2024, leaving users without official communication during outages.

Legal and ethical concerns

The service’s ability to archive paywalled content and refusal to respect robots.txt or removal requests has drawn criticism from publishers and copyright advocates. However, the operator’s anonymity and the service’s location in Europe (potentially Russia) have made legal action difficult.

The service has also been used to preserve controversial or extremist content, leading to some of the censorship attempts mentioned above. However, the operator maintains a hands-off approach except for child exploitation material and valid legal requests.

Technical features

Search functionality

Archive.today’s search is powered by Google Custom Search. If no results are found, the service falls back to Yandex Search. Users can search archived pages by URL, domain or keywords.

The search toolbar supports advanced operators:

Asterisk (*) as a wildcard character
Quotation marks for exact phrase matching
“insite:” operator to restrict searches to specific domains

Text highlighting

When users select text on an archived page, JavaScript generates a URL fragment that automatically highlights that text when the link is shared and visited again.

ZIP downloads

Users can download archived pages as ZIP files, though this feature was disabled for pages archived after Nov. 29, 2019, when the service switched to the Chromium browser engine.

Memento API

Since July 2013, archive.today has supported the Memento Project application programming interface (API), allowing programmatic access to archived content.

Current status and future uncertainty

As of October 2025, archive.today remains operational despite ongoing challenges. The service operates in a state of perpetual uncertainty as a one-person operation with opaque funding and potential connections to Russia. The service faces several existential threats:

Financial sustainability: The operator admits that personal funds cover the majority of operating costs, which continue to grow as the archive expands.
Technical challenges: The service experiences frequent domain issues, averaging “one trouble with domains per year and each fifth trouble will result in domain loss.” Recent reliability problems have frustrated users who depend on the service.
Legal pressure: While the operator has largely avoided legal action through anonymity, this could change if jurisdictions become more aggressive in pursuing copyright or content violations.
Political pressure: With suspected Russian connections and increasing sanctions, cross-border payments and international operations become more difficult.
Single point of failure: As the creator acknowledges, “My death can cause interruption of service.” There is no succession plan or organizational structure to ensure continuity.

The operator has described the service as a “weak tool” that is “doomed to die,” showing a realistic awareness of its fragility. However, it has persisted for more than a decade since 2012, demonstrating remarkable resilience despite numerous challenges.

Conclusion

Archive.today occupies a unique and controversial niche in web archiving. It provides a valuable service for preserving ephemeral online content, particularly for journalists, researchers and users in censored regions. Its ability to bypass paywalls and robots.txt restrictions makes it both powerful and contentious.

The mysterious nature of its operation—with an anonymous Russian operator funding it primarily out of pocket—raises questions about long-term sustainability and motivations. Yet this same anonymity has protected it from legal pressure that might have shut it down years ago.

As the web becomes increasingly paywalled, dynamic and censored, services such as archive.today play an important role in preserving the public record. Whether it continues to operate for another decade or disappears tomorrow, it has already archived half a billion pages and demonstrated the ongoing tension between copyright, censorship, preservation and access to information.

For users who depend on the service, the recommendation is clear: archive important content across multiple services (including the Internet Archive and local copies) to hedge against the risk of archive.today’s eventual disappearance—or its next extended outage.

About this post

Sources and verification: This article is based on publicly available information as of October 2025, including Wikipedia articles, investigative journalism, technical documentation, user forums and the operator’s own blog posts. All factual claims have been verified against multiple independent sources where possible.

Operator anonymity: The identity of archive.today’s operator remains unconfirmed. All names mentioned (Denis Petrov, Masha Rabinovich) may be pseudonyms. The operator maintains anonymity and has not been independently verified by journalists or researchers.

Self-reported information: Details about funding, infrastructure and operational costs come primarily from the anonymous operator’s own statements on their FAQ and Tumblr blog. These claims cannot be independently verified and should be evaluated with appropriate skepticism.

Service volatility: Archive.today operates as an unofficial, privately run service with no institutional backing, legal guarantees or service-level agreements. The service experiences frequent technical issues and may become unavailable or cease operations at any time without notice.

Legal considerations: Using archive.today to bypass paywalls or archive copyrighted content may violate copyright law, terms of service or other legal restrictions in your jurisdiction. This article describes the service’s capabilities but does not constitute legal advice or endorsement of any particular use.

Editorial independence: This article is not affiliated with, endorsed by or sponsored by archive.today or its operator. It is an independent analysis intended for educational and informational purposes.

#archiveToday #webarchiving #digitalpreservation #internetarchives #infosec #privacy #opendata #cloudflare #dns #edns #ecs #russia #censorship #internetfreedom #opensource #waybackmachine #journalism #researchtools #legaltech #cybersecurity #datastorage #ovh #hadoop #chromium #webinfrastructure #paywalls #copyright #freespeech #tor #darkweb #onionservices #infotech #digitalethics #compliance #opentech #webhistory