The Phone Factory: Notes from Inside AI's Extraction Layer

At an AI conference in Shanghai recently, I was casually chatting with a founder over pizza, and before I knew it, we went sliding down the rabbit hole of their phone factory. Human performers sit in a studio while their expressions, gestures, and body motions are captured and mapped onto synthetic personas (think The Congress). These fabricated characters, complete with designed faces, voices, and life stories, accumulate followers, brand deals, and parasocial trust across platforms. To their audiences, they are real people. Deepfake. They showed me the inner workings: content output per day at volumes no human creator could match. Idoru, William Gibson’s 1996 novel about a synthetic celebrity in Japan with a cult following, came to mind.

I told the founder that this is dark magic, with a byproduct of spamming cyberspace with AI slop. Their reply: the product works. Major global brands are adopting it as paying customers. Marquee investors are backing it.

The broader market is already at scale. Just six months ago, a hacker breached the backend of Doublespeed, a VC-backed phone-farm-as-a-service operation charging $1,500 to $7,500 a month to run thousands of synthetic social media accounts on physical devices. The leak exposed 400+ TikTok accounts, including one called Chloe Davis, an undisclosed AI persona promoting a massage roller for commission. None of the accounts disclosed they were ads, nor their synthetic nature.

Adjacent to this layer is the more mainstream end of the market. Japan’s two largest virtual persona companies, Cover Corp and ANYCOLOR, are publicly traded; both posted FY2025 revenues of roughly ¥43B (Cover, ANYCOLOR). Their model still relies on real human performers behind avatars, with the artifice generally disclosed. Even with the human acknowledged, the economics are notoriously extractive. The phone factory model removes the human entirely; only the extraction remains.

By the metrics growth marketers care about (CAC), the phone factory works incredibly well. It is also a textbook case of how the intelligence economy extracts from our informational commons, the shared reservoir of human trust and labor, at multiple layers simultaneously, and the field has not yet built the language to describe what is being lost.

The Extraction Spectrum

Synthetic personas are not a single category. They sit on a spectrum defined by two axes: whether the performer (if there is one) has consented to the use of their likeness, and whether the audience has been told they’re interacting with synthetic content. Disclosed fully-synthetic personas with no real human behind them are one legitimate end of the spectrum. A different founder I encountered is pivoting away from operating undisclosed deepfake personas on adult-content platforms toward a fully disclosed AI-native entertainment company. They figure maybe 10% of paying customers on the old model knew they were talking to a deepfake. Disclosed digital replicas of real performers, deployed under licensing agreements where the human captures ongoing upside, are another model. SAG-AFTRA’s contracts and emerging tools for authorized voice and likeness cloning live here.

A 2x2 matrix plotting synthetic personas on two axes — performer consent and audience disclosure — where only the consented-and-disclosed quadrant is legitimate.

Figure 1. The Extraction Spectrum: synthetic personas mapped by performer consent and audience disclosure.

The middle gets murkier: the AI chatbot impersonating a real adult content creator under a rev-share deal. The creator consents, the subscriber is never told. Some will generate custom images or video and send them on request. Legitimate from the labor side, deception from the audience side. Performer consent does not fix audience deception; it produces a new kind of laundering, where the performer’s authorization becomes cover for ongoing fabrication of the “personalized” relationship the subscriber thinks they’re in.

The extractive end of the spectrum is where both axes fail at once: no performer consent, no audience disclosure. That is the phone factory model. It is also what the “AI pimping” industry runs on.

The focus on audience deception is itself a distraction from a deeper structural issue: the enclosure of the informational commons. Performance-capture networks are in fact three-sided extraction engines.

The performers. Real humans do the expressive labor: the smiles, the micro-reactions, the range that brings the synthetic persona to life. They are paid contractors. They do not own the persona, do not accumulate the audience, do not capture the brand value they animated, or any financial upside. Their likeness has been laundered into someone else’s IP. Think session musicians, with a predatory twist: one recording session, scraped into an infinite, autonomous performance asset.

China is the first jurisdiction litigating this in real cases, and the doctrine is already splitting. In April 2024, the Beijing Internet Court ruled in Yin v. Beijing AI Company, the world’s first AI voice-infringement decision, that a synthetic voice trained on a real voice actor’s recordings without consent violated personality rights, awarding RMB 250K. Eight months later, the Chengdu Railway Transport Court reached the opposite conclusion in the Meng Shuai voice-pack case, finding no infringement because the plaintiff was not famous enough.

The split hinges on whether a performer has rights only once they’re already a celebrity. That is an applied commons question. A voice actor’s reputation is a professional commons enclosed by their industry. To deny protection there is to sanction the wholesale enclosure of professional identity. SAG-AFTRA’s 2025 Interactive Media Agreement landed on a different answer to the same question: consent and compensation when a digital replica is “objectively identifiable.” Meaningful, but it covers one jurisdiction’s union work. Outside that perimeter, the commons gets extracted quietly.

The audience. People form parasocial bonds, one-sided psychological relationships that drive consumer behaviors, with entities that do not exist. Real emotions, artificial counterparty. Last year, Mia Zelu, an AI persona (technically with disclosure, however buried behind a “Read More” button), went viral during Wimbledon, accumulating over 160K Instagram followers. Indian cricket star Rishabh Pant interacted with her account. The parasocial bond functioned as designed; the counterparty was fiction. A network of “sister” accounts operated by the same shop suggested coordinated synthetic-persona-as-asset operation at the agency level, the same structure I encountered in Shanghai.

404 Media’s investigation, beginning April 2024 and expanding through 2025, documented hundreds of Instagram accounts using stolen video from real adult content creators, face-swapped with AI, and monetized on competing platforms. Cornell Tech researcher Alexios Mantzarlis identified ~900 such accounts. Real creators reported reach drops of 50-90%. The audience layer and the performer layer are both being extracted from.

The information commons. Every undisclosed synthetic creator that earns followers lowers the signal-to-noise ratio of “real human creator” across the internet. A December 2024 HKUST and CISPA study using their OSM-Det classifier found that by October 2024, 37% of new Medium posts and 39% of new Quora posts were AI-generated; Reddit, by contrast, sat at ~2.5%, a useful reminder that platform incentives shape the rate of degradation. Mantzarlis recently identified an AI-generated podcast network publishing 11,000 episodes a day, ripping content off real media outlets at industrial scale. The commercial value the platforms charge for is generated by human attention to human contribution, and increasingly AI is on both sides.

This is Drago and Laine’s Intelligence Curse at the media layer. Concentrated technical capability generates surplus for the operators. The cost is distributed across a public consisting of performers, audiences, and the commons of trust, without a compensatory mechanism. I explored this phenomenon in an essay last month.

Disclosure collapses the parasocial bond, and the conversion rate. I’ve also watched sales teams in San Francisco run voice agents on people without telling them they’re talking to AI. The empathy is engineered. So is the conversion.

If the business model is optimized when the audience is deceived, the business model is the deception.

The Regulatory Clock Is Loud Right Now

EU AI Act Article 50 takes effect August 2, 2026, with fines up to 3% of global turnover. China’s mandatory labeling rules for synthetic content came into force last September. India, South Korea, Brazil, and the UK have either passed or are finalizing parallel frameworks. Disclosure mandates are becoming a global default: August 2026 in the EU, today in China.

Regulation describes what should not be. Infrastructure determines what can be at scale. Extraction is difficult to litigate away, if the underlying technology makes it the most profitable path.

The Composition Layer

The necessary technical primitives already exist. They are just not yet integrated.

Provenance at creation. The C2PA standard has hardware adoption from Leica, Samsung, Google Pixel, and Nikon. Adobe, Microsoft, Google, OpenAI, Meta, the AP, the BBC, Reuters, and the NYT sit on the steering committee. LinkedIn, TikTok, and Cloudflare surface content credentials at point of consumption. EU Article 50 and California SB 942 effectively mandate machine-readable provenance that maps to C2PA architecture.

Proof of (unique) personhood. World ID reports nearly 18 million Orb-verified humans across 160 countries, with a growing field of competing ZK identity platforms.

Provenance manifests get stripped at distribution because standard image and video pipelines do not preserve metadata. Proof of personhood is plumbed into account-level verification (dating apps, gaming anti-fraud), not the content layer. Microsoft’s February 2026 Media Integrity and Authentication report, which evaluated 60 combinations of provenance and watermarking methods, concluded that no single method works alone. Layered defense, deployed at scale, requires coordination no single platform is incentivized to lead.

Motion is happening at the agent layer. In March 2026, World launched AgentKit, which uses World ID and Coinbase’s x402 protocol to bind AI agents to verified humans via zero-knowledge proofs. iProov demonstrated cryptographic human-intent binding at RSA earlier this year. The agent layer is getting attention because there’s commercial pull from enterprise AI deployment. The content layer is not, because the commercial pull at the content layer points the other way.

No production system in 2026 binds a creator’s verified personhood to their published content as a routine workflow. No platform currently uses proof-of-personhood as a labeling input for human versus AI-generated content at scale. The pieces sit next to each other on the table. No incumbent has commercial incentive to assemble them.

That is the classic shape of a public goods coordination problem.

Opportunities & Needs

Public goods builders can treat this as an adversarial design space. Some things that can be built to address these provenance and agency issues, replacing industrialized deception with verifiable infrastructure for human-AI collaboration:

Durable provenance at the content layer. C2PA manifests that survive standard distribution pipelines, paired with invisible watermarking and content fingerprinting as fallback: layered defense no single platform is incentivized to lead alone.
Personhood-to-content binding. Cryptographically binding a verified human to the synthetic personas they animate, with audience-verifiable attribution and ZK-preserved performer privacy, a primitive that does not yet exist in production.
Consent and royalty infrastructure for performance capture. A protocol for the humans whose expressions animate synthetic personas to assert rights, claim attribution, and capture ongoing upside. SAG-AFTRA covers one jurisdiction’s union work; the global majority is uncovered.
Disclosure-default discovery. Recommendation systems that surface labeled synthetic content as a category rather than mixing it indistinguishably into human-creator feeds.
The audit layer. Detection tooling, watermark-removal research, and public-interest infrastructure for identifying undisclosed synthetic networks at scale; current detection sits at just 65 to 70% accuracy per India’s regulatory submissions.

Closing

The intelligence economy is being designed right now. The same week I encountered the phone factory, I sat with Chinese open-source labs like Inclusion AI building the commons infrastructure this moment requires. Two architectures in the same week, one extracting from the commons and one rebuilding it. Capital defaults to the path of least resistance: extraction. That gap is the work Funding the Commons took on. We’ve spent years convening this conversation. The next phase is supporting the founders who can build the alternative.

David Casey

Shanghai, May 2026

Acknowledgments

David Casey - Conceptualization, Investigation, Writing.

AI assistance - Claude (Anthropic) supported research verification, fact-checking, and editorial review, and produced Figure 1. All prose, claims, and final editorial decisions are the author’s.