The Scholar as Data: What AI Scraping Reveals About Authorship, Identity, and the Limits of Consent

Mar 21

I did not design my work to be scraped. I designed it to be read. The distinction matters, because what AI systems do when they encounter structured intellectual work is not reading — it is extraction. They do not follow an argument. They do not track the relationship between a concept introduced in one section and its application three sections later. They parse, weight, and absorb. The output is not understanding. It is pattern.

This is the condition I find myself in as an independent scholar in what I have come to call the Artificial Era: my work is being consumed by systems I did not consent to train, systems that will redistribute what they extract without attribution, without context, and without the architecture that gave those ideas coherence in the first place.

I am not writing about this as a grievance. I am writing about it as a structural condition — one that Psychological Architecture is, in fact, built to describe.

What Scraping Actually Does

There is a common assumption that when an AI system encounters a body of written work, it copies it. This is not accurate, and the inaccuracy matters. Copying preserves a relationship between the text and its origin. A copy can be traced, attributed, contested. What AI training pipelines do is different. They metabolize.

The system consumes the work, extracts its statistical weight, and dissolves it into a representation that is distributed across millions of parameters. No individual idea survives intact. No argument remains coherent as an argument. What remains is influence without source — a contribution that cannot be named, credited, or even identified, because the original has been absorbed rather than reproduced.

For a scholar, this is not simply an intellectual property problem, though it is that too. It is an identity problem. Intellectual work is not merely output. It is, in the terms of Psychological Architecture, an expression of the Meaning domain — the dimension of human experience concerned with purpose, contribution, and the construction of a lasting internal coherence. When that work is extracted without consent and dissolved into a system that has no stake in what the work means, something specific is damaged. Not the work itself. The relationship between the author and the work.

The Individual as Actor and Data Set

The Artificial Era essays have examined how automation reshapes effort, cognition, identity, and meaning at the level of daily life. What the scraping phenomenon reveals is something more specific: in the current period, the individual is simultaneously a producer of intellectual content and an involuntary training resource for the systems that will redistribute that content.

This dual condition is worth naming precisely. When I write an essay, I am acting as an author — making decisions about structure, sequence, emphasis, and argument. I am building something with a particular logic. That logic is inseparable from the form. Remove the sequence and you have lost the argument. Remove the context and you have lost the meaning.

When an AI system scrapes that essay, it treats the form as irrelevant. The structure I built is overhead. What the system extracts is the semantic content — words, phrases, conceptual relationships — stripped of the architecture that gave them force. The author becomes, in this transaction, not a source but a data point. The essay becomes not an argument but a training sample. This is not a metaphor. It is a description of a structural change in what authorship now includes. To produce serious intellectual work in the current period is to produce, simultaneously and without election, training material for systems that will circulate derivatives of that work without reference to its origin. The act of writing has not changed. The conditions surrounding it have. Authorship now operates inside a secondary process that the author did not design, does not control, and cannot opt out of — a process running parallel to the one the author intended, absorbing the same output toward entirely different ends.

This is what I mean by the individual as both actor and data set. The acting does not stop. I continue to write, to build, to develop the framework. But alongside that, without my participation and without my consent, I am also generating training material for systems whose outputs will shape how the ideas in my field are understood, summarized, and circulated — by systems that have absorbed the language of those ideas without the intellectual history behind them.

The Independent Scholar's Exposure

There is a version of this problem that institutional scholars face in attenuated form. A professor at a research university writes within a system that has already established its authority independently of any individual's output. The institution's name travels with the work. The journal's imprimatur travels with the work. The scholar's contribution is real, but it is housed inside a structure that absorbs some of the risk of invisibility and some of the cost of appropriation.

The independent scholar has no such housing.

When my work is scraped and metabolized into an AI system's training data, there is no institutional layer to absorb the loss. There is no department that continues to signal authority when my specific contribution has been dissolved into statistical weight. There is no journal whose reputation persists as a marker of the work's origin. There is only the work, and the name attached to it, and the site where it lives — none of which the scraping system acknowledges, cites, or preserves.

This asymmetry is not incidental. It is structural, and it compounds in specific ways.

Institutional scholars publish inside systems designed to maintain attribution. Peer review, journal databases, citation indexes, university repositories — these are not perfect, but they create multiple redundant records of who produced what and when. When AI systems train on academic literature, the sourcing is at least partially traceable in principle, even when it is not traced in practice. The infrastructure of attribution exists, even if it is being bypassed.

Independent scholars publish on the open web. The infrastructure of attribution is whatever the author builds and maintains. For me, that is a website, a research index, a presence on ResearchGate and Google Scholar — structures I have built deliberately and maintain continuously. They are real. But they are also fragile in comparison to institutional systems, because they depend entirely on the continued functioning of a single author's operation. There is no library acquiring my work. There is no repository maintaining persistent identifiers for it. There is no institution whose continued existence guarantees that the record of my contribution survives.

When a bot hits profrjstarr.com thousands of times in a day and carries the content into a training pipeline, it encounters none of the friction that exists around institutionally published work. There is no paywall, no license agreement, no terms of access that the system must navigate or acknowledge. The openness that makes independent scholarship accessible to readers also makes it maximally available to extraction. The design choices that serve the human reader serve the scraper equally, and the scraper does not distinguish between access and consent.

The psychological consequence of this specific exposure is worth naming. Institutional scholars can, at least partially, separate their identity from the fate of any individual piece of work. The institution continues. The department continues. The journal continues. For the independent scholar, every piece of work is also a piece of the infrastructure. The essays are not just arguments — they are also the evidence that the project exists, that it is serious, that it has been building over time. When that evidence is absorbed into systems that will never surface its origin, the damage is not just to attribution. It is to the visibility of the project itself.

This is not an argument for institutional affiliation as protection. Institutions carry their own distortions, their own pressures toward optimization and visibility, their own forms of appropriation. It is an argument for clarity about the specific vulnerability of independent intellectual work in an era where the primary mechanism of AI training is open-web scraping — and where the scholars most exposed are precisely those whose work is most carefully structured, most openly available, and least protected by the redundant attribution systems that academic publishing, for all its faults, still partially maintains.

Psychological Architecture and the Consent Problem

Psychological Architecture is a structural model organized around four interacting domains: Mind, Emotion, Identity, and Meaning. In normal conditions, these domains operate in relative coherence. The work a person produces is legible as an expression of who they are, what they value, and what they are trying to build. Authorship — the sense that one's intellectual output belongs to oneself, carries one's signature, and reflects one's development — is a function of Identity and Meaning working together.

The scraping dynamic disrupts this coherence in a specific way. It does not attack the work directly. It extracts from it. The author continues to produce. The essays accumulate. The framework develops. From the outside, nothing has changed. But something has shifted in the relationship between the author and the work, because the work is now also functioning as raw material for a system that has no relationship to the author at all.

Consent is the structural issue here, not simply the ethical one. In the Meaning domain, what matters is not just that work has value but that the author has standing in relation to that value. Standing requires acknowledgment — not applause, not visibility, but the structural condition in which the work remains legible as an expression of the person who produced it. When scraping removes that legibility silently, the author's relationship to their own output is altered in a way that has no visible marker. The work continues. The coherence quietly shifts.

This is the consent problem as a psychological condition, not merely a legal one.

Psychological Resistance Is Not Refusal

There is a temptation, when facing this condition, to treat it as a reason to stop. To conclude that if the work will be harvested regardless, the act of careful construction is futile. This conclusion deserves examination, because it is precisely wrong.

The argument for futility rests on a confusion between purpose and reception. The purpose of structured intellectual work is not to prevent extraction. It is to build something coherent — to develop an argument that holds, a framework that explains, a body of work that accumulates meaning over time. None of that purpose is negated by the fact that a system will also scrape it.

What I call psychological resistance is not a refusal to engage with AI systems or an attempt to prevent scraping through technical means. It is something more internal. It is the practice of maintaining the conditions under which serious intellectual work remains possible — specifically, the conditions of depth, patience, and authorial coherence that automated systems cannot generate, because they are not outputs but processes.

Deep, unoptimized focus is not a productivity strategy. It is a structural resistance to the pressure that the Artificial Era exerts on intellectual work — the pressure to produce at the pace of consumption, to optimize for retrieval rather than argument, to write in ways that serve the machine rather than the idea. When I sit in a room with no notifications, no feedback loops, no real-time audience, and work through a problem over hours or days, I am doing something that no scraper will ever do. I am not generating data. I am thinking.

The distinction is not a small one. It is the whole point.

What the Machine Cannot Take

AI systems that train on structured intellectual work can acquire the language of that work. They can reproduce the terminology, approximate the frameworks, and generate text that resembles the arguments. This is not nothing. But it is also not the work.

What they cannot acquire is the developmental sequence that produced the work. They cannot absorb the years of reading that preceded the framework, the false starts that preceded the published essays, the internal coherence that makes the whole system hold together as a single argument rather than a collection of related claims. The machine gets the output. It does not get the author.

This is not a consolation. It is a structural description of what authorship is and what it requires. Authorship is not the production of text. It is the maintenance of a relationship between a mind and its output over time — a relationship that builds, accumulates, and develops in ways that cannot be extracted, because the extraction would have to begin before the first word was written and continue until the last.

The Artificial Era has made this relationship more fragile in some respects. The volume of AI-generated content that resembles serious intellectual work creates real pressure on the signals by which serious work is identified. Attribution becomes harder to maintain. Distinctiveness requires more effort to preserve. These are genuine pressures, not imagined ones.

But the relationship itself — between a scholar and the architecture of their thinking — remains intact for as long as the scholar maintains the conditions that produce it. That is what I mean by psychological resistance. Not defiance. Not protest. The continued practice of building something that takes longer to make than any system will ever take to consume it.

The Artificial Era Requires a Different Kind of Authorship

The essays in this series have argued, from different angles, that the Artificial Era alters the psychological conditions of human life at the level of effort, identity, and meaning. The scraping phenomenon makes that argument concrete and personal in a way that abstract analysis cannot.

I am not writing about what AI will do to intellectual work in the future. I am writing about what it is doing now, to my work, on a site that AI bots visit more frequently than human readers on most days. The condition I am describing is not hypothetical. It is current, structural, and largely invisible to the scholars it affects most.

What the Artificial Era requires from a serious author is not simply more output or better protection. It requires a deliberate relationship with the purpose of intellectual work — one that does not derive its validity from reception, recognition, or the prevention of extraction, but from the internal coherence of the project itself. This is harder than it sounds. The psychological pressures of the era run in the opposite direction: toward visibility, speed, optimization, and measurable reach.

Authorship in the Artificial Era means continuing to build something architecturally serious in a context that will absorb the pieces and ignore the structure. It means maintaining the identity of a scholar in a period that increasingly treats intellectual output as data. It means caring about the argument when the system only wants the text.

That is not futility. It is the only available form of integrity.

RJ Starr