Skip to main content
  1. Data Science Blog/

From Claw Code to Clean Room: A Developer's Guide to Re-implementing Software Without Getting Sued

·2854 words·14 mins· loading · ·
AI Ethics & Governance Software Development Technology Trends & Future Clean Room Design Intellectual Property AI Code Generation Software Copyright Trade Secrets Software Development

From Claw Code to Clean Room

From Claw Code to Clean Room: A Developer’s Guide to Re-implementing Software Without Getting Sued
#

Clean-room design, AI coding agents, and why “I didn’t copy-paste” is not a defense

Disclaimer: I am a software practitioner, not an attorney. Laws vary by jurisdiction and change quickly—especially around AI. Use this article as a technical map of risks and practices; confirm decisions with qualified counsel before shipping code that depends on a clean-room story.


Introduction: The Evening Rewrite and the Morning Risk
#

The software engineering landscape shifted again—not gradually, but in a single product cycle. Autonomous AI coding agents (Claude, Codex, Cursor Agent, and orchestration workflows such as oh-my-codex (OmX)) can port an application to a new language, framework, or cloud shape in hours instead of quarters. Freelance developers sell “full rewrites” as fixed bids. In-house teams run parallel agent swarms on legacy modules nobody wanted to touch.

That speed is a genuine force multiplier. It is not an intellectual property laundry machine.

In early 2026, the developer community watched a familiar pattern play out at high velocity: a proprietary agent harness—widely discussed as “Claw Code”—surfaced in public channels. Within a short window, independent engineers published Python-first and Rust-first rewrites, often advertising them as clean-room implementations. The technical work was impressive. The legal story, for anyone who has built software under NDAs or competed in a crowded market, was fragile before the first pull request merged.

This article is for builders—freelancers, tech leads, and founders—who must re-implement behavior (migrate a stack, replace a vendor, study a competitor, or recover from a leak) while AI types most of the keystrokes. We walk through:

  1. A neutral case study of the Claw Code moment—what was claimed, what traditional clean-room process requires, and where AI does not help.
  2. The four legal fronts you actually fight on.
  3. The AI authorship paradox—who can copy your output if you cannot copyright it.
  4. When convergent architectures (RAG, agents) are legally boring—and what still forms a moat.
  5. A playbook and scenario matrix you can hand to counsel.

If you only read one companion piece on this site, pair this with Navigating Open-Source Licensing in the Age of AI: that post covers license compatibility; this one covers re-implementation and contamination.


1. Case Study: A Viral Rewrite and the Limits of “Clean Room” Claims
#

1.1 What happened (facts, not a verdict)
#

Reports in March 2026 described the core codebase of a prominent agent harness—colloquially called Claw Code—appearing outside its intended distribution boundary. The system was not a toy script; it embodied orchestration patterns, tool routing, memory handling, and safety guardrails that teams had treated as proprietary know-how.

Almost immediately, public repositories appeared that:

  • Reimplemented core behaviors in Python or Rust.
  • Credited AI orchestration (including OmX-style multi-step agent flows) for the speed of the port.
  • Stated or implied that the work was a from-scratch rewrite capturing architecture without copying literal source text—therefore a legally sound clean-room design.

One widely discussed line of work lives in the open repository dasarpai/claw-code, which documents an independent engineering effort in that wave. This section does not judge that repository’s legal status. Courts decide infringement; communities decide stars. Our job is to understand process.

1.2 What the author claimed vs what law cares about
#

The technical claim is intuitive: “We did not paste their files; we described behavior and had the model write new code.”

Traditional IP analysis asks a different question: Was the implementer’s mind contaminated by unauthorized access to the original expression or trade secrets—and did that contamination flow into the output?

Copyright cares about expression (code text, unique structure, non-obvious organization). Trade-secret law cares about economically valuable know-how that was not generally known and was protected by reasonable measures. Contract law cares about what you signed (NDA, assignment, acceptable use of AI vendors) regardless of how fresh the syntax looks.

A label on GitHub—clean-room, cleanroom, independent—is not a defense. It is marketing unless you can show separation of knowledge and specification-only handoff the way counsel expects.

1.3 The contamination chain
#

Classical clean-room design assumes an air gap between people who have seen the original and people who implement. AI collapses the typing cost; it does not collapse the gap.

A contaminated chain often looks like this:

Access leaked or proprietary code
    → Engineer forms mental model (structure, edge cases, naming habits)
        → Prompts reference "the same flow as X" or paste snippets into chat
            → Agent emits "new" code that tracks non-obvious choices
                → Ship under "rewrite" branding

Orchestration does not sterilize the room. Running ten agents, a reviewer model, and a test harness still leaves one human (or one team) who directed replication. Multi-agent glamour is not Team B.

Factors that weaken a clean-room story in AI-heavy rewrites include:

  • The same person both studied the leak and wrote the prompts.
  • Prompts that ask for parity, feature-for-feature matching, or same internal module boundaries.
  • Commit messages or issues that reference proprietary identifiers from the source.
  • Tests copied from private behavior descriptions not derived from a public spec.

1.4 Lessons without verdict
#

Community velocity is not legal velocity. Shipping first can still mean discovery first—for you.

Illustrative, not precedent. The Claw Code wave is a teaching moment about process failure modes, not a court ruling you can cite.

Ethics sit beside law. Even where enforcement is uncertain, building on leaked trade secrets damages trust with customers, employers, and future collaborators. Responsible teams treat leaks as incident response, not a shopping event.

This section describes a community pattern, not legal advice. Whether any particular rewrite infringes copyright or misappropriates trade secrets depends on facts, jurisdiction, and counsel—not on whether the code “looks different” or was typed by an agent.


2. Clean-Room Design: What It Is and What AI Did Not Invent
#

2.1 The two-team model
#

A clean-room design (also clean-room implementation) is a disciplined method to recreate functionality while reducing the risk that expression or secrets from a reference product illegally carry over. It is used in competitive analysis, platform migration, and litigation defense—not as a magic word on a README.

Historically it requires two isolated groups:

RoleTeam A (“dirty room”)Team B (“clean room”)
May seeThe reference product, reverse-engineered behavior, leaked code (only if counsel approves the study)Only a written specification
ProducesFunctional spec: inputs, outputs, protocols, performance targets, error semanticsImplementation from scratch
Must notHand code, pseudocode, or “rewrite this file” tasks to Team BRead the original source or sit in meetings where it is discussed

Team A documents what the system does—API contracts, state machines, user-visible behavior—not how the original authors wrote it (no copied algorithms, no idiosyncratic variable names, no comment text).

Team B implements with engineers who have never seen the original. In mature programs, access control, separate machines, and written certifications support that separation.

2.2 Where rewrites fail the test
#

These common patterns are rewrites, not clean rooms:

  • Solo developer with the competitor’s repo on one monitor and Cursor on the other.
  • “I only looked at architecture.” Architecture in code is expression-adjacent; non-obvious decomposition can be protectable when substantially similar.
  • Shared AI session where proprietary files were uploaded earlier in the thread—even if later prompts omit them (retention and context vary by vendor).
  • Spec written by the same person who debugged against the leaked tarball.

If you cannot produce a spec that predates implementation and prove Team B’s blindness, you have a narrative, not a defense.

2.3 Clean room in the AI era
#

AI is best understood as Team B’s extremely fast typist, not as a legal partition:

  • Humans still own the air gap. Someone must author a spec without laundering secret detail into it.
  • Spec-first, code-second. Generate code only from the spec artifact; ban “make it work like repo X” prompts.
  • Do not confuse AWS Clean Rooms with software clean room. AWS Clean Rooms solves collaborative analytics on data without raw sharing. Software clean room solves independent authorship of code. The names sound alike; the problems are not.

3. Four Legal Fronts Every Re-implementation Hits#

When a rights holder sues—or sends a cease-and-desist—the fight is rarely “you used Python instead of TypeScript.” It spans four fronts.

3.1 Copyright: idea vs expression vs derivative work#

Copyright protects original expression fixed in a tangible medium. It does not protect ideas, methods of operation, or general functional goals. Building a ride-hailing app does not infringe Uber’s copyright merely because both match drivers to riders.

Software infringement and derivative work claims often turn on:

  • Literal similarity—identical or near-identical code blocks (including comments and odd formatting).
  • Non-literal similarity—structure, sequence, and organization of modules when the reference is creative and the overlap is substantial. Courts use frameworks such as abstraction-filtration-comparison (see Computer Associates v. Altai in the U.S.) to separate unprotectable ideas from protectable expression.
  • Scènes à faire—standard patterns forced by the problem (e.g., CRUD handlers) weigh against infringement.

AI increases the risk of substantial similarity without intent: models trained on public code may reproduce memorable snippets; models steered by contaminated prompts may reproduce non-obvious structure. “The AI wrote it” is not a statutory defense.

The U.S. Supreme Court’s Google v. Oracle decision (2021) addressed a different question—fair use of API declaring code in a specific platform context—not a blanket license to clone commercial systems. Do not over-read it as permission to replicate complex internals.

3.2 Trade secrets and misappropriation
#

For many products, trade secret claims hurt more than copyright. A trade secret can be:

  • A tuning threshold in a retrieval pipeline.
  • A routing policy between tools in an agent harness.
  • A failure-mode heuristic learned from production traffic.

Misappropriation generally requires acquiring the secret through improper means (theft, breach of NDA, industrial espionage) or using it without consent. If your optimization clearly could only come from having seen private telemetry or internal docs, “we retyped it” will not comfort a jury.

Freelancer hotspot: You combine Client A’s public brief with mental residue from Client B’s confidential engagement in one chat workspace. That commingling is a compliance incident waiting for a dispute.

3.3 Contracts: NDAs, assignment, and AI vendor terms
#

Even pristine code can lose if contracts fail:

  • NDAs may prohibit disclosing operational logic to third parties—including commercial LLM providers that retain prompts or train on enterprise tiers unless disabled.
  • Employment and work-for-hire agreements may assign all implementations you create, including side repos, if they relate to the employer’s business.
  • AI vendor terms govern subprocessors, data residency, and whether your prompts become part of vendor training. A clean-room story dies if your “Team B” was a cloud model that stored the proprietary spec.

Read your MSA, SOW, employee IP agreement, and Cursor/Copilot/Claude enterprise paperwork as one system.

3.4 Patents (thin but real)
#

Software patents are narrower in some jurisdictions but still matter in agents, compression, and protocol niches. A clean copyright story does not automatically clear method claims. Counsel screens patents when the reference product is litigious.


4. The AI Paradox: Authorship, Ownership, and Who Can Copy You
#

4.1 Human authorship and registrability
#

U.S. Copyright Office guidance (2023–2025) stresses human authorship for registration. Purely AI-generated material without sufficient human creative selection, arrangement, or modification may be uncopyrightable by you—meaning you may have weaker tools to stop verbatim copying of your own drop.

The industry response is the selection and arrangement doctrine applied in practice:

  • Humans define architecture, module boundaries, and acceptance tests.
  • Humans choose which model outputs to keep, refactor, or discard.
  • Humans integrate components into a cohesive product.

Document that chain. Your defense in an infringement suit and your offense against copiers both improve when authorship is traceable.

4.2 Tool license vs your product license
#

The license on GitHub Copilot, Cursor, Claude, or API terms governs your use of the tool, not the license you grant downstream for your product. Conflicts appear when:

  • Generated code resembles training data subject to copyleft (GPL) snippets—see scanning tools and policies in the open-source licensing article.
  • Enterprise terms prohibit storing customer code in prompts.

Your product’s OSS compliance is a separate layer from your IP defense against a competitor.

4.3 Offensive risk: your “clean” repo may be copyable
#

Irony for AI-heavy teams: if you cannot establish human authorship, competitors may legally copy AI-only dumps you published without other protectable layers (trade secret in private ops, trademarks, contracts). The moat moves to execution and data, not the GitHub tarball.

4.4 Prompt logs as evidence—not armor
#

A prompt log (dated prompts, model versions, spec revisions, commit SHAs) helps prove independent generation from a specification. It does not cure contamination if the spec itself was distilled from a leak or if the same engineer both saw the secret source and authored the prompts.

Treat logs as litigation hygiene, like security audit trails—not as a get-out-of-suit card.


5. Convergence: RAG, Agents, and When Sameness Is Legal#

Modern AI stacks converge. Two teams reading the same public papers on GraphRAG, corrective RAG (CRAG), or agentic tool-calling patterns will ship similar folder trees. That similarity alone is not infringement.

Generally lawful convergence:

  • Implementations of ideas described in public literature or open standards.
  • Obvious data structures (vector store + chunker + reranker) without secret tuning.
  • UX patterns that are industry commonplace.

Still defensible as secrets or advantage (often not in the repo):

  • Proprietary corpora, labeling, and data cleaning pipelines.
  • Production thresholds, eval harnesses, and feedback loops tuned on private traffic.
  • Integration depth: RBAC, audit, residency, SLAs, support playbooks.

For teams and freelancers: document which parts came from public sources (papers, docs, OSS). Never smuggle employer-specific thresholds into a “generic” RAG starter you sell to the next client.


6. Playbook: Re-implementing Without Contaminating the Room
#

6.1 Before you start
#

  • Lawful inputs only. Public API docs, standards, your own prior work, properly licensed OSS, or a competitor study approved by counsel.
  • Written spec artifact (behavior tables, sequence diagrams, acceptance tests) before any implementation prompt.
  • Conflict check: employment agreement, client SOW, non-compete (where enforceable), and jurisdiction.

6.2 During implementation
#

  1. Extract signatures only — endpoints, message formats, user-visible states. Strip comments, internal names, and algorithm prose from any reference you are allowed to see.
  2. Blind AI sessions — separate account or workspace; never upload proprietary or leaked files; disable history retention where the vendor allows.
  3. Prompt for specification, not replication — e.g., “Implement an idiomatic Rust service that satisfies Appendix A acceptance tests; do not mirror any external codebase structure.”
  4. Independent reviewer — a second human who never viewed the reference implements or audits against the spec only.

6.3 Evidence pack (if disputed later)
#

  • Versioned spec PDF or markdown with dates.
  • Prompt log and model/version metadata.
  • Git history showing spec-first commits.
  • Vendor settings proving no training / no retention for enterprise prompts where applicable.

6.4 For teams and agencies
#

  • RACI: who may see the reference (Team A) vs who may not (Team B + agents).
  • CI policies blocking paths that match known proprietary repos in prompts.
  • Contract clause: customer warrants rights to all materials fed to agents.

6.5 For freelancers
#

  • One client, one room — separate AI subscriptions; no cross-client workspace memory.
  • Do not reuse “architecture inspiration” from Client B when building for Client A.
  • Ask about indemnity and IP ownership in the MSA before advertising overnight rewrites.

7. Scenario Matrix: Quick Orientation
#

ScenarioTypical riskPractical stance
Greenfield product inspired by public API docsLower copyright risk; mind API ToS and patentsSpec from docs only; blind implementation
Licensed SDK or approved competitive studyManaged risk if process is documentedFormal Team A / Team B with counsel
Leaked codebase “for research”High copyright + trade secret + ethical riskDo not implement; incident response
Former employer internal systemContract + trade secretDo not use memory of secrets; get clearance
Open-source forkLicense compliance (GPL, AGPL, etc.)Follow licensing guide; attribution
AI port of legacy you ownLow external IP risk; still document authorshipOwnership clear; prompt logs for maintainers

When two rows apply—e.g., OSS plus leaked reference—the worst row wins until counsel says otherwise.


Conclusion: Multipliers, Not Laundromats
#

AI coding agents are among the strongest multipliers in software history. They compress typing, exploration, and test scaffolding. They do not wash knowledge gained from leaks, NDAs, or competitor secrets. The Claw Code moment showed how fast the industry can move; it also showed how quickly “clean room” can become a label detached from process.

Developers who treat IP hygiene like security hygiene—air gaps, specs, logs, and counsel early—will ship at AI speed without trading tomorrow’s lawsuit for tonight’s merge. Those who confuse different syntax with independent authorship will learn that courts care about chains of human knowledge, not whether the last mile was typed by a human or a model.


Further reading
#

Confirm any production decision with qualified legal counsel in your jurisdiction.

Related

100 Websites You Only Need on the Internet
·1402 words·7 mins· loading
Data Science Resources Data Science Artificial Intelligence Developer Tools AI Tools Productivity Tools Online Learning
100 Websites You Only Need on the Internet # The internet has billions of pages. Most of them are …
The AI Leadership Playbook: A Reusable Workflow Template
·939 words·5 mins· loading
Business & Career Artificial Intelligence Career Development AI Integration Generative AI Future of Work
The AI Leadership Playbook: A Reusable Workflow Template # Part 7 of the Human Skills, AI-Expanded …
Agentic AI for Business Leaders: When Agents Help and When They Do Not
·967 words·5 mins· loading
Artificial Intelligence Business & Career Technology Trends & Future Career Development AI Integration Generative AI Future of Work
Agentic AI for Business Leaders: When Agents Help and When They Do Not # Part 6 of the Human …
AI for Technology Executives: Scenarios and Prompts
·1169 words·6 mins· loading
Business & Career Artificial Intelligence Technology Trends & Future Career Development AI Integration Generative AI Cybersecurity
AI for Technology Executives: Scenarios and Prompts # Part 5 of the Human Skills, AI-Expanded …