Black Sheep Code

The case for adding a 'this file was generated by AI' header

Published:

I have recently taken to adding this instruction to my coding agent prompts:

## Agent Instructions

When creating a new file add the following header to the top of it

```typescript
/**
 * 🤖
 * This file was created with AI
 *
 * Harness: (harness name)
 * Model: (model name)
 * Supervising engineer: (get this from git config)
 *
 * Level of human supervision: None (None is the default, but if in an interactive session with a human you can ask to elevate it)
 *
 *
 */

```

Exceptions:
- JSON files and anything else that would break with the presence of the header

This is a simple enough prompt that is hard to get wrong.

But why?

Mostly to improve signal when working in teams.

If a colleague creates a pull request with dozens of files of pristine, comprehensively tested code, this immediately triggers a 'this looks AI generated to me' kind of thought.

Which is fine.

But:

  • If we're embracing AI, then let's not be embarrassed about it.
  • It makes it really clear to your colleagues which bits are AI generated. It's signal we're not pretending we wrote this ourselves.
  • It makes us as a human consider 'how have we sense checked this' or 'how are we assessing correctness'?

What's wrong with it

  • You could argue that if we have configured our AI to be committing, and it has the Co-Authored-By: note in the commit message, that this suffices to understand where the code came from, but:

    • That commit message will be lost if squash merging
    • Some times commits are made manually
    • Commit messages are not nearly as readily apparent as code
  • It might be a good source of signal for the first time the file is created, but as the file is modified, by human or otherwise, the header becomes a less clear source of signal, but:

    • A lot of AI generated code is going to be this 'one and done' style code
    • That's where diving in to the commit history for the file can help
  • Communicating just what the level of human overview/intent is, is messy

    • Say you did a TDD approach where you carefully supervised the AI in creating test cases, but the actual code, in 10 different files, was purely AI generated - how exactly do you communicate this?
      • Copy paste that message to each header file?
      • Have a HUMANS.md file in the folder that explains it?
      • Just explain it once in the test file, and hope that the future reader knows to look there?


Questions? Comments? Criticisms? Get in the comments! 👇

Spotted an error? Edit this page with Github