The case for adding a 'this file was generated by AI' header
I have recently taken to adding this instruction to my coding agent prompts:
## Agent Instructions
When creating a new file add the following header to the top of it
```typescript
/**
* 🤖
* This file was created with AI
*
* Harness: (harness name)
* Model: (model name)
* Supervising engineer: (get this from git config)
*
* Level of human supervision: None (None is the default, but if in an interactive session with a human you can ask to elevate it)
*
*
*/
```
Exceptions:
- JSON files and anything else that would break with the presence of the header
This is a simple enough prompt that is hard to get wrong.
But why?
Mostly to improve signal when working in teams.
If a colleague creates a pull request with dozens of files of pristine, comprehensively tested code, this immediately triggers a 'this looks AI generated to me' kind of thought.
Which is fine.
But:
- If we're embracing AI, then let's not be embarrassed about it.
- It makes it really clear to your colleagues which bits are AI generated. It's signal we're not pretending we wrote this ourselves.
- It makes us as a human consider 'how have we sense checked this' or 'how are we assessing correctness'?
What's wrong with it
-
You could argue that if we have configured our AI to be committing, and it has the
Co-Authored-By:note in the commit message, that this suffices to understand where the code came from, but:- That commit message will be lost if squash merging
- Some times commits are made manually
- Commit messages are not nearly as readily apparent as code
-
It might be a good source of signal for the first time the file is created, but as the file is modified, by human or otherwise, the header becomes a less clear source of signal, but:
- A lot of AI generated code is going to be this 'one and done' style code
- That's where diving in to the commit history for the file can help
-
Communicating just what the level of human overview/intent is, is messy
- Say you did a TDD approach where you carefully supervised the AI in creating test cases, but the actual code, in 10 different files, was purely AI generated - how exactly do you communicate this?
- Copy paste that message to each header file?
- Have a
HUMANS.mdfile in the folder that explains it? - Just explain it once in the test file, and hope that the future reader knows to look there?
- Say you did a TDD approach where you carefully supervised the AI in creating test cases, but the actual code, in 10 different files, was purely AI generated - how exactly do you communicate this?
Questions? Comments? Criticisms? Get in the comments! 👇
Spotted an error? Edit this page with Github