On 2025-10-03 at 20:48:40, Elijah Newren wrote: > Would this mean that you wanted to ban contributions like d12166d3c8bb > (Merge branch 'en/docfixes', 2023-10-23), available on the list over > at https://lore.kernel.org/git/pull.1595.git.1696747527.gitgitgadget@gmail.com/ > ? We don't need to go theoretical, I've already contributed such a > patch series before -- 2 years ago -- and it was merged. Granted, > that was entirely documentation, and I called out the usage of AI in > the cover letter, and I manually checked every change (discarding many > of them) and split it into commits on my own, could easily explain any > change and why it was good, etc. And I was upfront about all of it. I think the main problem here is that we don't know the copyright status of LLM outputs. It is not uncommon for them to produce output that reflects their training input and we see evidence of that in, for instance, the New York Times lawsuit against OpenAI. As I said, the situation is very unclear legally, with active litigation in multiple countries, and we have to comply with pretty much every country's laws in this situation. Whether something is legal in the United States, where you're located, is completely irrelevant to whether it is legal in Canada, where I'm located, or Germany or the UK, where we have other contributors. We also have to consider whether it's legal in all of the countries that Git is distributed in, which includes every country in which Debian has a mirror[0], even countries under international sanctions, such as Iran, Russia, and Belarus. It doesn't matter if the person using AI has indemnification, either, since that only covers civil matters, and at least in the U.S. and Canada, knowingly violating copyright is also a criminal offence. The sign-off process is designed to clearly state that a person has the ability to contribute code under the license and I don't think, as things stand, it's possible to make that assertion with code or documentation generated from an LLM except in very limited circumstances. I don't allow LLM-generated code in my personal projects that require sign-off for that reason, and neither does QEMU[1]. I don't think I could honestly assert either (a) or (b) in the DCO with LLM-generated code because it's not clear to me whether "I have the right to submit it under the…license." To quote the QEMU policy: To satisfy the DCO, the patch contributor has to fully understand the copyright and license status of content they are contributing to QEMU. With AI content generators, the copyright and license status of the output is ill-defined with no generally accepted, settled legal foundation. Where the training material is known, it is common for it to include large volumes of material under restrictive licensing/copyright terms. Even where the training material is all known to be under open source licenses, it is likely to be under a variety of terms, not all of which will be compatible with QEMU's licensing requirements. I remember the SCO situation with Linux and how it really created a lot of uncertainty with Linux because SCO created FUD around Linux licensing and how that led to the DCO being created. I am aware of the fact that many open source contributors are very unhappy that their code has been used to train LLMs without retaining credits and copyright notices or honouring the license terms[2]. And I have spent many years working with non-profits[3], where I have always been taught that we should avoid even the appearance of impropriety. It may matter less what the situation actually ends up being legally (although it could end up being quite bad) and more whether someone can imply or suggest that Git is not being distributed in compliance with the license or contains infringing code, which could effectively make it undistributable because nobody wants to take that risk. And litigation, even if Git and its contributors are successful, can be extraordinarily expensive. So I think, given the circumstances, yes, the right thing to do is to ban LLM-generated contributions with a policy very similar or identical to QEMU's. If, in the future, the legal situation changes and it becomes unambiguously legal to use LLMs across the world, then we can reconsider that policy then. [0] https://www.debian.org/mirror/list [1] https://github.com/qemu/qemu/commit/3d40db0efc22520fa6c399cf73960dced423b048 [2] Regardless of the legal concerns, this implicates professional ethics concerns, such as §1.5 of the ACM Code of Ethics[4]. Ethics requirements usually go well beyond what the law requires. [3] Software Freedom Conservancy, which handles legal matters for the Git project, is a non-profit. [4] https://www.acm.org/code-of-ethics -- brian m. carlson (they/them) Toronto, Ontario, CA