The Responsible AI Ledger Alliance utilizes the following open standard for providing trustworthy provenance for digital content. This standard, originally authored as Arweave Network Standard 112, is a framework and definition for the “Data Format” of provenance proofs. Each proof of provenance lists a unique hash of the content's data, as well as the prompt that was used to generate the AI image. Other fields such as “Uploaded-For” and “Model” help to identify the content and make it searchable.
The RAIL protocol is built on top of Arweave, a decentralized data storage network that enables reliable, permanent storage of data on the internet. Unlike traditional blockchains, Arweave utilizes a structure known as the blockweave, which leverages a proof-of-access consensus mechanism to maintain the network's data integrity. Arweave‘s design features a 'storage endowment' and replaces the energy-intensive 'work' of blockchain networks with the useful validation of the network's dataset, ensuring that once data is stored on the network, it remains accessible indefinitely.
By recording RAIL's provenance records on the Arweave network, users can create an immutable and verifiable chain of custody for their data, supporting its credibility and trustworthiness in a wide variety of applications. Our specification focuses on the design and implementation of such a data protocol for AI-generated data, taking advantage of the characteristics of the Arweave network to provide a permissionless and immutable ledger of content provenance, without centralized controllers.
In addition, data storage on Arweave is scalable, meaning that RAIL provenance records can be adopted by large scale creators of AI content well into the future. Through the use of transaction bundles facilitated by RAIL members like Bundlr, content generators can create dozens of thousands of new provenance records every second. You can learn more about scalable bundling here.
1. Data format
1.1 Data tags
A provenance proof must include the following tags:
|Data-Protocol||Provenance-Confirmation||Provides ability to identify all Creative Commons transactions||❌|
|Hashing-Algo||string - Hash algorithm used on the data to generate Data-Hash. Defaults to sha256||Provides ability to use different has alogrithm within the standard||✔️|
|Data-Hash||string - Hash of the data using the Hashing-Algo algorithm||Provides an easy content integrity check||❌|
|Uploaded-For||string - Identifier of the person that the data relates to||Provides an easy attribution method for the uploader||✔️|
|Prompt||string - The prompt that led to the generation of the data||Allows for a prompt||✔️|
|Prompt-Hash||string - A hash of the prompt that led to the generation of the data||Allows for a private prompt which can act as a proof if it needs to be revealed||✔️|
|Model||string - Identifier of model used to generate data||Allows searchability based on the model the data relates to you||✔️|
The Digital Content Provenance Standard does not hold an opinion on which hashing algorithms to support. Specifying a hashing algorithm is left to the discretion of the users and distributors of the standard.
Storing the entire data file for a corresponding piece of digital content is optional. The Data-Hash value of the data asset is sufficient to verify provenance.
2. Record Validation
A provenance proof is valid if and only if:
Hashing-Algois a valid hashing algorithm name (identified by its RFC-6234 form).
Promptare present, then
Promptmust hash to the same value as the value stored in the
© 2022 Al Provenance Alliance