Immutable Content
Wherever possible, the Archival Units (AUs) that you preserve in ADPNet SHOULD HAVE Immutable Content:
- Network Members SHOULD package digital preservation content into AUs in such a way that files DO NOT need to be added to or deleted from the collection.
- Network Members SHOULD package digital preservation content into AUs in such a way that the content of files in the AU DOES NOT need to be updated over time.
If you intend to preserve a digital collection that grows or changes over time, ADPNet strongly recommends that you adopt a packaging scheme that arranges the items in series, and subdivides the collection into consistent chunks.
- Items within a growing collection MAY be arranged into AUs of fixed size based on identifiers within a series.
- EXAMPLE #1. ADAH Digitization Masters are arranged by their internal unique identifier numbers, and then divvied up into blocks of 500 Master files each. Master files are large TIFF image files, each of which is assigned an internal "Q-Number" in sequence, for example
Q0000150001.tif
,Q0000150002.tif
,Q0000150003.tif
, and so on. We package these up into directories of 500 TIFF files each, which are assigned Unique Names based on the range of Q-number identifiers (Digitization-Masters-Q-numbers-Master-Q0000150001_Q0000150500m
,igitization-Masters-Q-numbers-Master-Q0000105501_Q0000106000m
, and so on.)
- EXAMPLE #1. ADAH Digitization Masters are arranged by their internal unique identifier numbers, and then divvied up into blocks of 500 Master files each. Master files are large TIFF image files, each of which is assigned an internal "Q-Number" in sequence, for example
- Items within a growing collection MAY be arranged into AUs based on a timestamp if a fixed size doesn't fit your production schedule well.
- EXAMPLE #2. Tuskegee's digitized content has a production schedule that is much more up and down, so it is impractical to wait until a large number of items are filled before packaging content for preservation. Instead, items within the digital repository are organized into blocks based on a fixed timestamp, in this case the month in which they were produced or uploaded to the server. AUs have names like repository-2020-11, repository-2020-12, and so on, based on a directory structure that organizes files into date-based directories like
2020/11
,2020/12
, and so on.
- EXAMPLE #2. Tuskegee's digitized content has a production schedule that is much more up and down, so it is impractical to wait until a large number of items are filled before packaging content for preservation. Instead, items within the digital repository are organized into blocks based on a fixed timestamp, in this case the month in which they were produced or uploaded to the server. AUs have names like repository-2020-11, repository-2020-12, and so on, based on a directory structure that organizes files into date-based directories like
Why?
- Digital preservation is much easier to do correctly — and it is much easier to prove that it has been done correctly — when the units of preservation are immutable.
- Practically, the design of the LOCKSS software makes it easier to achieve, maintain, and prove consensus on the contents of AUs if the contents of AUs do not change over time.
- Practically, other components that we (ADPNet) use in our packaging guidelines (for example, BagIt) also depend on checksums and fixity checks for validation; these will work best and require the least troublesome work when applied to immutable units of data rather than changing data sets.
What if I absolutely, positively need to change the content of an AU?
Sometimes stuff happens. The LOCKSS daemon does provide the capability to version content. We strongly discourage relying on this capability routinely, and strongly encourage organizing AUs so as to avoid it, but if you end up in a situation where you just have to add or remove content for reasons of correctness or due to institutional obligations, then check out the instructions in our answer for Q: Does ADPNet provide a way to add or update content to an AU after it has been preserved in the network?