Q: Does ADPNet provide a way to add or update content to an AU after it has been preserved in the network?
Q. I'm packaging content that I want preserved into an Archival Unit (AU) for preservation in ADPNet. This content comes from a growing collection that will continue to have new items added. Is there a way to add new or updated content to the AU after it has already been added and crawled by other hosts?
A. This is possible for the LOCKSS software to do, but we strongly discourage organizing AUs in this way if it possible to avoid doing so. Wherever possible, we encourage adopting packaging schemes that avoid or minimize the use of the capability.
LOCKSS does support versioning of content in AUs, so it is always possible to re-stage an altered version of an AU and replace the old version of the AU with an updated version that adds, removes, or modifies the content of files.
If it turns out you do need to add to or update contents of an AU, then here’s what you’d do, under the current drop server configuration:
- Prepare your AU with the new or modified contents and simply upload to the same directory location on the drop server. The LOCKSS content harvesters treat content as an update to an existing AUs if and only if the content is placed under exactly the same directory name as the AU when originally ingested.
- A collection DOES NOT need to have been stored continuously on the drop server in order to update it. If you previously un-staged the content in the AU that you want to update, and had someone on TPC mark the AU as "down," then all you need to do is re-upload the AU (all the content, not just the new/modified content), put it in a directory with the same name as last time, and then have someone on the TPC mark it as "up."
- In general, after an AU has been fully crawled by the LOCKSS network, when the content is unstaged from the drop server, the top-level directory it was in should be retained, with something like a small README file to indicate that the directory is a location where an AU used to be stored.
- If you have followed recommended packaging guidelines for the AU content, then any changes to the content of the AU (adding a file, removing a file, or modifying any of the files in the set), the existing BagIt manifest will no longer validate against the modified content. You will need to re-do the BagIt manifest and checksums to reflect the new contents.
All that said, in general, I tend to strongly recommend organizing AUs so that their contents will not have to change over time, if it is at all possible to do so.
- This is partly because the design of LOCKSS makes it easier to maintain consensus if AUs don’t change, and partly because other components that we use in our guidelines (e.g. BagIt) work best when they are applied to immutable units of data rather than to changing datasets.
- The easiest way to do this with large, constantly growing collections is usually to adopt a packaging scheme so that the items will be arranged in series, and then subdivided into consistent chunks, either by number of items or by date of creation.
- EXAMPLE: ADAH Digitization Masters are arranged by their internal unique identifier numbers and then divvied up into blocks of 500 items each, with breath-taking and poetic names like Digitization-Masters-Q-numbers-Master-Q0000105001_Q0000105500m
- EXAMPLE: Tuskegee's digitized content has a production schedule that is much more up and down, so it is easier to organize it into blocks based on a fixed date (day or month of production) than it is to work with blocks based on a fixed count of items. AUs have names like repository-2020-11, etc.