LOCKSS Software
LOCKSS Software
Source
- Source : http://sourceforge.net/scm/?type=cvs&group_id=47774 (SourceForge)
- ViewVC : http://lockss.cvs.sourceforge.net/viewvc/lockss/ (view it online)
File System
Primary file system distribution of LOCKSS content.
- /etc/lockss
- /home/lockss
- /usr/share/lockss
- /var/log/lockss
In the file /etc/lockss/config.dat, the LOCKSS_DISK_PATHS variable enumerates the available disks.
- /cache0/gamma
- /cache1/gamma
- /cache2/gamma
- /cache3/gamma
Starting and Stopping the LOCKSS daemon
- /etc/init.d/lockss start
- /etc/init.d/lockss stop
- /etc/init.d/lockss restart
Log Rotate
Logs are in /var/log/lockss
logrotate /etc/logrotate.d/lockss
Configuration Files
Primary Configuration files
- http://props.lockss.org:8001/adpn/lockss.xml
- /etc/lockss/config.dat
- /home/lockss/local.txt
- /cache0/gamma/config/expert_config.txt
- /cache0/gamma/config/au.txt
Additional Configuration files
- /cache0/gamma/config/ui_ip_access.txt
- /cache0/gamma/config/proxy_ip_access.txt
- /cache0/gamma/config/content_servers_config.txt
Configuration parameters are listed
- http://www.lockss.org/lockssdoc/gamma/daemon/paramdoc.html
- http://www.lockss.org/lockssdoc/gamma/daemon/paramdoc.txt
Debug options from Logger.java
- info -- default generally,
- debug -- debugging messages, sparse
- debug1 -- debugging messages
- debug2 -- Detailed debugging that would not produce a ridiculous amount of output if it were enabled system-wide.
- debug3 -- Debugging messages that produce more output than would be reasonable if this level were enabled system-wide. (e.g. messages in inner loops, or per-file, per-hash step, etc.)
Examples of use of log level in Expert Config
org.lockss.log.BaseCrawler.level = info org.lockss.log.CrawlerImpl.level = info org.lockss.log.BlockTally.level = info org.lockss.log.V3PollerStatus.level = debug org.lockss.log.PlatformInfo.level = debug2
Title List
The default title lists for available AUs is provided by LOCKSS in http://props.lockss.org:8001/adpn/lockss.xml. The Add/Remove title lists in the Web admin are populated from titledb.xml.
<property name="titleDbs"> <list> <value>http://props.lockss.org:8001/adpn/titledb/titledb.xml</value> </list> </property>
It is possible to overwrite the default configuration using a local parameter org.lockss.titleDbs.
org.lockss.titleDbs = http://bpldb.bplonline.org/etc/adpn/titledb-local.xml
Manipulating the Cache
Exporting Content
The LOCKSS repository manager provides a simple HTTP-based download of any content type on the LOCKSS server by linking, copying or moving the data to /cache0/gamma/tmp/export/ and visiting Web admin http://bpl-adpnet.jclc.org:8081/ExportContent
/cache0/gamma/tmp/export/ is emptied on daemon restart, unless file ownership is changed from LOCKSS.
Deleting Stale AUs
Repositories marked as 'Deleted' in the RepositoryTable can be removed from the cache by deleting the appropriate directory (i.e. /cache0/gamma/cache/f). AUs not marked as deleted should be removed through the Web UI first or, untested, by removing the appropriates lines from /cache0/gamma/config/au.txt. Then remove the directory in the cache.
Restart the daemon.
Moving AUs, Same Disk
The daemon fills in missing directories when adding AUs so renaming in a directory is not strictly necessary. Nevertheless, cache directories are incremented 'a-z' then 'a[a-z]', 'b[a-z]' etc. An AU is not referenced by a specific cache directory location. An AU is referenced by indicating the disk repository in au.txt (ie. local\:/cache0/gamma NOT local\:/cache0/gamma/f). To move an AU to a new location on the same disk, simply rename the directory.
Consider the following scenario, AUs exist at 'a', 'b', 'c', 'd', 'e'. After deleting AUs 'c' and 'd', cache/gamma looks like 'a', 'b', 'e'. I can rename directory 'e' to 'c' and the next AU added will be appropriately named 'd'.
Restart the daemon.
n.b. Wait until active crawls are completed before renaming directories, otherwise crawler will use initial directories
Moving AUs, Different Disk
Cache directories are incremented "a-z" then "a[a-z]", "b[a-z]" etc. first identify the next directory name on the new disk.
Locate and modify the appropriate lines defining the AU in /cache0/gamma/config/au.txt (e.g. repository=local\:/cache1/gamma changes to local\:/cache2/gamma). Move the directory from the original to the new disk (e.g. mv /cache1/gamma/cache/g /cache2/gamma/cache/gj)
Restart the daemon.
Copying AUs, Different Nodes
The repository manager of a node will not initiate a new content crawl to a peer node, and new content crawls are generally recognized as the only way to populate a node's cache. It is possible to copy a file system from on node and transfer it to another. Once the node is populated with data and the repository manager is coerced into believing a new content crawl was completed successfully, the repository manager will engage in normal peer polling for the archival unit. Modification or addition of data to the archival unit still requires the use of the publisher's content staging area with the original URL structure.
Understand File System Organization
File system organization in the cache is system dependent only to a depth of 4 from the root. After that point, the file system organization is dependent on the original access URL for the archival unit data.
It is possible to populate a new node using a copy of the file system of a peer node. The peer node should have indicated a high-reliability factor of the cached data (100% agreement in a recent poll would be good). (I am shortcutting some steps under the assumption that the archival unit is in the add titles list of the LOCKSS Web UI, otherwise one would have to manually edit the au.txt file and create directories in the cache.)
/cache0/gamma/cache/m/bpldb.bplonline.org/http/adpn/load/Cartography/000400-000599/000404/tif/000404.tif/
When the archival unit is added to the node through the Web UI, the repository manager creates the next available directory based on the volume selected. (The base directory for the archival unit truncates at /cache0/gamma/cache/m/, the extra path in the example is to show URL structure dependence). The repository manager will then initiate a new content crawl to the start URL defined in the AU. Since the publisher has vacated the content staging area, the crawl will result in a 404 or other HTTP error. The changes to au.txt and creation of the /m/ directory in the file system are not rolled back.
Pack the AU
A tarred (compressed) package of an archival unit should be made on the peer node with content. For this example, the peer node had the archival unit in the file system at /cache2/gamma/cache/bj/.
tar -zpcvf /cache0/gamma/tmp/export/au_to_transfer.tar.gz /cache2/gamma/cache/bj/
There are a number of methods of exporting the content from the LOCKSS node, including FTP, but for this example I am utilizing the built-in HTTP export. After this command is executed, there will be a link called "au_to_transfer.tar.gz" in the ExportContent page of the LOCKSS Web UI.
Unpack the AU
The present working directory at this point should be /cache0/gamma/cache/. The appropriate command
tar -xvzf /path/to/au_to_transfer.tar.gz -C m --strip-components=4 --exclude \#agreement --exclude \#no_au_peers --exclude \#id_agreement.xml --exclude \#node_props --exclude \#au_id_file
Finish Up
The #node_props files will be generated at the LOCKSS daemon restart, which should be done at this time. By preserving the files #au_state.xml and #nodestate.xml, the repository manager thinks that a new content crawl has already been completed. A manually initiated V3 Poll on the archival unit should return appropriate URL agreement levels. The new LOCKSS node should now engage in normal content polling with the other peers.
Voting and Polling
Hashed UrlSets
AU released by LOCKSS with current content staging
# Block hashes from bpl-adpnet.jclc.org, 12:00:19 11/07/12 # AU: Birmingham Public Library Cartography Collection: Maps (000400-000599) # Hash algorithm: SHA-1 # Encoding: Base64 IpncVSUBDZaqglSsjkOp49OS4KE= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/ M7+kc1l4Nl/Jwp3Hp0XRgK8dK94= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/mrc lMeB8SlN+CGI7+1LxRhE+btzGmo= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/mrc/000404.mrc 266QzyI+r7sZVcRZGId2/roOLNI= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/tif Tv2dnIhIPXEllk95fqAeA8uB67s= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/tif/000404.tif 6lm8iDE+/gu5ZWcJxTFjU5PP6OI= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/txt b5Ta9uvYyS0GSZ7srapk2YZGWJM= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/txt/000404.csv iuzvdBZG0oft3zxao5Sn5r7m+dA= http://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/txt/000404.txt # end
Manually created AU with limited copy of current content
# Block hashes from bpl-adpnet.jclc.org, 11:41:58 11/07/12 # AU: Birmingham Public Library Base Plugin, Base URL http://bpldb.bplonline.org/adpn/load/, Group test, Collection 000404-000404 # Hash algorithm: SHA-1 # Encoding: Base64 GZzfEqeNHw8WdjuZbX1g/VrJyaI= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/ JkkpA1r7E548EOiiu0KeBuQYfts= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/mrc lMeB8SlN+CGI7+1LxRhE+btzGmo= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/mrc/000404.mrc uVD1FzG1WN43ROniD/VfQb/P6pU= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/tif Tv2dnIhIPXEllk95fqAeA8uB67s= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/tif/000404.tif 0bG54sCMTETu5qKnBA3RXO80v/0= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/txt b5Ta9uvYyS0GSZ7srapk2YZGWJM= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/txt/000404.csv iuzvdBZG0oft3zxao5Sn5r7m+dA= http://bpldb.bplonline.org/adpn/load/test/000404-000404/000404/txt/000404.txt # end
VoteBlock
url=http\://bpldb.bplonline.org/adpn/load/Cartography/000400-000599/000404/tif/000404.tif vn=I1dlZCBOb3YgMDcgMDM6NTg6NTYgQ1NUIDIwMTIKdW89MApmbz0wCnVsPTQ0NzkxMTEwCm5oPWw1S2NNcnFOdS9McVlobWE3RFNQcm13ZEhab1w9CmVycj1mYWxzZQpmbD00NDc5MTExMApwaD1UdjJkbkloSVBYRWxsazk1ZnFBZUE4dUI2N3NcPQo\= vt=0 (#Wed Nov 07 03:58:56 CST 2012
VoteBlock Versions (vn) decoded from Base64. Notice the ph (Plain Hash) for item 000404.tif matches both instances in the Hashed UrlSet.
#Wed Nov 07 03:58:56 CST 2012 uo=0 fo=0 ul=44791110 nh=l5KcMrqNu/LqYhma7DSPrmwdHZo\= err=false fl=44791110 ph=Tv2dnIhIPXEllk95fqAeA8uB67s\=