Difference between revisions of "LOCKSS Basics"

From Adpnwiki
Jump to navigation Jump to search
Line 7: Line 7:
  
 
LOCKSS is a preservation network that operates on the premise that the most surefire approach to long term preservation is by creating multiple replicas of data and distributing those replicas across geographies and hardware.
 
LOCKSS is a preservation network that operates on the premise that the most surefire approach to long term preservation is by creating multiple replicas of data and distributing those replicas across geographies and hardware.
 +
 +
While one could make multiple copies of data on various media and mail those media to random corners of the globe, the media themselves are not interconnected. Invariably, when data is replicated across discrete media, discrepancies between sets of data will emerge. That is, without checking and monitoring for changes, one will find that two presumably identical sets of data are in fact not identical. At that point it would be a challenge to determine the correct data.
 +
 +
LOCKSS creates a preservation network where identical replicas of data are distributed across multiple servers, and the servers, while independent, are connected over network communications. On each server resides a full copy of the preservation data. In order to monitor for discrepancies the preservation servers participate in polls by creating a hash of the preserved content and send that hash as a vote. LOCKSS has elaborate logic to determine when action is taken or not, but it boils down to this:
 +
 +
# the votes are tallied to see if a quorum exists
 +
# If a quorum exists and a preservation node finds itself in the minority, the preservation node will correct its own copy of the content to align with the majority

Revision as of 12:52, 25 February 2016

This pages should describe the very basics of LOCKSS for new members of the preservation network.

Lots of Copies Keeps Stuff Safe

LOCKSS -- Lots Of Copies Keeps Stuff Safe

LOCKSS is a preservation network that operates on the premise that the most surefire approach to long term preservation is by creating multiple replicas of data and distributing those replicas across geographies and hardware.

While one could make multiple copies of data on various media and mail those media to random corners of the globe, the media themselves are not interconnected. Invariably, when data is replicated across discrete media, discrepancies between sets of data will emerge. That is, without checking and monitoring for changes, one will find that two presumably identical sets of data are in fact not identical. At that point it would be a challenge to determine the correct data.

LOCKSS creates a preservation network where identical replicas of data are distributed across multiple servers, and the servers, while independent, are connected over network communications. On each server resides a full copy of the preservation data. In order to monitor for discrepancies the preservation servers participate in polls by creating a hash of the preserved content and send that hash as a vote. LOCKSS has elaborate logic to determine when action is taken or not, but it boils down to this:

  1. the votes are tallied to see if a quorum exists
  2. If a quorum exists and a preservation node finds itself in the minority, the preservation node will correct its own copy of the content to align with the majority