Difference between revisions of "Partitioning Cache Data"

From Adpnwiki
Jump to: navigation, search
m (Comprehending LOCKSS Title List)
m (Comprehending LOCKSS Title List)
Line 173: Line 173:
 
=== Comprehending LOCKSS Title List ===
 
=== Comprehending LOCKSS Title List ===
  
Nested same name elements with different levels of nesting depth causes some difficulty in comprehending using standard tools. Standard deserialization techniques won't work because group 1 property (org.lockss.titleSet) has different depth than group 2 property (org.lockss.title).  
+
Nested same name elements with different levels of nesting depth causes some difficulty in comprehension using standard tools. Standard deserialization techniques won't work because group 1 property (org.lockss.titleSet) has different depth than group 2 property (org.lockss.title).  
  
 
  <nowiki>protected void DeserializeXml() {
 
  <nowiki>protected void DeserializeXml() {

Revision as of 15:15, 10 July 2013

Why Partition

Partitioned Cost Reductions

Quorum in the network is 3. (This is down from the previous value of 4... find out why).

Any sort of partitioning strategy would need to be implemented at the titledb.xml level. Title AUs can be assigned to peers centrally, and each peer should receive a custom titledb.xml file. If LOCKSS is unwilling to support that then there is alternative using local parameters.

org.lockss.titleDbs = http://bpldb.bplonline.org/etc/adpn/titledb-local.xml

Implementing a 1 + 6 partitioning strategy can save 12% on average for each network node. 1 + 6 indicates AU owner + 6 additional network nodes. Adding 2 additional nodes to the network can decrease per node storage by an average of 30%. Adding 4 additional nodes and partitioning cache data can save per node storage on average of 41%. This means we could store up to 18 TB of data on 10.6 TB nodes.

Implementing a 1 + 5 which is 6 discrete nodes in the network (double quorum), base storage decrease is 25% with no additional nodes. 1 + 5 with 4 additional nodes achieves a staggering 50% on average per node storage reduction.

All nodes (default configuration)

			
au_host	AUCount	ContentSize in TB	DiskCost in TB
ADAH	778	5.11	5.13
AUB	778	5.11	5.13
BPL	778	5.11	5.13
SHC	778	5.11	5.13
TROY	778	5.11	5.13
UAB	778	5.11	5.13
UAT	778	5.11	5.13
UNA	778	5.11	5.13
		40.88	41.04

Does not include vacated publisher AUs (which is between 500 and 600 GB).

1 + 6 no additional nodes

au_host	AUCount	ContentSize	DiskCost	count	size	cost
ADAH	667	4.60	4.62		-14.27%	-9.99%	-9.97%
AUB	676	3.83	3.85		-13.11%	-25.00%	-24.94%
BPL	669	4.49	4.51		-14.01%	-12.19%	-12.15%
SHC	666	4.48	4.49		-14.40%	-12.41%	-12.41%
TROY	666	4.29	4.30		-14.40%	-16.13%	-16.09%
UAB	667	4.58	4.60		-14.27%	-10.29%	-10.26%
UAT	767	5.06	5.09		-1.41%	-0.89%	-0.86%
UNA	667	4.43	4.45		-14.27%	-13.27%	-13.23%
		35.76	35.91			-12.52%	-12.49%

1 + 6 with 2 additional nodes

au_host	AUCount	ContentSize	DiskCost	count	size	cost
ADAH	519	3.93	3.95		-33.29%	-23.00%	-22.98%
AUB	540	3.21	3.23		-30.59%	-37.15%	-37.09%
BPL	522	3.09	3.11		-32.90%	-39.49%	-39.44%
SHC	519	3.22	3.23		-33.29%	-36.97%	-36.96%
TROY	519	3.26	3.28		-33.29%	-36.15%	-36.09%
UAB	519	3.02	3.03		-33.29%	-40.93%	-40.89%
UAT	751	4.89	4.91		-3.47%	-4.40%	-4.38%
UNA	519	3.26	3.28		-33.29%	-36.15%	-36.09%
ADAH2	518	3.94	3.95				
BPL2	519	3.94	3.95				
		35.76	35.91		-12.52%	-12.49%

1 + 6 with 4 additional nodes

au_host	AUCount	ContentSize in TB	DiskCost in TB		
ADAH	503	2.60	2.61	-35.35%	-49.11%
AUB	425	3.16	3.17	-45.37%	-38.13%
BPL	405	2.47	2.48	-47.94%	-51.57%
SHC	408	2.83	2.84	-47.56%	-44.66%
TROY	401	3.34	3.35	-48.46%	-34.68%
UAB	413	2.50	2.51	-46.92%	-51.13%
UAT	740	4.79	4.81	-4.88%	-6.23%
UNA	357	2.97	2.98	-54.11%	-41.96%
ADAH2	407	2.52	2.54		
AUB2	501	3.25	3.26		
BPL2	457	2.86	2.87		
UAT2	428	2.48	2.49		
		35.76	35.91	-41.32%	-39.68%

1 + 5 with no additional nodes

							
au_host	AUCount	ContentSize in TB	DiskCost in TB				
ADAH	581	3.83	3.85	-25.32%	-24.87%		
AUB	539	3.97	3.99	-30.72%	-22.12%		
BPL	537	3.72	3.73	-30.98%	-27.13%		
SHC	590	3.36	3.38	-24.16%	-34.07%		
TROY	555	3.79	3.81	-28.66%	-25.71%		
UAB	572	3.68	3.70	-26.48%	-27.87%		
UAT	757	4.92	4.93	-2.70%	-3.68%		
UNA	536	3.34	3.35	-31.11%	-34.56%		
		30.65	30.78	-25.02%	-25.00%

1 + 5 with 4 additional nodes

au_host	AUCount	ContentSize in TB	DiskCost in TB		
ADAH	395	1.95	1.96	-49.23%	-61.80%
AUB	349	2.90	2.91	-55.14%	-43.30%
BPL	463	2.77	2.78	-40.49%	-45.74%
SHC	350	2.94	2.95	-55.01%	-42.46%
TROY	327	1.83	1.84	-57.97%	-64.18%
UAB	331	2.10	2.11	-57.46%	-58.88%
UAT	736	4.85	4.87	-5.40%	-5.02%
UNA	312	2.37	2.38	-59.90%	-53.55%
ADAH2	332	2.06	2.07		
AUB2	328	2.52	2.53		
BPL2	367	2.53	2.54		
UAT2	377	1.83	1.84		
		30.65	30.78	-47.57%	-46.87%

Distribution Algorithms

TBD...

Sample data was run using a static array of nodes, "randomized" with a Fisher-Yates shuffle, and least used node put in first array position.

string[] _nodes = { "ADAH", "AUB", "UAT", "UAB", "BPL", "UNA", "TROY", "SHC" };
Shuffle(_nodes); // Fisher-Yates Shuffle
            
//string _leastUsedNode = GetLeastUsedNodeByAUCount();
//string _leastUsedNode = GetLeastUsedNodeByContentSize();
string _leastUsedNode = GetLeastUsedNodeByDiskCost();

// put at beginning of array
Swap(_nodes, _leastUsedNode);

int _counter = 0;
// insert 6 5
foreach (string _node in _nodes)
{
 if (_counter == 5) break;
 if (_node.Equals(_row["au_owner"].ToString()))  continue; 

 // process
 counter++;
}

Storage Calculator

http://www.ibeast.com/content/tools/RaidCalc/RaidCalc.asp RAID Calculator

8 Disks * 3072 GB + RAID 5 = 20027.16 GB

Partition Implementation

Title List

The title list is the most crucial component for any partitioning. With close to a 1000 AUs management of partition responsibilities would be cumbersome at the node level.

<lockss-config>
 <property name="org.lockss.titleSet">
  <property name="Birmingham Public Library">
   <property name="name" value="All Birmingham Public Library AUs" />
   <property name="class" value="xpath" />
   <property name="xpath" value="[attributes/publisher='Birmingham Public Library']" />
  </property>
 </property> 
 <property name="org.lockss.title">
  <property name="BirminghamPublicLibraryBasePluginBirminghamPublicLibraryCartographyCollectionMaps000400000599">
   <property name="attributes.publisher" value="Birmingham Public Library" />
   <property name="journalTitle" value="Birmingham Public Library Cartography Collection" />
   <property name="type" value="journal" />
   <property name="title" value="Birmingham Public Library Cartography Collection: Maps (000400-000599)" />
   <property name="plugin" value="org.bplonline.adpn.BirminghamPublicLibraryBasePlugin" />
   <property name="param.1">
    <property name="key" value="base_url" />
    <property name="value" value="http://bpldb.bplonline.org/adpn/load/" />
   </property>
   <property name="param.2">
    <property name="key" value="group" />
    <property name="value" value="Cartography" />
   </property>
   <property name="param.3">
    <property name="key" value="collection" />
    <property name="value" value="000400-000599" />
   </property>
  </property>
 </property>
</lockss-config>


Comprehending LOCKSS Title List

Nested same name elements with different levels of nesting depth causes some difficulty in comprehension using standard tools. Standard deserialization techniques won't work because group 1 property (org.lockss.titleSet) has different depth than group 2 property (org.lockss.title).

protected void DeserializeXml() {
 using (FileStream _f = new FileStream(Server.MapPath(@"titledb.xml"), FileMode.Open, FileAccess.Read))
 {
  XmlReaderSettings _sets = new XmlReaderSettings();
  _sets.IgnoreWhitespace = true;
  _sets.ProhibitDtd = false;

  XmlReader _xml = XmlReader.Create(_f, _sets);
  XmlSerializer _xs = new XmlSerializer(typeof(LockssTitleDb));
  LockssTitleDb _titles = (LockssTitleDb)_xs.Deserialize(_xml);
 
  for (int i = 0; i < _titles.LockssTitleSet.Length; i++)
  {
    if (i % 2 == 0)
    {
      // first group property (has attribute name=org.lockss.titleSet)
      // titleSet displays publisher detail
      // this assumes group of 2 for each publisher and AU list
    } 
    else 
    {
      // second group property (has attribute name=org.lockss.title)
      // each iteration is a new AU for group 1 publisher
      for (int j = 0; j < _titles.LockssTitleSet[i].ChildNodes.Count; j++)
      {
        // property  name = normalized AU string
        // outer AU definition
        foreach (XmlNode _node in _titles.LockssTitleSet[i].ChildNodes[j].ChildNodes)
        {
          // loop through property tags          
        }
      }
    }
  }
  _f.close();
 }
...
[XmlRoot("lockss-config")]
public class LockssTitleDb 
{
    // cannot create a single interface for all nesting depths with a single name
    [XmlAnyElement()]
    public XmlElement[] LockssTitleSet;
}

Local Data Store

http://bpldb.bplonline.org/images/adpn/datastore.png