The massive scale of online news calls for collaborative approaches to assessing what has already been archived and asserting which un-archived sites should be preserved, considering their significance in representing a “first draft of history.” And given the fast pace at which online news content evolves, memory institutions interested in preserving this content may need to understand not just what has been archived, but also the temporal and structural conditions that were applied in its archiving. Researchers and journalists likewise have an interest in understanding what online news content is available to them, and are well-positioned to make recommendations about which content should be preserved, why, and at what pace, so that it can be made available into the future. There is therefore a potential for highly productive synergies between the news archiving community and the Cobweb initiative, which is building a collaborative collection development platform supporting the creation of comprehensive web archives by coordinating the independent activities of the web archiving community.
The demands of archiving the web in comprehensive breadth or thematic depth easily exceed the capacity of any single institution, and the same could be said of online news content. To ensure that the limited resources of a given archival program are deployed most effectively, it is important that its curators know something about the collection development priorities and holdings of other, similarly engaged institutions. Cobweb, https://github.com/CobwebOrg/cobweb, will meet this need by supporting three key functions of collaborative collection development of web archives: nominating, claiming, and holdings. The nomination function will let curators and stakeholders (including researchers and journalists) suggest web sites pertinent to specific thematic areas and provide seed-level descriptive metadata; the claiming function will allow archival programs to indicate an intention to capture some subset of nominated sites; and the holdings function will allow programs to document captured sites along with their collection-level description, structural and temporal scope, preservation policies, and terms of use. The aggregated descriptions of websites archived by many distributed archival programs that will be made discoverable through Cobweb are also intended to make it easier for researchers and journalists to find versions of websites relevant to their work.
Cobweb is a collaborative project of the California Digital Library, Harvard University, and UCLA, funded by the Institute for Museum and Library Services. This presentation provides an update on recent project activities, which include finalizing data models and functional requirements, user experience/interface design, development of a prototype interface and database, and exploring interactions of the Cobweb system with other web archiving systems.