[an error occurred while processing this directive]
[Back to 2004 Presentations]
Virtual Remote Control:
Preservation Risk Management for Web Resources
Nancy Y. McGovern, ECURE 2004
1
VRC Funding
- Part of a 4(5)-year NSF-funded project
- supported by the Digital Libraries Initiative, Phase 2
(Grant No. IIS-9905955, the Prism Project)
- Also partially funded by a grant from The Andrew W. Mellon Foundation
- For updates:
2
Current Team
- Anne R. Kenney, Advisor
- Nancy Y. McGovern, Project Manager
- Richard Entlich, Sr. Researcher
- William R. Kehoe, Technology Coordinator
- Ellie Buckley, Digital Research Specialist
- Erica Olsen (recent)
- Carl Lagoze, CIS PI
3
Research Scope
see, "Preservation Risk Management for Web Resources: Virtual Remote Control in Cornell's Project Prism"
by Anne R. Kenney, Nancy Y. McGovern, Peter Botticelli, Richard Entlich, Carl Lagoze, and Sandra Payette
in DLib Magazine, January 2002
http://www.dlib.org/dlib/january02/kenney/01kenney.html
4
Virtual...
- because VRC develops models to represent essential features of selected Web sites
- that enable ongoing monitoring over time
- to identify, respond to, and mitigate potential risks to the site integrity and longevity
5
Remote...
- because VRC is intended for use by cultural heritage institutions
- interested in the longevity of Web resources
- residing on remote servers
not owned or managed by the monitoring institution
6
Control...
- because at the most proactive end of the VRC approach
- a monitoring organization may act to protect another organization's resources
- by agreement or implicit consent
- through notification and/or action
7
Purpose
- Develop a model for research libraries (adaptable to other contexts)
- Support spectrum from passive monitoring to active capture
- Lifecycle support: selection to capture
- Understand nature of Web resources
- Promulgate good practice
8
Types of Web Resources
Two types of initiatives for monitoring and/or capture of:
- Web-based publications [Web site as a means]
- All of (or a subset of) a Web site consisting of pages within
a boundary defined by a URL (or a portion of one) [Web site as an end] (VRC)
9
Nature of Risks
Two perspectives on Web-based risk:
- potential liability of an institution based upon the content of
its Web site, or a Web site for which it is responsible
- potential threats to the integrity and longevity of a Web resource (VRC)
10
Types of Risks
Include:
- technological obsolescence
- security weaknesses and breaches
- human-error in developing/maintaining sites
- organizational issues; benign neglect
- power and technology failures
- inadequate backup and secondary systems
11
Risk Factors
- Organizational Context
- Combination of indicators
- Monitoring (change/loss over time)
- Triggers (events, organizational, upgrades)
- Degradation of site management indicators
12
VRC Stages
- Identification
- Analysis
- Appraisal
- Strategy
- Detection
- Response
13
Human — Tool Scenario
- Identification
- Human: identify Web resources of interest
- Tool: verify list, expand list
- Analysis
- Tool: crawl sites, generate characterizations
- Human: accept/revise characterizations
- Appraisal
- Human: define/review attributes of value
- Tool: support appraisal, capture results
14
Human — Tool Scenario
- Strategy
- Human: develop/review strategies
- Tool: plot appraisals, compile strategies
- Detection
- Human: define risk parameters
- Tool: identify/assess risks; propose responses
- Response
- Tool: propose risk response based on rules; automatic response for some risk categories
- Human: monitor automated responses; select response based on recommended actions
15
Contextual Layers
16
Server-level Monitoring
- Potential multi-site impact
- Server vulnerabilities put site content at risk
- Patches and new versions of Microsoft IIS and Apache server released frequently
- Apache http server 1.3 security updates
- to version 1.3.26 on June 18, 2002
- to version 1.3.27 on October 3, 2002
17
Server-level Monitoring
18
VRC Toolkit
- Identify tools for each stage (adopt, adapt, define, devise)
- Leverage existing; apply to longevity
- Analyze steps - automated and manual
- Formalize protocol
- Provide a framework to map existing, plug gaps with developments
19
VRC Toolkit
Development steps:
- extensive literature review
- development of tool categories
- definition of categories and test protocols
- survey existing tools for evaluation
- select representative for testing
- highlight findings in category summaries
20
Web Crawling
- traversing Web sites via links
- a capability common to most tools, but with different purposes and results
- the VRC toolkit needs more than just Web crawlers
21
Tool Categories
- Link checkers
- Web site monitors
- Web crawlers
- Site management
- Change Management
- Site Mapping (includes visualization)
22
Tool Inventory
23
24
25
OAIS Issues
- Pre-Ingest: Selection options
- Ingest: Capture
- vs. monitoring
- Targets, level and frequency
- Archival Storage: Formats
- Access: Site(s) vs. Page(s)
- AIP: Metadata issues
26
Management Issues
- frequency of capture — determined by
- nature of sites/pages
- events: technological, organizational
- resources
- well-informed crawling
- valuable vs. archival
27
Mandate
- to fully document the site by capturing all changes to the pages/sites
- to capture significant changes to pages/sites
- to record periodic versions of the site
- to capture one-time copy of pages/sites
28
Current Activities
- VRC Preservation Risk Management Program:
- Map stages to tool requirements
- Apply to potential organizational scenarios
- Enable risk/response scenario development
- Toolkit:
- Revise and populate tool inventory
- VRC Control Site
29
Future Projects
- Develop approach for building human sexuality collection: capturing Web blogs and other Internet communications
- State Government Web site case study
- Demonstrators for toolkit scenarios
30
For Discussion
What would the VRC approach have to address to be of interest, value, and/or potential
impact for archivists and records managers?
31
[ ECURE Home |
Archives |
2004 Presentations ]