From the grant proposal
6.3 Harvard Statement of Work
Harvard Library is a participant in the research grant Linked Data for Libraries: LD4L Labs that will build on and extend the initial work of the LD4L project. Harvard will undertake the following activities:
1. Deploy a pilot linked data conversion infrastructure. The linked data infrastructure piloted by Harvard is the foundation for Harvard LD4L Labs tool development and testing. The infrastructure will support the conversion, ingest, hosting, and update of Harvard linked data created under the grant, and the establishment of a Linked Open Data endpoint to make that data accessible as linked data on the Web. It will most likely leverage the open source, Harvard Catalyst funded eagle-i platform, an ontology independent platform for hosting, editing, and searching linked data resources with any ontology, however Harvard will also assess the Cornell Vitro platform. Building on the work Cornell, Harvard, and Stanford completed during the LD4L project, Harvard will deploy a triplestore that scales to handle Harvard’s BIBFRAME RDF, approximately 1 billion triples. The pilot infrastructure will enable the linked data to be updated from both legacy records in the Harvard ILS as well as from Harvard Geospatial Library and Harvard Film Archive linked data created as part of this grant. To link existing MARC metadata to the triplestore, Harvard will integrate and deploy the legacy record converters created under LD4L and LD4L Labs by Cornell and Harvard with Harvard’s Library Cloud metadata pipeline, enabling Harvard to automatically and regularly update the triplestore with fresh data from the ILS. This work will include deploying revisions of the Cornell and Stanford developed MARC->BIBFRAME converter as a conversion step in Harvard’s Library Cloud metadata processing pipeline.
- 25% of Senior Software Engineer (see the Appendix for job description)
- 5% of Michael Vandermillen
- AWS web services to run the infrastructure as a pilot for 2 years
2. Pilot a hosting environment for BIBFRAME linked data. Assuming a good assessment from LD4L Phase 1 work, Harvard plans to pilot eagle-i within its infrastructure as a platform to provide catalogers with linked data creation, editing, display, and dissemination. If the LD4L Phase one eagle-i assessment is negative, Harvard would plan to deploy the Vitro platform. The environment will support the creation and incorporation of subject and collection-specific ontologies to describe the unique aspects of the collection in a structured, extensible, and shareable manner. Both eagle-i and Vitro linked data platforms provide an ontology driven RDF creation and access environment. Since they are configurable based on any ontology, Harvard will configure one with the BIBFRAME ontology, with extensions for geospatial and Harvard Film Archive requirements. Harvard catalogers will evaluate the suitability of the chosen platform as an easily extensible production platform, and will collaborate with Cornell and Stanford in comparing and contrasting this environment with environments that they may deploy, as well as other BIBFRAME creation and editing environments. Development will include revisions to the native metadata editing functionality and user interface to reflect cataloger feedback and implement efficiency improvements.
- 20% of Senior Software Engineer
- 5% of Metadata Technologies Program Manager (see the Appendix for job description)
- 10% of Marc McGee
- 10% of Christine Eslao
3. Pilot linked data conversion, publication, and visualization of Harvard Geospatial Library metadata. Working with the other project partners, Harvard will develop a BIBFRAME/LD4L profile; develop metadata conversion software to convert existing geospatial metadata records from the Harvard Geospatial Library and from Stanford (see the Stanford SOW in section 6.5) describing raster maps and vector map data layers into BIBFRAME; publish the RDF to the Harvard linked data endpoint; and integrate a beta of graph visualization software into the Harvard Geospatial Library or an Omeka virtual collection to assess end user value. This project will focus on converting a subset of OpenGeoMetadata metadata records from the Harvard Geospatial Library and Stanford (where they are now represented using the geospatial community standard Federal Geographic Data Committee (FGDC) schema, ISO 19139) into linked data descriptions using BIBFRAME/LD4L as a base ontology. Deliverables for the project would include: a BIBFRAME/LD4L profile for geospatial datasets; a set of mapping rules for FGDC geospatial metadata standards to the BIBFRAME/LD4L profile; reconciled linked data entities in the source metadata for Originators, Place and Theme keywords, and series works; a linked data triplestore with published descriptions; and a user interface for searching and visualizing geospatial dataset descriptions.
- 25% of Senior Software Engineer
- 5% of Metadata Technologies Program Manager
- 15% of Marc McGee
4. Pilot linked data conversion, publication, and visualization of Harvard’s Harvard Film Archive metadata. The project will explore best practices for creating linked data descriptions for moving image resources including a variety of formats (film prints, negatives, DVDs, VHS, Super 8, and others) and content (feature films, trailers, home movies, ethnographic films, propaganda) and related archival materials (including production elements, artwork, film stills, and promotional ephemera) held by the Harvard Film Archive. The project will evaluate BIBFRAME/LD4L’s effectiveness as a data model for describing moving image materials for research needs and the lifecycle of moving image materials, and identify vocabularies for description of these materials in a linked data environment. The project will create mappings for records from the HFA’s film print database, focusing on a subset of moving image materials by women directors. Wherever possible, entities will be reconciled to linked data URIs, including personal and corporate names (ISNI, LCNAF), place names (GeoNames), genres (LC genre/form, Getty AAT), and works. The project deliverables will include: a BIBFRAME/LD4L profile for moving image resources; a set of published descriptions for moving image materials and related archival collections; deployment of descriptions as linked data in the triplestore; a user interface and visualization for film researchers based on an Omeka or Harvard Geospatial Library on-line collection; and a written evaluation of the project and set of recommendations for future research and development.
- 25% Senior Software Engineer
- 5% of Metadata Technologies Program Manager
- 15% of Christine Eslao over two years
5. Collaborate with Cornell and Stanford on LD4L Labs and LD4P projects. Participate in biweekly phone meetings, semi-annual LD4L Labs face to face meetings, and discussions of project related issues as they arise.
- 5% of Senior Software Engineer
- 5% of Marc McGee
- 5% of Christine Eslao