LibraryCloud Overview proposal - May 2013

Design Phase Technical and Requirements documentation (Google Documents Folder)

Projects under consideration


Here are some proposed guidelines for considering which projects to pursue when and with what level of commitment.


A) Viability

(B) Value, short term

(C) Long term value

  1. Resources are available for it (A)
  2. Few extrinsic obstacles (copyright, privacy, etc.) (A)
  3. Can be accomplished relatively quickly (or is at least demonstrable) (A)
  4. Illustrates LCloud's potential (C)
  5. Is extensible, reusable, or provides a useful model for other projects (C)
  6. Addresses LCloud use case (B)
  7. Doesn't exist already (B)
  8. There is a customer who will use it (B)
  9. Sustainable/reuseable (C)
  10. Leverages work that is going on now (A)

A brief taxonomy

  • A gimme: So low resource that it'd be silly not to do it, even if relatively low value
  • Low-hanging: Low resource requirements, quick beta, illustrates the value of LCloud or is provocative
  • High value: Central to other efforts and/or of significant value in itself. Needs serious dev and mgt resources.

Suggested projects

These are projects that have been suggested in various brainstorm-y sessions. (This list is incomplete. Add more.)

 ProjectDescription (about a paragraph)Current State (any existing capability that attempts to address the need, including ongoing projects)Owner for further explorationOwner for providing description
1Interop homepagemainly for internal comms and knowledge sharing, but also publicsketchDavid WDavid W
2API for AlephAlready available via PRESTO for bib records. If holdings or item data, that would be new. Also: PRESTO and the Lib. Innov. Lab's LCloud both provide APIs, with approx. 60% overlap. What is the right architecture and approach to providing a broadly useful set of APIs to items, usage, etc.?PRESTO and Lib. Innov. Lab's LibraryCloud API. DPLA's API may also be relevant/usefulBobbi, Corinna, David, PaulDavid W

Project to get item-level descriptive metadata from EAD finding aids into Library Cloud to enable discovery and access with descriptive metadata from other sources

Encoded Archival Description (EAD)  is the standard for describing archival collections. It mimics the analog practice of “finding aids” which describe content at the collection-level, providing background information and an inventory, which typically lists, and briefly describes, boxes and folders (and sometimes items within folders). In the online environment EADs do not play well with most other standard descriptive metadata for collections, which usually have descriptive metadata at the item level (e.g., a MARC record, an FGDC record or a VRA record for a book, a map or an image).  Within a finding aid, the contextual information required to adequately describe an item is not found in the item-level label alone. Rather, it is found in the information above the item (in sequence and hierarchy), such as the collection title, date and description as well as labels for the series, box and folder.

To enable discovery and access of these archival collections with items from other sources, we need to free the item-level metadata from EAD finding aids.

Current work by Michael Vandermillen to free item-level metadata described in EAD finding aids includes:

  • Automatically assigning item-level identifiers within EAD finding aids (Underway, done in OASIS QA for Link-o-matic)
  • Extracting item-level metadata along with other related metadata in finding aids such as series-level and collection-level descriptions (Underway, done in OASIS QA for harvesting, including Virtual Collections, CNA and CANA)
  • Creating crosswalks for converting EAD metadata to formats more amenable to transfer and display for sharing and aggregation with other collections. (Not yet started, needed for Library Cloud, DPLA sharing, CNA and CANA)

Other related activities include:

  • Andy Silva developed a method to parse EAD finding aids and merge them with DRS load reports to create 3D records with links to DRS files (with manual code tuning for each EAD) for the Law Library’s Suite Spot app.
  • ArchivesSpace, an open-source web application to manage descriptive information for archives, will be a central service to the libraries sometime after the production rollout in Fall 2013. Harvard users of the Archivist’s Toolkit and others will be likely adopters. AchivesSpace may become the source for component-level archival data once adopted (replacing MVs interim process described above). 

Michael, Robin, Wendy



Wendy, Michael


4API for HOLLIS usage dataLibrary Innovation Lab is building an API for usage data using the Library of Congress classification outline (LC call-number taxonomy) and Harvard circulation data. How many works in this or that subject have been checked out, recalled, reserved, etc. over a given period.The Innovation Lab will be adapting its current LibraryCloud item API to handle usage data. A new schema will be worked out and several API extensions will be implemented. We are using the Library of Congress classification outline to categorize aleph items' by subject (based on LC call number) and Cognos reporting to harvest usage metrics. The project is funded through Library Lab, and is slated for completion in fall 2013.


project overview

5API for real-time availability dataAbility to query Aleph and obtain availability for itemsCurrently available via PRESTO.

The base syntax is:
 Randy, Michael
6API for various non-book data, e.g. VIA, Harv. Geospatial LibraryAbility to search and obtain detailed metadata records for catalogs with content specific schemas

There is a PRESTO API to retrieve VIA record data in MODS. There is no search API for VIA, although image records can be located through the existing PRESTO HOLLIS search API

HGL is based on OpenGeoportal, which support solr search and retrieval of metadata records. I'm nor sure if an API exists to retrieve the native FGDC schema metadata record.

7Extend the API for DASH    

Collection building in Library Cloud

For curators, one of the benefits of the web is the ability to unite physically dispersed material through online digital collections. For web-based presentations, Harvard curators often create “virtual collections” by drawing together related content from multiple catalogs and/or collections. In addition, collaborations can also include non-Harvard content (e.g, Emily Dickinson Archive and Colonial Archive of North America). To do this, they need to be able to identify related content, select it and mark it as a collection (or “set”). Then, they need to be able to confine a search to retrieve a set and display it in a web-based collection presentation UI.   

To support this, the following collections functionality is needed in Library Cloud:

  • a search API to discover and select items for inclusion in a collection
  • the ability to mark records as part of a named set (or “collection”)
  • a search API that can confine results to the named set (i.e., qualify by collection facet or filter)
  • a default UI presentation driven by the Library Cloud API confined to a named set (this allows collection building through a simple centrally-supported presentation system to replace the aging “Virtual Collections" application (see status) 
  •  an OAI-PMH data provider (DC and MODS) for named sets (this allows harvesting for collection-building in systems such as OMEKA, see Library Cloud project #10)

 Harvard’s web-based collection-building application, Virtual Collection (VC) currently supplies the following functionality, but is in need of an upgrade:

  • harvests records from Aleph, VIA, HGL, into defined collections
  • ability to add topics
  • OAI-PMH data provider (DC and MODS, supports sets)
  • Customizable (LTS staff required) discovery UI, supporting search, browse, brief and full record display
  • no public search api for Virtual Collections

Michael, Wendy


Michael, Wendy


9Bring in some non-Harvard data: e.g., Colonial NA Digitization, bib data from other libraries, metadata about some high-value Web sites    
10Package up LCloud version of Omeka

Harvard’s special collections, archives, libraries, and museums are looking for web-based solutions to providing information about, and seamless delivery of, static images, textual works, sound recordings, and moving image materials in one place. Project would aim to offer special collections, libraries, archives, faculty, and students their own installation of Omeka on an LTS server using a “Harvard install” of Omeka.   The packaged version would include a a plugin (or plugins) for importing metadata from Library Cloud (OAI harvest, CSV import plugin, or potentially a new import plugin if needed); in addition a wide variety of plugins available to Omeka users would be bundled. Future development may include a single portal to Harvard’s Omeka content, facilitating a number of interesting projects, such as time lining across repositories or doing geospatial work.  The packaged version could be bundled with a set of data entry guidelines/recommended ways to enter content (and in which fields) to promote consistency of data across instances, as well as a plugin to get data back in to LibraryCloud.


Omeka is in use by, or has been experimented with, a number of Harvard repositories (such as the Center for the History of Medicine, which has a robust system, is planning additional development this summer, and could lend expertise).This has required local development for improved functionality, something many repositories, students, and faculty do not have access to.

With promotion, the University could build a substantial amount of content currently hidden to users and encourage the use of a system that could be centrally harvested and disseminated. Because the data is OAI-PMH compliant (Dublin Core, MODS, CDWA Lite), it could be harvested for display in a central system, such as Hollis, to bump up collection visibility.

Related Omeka plugins: OAI-PMH Harvester, OAI-PMH Repository, Catalog Search

Emily, Michael, Jonathan

Emily, Michael, Jonathan
11Linked Data project/exploration   Julie
12Guidelines for how to make collections more interoperableIdentify and analyze metadata in both shared and local systems utilized by Harvard’s special collections and museum communities and author guidelines for creating and mapping metadata meeting local needs to broader metadata standards to facilitate data sharing (such as Dublin Core). The objective is not to point user communities to particular systems, but rather, consider how metadata could be mapped  (and how, perhaps, metadata entry could be standardized) for the purposes of aggregation, and to encourage consistency in defining and populating database fields by providing specific “how to” examples. Additionally, the project would ask the community to consider how content in silos or locked in proprietary systems could otherwise be disseminated.Because of the number and richness of systems at play, it would be best to pick one standard (Dublin Core for OAI-PMH harvesting?) and look at a variety of records for collections management systems/databases/etc. to get a sense of the magnitude of mapping and specific content guidelines informing those fields. A literature review is also needed. Of possible interest: Metadata for Special Collections in CONTENTdm: How to Improve Interoperability of Unique Fields Through OAI-PMH.
 Wendy, Emily
13Resolution service: Feed it bib data and get back standard ids, etc.Feed it an url and it get bib data. (This would help a problem for the CATCH Annotation Hub, as well as having broader utility.)  Paolo
14Library collection case study/demonstration project: Harvard CNA (Colonial North America - a collaboration within Harvard repositories)) and Federation CANA (Colonial Archive of North America - an external collaboration that includes collections from Harvard, BPL, Mass. Hist. Soc. and Bibliothèque et Archives nationales du Québec (BAnQ).
As cases studies/ demonstration projects, the CNA and CANA digital projects would address many of the challenges posed by projects in this list:
  • The content and metadata are archival manuscripts described in EAD finding aids that need to be represented at the component level for discoverability, display and delivery (project #3, work underway)
  • The Harvard CNA and the federation CANA each require web-based collection presentations (project #10-ish, but also aligns with collection platform evaluation work)  
  • The federation CANA includes metadata for both Harvard and the external partners (project #9)

In addition, the CNA and CANA project workflows need to support both retrospective and prospective work, which may be a good test of 2 data transfer workflows.

(If there is interest in this, Wendy just needs to confirm approval from the CNA Planning Committee)

Jim, Wendy


15.Integrating open Web materials about booksE.g., NPR's records of its on-air stories include a tag if the story concerns a book, and many also include an ISBN number. These records could be tied to Aleph records, and could be extended via uniform title to editions other than the one with the ISBN. Then, from the Aleph record the NPR story could be tied to other books with the same LCSH's.Lib. Innov. Lab already has worked with NPR to get their 16,000 book-related records. Talking with NYT and CBC also. Also has experimented with other open Web sources, including Wikipedia. The Lib. Innov. Lab is committed to exploring this at least for LibraryMist.Paul D.Paul D.
16.HBS-HOLLIS integrationSearch for an HBS unit and see all the books the members of that unit have published. Search for a prof and see all of her/his books mashed up with other books on the same subjects in Hollis. Plus more.Contract dev has created wireframes. The development is funded.Library Innovation LabDavid W.


Meeting notes

  • No labels