Goals
The Harvard University Library maintains in its backend systems circulation and holdings data information which documents patron usage of its library collections as well as various library acquisitions metrics – this includes checkouts, recalls, placements on reserve, library acquisitions, etc. Much of the circulation data is also segmented by high-level user profile – is the patron a Harvard undergraduate student, graduate student, faculty member, and what school is the patron affiliated with (FAS, GSD, Kennedy, etc.)? This data offers the opportunity to express for each Aleph catalog item a usage profile which can then be used in a variety of use-cases for evaluating collection and item usage by the Harvard community. The Harvard Library Innovation Lab has already been regularly harvesting this data for use in its LibraryCloud item API, which exposes for each Aleph item aggregated usage data (since 2002) for that item, alongside metadata describing the object itself. The API does not fully expose the richness of the usage data for each item, however – it does not know about user school affiliation, does not know about usage date, does not know about any other patron groups than students and faculty.
The proposed HOLLIS Usage API would expose the full granularity of this data, at the transaction level, and with full patron group profiling (along the lines described above – this would not, of course, include any data identifying the patron at the individual level). In addition. Such an API would make it possible to write tools which would help answer such questions as: which parts of the Aleph catalog has seen the most usage since 2002, or in any other time period? Which parts of the catalog are trending up in terms of their usage, which parts down? What are the most checked-out items by the faculty at FAS compared to that at Divinity? How does undergraduate student usage at Harvard differ from faculty usage? What are the most popular items dealing with late-19-century German history checked out by FAS faculty?
Key Scope Items
The main scope items:
- Set up scheduled, automated harvest routine using the Cognos reporting tool (for circulation data) and LTS holdings and item data (for acquisitions and item data).
- Set up ingestion routine to select, normalize, enrich and merge the incoming data.
- Set up MySQL backend to handle data processing.
- Define API data schema.
- Set up Solr instance for indexing of data.
- Repurpose pre-existing LibraryCloud API (RESTful UI, JSON output) for use with usage data (primarily involves schema definition).
- Extend LC API with additional features: support of OR-ing boolean queries, output structuring limiting returned results to certain fields, support of sorting parameter.
- Decide on appropriate API throttling, if any.
- Decide what other output formats should be supported.
- Decide if there should be a bulk-download feature, and, if so, what form(s) should it take.
- Locate and set up hosts for production instances of backend and API.
Resources Required
Design and development will be handled by the Harvard Library Innovation Lab. LIL has already obtained funding from the Library Lab for a project to develop an analytics dashboard for Harvard collection managers, and part of this project is to develop a HOLLIS usage API. The analytics project is to be completed in fall 2013.
Key Challenges
A primary challenge will be to determine API access privileges. Who will be able to access this data – Harvard community members only?, the world? And, if the data is to be limited to particular communities, how best to implement restrictive access – API key, firewall rules, etc.? And who is authorized to make this policy decision?