Harvard Wiki has been integrated with Group Services.
Wiki administrators: visit IT Help for an overview of the changes to managing groups in your wikis.
Skip to end of metadata
Go to start of metadata


General description


• Allows staff users to easily create digital collections that can be shared 

• Enriches metadata within LibraryCloud with collection information



Staff facing website with the following functionality:

• Keyword, field-based, and faceted search of content available through the LibraryCloud API

• Create, edit, and delete collections

• Add to and remove items from collections

• Browse and view collections

• Manage which users can edit collections they’ve created

• Reassign ownership of a collection

• Authentication and authorization through CAS or Access Management Service required to create or edit collections

• Users can only edit collections that they’ve created, or to which they have been given access

Detailed Requirements

Collection Builder Requirements

Collection Builder Links

A library staff tool that allows staff users to create collections based on metadata within LibraryCloud, and browse collections created by others. The metadata represented by each collection is intended to be exportable to Exhibit tools such as Omeka, Spotlight, and search aggregators, such as DPLA.

ArtifactLinkLibrary Cloud InstanceStatus
Prod Collection Builderhttps://api.lib.harvard.edu/v2/collectionbuilder/PRODDeployed
QA Collection Builderhttp://faulkner.hul.harvard.edu:9024/collectionbuilder/QAWorking
AWS Prototype (VPN access)http://ec2-54-173-243-9.compute-1.amazonaws.com:8080/collectionsviewer/librarycloud_utils/collectionbuilder/PROD?Working are viewing and displaying collections, and searching. Nothing yet to add/edit/update collections
LTS internal system development documentation

Sysdev - Collection Builder   (Restricted to LTS Staff)



POD Consulting proposal

Pod Statement of Work


Pod Authentication proposal

Business synopsis (focused on application management)

We believe that this should be the main basis for making a decision on which approach to take.  This outlines the main parts of our discussion from our meeting last year.

-        Common aspects

o   In both options users who want to create/edit/manage collections could log in using AMS/Harvard Key.

o   The collections are user-created content

-        Option 1: Self-service

o   Our characterization: a common workflow for externally-facing community-oriented applications e.g. DPLA

o   Workflow

§  Users create collections in collections builder using the collections API

§  Users add/remove other users in collections builder using the collections API

§  Users add items to collections in collections builder using the collections API

§  Users can be blocked from create/edit tasks by either

·        Disabling their Harvard Key account, or

·        Disabling their account in the collections API

-        Option 2: LTS-managed

o   Our characterization: a common workflow for internally-facing or organization-centric applications e.g. HOLLIS

o   Workflow

§  Users make a request to LTS staff to create a collection.  LTS staff use the policy server to grant users permission to create a named collection.  LTS staff notify the user that their permissions have been added. Users access collections builder to create the collection.

§  Users make a request to LTS staff to add/remove users to a collection.  LTS staff use the policy server to add/remove users to a named collection.  LTS staff notify the user(s) that their permissions have been added or removed.

§  Users add items to collections in collections builder using the collections API

§  Users can be blocked from create/edit tasks by either

·        Disabling their Harvard Key account, or

·        Removing their access to the collections builder

We (pod) favor option 1 – we think that it more accurately reflects what we understand to be the business goals of the collections API and collections builder: to allow users to easily create collections using Harvard’s digital assets; we don’t think that option 1 is less secure than option 2; the business differences are around who gets to administer user-created collections, the creators or LTS; we think that option 1 is technically cleaner and less work to build, deploy and maintain.

In addition, we are not sure that option 2 is technically possible or advisable (see Authorization below).

Technical constraints

Technical considerations are secondary to business considerations except where there are constraints, some of which are laid out below:

-        API design

o   APIs are not web applications in the sense that they do not have a “user interface” to them – it needs to be possible to make a request to the API without being prompted to login since there may not be an opportunity to prompt the user.  The way that this is usually handled is by passing authentication tokens in the authorization header of the request.  These requests will come from applications other than web browsers.

§  This is how collection builder currently works.

§  Cookies can be used to authenticate with applications but only as a secondary option if the authorization header is not present

§  Other options such as CAS/SAML cannot be used directly (more on this later)

§  Even though collection builder is an interface to the Collections API, it will not be the only client.

-        Authentication

o   AMS provides a cookie which needs to be decrypted using custom JARs

§  The collections API cannot REQUIRE this since it would then not work as an open source platform - the cookie would be a secondary authentication option

o   SAML and CAS cannot be used directly with an API (later)

o   SAML is the HUIT approved mechanism for applications where external users need to use them.

-        Authorization

o   Authorization will have to be handled by the API regardless of whether we integrate with the policy server, otherwise the collections API will cease to be an open source project (and we would break a number of design best practices, most importantly encapsulation of functionality)

o   The Policy Server requires a database connection from the server (in the Amazon cloud) to work.  We are not sure that this is technically possible (Grainne is looking into it).  Using the policy server ties in with Option 2 above.  In addition, there would be two sources of authorization “truth”, the Policy Server and the collections API.

Suggested Approach

We would like to take a multi-layered approach that follows our discussion and accommodates (some of) the suggestions from the LTS team.

1)     Add functionality to the collections API to allow multiple users to access multiple collections with owner and editor roles (this is what we presented in pod’s original proposal)

a.      Collections builder would have functionality to assign and remove users added to it

2)     Build a parallel application (Harvard Library API Manager?) that can authenticate with SAML/Harvard Key and securely return an API key to the authenticated user

a.      Collections builder would integrate with the Harvard Library API manager so that authenticated users can pull their API keys for user in collections builder.  By building a parallel application we get around the issue of APIs not working well with SAML, and/or cookies.

b.      This application could be extended to support the Library cloud API at a later date

c.      If there was a desire to extend this to work with AMS at a later date, it could do so; also other authentication mechanisms

d.      This application could be hosted at Harvard if need be

3)     Look at the lifecycle of API keys, and types e.g. user API keys with a limited lifespan vs. app api keys with an infinite lifespan, etc.

The tiered approach will allow us to proceed through each area of work, deliver it and see where we are from a budget perspective.  It also gives us some time, if need be, to address concerns in the design of the second and third tiers



  • already handles interaction with PIN/HarvardKey and LDAP
  • LTS can add new gateway
  • simple redirect based paradigm
  • returns attributes in an encrypted cookie (including name, PIN, email address)


  • requires new registration with IDM - unknown time to implement gateway by IDM
  • How much custom code is required?
  • With no cookie, how is identity data stored? What are security requirements for storage?
  • No labels