Harvard Library conducts assessments of file formats prior to accepting new formats for long-term preservation in its digital preservation repository (the DRS). These assessments result in guidelines for the formats preferred and accepted in the DRS.
Typically the analysis includes several distinct sub-activities:
- Format analysis
- The first activity is the generation of a comparative matrix of formats ("the format matrix")
- Based on the format matrix analysis, format candidates are identified as Class A (preferred) and Class B (accepted)
- Profiles of the Class A and B formats are written
- Metadata analysis
- Identify the metadata elements of interest to consider documenting in repository metadata (typically technical, source and process history metadata)
- Identify any existing schemas that partially or fully support these metadata elements
- Recomend metadata schemas to use and possibly extend, or design a new schema
- Design of a DRS content model for Harvard Library
- Tool analysis
- Test FITS to see how well it currently identifies the formats that will be supported in the DRS
- Identify any format identification/validation or metadata extraction tools to consider adding to FITS
- Identify any additional metadata elements FITS should support
Any generic tools that we have created for these assessments can be found on this page.
We are making the results of our analysis for different families of formats available - see the links below.