Skip to end of metadata
Go to start of metadata

DRAFT (subject to change)

UC1. As an LD4P partner I want to convert a MARC record to the LD4L Ontology as the basis for LD4P cataloging/extension work

  • Take MARCXML file as input, output LD4L RDF. No expectation of deduping across a collection of records, or of reconciliation (with the exception of URI minting strategies based on local data, e.g. URIs/identifiers supplied from the data).
  • Roughly equivalent to the LC BF converter core + postprocessing used in LD4L1 project with a new target ontology.  Expect better code with unit and regression tests however.
  • Expect this core structural conversion will be extensible to other input formats and output modifications (e.g. FGDC input, BF2 output).
  • This is what we have committed to produce by end 2017-03, based on availability of the LD4L Ontology by 2017-01-01. Rebecca will work on this with help from Jim, and in coordination with Josh's validation engine.

UC2. As an LD4P partner I want to convert HFA Data from Filemaker Pro to the LD4L Ontology as the basis for LD4P cataloging/extension work

  • Access HFA metadata directly from Filemaker db.
  • Expect to extended converter core from use case 1
    • Directly re-use core converter "element converters" for shared elements, such as TitleBuilder, CreatorBuilder, PublisherBuilder, etc, to generate RDF and manage URI minting, entities
    • Re-use or extend lower level converter methods to the greatest extent possible to generate RDF for extension elements
    • Re-use or extend lower level converter methods to the greatest extent possible to manage URI minting
  • Automated access to convert from live database or clone at a frequency TBD
  • Extend converter Core to recognized new format and model; output using both core and extension modeling
  • Specific vocabulary/entity reconciliation needs

UC3. As an LD4P partner I want to convert FGDC XML to the LD4L Ontology as the basis for LD4P cataloging/extension work

  • FGDC metadata is in XML, stored by and available to Harvard LTS
  • Expect to extended converter core from use case 1
    • Directly re-use core converter "element converters" for shared elements, such as TitleBuilder, CreatorBuilder, PublisherBuilder,, etc, to generate RDF and manage URI minting, entities
    • Re-use or extend lower level converter methods to the greatest extent possible to generate RDF for extension elements
    • Re-use or extend lower level converter methods to the greatest extent possible to manage URI minting
  • Automated access to convert from live XML at a frequency TBD
  • Extend converter Core to recognized new model; output using both core and extension modeling
  • Specific vocabulary/entity reconciliation needs
  • Some form of convenient wrapper around core converter (possibly with extensions) that handles a batch of records and provides helpful debugging output to identify records that fail while continuing to process other records.
  • Work on deduplication of local URIs necessary
  • Scope here is modest numbers of records as input for manual work so performance likely not and issue
  • *M records so performance is as issue
  • Manual work to help with deduplication and reconciliation is out-of-scope – expectation is that improvements would come from fixes to the source MARC and/or to authority data
  • Not actually part of LD4L Labs proposal but we would like to do it
  • Builds on 5, expect deduplication, reconciliation, performance.
  • Catalog has daily updates (therefore repeat complete conversion is not possible)
  • Must deal with updates! Expectation that doing this in a triplestore will require named graph for triples from each record, thus allowing delete and then add in order to update.

UC7. As an LD4P partner I want to convert a batch of MARC/XXX records to the BF2 Ontology as the basis for LD4P cataloging/extension work

  • Relies on ability to customize core to output BF2 instead of LD4L Ontology
  • Harvard assumes there will be BF2.0 Converters available from LC and vendors. (With all the work to be done and unanswered questions, LD4L should not specifically strive for this.)
  • No labels