2/18/16

  1. meeting cancelled, Jonathan unavailable. Jonathan reports by email: ". I have not heard back from the HFF team about my technical issue and I am continuing to work on setting up a local triplestore for LCNAF data."

2/11/16

Attendees: Randy, Paul, Robin, Steven, Jonathan unavailable

  1. Discussed HFF matching strategy
  2. On 2/16, Jonathan reports "I spent time looking at the HFF documentation in an attempt to match fields to LCNAF. The HFF UI shows a minimal amount of information, but the documentation claims to support a broader set of VIVO and FOAF relationships. However, I am having technical difficulties with the HFF Sparql API and can’t query the data to see what data from the API is actually being stored and at what level of completeness. I have an email out to the HFF team requesting technical assistance. "

2/5/16

Attendees: Randy, Paul, Robin, Jonathan, Michelle, Steven

  1. Jonathan is still thinking about matching algorithms. Now that Steven has joined Harvard, he and Jonathan were able to have a phone call yesterday to discuss strategy.
  2. It seems like the best approach for HFF matching is to locate foaf links for persons to LC/NACO authority records, and then to try to identify the same links in HFF data.
    1. Steven will create an example of triple in our BIBFRAME RDF and matching person in HFF, including the new triple that would be added as a result of the process
    2. Steven and /or Jonathan will then translate the matching process into psuedo-code.
  3. If we can pin the matching algorithm down quickly, Jonathan can write code to iterate over a set of BIBFRAME to produce matches and new triples.
  4. In the long run, ITS staff would need to enrich our catalog records with person authority links to LC/NACO, ISNI, etc to make the matching process most effective.
  5. Randy will set up a meeting with senior library and HFF staff to talk about the future of HFF.
  6. Next checkin meeting is Thursday 2/11 at 9:30.

1/29/16

Attendees: Randy, Paul, Robin, Jonathan, Michelle

  1. Jonathan is still exploring matching algorithms. He has no written anything yet. He needs Harvard triples to work with. Paul will send him the same set of ~600,000  triples he used to process stackscore triples for Cornell. 
  2. However, since we have no triple store, searching for data fields in library cloud may be easier. But Jonathan will need the URI's for people to create new triples, so he'll have to get them from the linked data somehow. Maybe Cornell has an API. either solr or SPARQL?
  3. Paul D is done for now, as is Michelle.
  4. The project will likely receive a 2 month no cost extension, which will also postpone the need for a final report.
  5. Randy will schedule our next checkin for next Friday at 9:30.

1/22/16

Attendees: Randy, Paul, Robin on-call  Absent: Jonathan, Michelle

  1. Paul has completed his deliverables for the grant, and has sent a full set of stackscore triples to Cornell. Simeon indicated they look good.
  2. We have not had a report of progress from Jonathan this week. We have applied to Mellon for an extension of time for Jonathan to complete his work since it is apparent he will not be done by 1/31. Jonathan’s current goals for HFF entity resolution, within his planned funding, are:

 

a.       Specification (may be written iteratively and revised based on experimentation) Define the mapping for adding faculty finder links

a.       what are the input elements from our BIBFRAME?

b.      what SPARQL query would you then do against the HFF API to locate matching faculty?

c.       what would the markup be in the update BIBFRAME RDF?

b.       Develop an entity resolver software implementation as a record level application that takes in a single BIBFRAME RDF/XML “record”, accesses the HFF API as needed,  and outputs an updated BIBFRAME RDF/XML record. This would be working code. It would take in the input elements in the spec, call HFF as needed to obtain records based on those inputs, and – if we find a match (using hand selected test records) – and then the code would write out the RDF as per the spec.

c.       Run the entity resolver on a subset of the Harvard BIBFRAME data set as a proof of concept

d.       Document and report on findings and recommendations for future work.

Update: Jonathan reported via email on 1/25 that " I have started writing some of the basic look up code to the FF API and will tackle the a priori probability issue again once there is a functional basic lookup."

1/15/16

Attendees:  Michelle, Randy, Jonathan

  1. Jonathan was sick 3 days this week but continues to research linking to HFF. Apparently, HFF requires as input an "a priori" probability that the name being searched for is a valid Harvard name. He asked if we thought using the LC authority file to determine a list of all authors in the world would be a valid approach. Randy suggested just using sites on the web that list probabilities of name occurrences based on census data, assuming that the distribution of author names should be no different than for any other occupation...
  2. Paul is using Simeon's stackscore_annotations.py code to create stackscore triples for bib records, but he was getting fewer triples than expected. From speaking with Rebecca, they believe something was wrong with the number of instance assertions (too few) but she is not seeing that when processing the Harvard data at Cornell. It appears that the matching of local identifiers in Rebecca's code is not quite right - Paul is seeing extra chars in the Aleph numbers (eg 123456789-1, or 123456789-A, which Rebecca's code does not match.
  3. Pauls latest report: "As you may have seen from my mail exchange with Rebecca, Simeon's script (which I've slightly modified for Harvard data) seems to be working fine off Rebecca's ld4l output and an input file of our usage data.There's been a bit of a hiccup in the test ld4l data I was using as input, which was not creating enough instance triples (with embedded bib ID's in the local instance URI's) to use in matching up with our bib ID's in the usage-data input file.  But it seems that may have been due to an older version of the ld4l converter I was using.  Rebecca says that in her latest run of c. half a million Harvard records through the converter, instance triples are in much better shape. She's posted the converted output for Harvard, and I've agree to handle the usage-data triples generation using that data; and I guess the supplemented ld4l output all goes back to them once everything tests out.  And we can of course supplement that data with any URI's we may or may not create at the intersection of aleph and FacultyFinder data. I'll be back in the office on Monday and can answer any questions then."
  4. Jonathan will not have expended all budgeted hours on LD4L by 1/31. Randy will contact Dean and see if reimbursement can continue beyond 1/31 until the budget is exhausted.

1/8/16

Attendees:  Michelle Durocher, Randy, Jonathan

  1. Jonathan is just starting to research linking to HFF. We discussed how to measure how well any linking would be succeeding, and Michelle offered that she might be able to gather some information, at least for certain schools, about faculty authors represented in HOLLIS.
  2. Paul has created a new set of Harvard BIBFRAME records from the Cornell specified version of the LC converter, and sent the new BIBFRAME to Rebecca at Cornell.
  3. Paul is starting to look at stackscore.

12/18/15

  1. Attendees: Robin, Paul, Michelle, Randy  Absent: Jonathan
  2. Jonathan reported this week that he can't do eagle-i integration without weeks of help from eagle-i team. We dropped that project and have shifted his focus for January to HFF entity resolution.
  3. Paul is on verge of trying Rebecca’s converter, just waiting for operating instructions. Final converter from Rebecca expected on Jan 4.
  4. Paul  is contacting Cornell to agree on spec for usage data. Will create incremental usage file, similar to what we plan for Harvard Faculty Finder sameAs links.
  5. Michelle is in contact with Griffin Weber, and will get more info on persistence and expected future of HFF. Also determining any plans to link HFF data to ISNI authority data.
  6. Discussed moderately widespread adoption of VIVO for faculty/researcher linked data.

12/11/15

Attendees:  Robin Wendler, Paul Deschner, Michelle Durocher Absent: Randy, Jonathan

11/20/15

Attendees:  Jonathan Kennedy, Randy Stern, Robin Wendler, Paul Deschner, Michelle Durocher

  1. Status of Eagle-I readiness for full data set with stardog
    1. Jonathan believes he has made progress
    2. But he'll be out on vacation Thanksgiving week and the week after.
  2. Status of Eagle-I readiness for BIBFRAME customization and u/i configuration testing
    1. Jonathan and Michelle will set up a time to meet (now planned for Wed 12/9)
  3. Next BIBFRAME conversion for Cornell
    1. Jeff Licht ihas no time until January
    2. Paul has volunteered to:
      1. Rerun the Rebecca-specified version of the LC converter on our marc data.next week
      2. Run the Cornell post processor as it now stands on a small subset of that output to make sure you can do it
      3. When Rebecca releases the final version in a week or two, run the Cornell post processor and produce a complete output set to send to Cornell

11/13/15

Attendees:  Jonathan Kennedy, Randy Stern, Robin Wendler, Paul Deschner, Michelle Durocher

  1. Status of Eagle-I readiness (remove non Open RDF dependencies)
    1. Jonathan started working on this project, but he has been out sick a few more days this week.
    2. It turns out that eagle-i does not just use the Sesami API, but also uses some undocumented interfaces to its triple store, so connecting to stardog may be even harder than expected or impossible in our timeframe...
    3. So we decided that Jonathan should switch his priority and look at the standard out-of-the-box eagle-i installation that LTS provided a few months ago. This uses the default eagle-i triple store.
    4. The near term goal for the next week will be to get that running, load a small subset of our RDF records, and start to work with Michelle and her group on a configuration file for editing
    5. Michelle has her team ready to work with Jonathan now.
  2. Next BIBFRAME conversion for Cornell
    1. Jeff Licht is looking at the possibility of integrating the 2 bibframe converters into library cloud
    2. If he fails, then the backup is for Paul to rerun our marc with a Rebecca compatible version of the LC converter, and then again, run the new Cornell converter on that output. We'll wait to hear from Jeff Licht before he does anything.

11/6/15

Attendees:  Jonathan Kennedy, Randy Stern

  1. Status of Eagle-I readiness (remove non Open RDF dependencies)
    1. Jonathan started working on this project on 10/26, but he has been out sick most of this week.
    2. He had decided to try to replace the default triple store for eagle-i, TDB, with a purportedly scalable triple store from Stardog .
    3. However, he has run into a number of incompatibility problems. Apparently, eagle-i has more dependencies than expected. In addition to a dependence on Sesame APIs in the triple store, it also uses various other APIs (including the Jena API) for some functions. This will make it hard ir not impossible to integrate Stardog within the timeframe of the LD4L project.
    4. As a backup, Jonathan is now starting to look at the standard out-of-the-box eagle-i installation that LTS provided a few months ago. This uses the default eagle-i triple store.
    5. The near term goal will be to get that running, load a small subset of our RDF records, and start to work with Michelle and her group on a configuration file for editing
  2. Next BIBFRAME conversion for Cornell
    1. Rebecca at Cornell has provided a new BIBFRAME converter for testing. Its not done yet but can be used for seeting up an integration environment. She plans to provide the next real version on November 23.
    2. Since Jonathan is behind schedule, we need to explore options for doing the next conversion:
      1. Option 1 - Randy will contact Jeff Licht to see if there is any chance he could integrate the 2 BIBFRAME converters with Library Cloud quickly. This may not be possible
      2. Option 2 - Randy will contact Paul about the feasibility of him running the new converter on the output from the prior converter by hand.
      3. Option 3 - Jonathan has to drop his work on eagle-i and switch immediately to BIBFRAME conversion work.

10/21/15

Notes from emails and LD4L checkin

Jonathan has indicated that he has not started work yet, and will start in earnest on LD4L work next Monday. He must miss this Friday’s meeting, so I don’t think we need to meet this Friday. Does anyone else want to meet anyway?

See https://wiki.duraspace.org/x/uxE1B for notes on today’s LD4L systems checkin. The key take ways are:

  1. Rebecca at Cornell is a month behind schedule on the LC-BIBFRAME à LD4L-BIBFRAME converter. She now expects to have a version for our use by Monday 11/23.
  2. In advance of that date, around 11/4, she will provide an incomplete version for integration development and integration testing.
  3. She would then like it if both Harvard and Cornell could kick off a conversion with the new converter before Thanksgiving, so that the LD4L BIBFRAME triples would be converted and available to send to Cornell right after Thanksgiving.
  4. The 11/23 converter will NOT contain OCLC works entity resolution. That will be available somewhat later (in Dec??) We’d need to run that part on our data after that and send the results back to Cornell.
  5. If at any point we have been able to do faculty finder entity resolution and triple generation, we could provide an incremental file of any new triples that we generate and send it to Cornell.
  6. Last note – Jim at Cornell has apparently succeeded at loading 1B triples into his Virtuoso triple store on his laptop. He has loaded 800M Cornell triples in 10 hrs. He also tried 900M Harvard triples in a separate test which took 20hrs (unclear why…)

7/31/15

Attendees: Michele Durocher, Jonathan Kennedy, Randy Stern

  1. Status of Eagle-I readiness (remove non Open RDF dependencies)
    1. Jonathan has been busy on another project - no new work on LD4L in the last 2 weeks. He is out of the office next week, but expects to be able to focus on LD4L the week after.
    2. Michele has people who could work on the Eagle-i display mapping. They would like to evaluate the Eagle-i editor as a possible production editor. When Jonathan is back she would like them to talk about the mapping with him.
  2. August 23 meeting
    1. Randy, Robin, and Jonathan are planning to attend. Michele will talk with Scott about who from tech services will attend
    2. The agenda is not yet formed. Randy thinks day1 will be project catch up, bu day 2 will include Kd4L-Labs discussion
  3. LD4L-Labs
    1. Version 4 is Dean's latest draft
    2. Jonathan will write up some ideas that he would like to include in the prospectus, specifically relating to exploring the use of Eagle-i as a production linked creation and management tool. He expects to be able to deliver some text the week of 8/9
    3. Michele will follow up with Christine and Marc about getting some better text to describe the Harvard Film Archive and Cartographi

7/10/15

Attendees: Michele Durocher, Paul Deschner, Jonathan Kennedy, Randy Stern

  1. We discussed the strategy of using the triple store to provide the resolution for URIs for Harvard bib items. The library has put up a persistent metadata linking service, id.lib.harvard.edu.  Just like we now have have:

http://id.lib.harvard.edu/aleph/008126126/catalog

our linked data URIs will be something like

http://id.lib.harvard.edu/aleph/008126126/rdf

These will initiate a SPARQL query against the Eagle-I triple store, and resolve to the set of RDF that represents the graph for a MARC record

2. Jonathan is working on Eagle-I modifications to support open RDF. He noted that he could use help with setting up the Eagle-I configuration mapping file to drive the Eagle-I user interface on our BIBFRAME rdf. The near term work and target dates for our plan is:

3. Who plans to attend the August 24, 25 meeting at Cornell?

4. Randy will distribute the LD4L-2 draft summary for input

 

6/5/15

  1. Review deliverables below, especially how to get our BIBFRAME to Cornell (Paul?)
    1. Paul will send dropbox links to Simeon today.
  2. Usage ontology (Paul)
    1. Paul will call in to next ontology call
  3. Status/plan for faculty finder based entity resolution (Jonathan)
  1. Library cloud as a feeder for BIBFRAME conversion
  2. LD4P update, and June 29/30 LD4P meetings
  1. LD4L August 24/25 meetings
  2. LD4L2 grant proposal, Here is one set of ideas we could propose contributing:

4/30/15

Checkin on April status and goals for May.

 

  1. Jonathan to consult with Cornell on scalability and plug in architecture for entity resolvers.
    1. API spec for entity resolvers
    2. Scalability research/experiments
    3. Target: 5/15/15 - EAGLE-I instance set up at Harvard, scalability testing performed
  2. Paul
    1. Owl ontology for usage data
    2. Stackscore from Library Cloud can be exported as a triple?
  3. Michelle
    1. LD4P - meeting at ALA in June, putting together grant proposal for Jan 2016
    2. Infrastructure requirements? User requirements?
    3. Update on HFA ontology project
    4. Create RDF from FileMaker metadata and links to other other sources on the webInfrastructure needs that Harvard would have in a BIBFRAME production environment
  4. Jeff Licht/Michael
    1. Enhance library cloud to pass MARC record to publishing target
    2. Create SDK for publishing targets to make it easy to instantiate new library cloud pipeline steps

1/15/16

Attendees:  Michelle, Randy, Jonathan

Jonathan was sick 3 days this week but continues to research linking to HFF. Apparently, HFF requires as input an "a priori" probability that the name being searched for is a valid Harvard name. He asked if we thou