4603 BHL url to CitationDocumentation#4605
Open
kleintom wants to merge 4 commits intoSpeciesFileGroup:developmentfrom
Open
4603 BHL url to CitationDocumentation#4605kleintom wants to merge 4 commits intoSpeciesFileGroup:developmentfrom
kleintom wants to merge 4 commits intoSpeciesFileGroup:developmentfrom
Conversation
Mainly Claude code - the intial prompt is below. It needed some help with: * correct BokChoy usage * how to find and access (with project_token) public TW projects via api * differentiating/parsing different types of page numbers from bhl url string and TW * using correct attributes on hash data returned from apis The initial prompt: Use metadata from any 2 sources to predict and extend the attributes of either of those sources. We're reconciling data against each other, and predicting improvements, or matching elements. Ultimately we'll present these to a human user for confirmation, refinement, or selection such that curatorial decisions improve data. * The TaxonWorks root API for data is at https://sfg.taxonworks.org/api/v1 * The TaxonWorks root API for documentations is at https://api.taxonwork.org/ * The GlobalNames BHLNames api documentation is at https://bhlnames.globalnames.org/apidoc/index.html * The Ruby BHL gem wrapping BHLNames is at https://github.com/SpeciesFileGroup/bok_choy * The Ruby COL gem, useful for more identifiers on names is at https://github.com/SpeciesFileGroup/colrapi * The TaxonWorks code base is at https://github.com/SpeciesFileGroup/taxonworks * The Global names organization at GH is at https://github.com/gnames/ We want to boostrap the infrastructure with a basic use case. * User is navigating BHL and finds a page that contains information. * We use the URL, and a taxon name parameter against several APIs, collectively wrapped in meta-service * The meta-service queries TaxonWorks API, GlobalNames BHLNames API, and others it might need to resolve the problem * It seeks to predict the citation/source/refernce that the URI refers to. * It should return or infer the exact page number as physically indicated for the URI * It shoudl confirm the presence of the name string on that page * As a proof of concept we'll use Ruby to act as the meta-service * Use Thor to handle command line parameters * Take a name param, and a url param as input * Use the referenced APIs as data sources * Return a list of 5 sources in a ranked order, with the most probable source that the URL comes from at the top * Return a list of IDs for the TaxonName from at least the TaxonWorks API, and any other IDs you can discover from other APIs (e.g. global names) * Suggest a diff between the metadata directly tied to the BHL "source" page and the TaxonWorks Source metadata * A simple, executable Ruby script that ties these together * When there is no direct path to linking API endpoints, then you should suggest API endpoints that would resolve the problem. When you do this you must NOT add new functionality that is being encoded in this service, i.e. the new endpoints should be RESTful in nature with respect to complexity
Contributor
Author
|
This is just a small first start with many TODOs. The main blocker in sight at this point is that Lines 744 to 748 in 47b28d3 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Functionality to add (please correct/amend/etc.)
@mjy has a much fuller/clearer picture of the goals here, these are my scattered rememberings of what we discussed.
BHL reference should be more than just a url, it should include page mapping (from BHL page numbers to article page numbers) and perhaps ocr mapping as well
Relation to #4603: as discussed there, likely we'll want to create a 'virtual document' to represent the BHL document, and a (new) CitationDocumentation object to associate the (TN, TN source, BHL copy of TN source) data, which is where the page mapping etc data will live.