Skip to content

E-ARK SIP parse ignores CONTENTINFORMATIONTYPE and misreads TYPE/OTHERTYPE for contentType #372

@ThomasEdvardsen

Description

@ThomasEdvardsen

Summary

Parsing an E-ARK SIP via EARKSIP.parse(...) has two related issues:

  1. contentInformationType is never populated from METS, so it stays at default MIXED.
  2. contentType is derived from CONTENTINFORMATIONTYPE / OTHERCONTENTINFORMATIONTYPE instead of TYPE / OTHERTYPE.

Steps to Reproduce

  1. Create a SIP with root METS attributes:
    • TYPE="Other"
    • csip:OTHERTYPE="Moving images - on tangible media"
    • csip:CONTENTINFORMATIONTYPE="OTHER"
    • csip:OTHERCONTENTINFORMATIONTYPE="MOVINGIMAGES-PROFILE-1.0"
  2. Parse it with new EARKSIP().parse(path).
  3. Inspect results:
    • sip.getContentType().asString()incorrect (taken from content information fields)
    • sip.getContentInformationType().asString() → remains MIXED

Expected

  • contentType should come from TYPE / OTHERTYPE
  • contentInformationType should come from csip:CONTENTINFORMATIONTYPE / csip:OTHERCONTENTINFORMATIONTYPE

Actual

  • contentType incorrectly uses content information attributes
  • contentInformationType remains default MIXED

Likely Cause

In EARKUtils.setIPContentType(...), the value is taken from CONTENTINFORMATIONTYPE / OTHERCONTENTINFORMATIONTYPE rather than TYPE / OTHERTYPE.
Also, processMainMets(...) (and representation parsing) never call a setter for contentInformationType.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugstaleIssue has not had recent activity or appears to be solved. Stale issues will be automatically closed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions