Skip to content

POC of Go transformation libraries#62

Merged
epavlova merged 5 commits into
mainfrom
go-lang-libs
Mar 5, 2025
Merged

POC of Go transformation libraries#62
epavlova merged 5 commits into
mainfrom
go-lang-libs

Conversation

@epavlova
Copy link
Copy Markdown
Contributor

This is an incomplete POC for introducing Go libraries transforming content tree to plain text and to "external" bodyXML. It is based on PR #51.

The format of the "external" bodyXML is what is described as Body Format IV - this is what is currently retrieved from Content and Enriched Content APIs with policy INCLUDE_RICH_CONTENT.

Since the full implementation has not been finalised at this stage, this PR seeks endorsement specifically for:
A. Including Go libraries in this repository with the proposed structure and developing them alongside changes to the content tree definition.
B. Aligning on a commitment to collaborate on resolving the remaining TODOs. The transformation to “external” bodyXML intentionally includes multiple TODOs, reflecting open questions regarding tags that still require solutions. These TODOs could serve as a useful reference to assess how well the current content tree definition accommodates old content.

Please note that the proposed implementations does not pass the unit tests.

I know that this is a long PR but the C&M needed a POC that we can indeed adopt content tree as the primary body format in the platform.

The Go structs allow representation of full content tree
and converting a JSON content tree to Go objects (unmarshalling).
Add Go mod definition in the main dir of the project.

This implementation returns an error during unmarshalling if the
tree does not adhere to the JSON schema, including when the node
type is "undefined".
The Go transformer (JSON tree -> plain text) uses the contenttree
package and its base content tree representation. The whole content
tree is unmashalled into Go objects thus the input of the
transformer needs to comply to the contenttree package
implementation.
The update is moving from content tree body as input of the test
to content tree root. This assumes that the transformer should
receive the whole content tree, starting from the root.
Remove double spaces from the output of the test cases for
content tree to plain text transformers.
The trasformer relies on the content tree representation in the
main contenttree package.
The implementation is not completed and does not pass the
unit tests.
@epavlova epavlova requested review from a team and chee as code owners February 26, 2025 13:45
@chee
Copy link
Copy Markdown
Contributor

chee commented Mar 4, 2025

this is really exciting!!!

for what it's worth i very 100% approve of the structure of the go libs, and am pumped!

Copy link
Copy Markdown
Contributor

@chee chee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's get this in, and get started

@epavlova epavlova merged commit 79fdcec into main Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants