Skip to content

Any way to programmatically construct a DataContract from a ODCS string? #805

@struesda

Description

@struesda

We are wanting to read data into a spark dataframe that is "predefined" based on a schema from a data contract.

We are defining our data contracts using ODCS and storing them in a DynamoDB database behind an API.

Therefore we would like to create a DataContract() object from an ODCS string.

We have been able to do this using a Data Contract Specification string - but not using an ODCS string.

Is there a way to do this? Right now it seems that the ODCS import only works with files, not strings?

from datacontract.data_contract import DataContract
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DateType, NullType, DecimalType

# data_contract_for_validate = DataContract(data_contract_str=data_contract_yaml,spark=spark_session)
data_contract_for_validate = DataContract(data_contract_str=data_contract_only_strings_yaml,spark=spark_session)
target_struct = eval(data_contract_for_validate.export(export_format="spark").split('=',1)[1].strip())

source_df = spark_session.read.csv(
    "s3://datalake/landing-zone-test/operations_vs_requests_order_summary/ORDER_SUMMARY_IN_PROCESS_no_tabs.csv", 
    header="true", 
    mode="PERMISSIVE", 
    inferSchema="false",
    schema=target_struct,
    columnNameOfCorruptRecord = "_corrupt_record"
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions