-
Notifications
You must be signed in to change notification settings - Fork 218
Any way to programmatically construct a DataContract from a ODCS string? #805
Copy link
Copy link
Closed
Description
We are wanting to read data into a spark dataframe that is "predefined" based on a schema from a data contract.
We are defining our data contracts using ODCS and storing them in a DynamoDB database behind an API.
Therefore we would like to create a DataContract() object from an ODCS string.
We have been able to do this using a Data Contract Specification string - but not using an ODCS string.
Is there a way to do this? Right now it seems that the ODCS import only works with files, not strings?
from datacontract.data_contract import DataContract
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, DateType, NullType, DecimalType
# data_contract_for_validate = DataContract(data_contract_str=data_contract_yaml,spark=spark_session)
data_contract_for_validate = DataContract(data_contract_str=data_contract_only_strings_yaml,spark=spark_session)
target_struct = eval(data_contract_for_validate.export(export_format="spark").split('=',1)[1].strip())
source_df = spark_session.read.csv(
"s3://datalake/landing-zone-test/operations_vs_requests_order_summary/ORDER_SUMMARY_IN_PROCESS_no_tabs.csv",
header="true",
mode="PERMISSIVE",
inferSchema="false",
schema=target_struct,
columnNameOfCorruptRecord = "_corrupt_record"
)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels