All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- n/a
- Update
examples/item_pytorch_geo_unet.jsonto align with UNet band values defined bytorchgeo==0.8.1(see torchgeo @ 41411d4).
- n/a
- n/a
- Fix resolution (and restrictions) of
torch,torchvisionandtorchgeodependencies instac-modelpackage to allow backward-compatibility with olderpython<=3.10versions. Examples usingtorchgeo.models.unetare limited topython>=3.11versions, but others remain backward-compatible. - Fix
MLModelExtension.from_torchinference ofvalue_scalingvalues fromtorchmodel definition when the transforms are based ontorchvision.transformsutilities rather thankornia.augmentation. - Fix
value_scalingvalues enforced toint. They will now acceptfloatvalues as well, and will applyintonly when the resulting value is equivalent to an integer (i.e.: decimal.0) to align with JSON schema parsers.
- Add optional
descriptionfield toProcessingExpressionto allow additional information about pre-processing or post-processing functions.
- n/a
- n/a
- n/a
- Fix case in
stac_model.torch.exportwheremetadataisNone. - Fix
stac_model.torch.exportdocs to correctmetadata_pathtometadatainREADME_STAC_MODEL.md.
- Add explicit JSON schema validation checks for
bandsandvariablesdefinitions against corresponding"bands"and"variables"dimensions indicated indim_orderto ensure they are coherent. Because of this validation, the"bands"and"variables"dimension names are now considered reserved for this purpose and cannot be employed as general dimension names without accompanyingbandsorvariablesdefinitions. - Add
item_pytorch_geo_unet.jsongenerated using the exampleunet_mlm()function, which parses a TorchGeo UNet model withSENTINEL2_2CLASS_NC_FTWdefault weights. - Add
MLModelExtension.from_torchclass method to convert a PyTorch model and optional TorchVision-style weights into a STAC Item with ML Model Extension metadata (fixes #70). - Add torch export and packaging utilities for combining a model, transforms, and MLM schema
compliant metadata into a single
.pt2archive. - Add
cpudevice type tomlm:acceleratorand correspondingstac_model.runtime.AcceleratorEnum. - Add
pre_processing_functionandpost_processing_functionsupport as JSON array of Processing Expression definitions respectively for the Model Input Object and Model Output Object (fixes #93). - Add official Python 3.13 support to the CI workflow and package release.
- Add
examples/torch/mlm-metadata.yamlexample that provides a minimal metadata example for a PyTorch model which can be validated using the MLM Schema without the need to be fully compliant with the STAC Specification. - Add
ModelDataVariabletostac_modelfor correspondingmlm:inputandmlm:outputdefinitions as the JSON schema. - Add
variablesproperties to Model Input Object to allow specifying the relevant data variables used by the model, with cross-references to the datacube extension (relates to #90). - Add
bandsandvariablesproperties to Model Output Object to allow specifying the relevant bands or variables produced by the model if any applies. - Add
downscalingto Tasks as common operation for climate variable models. - Add ML-Model Legacy document providing migration guidance from the deprecated ML-Model extension (relates to stac-extensions/ml-model#16).
- Move DLM Legacy document.
- Add
embeddingas suggested dimension name (relates to #77). - Add
huggingface/safetensorsrecommendations formlm:artifact_typeand correspondingmlm:frameworkvalues (fixes #68). - Add
Flaxto the list ofmlm:frameworkand the correspondingmlm:artifact_typeSafeTensors backend in the JSON schema examples. - Add
Paddleto the list ofmlm:framework(fixes #69).
- Enforce
roles: ["code"](minimally) to be included if an Asset specifiedmlm:entrypoint. - Update
stac-model==0.4.0to provide corresponding additions forvariablesreference. - Refactor
ModelInputandModelOutputobjects to use a newModelBandsOrVariablesReferencesdefinition combining theModelBandandModelDataVariablelists. - Moved
ModelBandfromstac_model.inputtostac_model.basesince it is now required by bothModelInputandModelOutputobjects. - Refactor the JSON schema to check for
bandsandvariablesreferences within bothmlm:inputandmlm:output. If either location detects that eitherbandsorvariablesis provided, their corresponding sets of extensions providing relevant descriptions are verified. - Refactor the JSON schema
mlm:outputproperty to employ aModelOutputobject definition rather than directly provided properties nested under the array. - Refactor the JSON schema to allow the omission of
bandsundermlm:inputif thevariablesproperty is provided. - Make
total_parametersoptional instac-modeland enforce greater than 0 to match with JSON-schema (applied in #101. - Update
stac-model==0.3.0to provideValueScalingObjectfrom installed package.
- n/a
- n/a
- Fix JSON schema not allowing
mlm:entrypointto be defined under a Source Code Asset. - Fix
stac_model.output.ModelOutputenforcing the need to specifyclassification:classesorclasses. The property can now be omitted if the model does not need to indicate that it produces a classification output. - Fix missing
encoding="utf-8"parameters inopencalls leading to failing parsing of example JSON STAC Item when they contain non-ASCII characters.
- Add better descriptions about required and recommended MLM Asset Roles and their implications (fixes #54).
- Add explicit check of
value_scalingsub-fieldsminimum,maximum,mean,stddev, etc. for correspondingtypevaluesmin-maxandz-scorethat depend on it. - Allow different
value_scalingoperations per band/channel/dimension as needed by the model. - Allow a
processing:expressionfor a band/channel/dimension-specificvalue_scalingoperation, granting more flexibility in the definition of input preparation in contrast to having it applied for the entire input (but still possible). - Add optional
mlm:compile_methodfield at the Asset level with optionsaotfor Ahead of Time Compilation,jitfor Just-In Time Compilation.
- Explicitly disallow
mlm:name,mlm:input,mlm:outputandmlm:hyperparametersat the Asset level. These fields describe the model as a whole and should therefore be defined in Item properties. - Moved
norm_typetovalue_scalingobject to better reflect the expected operation, which could be another operation than what is typically known as "normalization" or "standardization" techniques in machine learning. - Moved
statisticstovalue_scalingobject to better reflect their mutualtypeand additional properties dependencies. - moved
mlm:artifact_typefield value descriptions that are framework specific to best-practices section. - expanded suggested
mlm:artifact_typevalues to include Tensorflow/Keras.
- n/a
- Removed
norm_typeenum values that were ambiguous regarding their expected result. Instead, aprocessing:expressionshould be employed to explicitly define the calculation they represent. - Removed
norm_clipproperty. It is now represented undervalue_scalingobjects with a correspondingtypedefinition. - Removed
norm_by_channelfrommlm:inputobjects. If rescaling (previously normalization in the documentation) is a single value, broadcasting to the relevant bands should be performed implicitly. Otherwise, the amount ofvalue_scalingobjects should match the number of bands or channels involved in the input.
- Fix missing
mlm:artifact_typeproperty check for a Model Asset definition (fixes #42). Themlm:artifact_typeis now mutually and exclusively required by the corresponding Asset withmlm:modelrole. - Fix check of disallowed unknown/undefined
mlm:-prefixed fields (fixes #41).
- Add
raster:bandsrequired propertynamefor describingmlm:inputbands (see README - Bands and Statistics for details). - Add README warnings about new extension
eoandrasterversions.
- Split
ModelBandsandAnyBandsRefdefinitions in the JSON schema to allow them to be referenced individually. - Move
AnyBandsRefdefinition explicitly to STAC Item JSON schema, rather than implicitly inferred viamlm:input. - Modified the JSON schema to use a
ifcheck of thetype(STAC Item or Collection) prior to validating further properties. This allows some validators (e.g.pystac) to better report the real error that causes the schema to fail, rather than reporting the first mismatchingtypecase with a poor error description to debug the issue.
- n/a
- Removed
$commententries from the JSON schema that are considered as invalid by some parsers. - When
mlm:inputobjects do NOT define band references (i.e.:bands: []is used), the JSON schema will not fail if an Asset with themlm:modelrole contains a band definition. This is to allow MLM model definitions to simultaneously use some inputs withbandsreference names while others do not.
- Band checks against
eo,rasteror STAC Core 1.1bandswhen amlm:inputreferences names inbandsare now properly validated. - Fix the examples using
raster:bandsincorrectly defined in STAC Item properties. The correct use is for them to be defined under the STAC Asset using themlm:modelrole. - Fix the EuroSAT ResNet pydantic example that incorrectly referenced some
bandsin itsmlm:inputdefinition without providing any definition of those bands. Theeo:bandsproperties have been added to the correspondingmodelAsset using thepystac.extensions.eoutilities. - Fix various STAC Asset definitions erroneously employing
mlm:modelrole instead of the intendedmlm:source_code.
- Add the missing JSON schema
item_assetsdefinition under a Collection to ensure compatibility with the Item Assets extension, as mentioned this specification. - Add
ModelBandrepresentation usingname,formatandexpressionproperties to allow derived band references (fixes crim-ca/mlm-extension#7).
- Adds a job to
.github/workflows/publish.yamlto publish thestac-modelpackage to PyPI.
- n/a
- Field
mlm:namerequirement to be unique. There is no way to guarantee this from a single Item's definition and their JSON schema validation. For uniqueness requirement, users should instead rely on theidproperty of the Item, which is ensured to be unique under the corresponding Collection, since it would not be retrievable otherwise (i.e.:collections/{collectionID}/items/{itemID}).
- Fix the validation strategy of the
mlm:modelrole required by at least one Asset under a STAC Item. Although the role requirement was validated, the definition did not allow for other Assets without it to exist. - Correct
stac-modelversion in code and publish matching release on PyPI.
- Add pattern for
mlm:framework, needing at least one alphanumeric character, without leading or trailing non-alphanumeric characters. - Add
examples/item_eo_and_raster_bands.jsondemonstrating the original use case represented by the previousexamples/item_eo_bands.jsoncontents. - Add a
descriptionfield formlm:inputandmlm:outputdefinitions.
- Adjust
scikit-learnandHugging Faceframework names to match the format employed by the official documentation.
- n/a
- Removed combination of
mlm:inputwithbands: nullthat could never occur due to pre-requirement oftype: array.
- Fix
AnyBandsdefinition and use in the JSON schema to better consider possible use cases witheoextension. - Fix
examples/item_eo_bands.jsonthat was incorrectly also usingrasterextension. This is not fundamentally wrong, but it did not allow to validate theeoextension use case properly, since theraster:bandsreference caused a bypass for themlm:input[*].bandsto succeed validation.
- more Task Enum tasks
- Model Output Object
batch_sizeand hardware summarymlm:accelerator,mlm:accelerator_constrained,mlm:accelerator_summaryto specify hardware requirements for the model- Use common metadata Asset Object to refer to model asset and source code.
- use
classification:classesin Model Output - add
scene-classificationto the Enum Tasks to allow disambiguation between pixel-wise and patch-based classification
disk_sizereplaced byfile:size(see Best Practices - File Extension)memory_sizeunderdlm:architecturemoved directly under Item properties asmlm:memory_size- replaced all hardware/accelerator/runtime definitions into distinct
mlmfields directly under the STAC Item properties (top-level, not nested) to allow better search support by STAC API. - reorganized
dlm:architecturenested fields to exist at the top level of properties asmlm:name,mlm:summaryand so on to provide STAC API search capabilities. - replaced
normalization:mean, etc. with statistics from STAC 1.1 common metadata - added
pydanticmodels for internal schema objects instac_modelpackage and published to PYPI - specified
rel_typeto bederived_fromand specify how model item or collection json should be named - replaced all Enum Tasks names to use hyphens instead of spaces
- replaced
dlm:taskbymlm:tasksusing an array of value instead of a single one, allowing models to represent multiple tasks they support simultaneously or interchangeably depending on context - replace
pre_processing_functionandpost_processing_functionto use similar definitions to the Processing Extension - Expression Object such that more extended definitions of custom processors can be defined. - updated JSON schema to reflect changes of MLM fields
- any
dlm-prefixed field or property
- Data Object, replaced with Model Input Object that uses the
namefield from the common metadata band object which also recordsdata_typeandnodatatype
- n/a
- Added example model architecture summary text.
- Modified
$idif the extension schema to refer to the expected location when eventually released (https://schemas.stacspec.org/v1.0.0-beta.3/extensions/dl-model/json-schema/schema.json). - Replaced
dtypefield bydata_typeto better align with the corresponding field ofraster:bands. - Replaced
nodata_valuefield bynodatato better align with the corresponding field ofraster:bands. - Refactored schema to use distinct definitions and references instead of embedding all objects
within
dl-modelproperties. - Allow schema to contain other
dlm:-prefixed elements usingpatternPropertiesand explicitly deny otheradditionalProperties. - Allow
class_name_mappingto be directly provided as a mapping of index-based properties and class-name values.
- Specifying
class_name_mappingby array is deprecated. Direct mapping as an object of index to class name should be used. For backward compatibility, mapping as array and using nested objects withindexandclass_nameproperties is still permitted, although overly verbose compared to the direct mapping.
- Field
nodata_value. - Field
dtype.
- Fixed references to other STAC extensions to use the official schema links on
https://stac-extensions.github.io/. - Fixed examples to refer to local files.
- Fixed formatting of tables and descriptions in README.
- Initial release of the extension description and schema.
- n/a
- n/a
- n/a
- n/a