Skip to content

442 cloud optimise flow refactor#261

Draft
thommodin wants to merge 19 commits intomainfrom
442-cloud-optimise-flow-refactor
Draft

442 cloud optimise flow refactor#261
thommodin wants to merge 19 commits intomainfrom
442-cloud-optimise-flow-refactor

Conversation

@thommodin
Copy link
Contributor

This pull request introduces several improvements and fixes across configuration handling, timestamp processing, dependency management, and testing. The changes enhance robustness for timestamp conversion, improve configuration validation (especially for CSV ingestion), update the dependency list, and add new tests and documentation. Below are the most important changes grouped by theme:

Configuration and Validation Improvements:

  • Added a new csv_config section to dataset configuration files (e.g., animal_acoustic_tracking_delayed_qc.json) with validation ensuring only one of pandas_read_csv_config or polars_read_csv_config is provided, and improved documentation to clarify mutually exclusive options and validation rules. [1] [2] [3]
  • Updated schema validation to allow year_range and coiled_cluster_options fields to accept null values as well as their original types, improving flexibility in schema_validation_parquet.json. [1] [2]
  • Removed the unused spatial_extent block from the mooring_temperature_logger_delayed_qc.json dataset config, likely to avoid redundancy or potential conflicts.
  • Changed force_previous_parquet_deletion to true in aggregated_seabird_nonqc.json, ensuring previous Parquet files are deleted before processing.

Timestamp Handling and Code Robustness:

  • Refactored timestamp conversion logic in GenericParquetHandler.py to use nanosecond resolution with .as_unit("ns"), ensuring consistent behavior across different h5py versions and platforms. [1] [2]

Dependency and Compatibility Updates:

  • Added pydantic>=2.12.5 to project dependencies and updated the Python version constraint to <3.15 in pyproject.toml, improving compatibility and validation capabilities. [1] [2]
  • Minor dependency cleanup and reordering for clarity and maintainability in pyproject.toml. [1] [2] [3]

Testing and Platform Support:

  • Added a new conftest.py to set DYLD_LIBRARY_PATH on macOS, ensuring tests can find Homebrew-installed libraries during collection.
  • Introduced a new unit test module test_orchestrate_generate.py for the generate and optimise functions, improving test coverage for core orchestration logic.

Internal Code Quality:

  • Improved resource handling in config.py by using as_file context manager for loading configuration files, increasing reliability and compatibility with different importlib resource backends. [1] [2]

@codecov-commenter
Copy link

Codecov Report

❌ Patch coverage is 82.35294% with 174 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@ccef564). Learn more about missing BASE report.

Files with missing lines Patch % Lines
.../bin/config/model/parquet_schema_transformation.py 72.52% 50 Missing ⚠️
..._cloud_optimised/bin/orchestrate/file_collector.py 39.68% 38 Missing ⚠️
...dn_cloud_optimised/bin/config/model/path_config.py 75.51% 24 Missing ⚠️
...cloud_optimised/bin/config/model/dataset_config.py 82.17% 23 Missing ⚠️
...sed/bin/config/model/zarr_schema_transformation.py 80.35% 22 Missing ⚠️
...n_cloud_optimised/bin/config/model/run_settings.py 87.87% 8 Missing ⚠️
test_aodn_cloud_optimised/conftest.py 33.33% 4 Missing ⚠️
...oud_optimised/bin/config/model/csv_config_model.py 88.00% 3 Missing ⚠️
aodn_cloud_optimised/bin/orchestrate/content.py 94.44% 1 Missing ⚠️
..._aodn_cloud_optimised/test_orchestrate_generate.py 96.55% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #261   +/-   ##
=======================================
  Coverage        ?   68.71%           
=======================================
  Files           ?       47           
  Lines           ?     5786           
  Branches        ?        0           
=======================================
  Hits            ?     3976           
  Misses          ?     1810           
  Partials        ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants