Remove upper bound for azure-datalake-store dependency#526
Remove upper bound for azure-datalake-store dependency#526hutch3232 wants to merge 2 commits intofsspec:mainfrom
Conversation
|
Hi @hutch3232. Thanks for the PR! I'm hesitant to remove the upper bound cap completely as it protects That being said Azure Data Lake Storage Gen 1 has been retired since February 2024 and @hutch3232 could you also elaborate more on why you are trying to use |
|
Thanks, @kyleknap. To be honest, I don't have an immediate need to use the new version of the dependency. I'm new to Azure and was poking around to try to understand how all these packages fit together and I was surprised when my resolver wasn't getting me the latest version I had seen on GitHub. Found that |
|
We'd love this to be merged and relased @kyleknap -> it holds us back in Airflow from upgrading the
This is far better than the current "they are limited by you not to upgrade it even if they need it for something else" - this is precisely the issue we have in Airflow - even if our users do not use azure with fsspec - but airflow itself (and particularly microsoft-azure provider uses it for other things. the sheer fact that we use fsspec limits us from upgrading the azure library) So my recommendation is - to accept the "no upper-binding" approach for those libraries, and also likely adding missing tests for adlfs. It's not your user's fault that there are no tests, and well, as maintainers of the library you chose to depend on it (as a required dependency) and expose azure functionality through adlfs. And it's not a secret what's changed: https://github.com/Azure/azure-data-lake-store-python/releases
But how it impacts your code, it's likely an assessment maintainers of This is the code comparision: Azure/azure-data-lake-store-python@v0.0.53...v1.0.1 -> it does not seem a lot, so I guess knowing the integration points, there - it should be easy to asses |
|
@hutch3232 @potiuk Thanks for the feedback here. For the short term, I'd still prefer for now just increasing the ceiling to the major version Long term, I'd actually prefer |
Sounds good. I was about to propose that as an option as well, but I did not know how strong tie it has with adlfs |
|
As discussed, I've added back the cap but bumped it to 2.0.0. Agreed it'd be great to remove this dependency entirely once adls gen 1 support is dropped. |
|
Should we merge/release it ? |
|
The diff looks good, but before merging it, I'd like to pull it down and try to vet if there are any negative impact of the upgrade especially since there are no tests for it. I'm hoping to do that in the next few days. |
Maybe we can help with that - if you have a version of azure-data-lake-store (say alpha/beta/rc) that we can test, we can run it through our (Apache Airflow) testing suite for our provider - and maybe other users can be asked for it as well - we do not have a full coverage of it of course, but when you combine inputs from multiple users, your own testing might be limited |
|
@hutch3232 Thanks! That sounds good to me. I'm thinking we just use the latest version (1.0.1). That should suffice. Does your test suite include end to end tests that make API requests to Azure DataLake Gen 1? I have not actually used Gen 1 before so I'm curious with its retirement how much of the API we will be able to use in testing its filesystem class. |
Good question: We do have system tests that test "real" Azure service - but likely not datalake-store https://github.com/apache/airflow/tree/main/providers/microsoft/azure/tests/system/microsoft/azure - and those rely on someone who would like to run them. We tried to encourage Microsoft to take stewardship of the azure provider - with a limited success so far unfortunately so if the tests are really "end-2-end" we likely cannot help much. Unless of course we can get hold of the azure-deltalake-store team that could be interested in spending time on contributing and testing the system test suite. |
|
@hutch3232 Got it. Thanks for the context. I still think running it through your test suite would be helpful. Mainly curious on whether you had an end to end set up. I plan to try to get a working setup when I pull down the PR and see how far I get. |
|
@hutch3232 @potiuk Just giving an update here. I pulled down the PR and tried out the upgrade of Furthermore, I reached out within Microsoft and confirmed that there are no customers using Gen 1 and it is not possible to create new Gen 1 stores to test this manually. At this point, I'm leaning toward pivot away from upgrading the dependency to spend the cycles to just remove Gen 1 support from adlfs. Mainly while there should not be anyone making API calls to Gen 1, technically someone could be instantiating the Gen 1 filesystem successfully and just not using it. So, in causing churn, I'd prefer it to be more deliberate (e.g., deprecation warning in one release and removal in a subsequent) than having no clear attribute errors or auth issues. Thoughts? |
|
Just sent up a PR to update the deprecation messaging on the plans for removing ADLS Gen 1 functionality: #529 |

I noticed there is a 1.x.x release (June, 2025) of this dependency but this upper bound prevents it's usage. However, it is being tested against already (the latest.txt requirements are uncapped).