Skip to content

Conversation

@StephenBrown2
Copy link
Contributor

Resolves #150 and adds tests based on deviations from 2003 spec: https://unicode.org/reports/tr46/#Deviations

This also enables Unicode® Technical Standard #46 to normalize capital letters and such, and a test based on the example given in the module documentation.

Since the idna encode() function chokes on colons, I've split and rejoined the authority string and added to the tests to make sure the port stays as expected.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for opening this @StephenBrown2! I've marked a few issues, I think we should look at how urrlib3 handles this case or explore using rfc3986.IRIReference to handle it for us.

@StephenBrown2
Copy link
Contributor Author

Looks like rfc3986.IRIReference will in fact do exactly what we want, including handling colons in the authority. I've rebased on master to resolve the conflict with hsts... and switched to using rfc3986.api.iri_reference(url).encode() which will handle the IDNA Domain and return a URLReference object as expected.

Ref: convenience function - https://github.com/python-hyper/rfc3986/blob/df21c4b09078580c1cb10c728dc18c3111fcf603/src/rfc3986/api.py#L41
Relevant bit of IRIReference.encode() - https://github.com/python-hyper/rfc3986/blob/df21c4b09078580c1cb10c728dc18c3111fcf603/src/rfc3986/iri.py#L97-L129

httpx/models.py Outdated
) -> None:
if isinstance(url, rfc3986.uri.URIReference):
self.components = url
elif isinstance(url, rfc3986.iri.IRIReference):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be exposing extra types as part of this pull request.

(In fact I don't think we should be exposing rfc3986.uri.URIReference as part of our public interface in the first place - it'd be much better if we could keep that as an internal implementation detail, in order to allow us to completely switch implementations if needed.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #170

@lovelydinosaur
Copy link
Contributor

I guess we might also choose to switch our dependency listing to rfc3986[idna], instead of listing the two packages seperately?

That'd stop devs from having to do a "wait where are you actually using the idna package?" double take.

Copy link
Contributor

@sethmlarson sethmlarson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased and removed a test case that's no longer valid due to interface changes.

@lovelydinosaur
Copy link
Contributor

Yas - good stuff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use IDNA 2008 instead of IDNA 2003

4 participants