Implementation of APIv3#4863
Conversation
humitos
left a comment
There was a problem hiding this comment.
Some thoughts on the overall idea but also where write endpoints make sense.
| ~~~~~~ | ||
|
|
||
| The API can be accessed by using the HTTP header ``Authorization`` | ||
| with a specific ``Token`` that provide access to different resources. |
There was a problem hiding this comment.
We may probably allow a Cookie session also here.
There was a problem hiding this comment.
@agjohnson has a point about not supporting this that we should consider. He wrote something at #5009 (comment)
| and do require authentication even for requesting public information. | ||
|
|
||
| Scopes | ||
| ~~~~~~ |
There was a problem hiding this comment.
I need to keep thinking about the implementation of this, since given a Token the request won't be authenticated under a particular user but as a "token with limited permissions".
So, queryset for API endpoints will need to manage these two cases:
- projects where the user is admin (in case we support cookie sessions)
- projects where that token has specific permissions
There was a problem hiding this comment.
Why don't we want tokens tied to a user? A user might have tokens that have less than full permissions but I think we definitely want tokens tied to a user. Maybe I'm misunderstanding this.
There was a problem hiding this comment.
You are right. I didn't express myself properly. The 1) case is the same for cookie sessions than for a token user (without custom permission scopes).
Summarizing, we will have these authentication methods:
- regular session cookie (authenticated as a user)
- token tied to a user (authenticated as a user)
- token with specific permissions (not tied to a user)
- it could be project specific:
project:readonly indocsproject (multiple projects also?) - it could be all project from a user specific:
project:readfor all the projects belonging touser
- it could be project specific:
| * repo type | ||
| * version active | ||
| * version built | ||
| * all database? |
There was a problem hiding this comment.
I may want to perform a query like
/api/v3/project/?repo_url=https://github.com/pypa/pip
and get all the projects that are using that URL as source.
Or, all the Spanish projects under Read the Docs,
/api/v3/project/?language=es
but I don't want to allow listing all the projects without any kind of filtering.
There was a problem hiding this comment.
We need to be very careful on this. For projects (where we have ~100k) this might be ok. For builds or versions, we should probably only allow filtering on indexed fields (eg. slug, project__slug, etc.). Otherwise, this can result in API calls that are very heavy from a performance and load perspective.
| Project list | ||
| ++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/project/ |
There was a problem hiding this comment.
This should probably be called /api/v3/project/me/ or similar so we can allow some global filtering now or in the future but leaving the feature open.
| { | ||
| "href": "/api/v3/project/pip/users/", | ||
| "rel": "users", | ||
| "type": "GET" |
There was a problem hiding this comment.
I find this really verbose but very useful to discover the API and navigate over it.
I suppose that if we really do care about the amount of data returned here, we should re-consider GraphQL ;)
There was a problem hiding this comment.
I mean, you know the slug of your project, you hit this endpoint and you know everything you can do with the API and your project.
There was a problem hiding this comment.
Shouldn't type be method to be more precisely named? However, I question whether it is even needed. If you are just listing, it is always a GET.
There was a problem hiding this comment.
href, rel and type are the "standards" (or what most of the docs I read use).
I think we could remove it completely as you say, but I think that POST would also be allowed in this particular endpoint to add a Maintainer/Owner for this particular project for example.
|
|
||
| .. TODO: | ||
|
|
||
| Add query string filters to narrow the query: |
There was a problem hiding this comment.
?featured=true could be useful for the home page also.
| "type": "GET" | ||
| }, | ||
| { | ||
| "href": "/api/v3/project/pip/builds/", |
There was a problem hiding this comment.
I'm thinking that you can do a POST to this URL with something like:
{
"version": "latest"
}
and trigger a build for that project and version.
There was a problem hiding this comment.
and probably similar POST request for the rest of the endpoints listed here as links and GET.
/domains could add a new domain, /notifications a new webhook or email notification, etc
| Version detail | ||
| ++++++++++++++ | ||
|
|
||
| .. http:get:: /api/v2/project/(string:project-slug)/version/(string:version-slug)/ |
There was a problem hiding this comment.
By POSTing here, we could activate or disable the version, change the privacy level, etc
| "default_version": "stable", | ||
| "default_branch": "master", | ||
| "documentation_type": "sphinx_htmldir", | ||
| "canonical_url": "http://pip.pypa.io/en/stable/", |
| Project details | ||
| +++++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/project/(string:slug)/ |
There was a problem hiding this comment.
Everything should be accessed by string:slug or id:int.
There was a problem hiding this comment.
Do we really want/need this at all?
We are currently accessing by id to the different endpoint from projects/tasks.py from builders, but I think that this was built like this because there wasn't another way.
I'm not sure that accessing by ID is useful in our case.
There was a problem hiding this comment.
I don't think we need to expose the ID.
There was a problem hiding this comment.
👍 to not exposing ID. slug is the public facing unique identifier in my mind. Though it is possible that it could change over time (by an admin), but I think that's an edge case.
| "htmlzip": "//readthedocs.org/projects/pip/downloads/htmlzip/stable/", | ||
| "epub": "//readthedocs.org/projects/pip/downloads/epub/stable/" | ||
| }, | ||
| "links": [ |
There was a problem hiding this comment.
To build this structure, we could use something similar to https://www.django-rest-framework.org/tutorial/5-relationships-and-hyperlinked-apis/#hyperlinking-our-api
There was a problem hiding this comment.
I personally like the idea of links to show related endpoints to the current resource (and to get more detailed info about it) but the list structure does not convince me since if you want the builds url, you need to loop over the whole list. Maybe a dict is better.
On the other hand, using a dict will also require another nested dict for the method type, like:
"links": {
"builds": {
"GET": {
"href": URL,
...
}I'm not 100% convinced on the structure, but I think it's a good adoption to have these HATEOAS links here.
There was a problem hiding this comment.
Well... as there is not only one solution for this, I kept googling it and I found that what I'm suggesting here is something already adopted: https://slides.com/jcassee/pycon-sette-hypermedia#/2/6
This is the structure:
links:
rel:
href: URI
| { | ||
| "id": 7367364, | ||
| "date": "2018-06-19T15:15:59.135894", | ||
| "length": 59, |
There was a problem hiding this comment.
I think this field should be called duration
| "type": "GET" | ||
| }, | ||
| { | ||
| "href": "/api/v3/project/pip/notifications/", |
There was a problem hiding this comment.
All of these URLs can be expressed in the other way:
/api/v3/notifications/pip/
but for some endpoints doesn't look great. Example,
/api/v3/versions/pip/stable/
I'm not 100% on either, though. In this way it's clear that you will be returned a Version resource and /api/v3/project/pip/versions/stable/ maybe it's not that clear.
This document says something about this: https://geemus.gitbooks.io/http-api-design/content/en/requests/minimize-path-nesting.html
There was a problem hiding this comment.
Another comment here is that we are using a similar approach of nested URLs for the URLs site: https://readthedocs.org/dashboard/(project-slug)/version/(version-slug)/
There was a problem hiding this comment.
I would try hard to not have multiple endpoints that return essentially the same thing. I think that point is not that all API nesting is bad but more that it can go too far.
I prefer /api/v3/project/pip/notifications/ as an endpoint to /api/v3/notifications/pip/ (I would prefer /api/v3/notifications/?project__slug=pip to the 2nd one although not the 1st). Since I think /api/v3/notifications/ -- notifications for all projects -- is not particularly useful I think it's fine to only expose the endpoint that is specific to a project.
| Build detail | ||
| ++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/build/(int:id)/ |
There was a problem hiding this comment.
In this case, it makes sense to access by id since it's only unique identifier that we have for a build.
There was a problem hiding this comment.
This probably shoulnd't be allowed (since doesn't make to much sense as a whole) and instead we should make it like /api/v3/project/pip/builds/?commit=a1b2c3 or ?latest=true or ?running=true.
| Project list | ||
| ++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/project/ |
There was a problem hiding this comment.
I think we should start using the resource name in plural for v3, /projects/
There was a problem hiding this comment.
Yes.
I made a mistake and then copy&paste, but it should be plural here.
| "slug": "stable", | ||
| "verbose_name": "stable", | ||
| "identifier": "3a6b3995c141c0888af6591a59240ba5db7d9914", | ||
| "built": true, |
There was a problem hiding this comment.
I'd like to have a calculated field here: build_date or last_success_build_date or something with that meaning but a better name ;)
davidfischer
left a comment
There was a problem hiding this comment.
I like where this is going.
| and do require authentication even for requesting public information. | ||
|
|
||
| Scopes | ||
| ~~~~~~ |
There was a problem hiding this comment.
Why don't we want tokens tied to a user? A user might have tokens that have less than full permissions but I think we definitely want tokens tied to a user. Maybe I'm misunderstanding this.
| * repo type | ||
| * version active | ||
| * version built | ||
| * all database? |
There was a problem hiding this comment.
We need to be very careful on this. For projects (where we have ~100k) this might be ok. For builds or versions, we should probably only allow filtering on indexed fields (eg. slug, project__slug, etc.). Otherwise, this can result in API calls that are very heavy from a performance and load perspective.
| Project details | ||
| +++++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/project/(string:slug)/ |
There was a problem hiding this comment.
I don't think we need to expose the ID.
| "type": "GET" | ||
| }, | ||
| { | ||
| "href": "/api/v3/project/pip/notifications/", |
There was a problem hiding this comment.
I would try hard to not have multiple endpoints that return essentially the same thing. I think that point is not that all API nesting is bad but more that it can go too far.
I prefer /api/v3/project/pip/notifications/ as an endpoint to /api/v3/notifications/pip/ (I would prefer /api/v3/notifications/?project__slug=pip to the 2nd one although not the 1st). Since I think /api/v3/notifications/ -- notifications for all projects -- is not particularly useful I think it's fine to only expose the endpoint that is specific to a project.
| { | ||
| "href": "/api/v3/project/pip/users/", | ||
| "rel": "users", | ||
| "type": "GET" |
There was a problem hiding this comment.
Shouldn't type be method to be more precisely named? However, I question whether it is even needed. If you are just listing, it is always a GET.
|
|
||
| :statuscode 201: Created sucessfully | ||
| :statuscode 400: Some field is invalid | ||
| :statuscode 401: Not valid permissions |
There was a problem hiding this comment.
I don't think this is necessary to list on every endpoint. This should probably just be in some section on authentication.
| :statuscode 401: Not valid permissions | ||
|
|
||
|
|
||
| Footer |
There was a problem hiding this comment.
I think we should come up with a better name for this API endpoint than footer. Maybe displaymetadata. Maybe it can even go away because all the necessary data is in other endpoints. Not critical but something to think about.
There was a problem hiding this comment.
I thought on removing it in favor of all the other endpoints but in that case to build the flyout you may need to make 2 or 3 or maybe more requests. Is that affordable regarding request time?
ericholscher
left a comment
There was a problem hiding this comment.
I like the direction that this is going. I think it could use a design document explaining some of the design decisions that went into it. I also think it would be useful to have some explicit use cases that we care about, and want to support and aren't supporting. It feels like they are implied here, but having them be explicit would be very useful. For example, this API feels very focused on automating dashboard operations, but not doing anything with documentation page features, which is arguably ~90% of what our users want to be doing w/ RTD.
| -------------------------------- | ||
|
|
||
| Requests to the Read the Docs public API are for public and private information | ||
| and do require authentication even for requesting public information. |
There was a problem hiding this comment.
I'm -0 on this. I think having a public API that isn't authed fits a lot of use cases nicely, and I haven't heard an argument for why we should exclude it in this design. I'm fine to heavily rate limit anon access so it isn't built into processes, but I think for learning etc. having it open is good.
There was a problem hiding this comment.
Also, is it possible to use this API on a documentation page as designed? eg. I'm a user who wants to get the version listing of my project in my own JS, can I do that?
There was a problem hiding this comment.
To me, not requiring authentication duplicates a mistake we made with API v1 and v2. Now that we are deprecating API v1 and v2, there's no way to know who is making outdated API calls because the calls aren't authenticated. Other than broadcast messages in our docs and on the blog, we can't warn users specifically that their API calls are going to break after a certain date.
Secondly, I question the usefulness of an API that allows browsing all projects, versions, and builds rather than "my" projects, versions, and builds. For example, our current v2 endpoints would be a lot more useful if they returned the few projects where you are a maintainer (at least by default) rather than /api/v2/projects/ returning all projects and requiring you to page through them or filter by a single slug.
Still, maybe the right approach is to heavily rate limit the API when used anonymously such that almost everyone will get a token.
Also, is it possible to use this API on a documentation page as designed? eg. I'm a user who wants to get the version listing of my project in my own JS, can I do that?
With authentication, probably not. However, how does one know which projects are "my" projects without authentication?
There was a problem hiding this comment.
With authentication, probably not. However, how does one know which projects are "my" projects without authentication?
GET /api/v3/projects/<slug>/versions/
There was a problem hiding this comment.
That gets a project but there's no concept of my project without auth.
There was a problem hiding this comment.
well sure, but presumably the JS embedded in a project's docs know what the project is :)
There was a problem hiding this comment.
I agree in general auth is useful, but I think we need to support unauth'd endpoints for a lot of pretty important use cases, or have some kind of publishable token.
There was a problem hiding this comment.
Wait a second... I think Eric has a point. How are we going to use our own API (for the footer for example) if we do not allow anonymous access? Will we allow anonymous access only on the endpoints that we know that are going to be used for any doc page?
I'm not sure if we should raise this topic again or not(*), but I tend to agree with Eric that it's good to have the API open. If you are designing a theme for Read the Docs, you won't be able to query anything via the API. Let's say that you want to create your custom version menu flyout:
/api/v3/projects/pip/versions/?active=true
won't be available. What we would do in these cases?
(*) is all about communication with users?
There was a problem hiding this comment.
There are some endpoints that need to be unauthed -- footer api, ads api, etc. But I'm 👎 on a public endpoints like /api/v3/projects listing. @davidfischer sums up the problems nicely.
I think requiring auth is pretty accepted API design at this point from a user perspective, and I don't feel that a public API provides enough value that we need to complicate our design for this use case. There is however a lot of value for us to keep things simple. If there is anything in particular that requires a public API, we should enumerate these though.
What we would do in these cases?
Use a API token :) Part of the intersphinx token work was to eventually pass in a generated token to the build process for this particular case, using our API from the build process.
| * ``project`` | ||
| * ``project:admin``: access to change admin settings (similar to Admin tab) | ||
| * ``project:read``: access to read all the information about the projects | ||
| * ``project:write``: access to anything related to perform write actions (triggering a Build for this project, activate/deactivate versions, etc) |
There was a problem hiding this comment.
How is this different than admin?
There was a problem hiding this comment.
I'm thinking on a scope that allows you to trigger a build, but not to change the name of the project. From my point of view, admin will allow you to change anything under the "Admin" tab of your project dashboard.
| Project details | ||
| +++++++++++++++ | ||
|
|
||
| .. http:get:: /api/v3/project/(string:slug)/ |
There was a problem hiding this comment.
👍 to not exposing ID. slug is the public facing unique identifier in my mind. Though it is possible that it could change over time (by an admin), but I think that's an edge case.
I wrote a document that describes all the things that we have been talking, mentions the pros/cons and raise some questions. It's probably not 100% complete but we can add more things, discuss about the topics in this PR and we can write the decisions we take after that if we consider or leave them in the comments of PR.
@eric could you expand on that? What I understand from there is basically "having access to the documentation pages via the API". If I'm right on that, I'm basing that on "token scopes and serving any output file via the API" (as first approach at least) like /api/v3/projects/pip/docs/?filename=/objects.inv or similar. I know it's not the best, though, and I'm not sure if I can just use nginx serve file from that endpoint or not... but in any case, I think it's a good first start. Let me know. |
|
|
||
|
|
||
| We do NOT want to cover | ||
| +++++++++++++++++++++++ |
There was a problem hiding this comment.
Probably, we don't want to create users.
| "repo_type": "git", | ||
| "default_version": "stable", | ||
| "default_branch": "master", | ||
| "documentation_type": "sphinx_htmldir", |
There was a problem hiding this comment.
This field needs to be removed because it's deprecated.
stsewd
left a comment
There was a problem hiding this comment.
I left some comments, didn't read the previous discussions, sorry if I'm repeating something
| We could extend the scopes to be per-project or per-user. I'm not | ||
| sure if this is possible yet, but we should consider it. | ||
|
|
||
| * per-project: the user could go to the Project Admin tab and create a Token for this specific project |
There was a problem hiding this comment.
I like this idea, similar to what we have for webhoooks (just access to /projects/)
| :>json array links: Array with HEATEOAS_ links to retrieve related information | ||
|
|
||
| :statuscode 200: Success | ||
| :statuscode 404: There is no ``Project`` with this slug |
There was a problem hiding this comment.
We could add a top level section with all the status code and what they return (at the beginning it says we always return a json)
| Project import | ||
| ++++++++++++++ | ||
|
|
||
| .. http:post:: /api/v3/project/import/ |
There was a problem hiding this comment.
why not a post to projects/ endpoint (not import)?
| Project edit | ||
| ++++++++++++ | ||
|
|
||
| .. http:patch:: /api/v3/project/(string:slug)/ |
There was a problem hiding this comment.
I believe is more common to put <string:slug>
There was a problem hiding this comment.
The syntax that I used comes from the .. http: directive and it uses the (string:slug) to show it nicer.
|
|
||
| This endpoint may be under Project section | ||
|
|
||
| .. http:post:: /api/v3/project/(string:slug)/builds/ |
There was a problem hiding this comment.
or under /projects/versions/builds?
There was a problem hiding this comment.
Without the slug?
If you missed the slug by mistake, I think it's not necessary to add /versions/ to the URL since "the project has builds" and each "build is tied to a version"
| "API_HOST": "readthedocs.org", | ||
| "MEDIA_URL": "https://media.readthedocs.org", | ||
| "PRODUCTION_DOMAIN": "readthedocs.io", | ||
| "READTHEDOCS": true |
There was a problem hiding this comment.
Those are the only fields with uppercase keys
There was a problem hiding this comment.
Yes, these come from the template context and I'm keeping compatibility/nomenclature from there.
| :query string theme: Theme used in the documentation | ||
| :query string docroot: Docroot used to generate external links | ||
| :query string source_suffix: Source suffix used to generate external links | ||
| :query string format: Output format (``jsonp``, ``json``) |
There was a problem hiding this comment.
I guess jsonp is a workaround for CORS on custom domains? I think we want a better way to handle that.
|
|
||
| * Rate limit | ||
| * ``Request-ID`` header | ||
| * `JSON minified by default`_ (maybe with ``?pretty=true``) |
There was a problem hiding this comment.
I guess users can just use an API client that does the job to prettified the response
| * Rate limit | ||
| * ``Request-ID`` header | ||
| * `JSON minified by default`_ (maybe with ``?pretty=true``) | ||
| * `JSON schema and validation`_ with docs_ |
There was a problem hiding this comment.
Wonder if there is a sphinx extension to generate docs from the schema
|
|
||
|
|
||
| .. _JSON minified by default: https://geemus.gitbooks.io/http-api-design/content/en/responses/keep-json-minified-in-all-responses.html | ||
| .. _JSON schema and validation: https://geemus.gitbooks.io/http-api-design/content/en/responses/keep-json-minified-in-all-responses.html |
|
|
||
| *See Build details* | ||
|
|
||
| :statuscode 201: Created sucessfully |
There was a problem hiding this comment.
Very much looking forward to this new API, thanks!
I suggest this returns 202 rather than 201 as "The request has been accepted for processing, but the processing has not been completed." (assuming the API is non-blocking of course)
Codecov Report
@@ Coverage Diff @@
## master #4863 +/- ##
==========================================
+ Coverage 76.41% 76.47% +0.05%
==========================================
Files 158 158
Lines 9990 9993 +3
Branches 1262 1262
==========================================
+ Hits 7634 7642 +8
+ Misses 2016 2009 -7
- Partials 340 342 +2
|
Codecov Report
@@ Coverage Diff @@
## master #4863 +/- ##
=========================================
Coverage ? 76.41%
=========================================
Files ? 158
Lines ? 9990
Branches ? 1262
=========================================
Hits ? 7634
Misses ? 2016
Partials ? 340 |
|
but even installing |
|
Looks like we can ignore it: |
|
We have a configuration file for this https://github.com/rtfd/readthedocs.org/blob/master/docs/.rstcheck.cfg |
stsewd
left a comment
There was a problem hiding this comment.
Isn't clear for me why we have the docs for the api and the design docs in the same PR, but the schema looks good.
|
|
||
|
|
||
| Build command details | ||
| +++++++++++++++++++++ |
There was a problem hiding this comment.
Not sure if we really need this one, I can't imagine why will want to list only one command, this info should be retrieved in /api/v3/projects/(str:project_slug)/builds/(int:build_id)/commands/
I'm not sure what to do here. We have On @ericholscher should I remove |
|
Perhaps we just move the actual user-facing docs into another PR where we can iterate on them with the code, like what Santos did for the YAML config. |
|
Don't we have a way of |
|
Looks like we can use this http://www.sphinx-doc.org/en/master/usage/configuration.html#confval-exclude_patterns |
BTW, that was kind of hard to maintain at the end :/, I would have preferred to put/update little chunks with the proper code. |
| .. sourcecode:: json | ||
|
|
||
| { | ||
| "version": "latest", |
There was a problem hiding this comment.
I'm doubting about this now.
Instead of sending an argument here, we could do:
/api/v3/projects/(string:project_slug)/builds/for triggering the default version configured (usuallylatest)/api/v3/projects/(string:project_slug)/versions/(string:version_slug)/builds/to trigger the specific version
This approach seems more clear and follow the pattern we are using, I guess.
There was a problem hiding this comment.
Build is a weird one, the global pk communicates that this is a top-level object, but it effectively is not in any application. I see a use case for the first endpoint in the case of querying a build, but in the case of triggering a build, i'd definitely vote for the second option. In the case of querying a build, we don't necessarily care what the version is, but it might still help to be explicit about that relationship in our API modeling -- at least just for consistency?
There was a problem hiding this comment.
Let's move this to a new issue though, the conversation here has dragged this PR out.
There was a problem hiding this comment.
In the case of querying a build, we don't necessarily care what the version is, but it might still help to be explicit about that relationship in our API modeling -- at least just for consistency?
This was where I was doubting also. I like consistency but maybe that purity complicates the usage from a user perspective. Although, if we provide good links on the responses with the related object this shouldn't be a problem.
There was a problem hiding this comment.
That's true too, yeah. Maybe also worth considering, long term, we might want to change the Build model pk to a compound key, in which case these two would be different builds:
/api/v3/projects/foo/versions/latest/build/1/api/v3/projects/foo/versions/bar/build/1
The fact that we expose build pk as a global pk has always been sort of odd.
There was a problem hiding this comment.
I like this idea.
Do we need to "modify the Build model pk"? Can't we make this an API identifier only for now?
1would be the first (sorted by date) Build for this version3would be the third one-1would be the last one
Internally, we just sort the queryset (by date or -date) and pick the N element.
What do you think?
|
I removed the APIv3 user documentation from the toctree so it's not linked anywhere, but it's still available (with a warning message saying that's it not implemented yet) by hitting it the right URL manually. I think we can merge it as is (this PR is insane with the amount of comments), and update the missing pieces/updates on #5356 (most of this PR is based on |
|
This has enough reviews to merge, feel free to merge if you think it's ready and we're tracking unanswered questions in another PR or issue. |
|
@humitos Thanks for the work on this. Were you aware there was a bounty for the feature to enable project creation by API? Please feel free to claim the bounty. |
|
@jaraco nice! (I wasn't aware) I was paid by Read the Docs itself to do this work, so it may make more sense to donate this back to them 😄 On the other hand, I would appreciate a lot if you have some feedback to share with us about APIv3 and if it's has been being useful to achieve your needs. |
The way BountySource works, an individual needs to claim the bounty, at which point I'll approve. I don't have the ability to withdraw a bounty and thereby contribute it elsewhere. I would very much like for you or an RTD owner to claim the bounty. Otherwise, it will just linger there indefinitely. Would you be willing to take the lead in finding someone to claim it?
It has been useful. It's been useful in updating the default branch, a routine I've used to facilitate the renaming of default branch across 100+ repos without having to click through a web UI. I still have plans to utilize the project creation API to automatically enroll projects mechanically, but I haven't actually implemented that yet. |
Initial ideas to discuss the implementation of APIv3.
This is a work in progress PR that only has some pieces of the documentation with the idea to start a discussion around the general proposal.
Closes #4286