Skip to content

feat: send files directly from bytes#260

Draft
EdJoPaTo wants to merge 21 commits intomasterfrom
edjopato/filehandling
Draft

feat: send files directly from bytes#260
EdJoPaTo wants to merge 21 commits intomasterfrom
edjopato/filehandling

Conversation

@EdJoPaTo
Copy link
Collaborator

No description provided.

Comment on lines 94 to 95
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While this PR seems awesome at first glance it turns out as quite a performance mess with bigger files. The params are parsed with serde_json which involves parsing megabytes of data (for InputFile::bytes) only to throw them away as they are handled via the files Vec rather than the params.

Possible ideas: on every method that sends files the InputFile needs to be replaced with an (empty?) String to ensure there isn't much data touched for the params. As that are references currently, it will require a clone, so the InputFile will be at least twice in memory. With self-hosted bot API servers that can be quite a lot of memory usage.

Alternatively, the params of API methods which handle files need lifetimes and only a reference to the actual data. Then the clones are way cheaper. Also, the params could be moved rather than referenced into the methods which allows for modifying them instead of cloning. If the developer using this library requires the params multiple times, they can still clone it themselves which would still be pretty cheap.
I tinkered with std::borrow::Cow but that doesn't change the fact that it requires the same: lifetimes for all the Param structs. ('static is pretty useless in this case as most stuff would require Cow::Owned then again → nothing won.)

An alternative might be proc macros knowing which variable in a struct is what. So InputFile wouldn't end up in the params in the first place. But I think this is another rabbit hole and which some more complex file handling methods, probably very messy. → probably not a good idea.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick Idea: Does skip serializing work with a certain enum type? (Option::None works) Then bytes could be skipped to be serialized?

Conflicts:
	examples/async_file_upload.rs
	src/client_reqwest.rs
	src/client_ureq.rs
Conflicts:
	src/trait_async.rs
	src/trait_sync.rs
removing that is part of another PR
Conflicts:
	src/api_params.rs
	src/client_ureq.rs
	src/trait_async.rs
	src/trait_sync.rs
Conflicts:
	src/client_ureq.rs
	src/macros.rs
	src/trait_async.rs
	src/trait_sync.rs
@EdJoPaTo EdJoPaTo linked an issue Mar 5, 2025 that may be closed by this pull request
Comment on lines 118 to 122
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This bytes.clone() is kinda horrible. Even with the rest of this approach being refactored into references, reqwest still requires a clone here.
And for 1.5 GB file data, this is definitely a lot.

The best we could do would require the ownership and pass it over to reqwest so it's never cloned by frankenstein and only explicitly by the user of frankenstein when needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ureq would work with references while reqwest doesn't. But frankenstein requires either both with references or none.

@EdJoPaTo EdJoPaTo force-pushed the edjopato/filehandling branch from 95ecdc8 to 1ff7f49 Compare March 11, 2025 01:45
When someone wants to upload a file the FileUpload type is required.
They can read about it in the autogenerated docs themselves.
No need to manually add that to the README again.
@EdJoPaTo
Copy link
Collaborator Author

EdJoPaTo commented Mar 13, 2025

The current state is definitely useable but kinda annoying due to the clones and the implicit serialization as parameters. Its not a big deal with PathBuf but with binary data it has at least 2 instances of it in memory, some methods even more. Sending some MB is totally fine but sending 1.5 GB files is not.

I am thinking about putting the binary stuff behind a feature flag. That way it's useable but not an easy foot gun on bigger files.

The alternative is to take the ownership of all parameters instead of taking their references. That's breaking but would prevent such huge data duplications in memory. And when it should be duplicated, the user actively has to call clone themselves.

A lite step of that would be to take only the ownership for methods that involve files. But its probably a weird interface then as it differs between methods.

Not sure which is the better way forward with this PR.

EdJoPaTo added 6 commits April 8, 2025 13:06
Conflicts:
	examples/api_trait_implementation.rs
	examples/file_upload.rs
They might not be perfectly accurate now (might not accept URLs or
file_id) but they are far easier to handle with macros.

Fixes setWebhook which never correctly uploaded certificate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

send file using bytes

1 participant

Comments