Fix: persist environments with long IDs safely#948
Open
omkarsureshs wants to merge 2 commits intofossasia:masterfrom
Open
Fix: persist environments with long IDs safely#948omkarsureshs wants to merge 2 commits intofossasia:masterfrom
omkarsureshs wants to merge 2 commits intofossasia:masterfrom
Conversation
- Truncate environment names longer than 200 characters - Append MD5 hash for uniqueness - Update load_env to handle truncated filenames - Fix incorrect path in load_env (eid/.json -> eid.json) Fixes fossasia#648
Reviewer's GuideUpdates Visdom environment serialization to safely persist and reload environments whose IDs exceed typical filesystem filename limits by truncating long IDs, appending a hash-based suffix for uniqueness, and fixing the load path logic. Sequence diagram for environment save and load with safe filenamessequenceDiagram
actor User
participant VisdomServer
participant serialize_env
participant load_env
participant Filesystem
User->>VisdomServer: request_save_env(eids)
VisdomServer->>serialize_env: serialize_env(state, eids, env_path)
loop for_each_env_id
serialize_env->>serialize_env: check_length(env_id)
alt env_id_length <= 200
serialize_env->>serialize_env: env_path_file = env_path/env_id.json
else env_id_length > 200
serialize_env->>serialize_env: hash_part = md5(env_id)[:10]
serialize_env->>serialize_env: safe_env_id = env_id[:190] + _ + hash_part
serialize_env->>serialize_env: env_path_file = env_path/safe_env_id.json
end
serialize_env->>Filesystem: open(env_path_file, write)
serialize_env->>Filesystem: write(env_state_json)
Filesystem-->>serialize_env: file_written
end
serialize_env-->>VisdomServer: save_complete
User->>VisdomServer: request_load_env(eid)
VisdomServer->>load_env: load_env(state, eid, socket, env_path)
alt eid_in_memory
load_env-->>VisdomServer: env_from_state
else eid_not_in_memory
load_env->>load_env: p = env_path/eid.json
alt file_exists(p)
load_env->>Filesystem: open(p, read)
Filesystem-->>load_env: env_json
load_env-->>VisdomServer: env_from_file
else not_exists(p) and len(eid) > 200
load_env->>load_env: hash_part = md5(eid)[:10]
load_env->>load_env: safe_eid = eid[:190] + _ + hash_part
load_env->>load_env: p = env_path/safe_eid.json
alt file_exists(p)
load_env->>Filesystem: open(p, read)
Filesystem-->>load_env: env_json
load_env-->>VisdomServer: env_from_truncated_file
else file_not_found
load_env-->>VisdomServer: env_not_found
end
end
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- The filename truncation and hashing logic (including the 200/190 limits and MD5 suffix) is duplicated between
serialize_envandload_env; consider extracting a shared helper (e.g.,get_env_filename(eid)) so behavior stays consistent if you ever adjust the strategy. - You import
hashlibinside bothserialize_envandload_env; moving this to a single top-level import would simplify the functions and avoid repeated imports. - In
serialize_envyou introduceMAX_FILENAME_LENGTH = 200butload_envuses the literal200and190; using the same named constant (or helper) in both places would reduce the risk of mismatches if the limit needs to change later.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The filename truncation and hashing logic (including the 200/190 limits and MD5 suffix) is duplicated between `serialize_env` and `load_env`; consider extracting a shared helper (e.g., `get_env_filename(eid)`) so behavior stays consistent if you ever adjust the strategy.
- You import `hashlib` inside both `serialize_env` and `load_env`; moving this to a single top-level import would simplify the functions and avoid repeated imports.
- In `serialize_env` you introduce `MAX_FILENAME_LENGTH = 200` but `load_env` uses the literal `200` and `190`; using the same named constant (or helper) in both places would reduce the risk of mismatches if the limit needs to change later.
## Individual Comments
### Comment 1
<location> `py/visdom/utils/server_utils.py:121-122` </location>
<code_context>
if env_path is not None:
for env_id in env_ids:
- env_path_file = os.path.join(env_path, "{0}.json".format(env_id))
+ # Truncate long environment IDs to prevent filesystem errors
+ MAX_FILENAME_LENGTH = 200 # Leave room for .json extension
+ if len(env_id) > MAX_FILENAME_LENGTH:
+ # Keep first 190 chars + 10 char hash for uniqueness
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid re-defining truncation constants inline and make the filename-length logic self-consistent.
The current math doesn’t match the comment: `190 + 1 (underscore) + 10 (hash) + 5 (.json) = 206`, so `MAX_FILENAME_LENGTH = 200` isn’t actually respected. Also, the truncation values (`200`, `190`, `10`) are hard-coded here and in `load_env`. Consider centralizing this into a helper that, given an `env_id`, returns a safe filename using a single `MAX_FILENAME_LENGTH`, and derives the slice length from that constant instead of duplicating magic numbers.
Suggested implementation:
```python
def _safe_env_filename(env_id, max_length=200, extension=".json"):
"""
Return a filesystem-safe filename for the given env_id, enforcing max_length.
The resulting filename (including extension) will not exceed max_length
characters. If truncation is required, a hash suffix is appended to
preserve uniqueness.
"""
base_length = max_length - len(extension)
if len(env_id) <= base_length:
return f"{env_id}{extension}"
# Reserve space for "_{hash}"
HASH_LEN = 10
SEP_LEN = 1 # "_"
prefix_length = base_length - (HASH_LEN + SEP_LEN)
if prefix_length <= 0:
# Fallback: if configuration is inconsistent, just hash the whole id
import hashlib
hash_part = hashlib.md5(env_id.encode()).hexdigest()[:HASH_LEN]
return f"{hash_part}{extension}"
import hashlib
hash_part = hashlib.md5(env_id.encode()).hexdigest()[:HASH_LEN]
safe_env_id = f"{env_id[:prefix_length]}_{hash_part}"
return f"{safe_env_id}{extension}"
def serialize_env(state, eids, env_path=DEFAULT_ENV_PATH):
env_ids = [i for i in eids if i in state]
if env_path is not None:
for env_id in env_ids:
safe_filename = _safe_env_filename(env_id)
env_path_file = os.path.join(env_path, safe_filename)
with open(env_path_file, "w") as fn:
if isinstance(state[env_id], LazyEnvData):
fn.write(json.dumps(state[env_id]._raw_dict))
```
There is similar truncation logic in `load_env` that should also use `_safe_env_filename` for consistency. Update that code to:
1. Call `_safe_env_filename(env_id)` (or derive the safe env id from the returned filename if needed) instead of duplicating the truncation logic and magic numbers.
2. Ensure any tests that assume a specific truncation pattern are updated to use the centralized helper so they remain aligned with the new behavior.
</issue_to_address>
### Comment 2
<location> `py/visdom/utils/server_utils.py:396-401` </location>
<code_context>
- p = os.path.join(env_path, eid.strip(), ".json")
+ # Try to find the environment file
+ # First try exact filename
+ p = os.path.join(env_path, "{0}.json".format(eid.strip()))
+
+ # If not found and env_id is long, try truncated version
+ if not os.path.exists(p) and len(eid) > 200:
+ import hashlib
+ hash_part = hashlib.md5(eid.encode()).hexdigest()[:10]
+ safe_eid = eid[:190] + "_" + hash_part
+ p = os.path.join(env_path, "{0}.json".format(safe_eid))
</code_context>
<issue_to_address>
**issue:** Align whitespace handling between the saved filenames and the load-time lookup.
The first lookup strips whitespace (`eid.strip()`), but the hashed fallback and `serialize_env` both use the raw `env_id`. This leads to inconsistent filenames when IDs have leading/trailing spaces: plain lookups use the stripped name, while saved and hashed filenames use the unstripped one. Normalize the ID once (strip or not) and use that consistently for both save and load paths, including the hashed fallback.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
- Extract shared _safe_env_filename helper function - Fix math error in filename length calculation (190+1+10+5=206 > 200) - Use consistent whitespace stripping - Move hashlib import to top of file - Remove duplicate truncation logic Maintains same functionality with cleaner implementation.
Author
|
@fossasia/visdom-maintainers @maintainers Could someone familiar with The changes:
This fixes the silent failure when saving environments with long names. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
This PR fixes issue #648: "Envs with >255 chars do not get saved".
When environment names (eid) exceed filesystem filename length limits (typically 255 characters), the
serialize_envfunction fails silently because it cannot create the corresponding.jsonfile. This fix ensures environments with arbitrarily long names are persisted safely.Technical Changes:
load_envfunction is updated to locate truncated filesload_env(changed fromeid/.jsontoeid.json)Implementation Details:
.jsonextension and directory paths){first_190_chars}_{10_char_hash}.jsonpy/visdom/utils/server_utils.pyMotivation and Context
Filesystems impose filename length limitations (255 bytes on Linux ext4, 260 characters total path length on Windows). Without this fix:
This addresses a real-world usability issue where environment names naturally exceed safe limits.
How Has This Been Tested?
Test Environment:
Test Cases:
"test_env"- Saves astest_env.json(unchanged behavior)"a" * 300- Saves as truncated file with hash suffixTest Results:
Example: