This document tracks technical debt, refactoring opportunities, and optimization tasks identified in the MCR codebase.
- Not started
- Completed
- [~] In progress
- Extract shared validation module (
src/validation.rs) - Extract ProtocolState to
src/supervisor/protocol_state.rs - Add safe socket creation wrappers (
create_raw_ip_socket,create_packet_socket,create_ioctl_socket) - Add safe setsockopt wrappers (
set_ip_hdrincl,set_ip_pktinfo,set_packet_auxdata,join_multicast_group) - Add eventfd wrapper (
create_eventfd) - Add multicast socket option wrappers (
set_multicast_if_by_index,set_multicast_if_by_addr,set_multicast_ttl) - Refactor protocol_state.rs to use safe wrappers
- Refactor adaptive_wakeup.rs tests to use eventfd wrapper
- Refactor unified_loop.rs to use multicast wrappers
Risk: Protocol failures, route timeouts, silent failures
-
H4.1 Missing timer when adding downstream to existing (*,G) route - BUG
- File:
src/supervisor/protocol_state.rslines 561-576 - Issue: New route creation scheduled PimJoinPrune timer, but adding downstream to EXISTING route did NOT schedule timer
- Fix: Added timer scheduling to existing route update path (mirrors new route path)
- File:
-
H4.2 Incomplete passive IGMP route handling - BUG
- File:
src/supervisor/protocol_state.rslines 752-791 - Issue: Passive IGMP path lacked downstream update logic for existing routes
- Fix: Added else branch mirroring active IGMP path
- File:
-
H4.3 Critical error handling - silent failures on critical paths
- File:
src/supervisor/protocol_state.rs - Issues fixed:
- Line 2046: MSDP Connect command - added warning log on failure
- Line 2094: MSDP Keepalive command - added warning log on failure
- Line 2129: MSDP Disconnect command - added warning log on failure
- Notes:
- Lines 295-302: PIM Hello timer already logs warning (acceptable)
- Line 2628: Already uses
?for proper error propagation (not a bug)
- File:
Risk: Memory exhaustion under attack or misconfiguration
- H1.1 Add max capacity to
rulesHashMap in unified_loop.rs- Added
max_rulesconfig (default 10k), rejects when limit reached
- Added
- H1.2 Add max capacity to
flow_counterswith LRU eviction- Added
max_flow_countersconfig (default 100k), evicts lowest packet count
- Added
- H1.3 Add max capacity to
egress_socketsHashMap- Added
max_egress_socketsconfig (default 10k), evicts at capacity
- Added
- H1.4 Add metrics for collection sizes
- Added
rules_rejected,flow_counters_evicted,egress_sockets_evictedstats
- Added
Risk: Unsafe code in socket helpers is untested
- H2.1 Add unit tests for socket creation functions (3 tests + 4 ignored)
- H2.2 Add unit tests for socket option functions (2 tests + 7 ignored)
- H2.3 Add unit tests for eventfd wrapper (6 tests)
- H2.4 Add unit tests for interface lookup functions (2 tests)
- Ignored tests require CAP_NET_RAW:
cargo test -- --ignored
- Ignored tests require CAP_NET_RAW:
Risk: O(n) interface scan on every socket operation
- H3.1 Create InterfaceCache struct with TTL-based refresh (30s default)
- H3.2 Provide O(1) lookup methods:
get_index,get_capability,get_name_by_index - H3.3 Add global singleton via
global_interface_cache() - H3.4 Add netlink-based cache invalidation on interface changes ✓ COMPLETED (Jan 2025)
- Added
netlink_monitormodule using rtnetlink crate - Monitors RTM_NEWLINK and RTM_DELLINK events via RTMGRP_LINK multicast group
- Automatically refreshes interface cache when interfaces are added/removed/changed
- TTL-based refresh remains as fallback
- Added
Impact: 30+ lines of repetitive error handling
- M1.1 Create
check_libc_result()helper function - M1.2 Refactor 9 socket option functions to use helper
set_ip_hdrincl,set_ip_pktinfo,set_packet_auxdatajoin_multicast_group,set_multicast_if_by_index,set_multicast_if_by_addrset_multicast_ttl,set_bind_to_device,set_tcp_nodelay- Note:
set_recv_buffer_sizehas different return type, kept as-is
Impact: ~20 lines duplicated
- M2.1 Extract
extract_interface_capability()helper - M2.2 Refactor
get_interface_capability()to use helper (14 → 4 lines) - M2.3 Refactor
get_multicast_capable_interfaces()to use helper (14 → 6 lines)
Impact: Mixed error types across codebase
- M3.1 Document error handling strategy
- Decision:
io::Errorfor I/O operations,anyhowfor configuration/setup - MSDP TCP uses
io::Resultappropriately for async I/O
- Decision:
- M3.2 MSDP TCP error handling - KEPT AS-IS
io::Error::new()with specific ErrorKinds is correct for network I/O
- M3.3 Bare error returns in
pre_exec- KEPT AS-IS- API constraint:
pre_execclosures must returnio::Result
- API constraint:
Impact: Maintainability
- M4.1 Client command handler - ALREADY EXTRACTED
handle_client()function exists at line 599
- M4.2 Rule sync logic - ALREADY EXTRACTED
sync_rules_to_workers()function exists at line 252
- M4.3 Extract protocol event processing (optional)
ProtocolCoordinator::process_pending_events()exists but inline in run()- Low priority: further extraction adds complexity without clear benefit
- M4.4 TODO at line 1723-1726 - DOCUMENTED
- Issue: Worker count vs lazy socket creation
- Plan: Implement lazy AF_PACKET socket creation per interface
Impact: Reliability
- M5.1 Add protocol socket creation tests (requires CAP_NET_RAW)
- Covered by integration tests in
tests/integration/
- Covered by integration tests in
- M5.2 Buffer pool tests - ALREADY COMPREHENSIVE
test_pool_exhaustion,test_concurrent_acquire_release_simulationtest_zero_capacity_pool,test_arc_cloning
- M5.3 Command reader tests - ALREADY EXIST
test_frame_too_large,test_invalid_jsontest_partial_frame_data,test_partial_frame_length
Impact: Unnecessary allocations eliminated in command broadcast paths
-
M6.1 Audit clone operations in supervisor/mod.rs
- Identified hot-path clones: cmd_bytes.clone() in broadcast loops
- Identified unnecessary clone: interface_rules.clone() when ownership could be moved
-
M6.2 Refactor hot-path clones to use Bytes (Arc-based)
- Added
bytescrate dependency - Converted
Vec<u8>toBytesupfront in 4 broadcast paths Bytes::clone()is O(1) - just increments Arc refcount- Removed unnecessary
interface_rules.clone()(moved ownership instead) - Removed unnecessary
all_rules.clone()(moved out of RelayCommand)
- Added
-
M6.3 Files modified:
Cargo.toml: Addedbytes = "1"dependencysrc/supervisor/mod.rs: 10 clone sites optimized
Impact: ~160 lines of duplicated code reduced
-
M7.1 Add
impl From<&StarGState> for StarGRoute- File:
src/mroute.rs - Eliminated 5 duplicate StarGRoute creation sites
- File:
-
M7.2 Add
impl From<&SGState> for SGRoute- File:
src/mroute.rs - Eliminated 1 duplicate SGRoute creation site
- File:
Impact: 7 levels of nesting, hard to read/maintain, 95% code duplication
-
M8.1 Extract helper:
create_star_g_route_state()- Creates StarGState with group, rp, upstream, downstream
-
M8.2 Extract helper:
send_pim_star_g_join()- Determines upstream neighbor, builds Join packet, schedules timer
-
M8.3 Extract helper:
determine_pim_upstream_neighbor()- RP direct neighbor check with fallback logic (lines 483-495, 686-697)
-
M8.4 Refactor IGMP handler to use early returns
- Reduce nesting from 7 levels to 2-3 levels
- Consolidate active (407-565) and passive (623-735) paths
Impact: Reduced 326-line function with 13 timer types to ~80 lines + 7 helper methods
-
M9.1 Extract IGMP timer handlers
handle_igmp_general_query_timer()- ~40 lineshandle_igmp_group_query_timer()- ~40 lines
-
M9.2 Extract PIM timer handlers
handle_pim_join_prune_timer()- ~55 lines (highest priority)handle_pim_hello_timer()- ~35 lines
-
M9.3 Extract MSDP timer handlers
handle_msdp_connect_retry_timer()- ~40 lineshandle_msdp_keepalive_timer()- ~45 lineshandle_msdp_hold_timer()- ~30 lines
-
Result:
handle_timer_expired()now delegates to focused helper methods, improving readability and testability. Each timer type has clear separation of concerns.
Impact: Maintenance burden
-
L1.0 Remove dead adaptive_wakeup module ✓ COMPLETED (Jan 2025)
- Deleted
src/worker/adaptive_wakeup.rs(435 lines) - Orphaned code from legacy two-thread data plane (ingress.rs/egress.rs removed Nov 2025)
- Current unified_loop.rs uses io_uring submit_and_wait() directly
- Deleted
-
L1.1 Remove unused socket helper functions ✓ COMPLETED (Jan 2025)
- Removed
set_bind_to_device()andset_tcp_nodelay()(~35 lines) - Kept
get_multicast_capable_interfaces()- used in tests, useful public API
- Removed
-
L1.2 Remove duplicate constant ✓ COMPLETED (Jan 2025)
- Removed duplicate
ALL_PIM_ROUTERSfrom protocol_state.rs - All usages already import from
crate::protocols::pim
- Removed duplicate
-
L1.3 Evaluate unused diagnostic field ✓ REVIEWED (Jan 2025)
interface_namefield in unified_loop.rs - KEPT- Intentionally reserved for future diagnostic/logging use (documented)
-
L1.4 Remove legacy msdp_tcp function ✓ COMPLETED (Jan 2025)
- Removed
process_connection_messages()(~40 lines) - Replaced by
process_reader_messages()which is actively used
- Removed
Impact: Minor performance improvement
-
L2.1 Create string constants for common responses
- File:
src/supervisor/command_handler.rs - Replace
.to_string()calls with constants where possible
- File:
-
L2.2 Consider
Cow<'static, str>for Response type- Allows both static and dynamic strings without allocation
Impact: Build size and complexity
-
L3.1 Remove tokio-uring dependency ✓ COMPLETED (Jan 2025)
- Removed
tokio-uring- worker only used it as runtime wrapper - Worker async code uses standard tokio features (UnixStream)
- Data plane uses
io_uringcrate directly for performance - Eliminated duplicate io-uring versions (v0.6.4 via tokio-uring, v0.7.11 direct)
- Removed
-
L3.3 Remove unused/replaceable dependencies ✓ COMPLETED (Jan 2025)
- Removed
sysinfo(completely unused) - Removed
logcrate (replaced 4error!()calls with customlog_error!macro) - Removed
num_cpus(replaced withstd::thread::available_parallelism()) - Reduced direct dependencies from 21 to 18
- Reduced dependency tree from 109 to 103 crates
- Removed
-
L3.4 Replace pnet with nix::ifaddrs ✓ COMPLETED (Jan 2025)
- pnet contributed 28 unique crates to dependency tree (including regex, proc-macros)
- Only used for
pnet::datalink::interfaces()to enumerate network interfaces - Created
get_interfaces()usingnix::ifaddrs::getifaddrs()(nix already a dependency) - Added
NetworkInterfaceandIpNetworktypes as pnet replacements - Reduced direct dependencies from 18 to 17
- Reduced dependency tree from 239 to 211 crates
-
L3.2 Document error crate usage
- Clarify when to use anyhow vs thiserror
- Add to contributing guidelines
Impact: Onboarding and maintenance
-
L4.1 Document unsafe code rationale ✓ COMPLETED (Jan 2025)
- Reduced unsafe blocks from 72 to 63 (9 removed)
- Removed unused
create_eventfd()wrapper (production usesnix::sys::eventfd::EventFd) - Replaced
libc::if_indextonamewithnix::net::if_::if_indextonamein protocol_state.rs - Added SAFETY comments to ringbuffer.rs lock-free operations
- Added SAFETY comments to socket_helpers.rs socket creation and bind operations
-
L4.2 Document socket_helpers API
- Add module-level documentation
- Document when to use each function
| Metric | Current | Target | Status |
|---|---|---|---|
| Unsafe blocks | 63 | <50 | In progress (was 72) |
| Test coverage (socket_helpers) | 31 tests | >80% | ✓ Done |
| Duplicate error patterns | 5 | 0 | ✓ Reduced (was 14) |
| Dead code functions | 0 | 0 | ✓ Done |
| Metric | Current | Target | Status |
|---|---|---|---|
| Interface lookup | O(1) cached | O(1) cached | ✓ Done |
| Max rules per worker | 10k (configurable) | Configurable limit | ✓ Done |
| Max flow counters | 100k (configurable) | Configurable limit | ✓ Done |
| Max egress sockets | 10k (configurable) | Configurable limit | ✓ Done |
- Validation module - Centralized address/interface/port validation
- ProtocolState extraction - Reduced supervisor/mod.rs from 6,798 to 3,037 lines
- Socket wrappers - Eliminated ~210 lines of unsafe code duplication
- Eventfd wrapper - Simplified eventfd creation in socket_helpers.rs
- Multicast wrappers - Consolidated IP_MULTICAST_IF/TTL handling
- Bounded collections (H1) - Memory-safe limits with eviction for rules, flows, sockets
- Socket_helpers tests (H2) - 37 new tests (22 passing, 11 ignored requiring CAP_NET_RAW)
- Interface cache (H3) - O(1) lookups via InterfaceCache with 30s TTL
- Error handling helpers (M1) -
check_libc_result()eliminated ~50 lines duplication - Interface capability helper (M2) -
extract_interface_capability()reduced ~20 lines - Dead code removal (L1.0) - Deleted adaptive_wakeup.rs (435 lines of orphaned code)
- Dead code removal (L1.1-L1.4) - Removed unused socket helpers, duplicate constant, legacy function (~80 lines)
- Unsafe code audit (L4.1) - Reduced unsafe blocks 72→63, added SAFETY comments, replaced libc calls with nix wrappers
- Netlink monitor (H3.4) - Real-time interface change detection via RTMGRP_LINK multicast group
- Dependency reduction (L3.3) - Removed sysinfo, log, num_cpus; reduced deps 21→18, tree 109→103
- pnet removal (L3.4) - Replaced pnet with nix::ifaddrs for interface enumeration; reduced tree by 28 crates (239→211)
- Prefer safe wrappers over raw unsafe blocks
- Add tests before refactoring critical code
- Document safety invariants for remaining unsafe code
- Use
anyhow::Contextfor error chain context