feat: Bump igraph format version to 1.5.0#835
Conversation
a10729a to
1e851ba
Compare
1e851ba to
f90bdaf
Compare
f90bdaf to
e241fb5
Compare
|
@szhorvat @ntamas @GroteGnoom @vtraag @Antonov548: I believe the current storage format in main satisfies all requirements, with this PR. To keep things simpler moving forward, we could indeed switch to integer format versions. I checked, This format also seems robust against future changes. Do you see substantially better alternatives? The description of this PR also contains the format description. We should think about how to consolidate it. As a side effect, we now have an image where we can run all old igraph versions. I used Ubuntu 18.04, more work will be needed to base it on 22.04. But we can also freeze it and create a new image based on 22.04 and go back in time as far as is possible without insulting the compiler. |
|
For completeness, a code that displays what happened between the individual versions. It appears that we had only three meaningful changes over the entire history:
options(conflicts.policy = list(warn = FALSE))
library(tidyverse)
source("R/versions.R")
source("R/old-0_1_1.R")
source("R/old-0_2.R")
source("R/old-0_5.R")
source("R/old-0_6.R")
source("R/old-1_0_0.R")
source("R/old-1_5_0.R")
samples <- oldsamples()
scrub <- function(g) {
cl <- class(g)
g <- unclass(g)
if (length(g) >= 10) {
g[[10]] <- as.list(g[[10]], all.names = TRUE)
if (!is.null(g[[10]]$myid)) {
g[[10]]$myid <- "<myid>"
}
if (!is.null(g[[10]]$me)) {
g[[10]]$me <- "<me>"
}
}
attr(g, "klass") <- cl
g
}
transitions <-
samples |>
enframe("version", "result") |>
mutate(result_scrubbed = map(result, scrub), .keep = "unused") |>
mutate(
prev_version = lead(version),
prev_result_scrubbed = lead(result_scrubbed),
) |>
mutate(compare = map2(prev_result_scrubbed, result_scrubbed, waldo::compare)) |>
transmute(transition = paste0(prev_version, " -> ", version), compare) |>
head(-1) |>
tibble::deframe() |>
rev()
names(transitions)
#> [1] "0.1.1 -> 0.2" "0.2 -> 0.5" "0.5 -> 0.6" "0.6 -> 1.0.0"
#> [5] "1.0.0 -> 1.5.0"
transitions
#> $`0.1.1 -> 0.2`
#> `old` is length 11
#> `new` is length 9
#>
#> `old[[3]]`: 0 0 1 1 2 2
#> `new[[3]]`: 0 1 2
#>
#> `old[[4]]`: 1 2 2 0 0 1
#> `new[[4]]`: 1 2 0
#>
#> `old[[5]]`: 1 0 3 2 5 4
#> `new[[5]]`: 0 1 2
#>
#> `old[[6]]`: 4 3 5 0 2 1
#> `new[[6]]`: 2 0 1
#>
#> `old[[7]]`: 0 2 4 6
#> `new[[7]]`: 0 1 2 3
#>
#> `old[[8]]`: 0 2 4 6
#> `new[[8]]`: 0 1 2 3
#>
#> `old[[9]]` is length 3
#> `new[[9]]` is length 4
#>
#> `old[[9]][[1]]`: 1
#> `new[[9]][[1]]`: 1 0
#>
#> `old[[9]][[2]]` is a character vector ()
#> `new[[9]][[2]]` is a list
#>
#> And 4 more differences ...
#>
#> $`0.2 -> 0.5`
#> `names(old[[9]][[2]])` is absent
#> `names(new[[9]][[2]])` is a character vector ()
#>
#> $`0.5 -> 0.6`
#> `old[[9]][[1]]`: 1 0
#> `new[[9]][[1]]`: 1 0 1
#>
#> `old[[9]][[2]]` is length 0
#> `new[[9]][[2]]` is length 3
#>
#> `names(old[[9]][[2]])`:
#> `names(new[[9]][[2]])`: "name" "mutual" "circular"
#>
#> `old[[9]][[2]]$name` is absent
#> `new[[9]][[2]]$name` is a character vector ('Ring graph')
#>
#> `old[[9]][[2]]$mutual` is absent
#> `new[[9]][[2]]$mutual` is a logical vector (FALSE)
#>
#> `old[[9]][[2]]$circular` is absent
#> `new[[9]][[2]]$circular` is a logical vector (TRUE)
#>
#> $`0.6 -> 1.0.0`
#> `old` is length 9
#> `new` is length 10
#>
#> `old[[10]]` is absent
#> `new[[10]]` is a list
#>
#> $`1.0.0 -> 1.5.0`
#> `old[[5]]` is a double vector (0, 1, 2)
#> `new[[5]]` is NULL
#>
#> `old[[6]]` is a double vector (2, 0, 1)
#> `new[[6]]` is NULL
#>
#> `old[[7]]` is a double vector (0, 1, 2, 3)
#> `new[[7]]` is NULL
#>
#> `old[[8]]` is a double vector (0, 1, 2, 3)
#> `new[[8]]` is NULL
#>
#> `old[[10]]$.__igraph_version__.`: "0.8.0"
#> `new[[10]]$.__igraph_version__.`: "1.5.0"Created on 2023-06-11 with reprex v2.0.2 |
Checked with https://github.com/krlmlr/rigraph-forensics.
The igraph data format
List of length 10 with a
"class"attributeigraph_t_idx_n:numeric(1), number of verticesigraph_t_idx_directed:logical(1), is the graph directed?igraph_t_idx_from:numeric(), "from" vertex ID of each edge in orderigraph_t_idx_to:numeric(), "to" vertex ID of each edge in orderigraph_t_idx_oi: auxiliary indexes,numeric()in <= 1.4.3,NULL(ignored if present) in 1.5.0igraph_t_idx_ii: auxiliary indexes,numeric()in <= 1.4.3,NULL(ignored if present) in 1.5.0igraph_t_idx_os: auxiliary indexes,numeric()in <= 1.4.3,NULL(ignored if present) in 1.5.0igraph_t_idx_is: auxiliary indexes,numeric()in <= 1.4.3,NULL(ignored if present) in 1.5.0igraph_t_idx_attr: attribute structureigraph_t_idx_env: environment, gains a new entry"igraph"in 1.5.0Requirements
Old versions should not misbehave with the new format. Examples of bad behaviour: silent wrong result, crash, memory leak. Examples of acceptable behaviour: immediate error (even if unintuitive), correct operation. This applies to each function in old versions, different functions may behave differently.
Tested with all old versions with the 1.5.0 format: unintuitive error (due to components 5-8 being
NULL), see, e.g., https://github.com/krlmlr/rigraph-forensics/actions/runs/5229054061/jobs/9441807366#step:6:2724:Future versions should support all older format versions, either by doing an automatic upgrade, or by instructing the user to upgrade manually.
upgrade_graph()if possible, supported for all inputs from igraph >= 0.2. The in-place upgrade is possible becauseigraph_t_idx_envis an environment that can be updated without changing the object that contains it.Proper format versioning and checks: Starting with the next version, igraph should check the format of an object, and should be able to decide whether that format is compatible and act accordingly (accept or reject the object, and advise the user with a clear message).
graph_version()returns an object of class"package_version"#832Support future changes: Future format modifications should be feasible and easy. It should be very clear to a future maintainer how they can execute a format change without breaking these requirements.
Long-term suitability: let’s try to get it right so we can go as long as possible without another format change, even if the C-side format is modified.
unclass(g)[[10]]$igraph). If this changes, e.g., toigraph2after an ABI update, users won't crash even in that rare case.Platform independent: An igraph object saved on one computer should work on another, as much as feasible. This includes 32/64-bit interop.
Minimize storage redundancy: Try not to double storage requirements if we can get away with less.
Minimize conversion overhead: Since the plan is to have an associated C-side igraph_t, minimize the conversion overhead (both in time and in code complexity) between the R-side and C-side objects.
restore_pointer()function, it requires a single pass, is cache efficient and small and simple enough.Clearly documented format versioning: Any proposal should explain how formats are versioned and how versions are meant to be interpreted (is 0.10 greater than 0.8?)