Skip to content

feat: Bump igraph format version to 1.5.0#835

Merged
krlmlr merged 11 commits intomainfrom
f-upgrade-version
Jun 15, 2023
Merged

feat: Bump igraph format version to 1.5.0#835
krlmlr merged 11 commits intomainfrom
f-upgrade-version

Conversation

@krlmlr
Copy link
Copy Markdown
Contributor

@krlmlr krlmlr commented Jun 10, 2023

Checked with https://github.com/krlmlr/rigraph-forensics.

The igraph data format

  • List of length 10 with a "class" attribute

    1. igraph_t_idx_n: numeric(1), number of vertices

    2. igraph_t_idx_directed: logical(1), is the graph directed?

    3. igraph_t_idx_from: numeric(), "from" vertex ID of each edge in order

    4. igraph_t_idx_to: numeric(), "to" vertex ID of each edge in order

    5. igraph_t_idx_oi: auxiliary indexes, numeric() in <= 1.4.3, NULL (ignored if present) in 1.5.0

    6. igraph_t_idx_ii: auxiliary indexes, numeric() in <= 1.4.3, NULL (ignored if present) in 1.5.0

    7. igraph_t_idx_os: auxiliary indexes, numeric() in <= 1.4.3, NULL (ignored if present) in 1.5.0

    8. igraph_t_idx_is: auxiliary indexes, numeric() in <= 1.4.3, NULL (ignored if present) in 1.5.0

    9. igraph_t_idx_attr: attribute structure

    10. igraph_t_idx_env: environment, gains a new entry "igraph" in 1.5.0

Requirements

  • Old versions should not misbehave with the new format. Examples of bad behaviour: silent wrong result, crash, memory leak. Examples of acceptable behaviour: immediate error (even if unintuitive), correct operation. This applies to each function in old versions, different functions may behave differently.

  • Future versions should support all older format versions, either by doing an automatic upgrade, or by instructing the user to upgrade manually.

    • Now fully tested. User is being asked to call upgrade_graph() if possible, supported for all inputs from igraph >= 0.2. The in-place upgrade is possible because igraph_t_idx_env is an environment that can be updated without changing the object that contains it.
  • Proper format versioning and checks: Starting with the next version, igraph should check the format of an object, and should be able to decide whether that format is compatible and act accordingly (accept or reject the object, and advise the user with a clear message).

  • Support future changes: Future format modifications should be feasible and easy. It should be very clear to a future maintainer how they can execute a format change without breaking these requirements.

    • Search for R_IGRAPH_TYPE_VERSION
    • Search for the string it is defined to (currently 1.5.0)
    • Adapt every occurrence as needed
  • Long-term suitability: let’s try to get it right so we can go as long as possible without another format change, even if the C-side format is modified.

    • The C side is never serialized to disk. Exotic risk: unload the igraph R package and load an older/newer version of that package with igraph objects in memory. To mitigate, ABI changes could come with a change of the element name our external pointer is stored in (currently unclass(g)[[10]]$igraph). If this changes, e.g., to igraph2 after an ABI update, users won't crash even in that rare case.
  • Platform independent: An igraph object saved on one computer should work on another, as much as feasible. This includes 32/64-bit interop.

    • Handled by R as long as we rely on natively supported data types, no action needed.
  • Minimize storage redundancy: Try not to double storage requirements if we can get away with less.

    • The storage requirements are reduced compared to 1.4.3 because the auxiliary index vectors are no longer stored and actively discarded when upgrading to 1.5.0.
  • Minimize conversion overhead: Since the plan is to have an associated C-side igraph_t, minimize the conversion overhead (both in time and in code complexity) between the R-side and C-side objects.

    • The complexity is captured in the static restore_pointer() function, it requires a single pass, is cache efficient and small and simple enough.
  • Clearly documented format versioning: Any proposal should explain how formats are versioned and how versions are meant to be interpreted (is 0.10 greater than 0.8?)

    • Unclear: do we switch to integer versions?

@krlmlr krlmlr force-pushed the f-upgrade-version branch from f90bdaf to e241fb5 Compare June 11, 2023 21:11
@krlmlr
Copy link
Copy Markdown
Contributor Author

krlmlr commented Jun 11, 2023

@szhorvat @ntamas @GroteGnoom @vtraag @Antonov548: I believe the current storage format in main satisfies all requirements, with this PR. To keep things simpler moving forward, we could indeed switch to integer format versions. I checked, graph_version() isn't used anywhere on CRAN except for the Leiden packages. What do you think?

This format also seems robust against future changes. Do you see substantially better alternatives?

The description of this PR also contains the format description. We should think about how to consolidate it.

As a side effect, we now have an image where we can run all old igraph versions. I used Ubuntu 18.04, more work will be needed to base it on 22.04. But we can also freeze it and create a new image based on 22.04 and go back in time as far as is possible without insulting the compiler.

@krlmlr
Copy link
Copy Markdown
Contributor Author

krlmlr commented Jun 11, 2023

For completeness, a code that displays what happened between the individual versions. It appears that we had only three meaningful changes over the entire history:

  • To 0.2: shorter list, condensing
  • To 1.0.0: added environment component
  • To 1.5.0: this PR
options(conflicts.policy = list(warn = FALSE))

library(tidyverse)

source("R/versions.R")
source("R/old-0_1_1.R")
source("R/old-0_2.R")
source("R/old-0_5.R")
source("R/old-0_6.R")
source("R/old-1_0_0.R")
source("R/old-1_5_0.R")

samples <- oldsamples()

scrub <- function(g) {
  cl <- class(g)
  g <- unclass(g)
  if (length(g) >= 10) {
    g[[10]] <- as.list(g[[10]], all.names = TRUE)
    if (!is.null(g[[10]]$myid)) {
      g[[10]]$myid <- "<myid>"
    }
    if (!is.null(g[[10]]$me)) {
      g[[10]]$me <- "<me>"
    }
  }
  attr(g, "klass") <- cl
  g
}

transitions <-
  samples |>
  enframe("version", "result") |>
  mutate(result_scrubbed = map(result, scrub), .keep = "unused") |>
  mutate(
    prev_version = lead(version),
    prev_result_scrubbed = lead(result_scrubbed),
  ) |>
  mutate(compare = map2(prev_result_scrubbed, result_scrubbed, waldo::compare)) |>
  transmute(transition = paste0(prev_version, " -> ", version), compare) |>
  head(-1) |>
  tibble::deframe() |>
  rev()

names(transitions)
#> [1] "0.1.1 -> 0.2"   "0.2 -> 0.5"     "0.5 -> 0.6"     "0.6 -> 1.0.0"  
#> [5] "1.0.0 -> 1.5.0"

transitions
#> $`0.1.1 -> 0.2`
#> `old` is length 11
#> `new` is length 9
#> 
#> `old[[3]]`: 0 0 1 1 2 2
#> `new[[3]]`: 0 1 2      
#> 
#> `old[[4]]`: 1 2 2 0 0 1
#> `new[[4]]`: 1 2 0      
#> 
#> `old[[5]]`: 1 0 3 2 5 4
#> `new[[5]]`: 0 1 2      
#> 
#> `old[[6]]`: 4 3 5 0 2   1
#> `new[[6]]`:         2 0 1
#> 
#> `old[[7]]`: 0 2 4 6
#> `new[[7]]`: 0 1 2 3
#> 
#> `old[[8]]`: 0 2 4 6
#> `new[[8]]`: 0 1 2 3
#> 
#> `old[[9]]` is length 3
#> `new[[9]]` is length 4
#> 
#> `old[[9]][[1]]`: 1  
#> `new[[9]][[1]]`: 1 0
#> 
#> `old[[9]][[2]]` is a character vector ()
#> `new[[9]][[2]]` is a list
#> 
#> And 4 more differences ...
#> 
#> $`0.2 -> 0.5`
#> `names(old[[9]][[2]])` is absent
#> `names(new[[9]][[2]])` is a character vector ()
#> 
#> $`0.5 -> 0.6`
#> `old[[9]][[1]]`: 1 0  
#> `new[[9]][[1]]`: 1 0 1
#> 
#> `old[[9]][[2]]` is length 0
#> `new[[9]][[2]]` is length 3
#> 
#> `names(old[[9]][[2]])`:                           
#> `names(new[[9]][[2]])`: "name" "mutual" "circular"
#> 
#> `old[[9]][[2]]$name` is absent
#> `new[[9]][[2]]$name` is a character vector ('Ring graph')
#> 
#> `old[[9]][[2]]$mutual` is absent
#> `new[[9]][[2]]$mutual` is a logical vector (FALSE)
#> 
#> `old[[9]][[2]]$circular` is absent
#> `new[[9]][[2]]$circular` is a logical vector (TRUE)
#> 
#> $`0.6 -> 1.0.0`
#> `old` is length 9
#> `new` is length 10
#> 
#> `old[[10]]` is absent
#> `new[[10]]` is a list
#> 
#> $`1.0.0 -> 1.5.0`
#> `old[[5]]` is a double vector (0, 1, 2)
#> `new[[5]]` is NULL
#> 
#> `old[[6]]` is a double vector (2, 0, 1)
#> `new[[6]]` is NULL
#> 
#> `old[[7]]` is a double vector (0, 1, 2, 3)
#> `new[[7]]` is NULL
#> 
#> `old[[8]]` is a double vector (0, 1, 2, 3)
#> `new[[8]]` is NULL
#> 
#> `old[[10]]$.__igraph_version__.`: "0.8.0"
#> `new[[10]]$.__igraph_version__.`: "1.5.0"

Created on 2023-06-11 with reprex v2.0.2

@krlmlr krlmlr merged commit 2af499a into main Jun 15, 2023
@krlmlr krlmlr deleted the f-upgrade-version branch June 15, 2023 05:15
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 15, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant