Conversation
Subissues are now edges in an issue network, agents can be filtered Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
|
For now, this allows us to use the subissue data for network construction and filter agents analogue to bots. I will check if there are other places where bots get special treatment, and see if we want to also include agents there. |
There was a problem hiding this comment.
Pull request overview
This PR extends coronet’s issue/network and filtering capabilities by (1) treating sub-issues as edges in the issue artifact network and (2) adding first-class handling for “agents” alongside bots, including a new configuration flag to filter them from datasets.
Changes:
- Add parsing/handling intended to turn sub-issue relationships into issue-network edges.
- Introduce
filter.agentsconfiguration and propagate it through commits/issues/mails filtering. - Extend bot/author metadata to include
is.agent.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
util-read.R |
Updates issue parsing and expands bot/author metadata with is.agent. |
util-networks.R |
Adds construction of issue-network edges for sub-issue events. |
util-data.R |
Wires new filter.agents flag through ProjectData filtering and adds filter.agents() implementation. |
util-conf.R |
Adds new filter.agents ProjectConf attribute. |
Comments suppressed due to low confidence (1)
util-read.R:476
- The roxygen for
read.bot.info()says it returns a boolean for whether an author is a bot, but the data now includesis.agentas well. Please update the function documentation to reflect the new column so users know agents are supported.
#' Read the bot classification from the 'bots.list' file.
#'
#' @param data.path the path to the commit-messages list
#'
#' @return a data frame with author.name, author.email, and a (potentially NA) boolean whether this is a bot,
#' or \code{NULL} if the above file is not present.
read.bot.info = function(data.path) {
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
util-networks.R
Outdated
| artifacts.net.data.raw$event.info.2 == "issue", ] | ||
|
|
||
| components = artifacts.net.data.raw[artifacts.net.data.raw$event.name == "sub_issue_added",] | ||
| unique(components, by = 1) |
util-networks.R
Outdated
| })) | ||
| return(edges) | ||
| })) | ||
| edge.list = rbind(edge.list, component.edges) |
| edges = plyr::rbind.fill(lapply(unlist(from$issue.components[[1]]), function(target.issue) { | ||
| edge = list("from" = from[["issue.id"]], "to" = target.issue) | ||
| edge = cbind(edge, edge.attrs, row.names = NULL) | ||
| return(edge) | ||
| })) |
There was a problem hiding this comment.
I am unsure whether this is necessary - in my mind, this case would mean that the issue data is faulty and we want to throw the error. Except maybe this could happen during network splitting, if vertices are removed there.
I'll add this fix for now, since it ensures that you can always build these networks.
There was a problem hiding this comment.
Can you check whether we do this check for the previously existing "add_link" or "referenced_by"? If we don't do that there, it would be inconsistent to do that here.
There was a problem hiding this comment.
During the construction of the edge list using add_link and referenced_by events, all events not targeting an existing issue are implicitly discarded. So while there is no separate check, it should also not be possible for the network construction to throw an error because of faulty data.
| #' Filter agents from given data. | ||
| #' | ||
| #' @param data.to.filter A data frame, with the standard author columns, | ||
| #' from which all rows with agent authors are removed | ||
| #' | ||
| #' @return the filtered data | ||
| filter.agents = function(data.to.filter) { | ||
| authors = self$get.authors() | ||
| ## authors are uniquely identified by their email, so checking this here is sufficient | ||
| agent.indices = authors[match(data.to.filter[["author.email"]], | ||
| authors[["author.email"]]), "is.agent"] | ||
| ## retain if entry is FALSE or NA | ||
| agent.indices = !agent.indices | is.na(agent.indices) | ||
| return(data.to.filter[agent.indices,]) | ||
| }, |
| issue.data[["issue.components"]] = lapply(issue.data[["issue.components"]], function(numbers) { | ||
| num.vector = unlist(numbers) | ||
| ids = lapply(num.vector, function(id) { | ||
| return(sprintf(ISSUE.ID.FORMAT, "github", id)) | ||
| }) | ||
| return(ids) | ||
| }) |
| ## column names of a dataframe containing authors (see file 'authors.list' and function \code{read.authors}) | ||
| AUTHORS.LIST.COLUMNS = c( | ||
| "author.id", "author.name", "author.email", "is.bot" | ||
| "author.id", "author.name", "author.email", "is.bot", "is.agent" | ||
| ) | ||
|
|
||
| ## column names of a dataframe containing authors, before adding bot data. | ||
| AUTHORS.LIST.COLUMNS.WITHOUT.BOTS = AUTHORS.LIST.COLUMNS[1:3] | ||
|
|
||
| ## declare the datatype for each column in the constant 'AUTHORS.LIST.COLUMNS' | ||
| AUTHORS.LIST.DATA.TYPES = c( | ||
| "character", "character", "character", "logical" | ||
| "character", "character", "character", "logical", "logical" | ||
| ) |
Also fix issue with empty bots.list and add documentation to readme Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Subissues are now edges in an issue network, agents can be filtered
Prerequisites
showcase.Rwith respect to my changes.dev.Description
Changelog