Skip to content

Integrate subissues and agent handling#292

Open
Leo-Send wants to merge 2 commits intose-sic:devfrom
Leo-Send:dev
Open

Integrate subissues and agent handling#292
Leo-Send wants to merge 2 commits intose-sic:devfrom
Leo-Send:dev

Conversation

@Leo-Send
Copy link
Copy Markdown
Contributor

Subissues are now edges in an issue network, agents can be filtered

Prerequisites

  • I adhere to the coding conventions (described here) in my code.
  • I have updated the copyright headers of the files I have modified.
  • I have written appropriate commit messages, i.e., I have recorded the goal, the need, the needed changes, and the location of my code modifications for each commit. This includes also, e.g., referencing to relevant issues.
  • I have put signed-off tags in all commits.
  • I have updated the changelog file NEWS.md appropriately.
  • I have checked whether I need to adjust the showcase file showcase.R with respect to my changes.
  • The pull request is opened against the branch dev.

Description

Changelog

Subissues are now edges in an issue network, agents can be filtered

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
@Leo-Send
Copy link
Copy Markdown
Contributor Author

For now, this allows us to use the subissue data for network construction and filter agents analogue to bots. I will check if there are other places where bots get special treatment, and see if we want to also include agents there.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends coronet’s issue/network and filtering capabilities by (1) treating sub-issues as edges in the issue artifact network and (2) adding first-class handling for “agents” alongside bots, including a new configuration flag to filter them from datasets.

Changes:

  • Add parsing/handling intended to turn sub-issue relationships into issue-network edges.
  • Introduce filter.agents configuration and propagate it through commits/issues/mails filtering.
  • Extend bot/author metadata to include is.agent.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 6 comments.

File Description
util-read.R Updates issue parsing and expands bot/author metadata with is.agent.
util-networks.R Adds construction of issue-network edges for sub-issue events.
util-data.R Wires new filter.agents flag through ProjectData filtering and adds filter.agents() implementation.
util-conf.R Adds new filter.agents ProjectConf attribute.
Comments suppressed due to low confidence (1)

util-read.R:476

  • The roxygen for read.bot.info() says it returns a boolean for whether an author is a bot, but the data now includes is.agent as well. Please update the function documentation to reflect the new column so users know agents are supported.
#' Read the bot classification from the 'bots.list' file.
#'
#' @param data.path the path to the commit-messages list
#'
#' @return a data frame with author.name, author.email, and a (potentially NA) boolean whether this is a bot,
#'         or \code{NULL} if the above file is not present.
read.bot.info = function(data.path) {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

util-networks.R Outdated
artifacts.net.data.raw$event.info.2 == "issue", ]

components = artifacts.net.data.raw[artifacts.net.data.raw$event.name == "sub_issue_added",]
unique(components, by = 1)
util-networks.R Outdated
}))
return(edges)
}))
edge.list = rbind(edge.list, component.edges)
Comment on lines +828 to +832
edges = plyr::rbind.fill(lapply(unlist(from$issue.components[[1]]), function(target.issue) {
edge = list("from" = from[["issue.id"]], "to" = target.issue)
edge = cbind(edge, edge.attrs, row.names = NULL)
return(edge)
}))
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure whether this is necessary - in my mind, this case would mean that the issue data is faulty and we want to throw the error. Except maybe this could happen during network splitting, if vertices are removed there.
I'll add this fix for now, since it ensures that you can always build these networks.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you check whether we do this check for the previously existing "add_link" or "referenced_by"? If we don't do that there, it would be inconsistent to do that here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

During the construction of the edge list using add_link and referenced_by events, all events not targeting an existing issue are implicitly discarded. So while there is no separate check, it should also not be possible for the network construction to throw an error because of faulty data.

Comment on lines +1771 to +1785
#' Filter agents from given data.
#'
#' @param data.to.filter A data frame, with the standard author columns,
#' from which all rows with agent authors are removed
#'
#' @return the filtered data
filter.agents = function(data.to.filter) {
authors = self$get.authors()
## authors are uniquely identified by their email, so checking this here is sufficient
agent.indices = authors[match(data.to.filter[["author.email"]],
authors[["author.email"]]), "is.agent"]
## retain if entry is FALSE or NA
agent.indices = !agent.indices | is.na(agent.indices)
return(data.to.filter[agent.indices,])
},
Comment on lines +400 to +406
issue.data[["issue.components"]] = lapply(issue.data[["issue.components"]], function(numbers) {
num.vector = unlist(numbers)
ids = lapply(num.vector, function(id) {
return(sprintf(ISSUE.ID.FORMAT, "github", id))
})
return(ids)
})
Comment on lines 509 to 520
## column names of a dataframe containing authors (see file 'authors.list' and function \code{read.authors})
AUTHORS.LIST.COLUMNS = c(
"author.id", "author.name", "author.email", "is.bot"
"author.id", "author.name", "author.email", "is.bot", "is.agent"
)

## column names of a dataframe containing authors, before adding bot data.
AUTHORS.LIST.COLUMNS.WITHOUT.BOTS = AUTHORS.LIST.COLUMNS[1:3]

## declare the datatype for each column in the constant 'AUTHORS.LIST.COLUMNS'
AUTHORS.LIST.DATA.TYPES = c(
"character", "character", "character", "logical"
"character", "character", "character", "logical", "logical"
)
Also fix issue with empty bots.list and add documentation to readme

Signed-off-by: Leo Sendelbach <s8lesend@stud.uni-saarland.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants