Skip to content

Update Shapley.R#150

Open
TomasZdrazil wants to merge 1 commit intogiuseppec:mainfrom
TomasZdrazil:patch-1
Open

Update Shapley.R#150
TomasZdrazil wants to merge 1 commit intogiuseppec:mainfrom
TomasZdrazil:patch-1

Conversation

@TomasZdrazil
Copy link
Copy Markdown

When creating y.hat.diff$feature.value, it takes the colnames of x.interest and just adds it as another column. However, the order of the colnames of x.interest may be different than the order of the same features in y.hat.diff$feature, therefore vlookup is needed instead of just appending the column. For this the auxiliaryTab is created that takes the feature names from the x.interest and then the merge function is used to assign the correct feature.value to the corresponding feature.

When creating y.hat.diff$feature.value, it takes the colnames of x.interest and just adds it as another column. However, the order of the colnames of x.interest may be different than the order of the same features in y.hat.diff$feature, therefore vlookup is needed instead of just appending the column. For this the auxiliaryTab is created that takes the feature names from the x.interest and then the merge function is used to assign the correct feature.value to the corresponding feature.
@christophM
Copy link
Copy Markdown
Collaborator

Thanks for this pull request.
The tests don't run through, it seems that now the Shapley values don't add up to the difference in the test, as the should.

@TomasZdrazil
Copy link
Copy Markdown
Author

Hi, thank you for your comment, I checked it and it seems like the Shapley values don't add up to the difference even by default, running the iml_0.10.1. Might that be an issue in your package? Not sure. See the code that I tested it attached. The R session info:
R version 3.6.1 (2019-07-05), Platform: x86_64-w64-mingw32/x64 (64-bit), Running under: Windows 10 x64 (build 18362)

ShapleySumTest.txt

@christophM
Copy link
Copy Markdown
Collaborator

It does add up, but only in expectation, meaning that when you increase the sample.size in Shapley$new, you will get closer to the difference.

The test for Shapley to add up can be found here: https://github.com/christophM/iml/blob/master/tests/testthat/test-Shapley.R

@TomasZdrazil
Copy link
Copy Markdown
Author

Thanks, I will have to look into that more deeply as for my data they do not add up and the gap is quite big, the actual difference is more than twice the sum of Shapley values, had sample.size = 3000.

Anyway, this request aimed to tackle other issue, and that is the fact that in case the order of columns in the training data (predictor$data$X) is not the same as in the record to explain (x.interest) the result is misleading, as the table shapley$results has the columns feature and feature.value with different values, e.g. for 1 line the feature specified in feature is not the same as specified in feature.value. This results for example in wrong visual, because shapley$plot uses feature.value as the label so the values of phi get visualised for wrong feature. Attached the script demonstrating this issue.

ShapleyColsOrderTest.txt

Can you confirm this behaviour? I guess the workaround is to order the columns manually for both datasets before running the Shapley values analysis, but I thought it would be more elegant to have this implemented in the function directly, as an user may not know this requirement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants