-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement splitting of simplified networks #278
base: dev
Are you sure you want to change the base?
Conversation
Hey, I implemented splitting of simplified networks / edges as discussed in our meeting(s). However, there is one case that we should discuss, and I figured that this is a good place to do so. I would be interested in all of your opinions so we can find the best solution. How should we deal with numeric attributes on simplified edges?Lets say we have the Here is the catch though, how do we remove the attribute exactly? If we split and have a bin that has some edges that did not have to be separated (i.e., because they were not simplified in the beginning or because all partial edges belong to the same bin) then these edges all have a valid value of the If we remove it entirely for the sake of not keeping a potentially really useless attribute around, do we also remove it from all other bins? Discussion with @hechtlC revealed that in practice it seems unlikely that even a single bin would only have complete (non-partial) edges, so when we go the step of removing it from all edges in a bin, its not so far fetched to say, that we remove it from all bins altogether. And finally: What do we do with the |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## dev #278 +/- ##
==========================================
+ Coverage 81.69% 81.89% +0.20%
==========================================
Files 16 16
Lines 5146 5226 +80
==========================================
+ Hits 4204 4280 +76
- Misses 942 946 +4 ☔ View full report in Codecov by Sentry. |
Having partial information is as valuable as having no information at all. Thus, we should remove it from all edges. Consistency should take precedence over partial information. (Especially as we don't lose any information here, since the entire network still is present unless the user deliberately deletes the global network from memory.)
Yes, same reason as above: Consistency should take precedence over partial information. If we remove it from one of the networks that result from splitting, then I (as a user) would expect that all splits contain the same attributes, that is, that it is removed from all networks. I totally agree with @hechtlC here.
Could you come up with an example where we would get bullshit values? I can't think of such an example right now. But if you can come up with any, we can discuss it.
According to our corresponding section in our README.md, |
Okay, im fine with removing the attribute altogether 👍. Regarding the If |
@maxloeffler Would you like us to review this PR already, or do you plan to add additional commits or make changes to the existing commits first? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this challenging PR @maxloeffler.
While I basically agree with all your implementation, I have a couple of questions here and there (mostly regarding comments, but also regarding improving code readability). See further details in the individual comments below.
In addition, I would like you double-check whether the function get.data.sources.from.relations
still works in every conceivable situation, after the changes that we have made to the handling of the relation
attribute.
## Track whether to remove numeric edge attributes after splitting. | ||
## Numeric attributes cannot be correctly separated when they | ||
## appear in multi-partial edges removing their semantic meaning | ||
numeric.attrs = edge.attr.names[EDGE.ATTR.HANDLING[edge.attr.names] == "sum" & edge.attr.names != "weight"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I wonder whether there could also be other handling functions for numeric attributes than "sum"
.
Imagine that, in distant future, we would add a new numeric attribute that uses a different handling function. Nobody would think about updating this code here. Of course, we cannot anticipate any weird customized handling function one could come up with in the future, but we should make sure that this works, at least, for the default options provided by igraph
. If we cannot come up with a general solution, it might be helpful to add a subtle comment to the definition of EDGE.ATTR.HANDLING
that reminds one to check this condition when adding new weird stuff...
The attributes assigned to edges in simplified networks (potentially) contain multiple values for a single edge. Previously, splitting such networks was not supported as it is not obvious how to deal with such edges. Now, we support splitting simplified networks. An edge that comprises of multiple source-edges before simplification should be partitioned again and its partials distributed over the corresponding bins. Singular edges can be handled as prior. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
When separating simplified edges during network splitting, it is not possible to determine which portion of the a numeric edge attribute belongs to which partial edge. Therefore, they lose their semantics in this case and should be removed. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Keeping all values of the 'relation' edge attribute when simplifying enables a proper separation of the attribute when splitting simplified networks. This change is not in contradiction with se-sic#251 (in which the previous simplification strategy was proposed). Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
As discussed in PR#278, removing numeric attributes from some splits while keeping them for others is not benefitial. Hence, we remove them from all splits as soon as one contains non-complete edges. Furthermore, the 'weight' attribute is mandatory and can therefore not be removed. Now, we assume 'weight' to always be equal to the amount of partials that comprise an edge. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
When splifying a network without the 'simplify.multiple.relations' options set, we must still consider that input networks may already contain multi-relational edges (as shown in 'showcase.R' when the sample network is simpified again). In our recent meetings we concluded that edges with a 'mail' relation can be simplified together with edges that have a list('mail', 'mail') relation but not with edges that have a list('mail', 'cochange') relation. Further, we must ensure that any multi-relational edges can still be simplified together with multi-relational edges that have the same relation mix. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
The previously used "concat" strategy does not work flawlessly with list edge attributes. When simplifying already simplified edges, i.e., edges that have list attributes longer than one, using the "concat" strategy, the result is a top-level list containing sublists from the merged attribute values. Instead, we want a single top-level list that contains all attribute values from the edges that comprise it. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
date = I(as.list(get.date.from.string(c("2016-07-12 15:58:59 UTC", "2016-07-12 16:06:10 UTC", | ||
"2016-07-12 16:00:45 UTC", "2016-07-12 16:05:41 UTC", | ||
"2016-07-12 16:06:32 UTC", "2016-07-12 16:06:32 UTC", | ||
"2013-05-05 21:46:30 UTC", "2013-05-05 21:49:21 UTC", | ||
"2013-05-05 21:49:34 UTC", "2013-05-06 01:04:34 UTC", | ||
"2013-05-25 03:48:41 UTC", "2013-05-25 04:08:07 UTC", | ||
"2016-07-12 14:59:25 UTC", "2016-07-12 16:02:30 UTC", | ||
"2016-07-12 16:06:01 UTC", "2016-07-15 19:55:39 UTC", | ||
"2017-05-23 12:32:39 UTC", "2016-07-12 15:59:59 UTC", | ||
"2016-07-15 20:07:47 UTC", "2016-07-27 20:12:08 UTC", | ||
"2016-07-28 06:27:52 UTC", "2013-05-25 03:25:06 UTC", | ||
"2013-05-25 06:06:53 UTC", "2013-05-25 06:22:23 UTC", | ||
"2013-06-01 06:50:26 UTC", "2016-07-12 16:01:01 UTC", | ||
"2016-07-12 16:02:02 UTC", "2013-04-21 23:52:09 UTC", | ||
"2016-07-12 15:59:25 UTC", "2016-07-12 16:03:59 UTC")))), | ||
artifact.type = I(c(as.list(rep("Feature", 6)), as.list(rep("IssueEvent", 24)))), | ||
date = get.date.from.string(c("2016-07-12 15:58:59", "2016-07-12 16:06:10", | ||
"2016-07-12 16:00:45", "2016-07-12 16:05:41", | ||
"2016-07-12 16:06:32", "2016-07-12 16:06:32", | ||
"2013-05-05 21:46:30", "2013-05-05 21:49:21", | ||
"2013-05-05 21:49:34", "2013-05-06 01:04:34", | ||
"2013-05-25 03:48:41", "2013-05-25 04:08:07", | ||
"2016-07-12 14:59:25", "2016-07-12 16:02:30", | ||
"2016-07-12 16:06:01", "2016-07-15 19:55:39", | ||
"2017-05-23 12:32:39", "2016-07-12 15:59:59", | ||
"2016-07-15 20:07:47", "2016-07-27 20:12:08", | ||
"2016-07-28 06:27:52", "2013-05-25 03:25:06", | ||
"2013-05-25 06:06:53", "2013-05-25 06:22:23", | ||
"2013-06-01 06:50:26", "2016-07-12 16:01:01", | ||
"2016-07-12 16:02:02", "2013-04-21 23:52:09", | ||
"2016-07-12 15:59:25", "2016-07-12 16:03:59")), | ||
artifact.type = c(rep("Feature", 6), rep("IssueEvent", 24)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have an idea why the timezone was included before and why you did change that? I am just curious...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don‘t know why it was there. It was just a check for my curiosity to see whether it makes a difference (it doesn’t), and then I left it in the commit because why not. I can revert it though 😅
When splitting multi-partial edges, we only need to remove numeric attributes (from all edges in all bins) when there is at least one edge that has partials that belong to different bins. When for all edges all partials belong to the same bin, we can keep the numeric attributes. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Nomenclature: An "edge attribute" refers to the values of an attribute over all edges of a network, while an "edge attribute value" refers to the value of an edge attribute of a single edge. Previously we decided that edge attributes by default are of list type, yet we did not specify of which type edge attribute values should be. This is especially relevant when an edge attribute value consists of multiple values, i.e., in simplified edges, and the interplay of simplified edges and non-simplified edges when simplifying repeatedly. By default, simplification with the "concat" strategy wraps the values of all source edges into a list. Therefore, we decided that not only edge attributes should be lists, but also edge attribute values. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8mloef@stud.uni-saarland.de>
Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
The 'date' edge attribute is of list type by default an additional check is therefore not necessary. Signed-off-by: Maximilian Löffler <s8maloef@stud.uni-saarland.de>
Regarding the last open points for discussion:
|
Prerequisites
showcase.R
with respect to my changes.dev
.Description
Changed:
Changelog