-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
optimize goal_count (callgrind ir/call : 816 -> 126) #237
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the pull request. I'm still unclear about where the inefficiency is coming from.
- Is it that the state is packed (I think it shouldn't be)?
- Or is creating the
FactProxy
objects an issue (should be inlined)?
We often recommend people new to the planner to look into the implementation of this heuristic for an example, so it would be good if we could keep this code as simple and straight-forward as possible. I see three main changes here (unpacking the state, precomputing a list of goals, and storing goal facts as pair<int, int>
rather than using FactProxies). Did you look at the performance impact of these changes individually?
const int& var = it.first; | ||
const int& val = it.second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const int& var = it.first; | |
const int& val = it.second; | |
int var = it.first; | |
int val = it.second; |
We'd generally copy the values here instead of taking a reference. Should be equally fast as creating a reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, now that we have C++20, this should work as well:
for (auto[var, val] : goals) {
...
}
@@ -5,6 +5,7 @@ | |||
|
|||
namespace goal_count_heuristic { | |||
class GoalCountHeuristic : public Heuristic { | |||
std::vector<std::pair<int,int>> goals; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line could use a comment explaining that the goals are stored here for performance reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we run an experiment to have numbers on performance? callgrind data is useful, but usually not conclusive. The VM has its own issues when comparing to actual performance on hardware, and due to the vagaries of inlining, runtime can shift from one place to another in a way that makes it hard to draw conclusions from the profile data alone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used the code for implementing BFWS which uses goalcount. The difference was significant. I also observed similar bottlenecks from ***Proxy while implementing other parts of BFWS code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I measured the difference in the release build on ipc instances. I lost the data though
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line could use a comment explaining that the goals are stored here for performance reasons.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could re-run the experiment on our servers but it might take me some time to get around to it. It would help if you prepare a revision of the code that is in between this one and the main branch, where you only do the precomputation/storing of the goals, still using fact proxies in the heuristic computation. This way, we can separate the impact of this and the change avoiding the fact proxies.
Here is the disassembly comparison ( goal_count_heuristic.cc.o-c7d6a4d87.txt |
g++ --version |
I can run experiments on our grid but it will take a while as we are currently having technical issues with it. I have it on my to-do list though. |
By the way, we have https://issues.fast-downward.org/issue997 which might tackle the same problem on a more global level. Not sure it the reduction in complexity there is enough to allow inlining here, but it's worth a try. |
No description provided.