optimize goal_count (callgrind ir/call : 816 -> 126) #237

guicho271828 · 2025-01-12T19:11:15Z

No description provided.

FlorianPommerening

Thank you for the pull request. I'm still unclear about where the inefficiency is coming from.

Is it that the state is packed (I think it shouldn't be)?
Or is creating the FactProxy objects an issue (should be inlined)?

We often recommend people new to the planner to look into the implementation of this heuristic for an example, so it would be good if we could keep this code as simple and straight-forward as possible. I see three main changes here (unpacking the state, precomputing a list of goals, and storing goal facts as pair<int, int> rather than using FactProxies). Did you look at the performance impact of these changes individually?

src/search/heuristics/goal_count_heuristic.cc

FlorianPommerening · 2025-01-13T12:12:43Z

src/search/heuristics/goal_count_heuristic.cc

+	const int& var = it.first;
+	const int& val = it.second;


Suggested change

const int& var = it.first;

const int& val = it.second;

int var = it.first;

int val = it.second;

We'd generally copy the values here instead of taking a reference. Should be equally fast as creating a reference.

Actually, now that we have C++20, this should work as well:

for (auto[var, val] : goals) { ... }

FlorianPommerening · 2025-01-13T12:22:38Z

src/search/heuristics/goal_count_heuristic.h

@@ -5,6 +5,7 @@

 namespace goal_count_heuristic {
 class GoalCountHeuristic : public Heuristic {
+    std::vector<std::pair<int,int>> goals;


This line could use a comment explaining that the goals are stored here for performance reasons.

Can we run an experiment to have numbers on performance? callgrind data is useful, but usually not conclusive. The VM has its own issues when comparing to actual performance on hardware, and due to the vagaries of inlining, runtime can shift from one place to another in a way that makes it hard to draw conclusions from the profile data alone.

I used the code for implementing BFWS which uses goalcount. The difference was significant. I also observed similar bottlenecks from ***Proxy while implementing other parts of BFWS code.

I measured the difference in the release build on ipc instances. I lost the data though

This line could use a comment explaining that the goals are stored here for performance reasons.

done

I could re-run the experiment on our servers but it might take me some time to get around to it. It would help if you prepare a revision of the code that is in between this one and the main branch, where you only do the precomputation/storing of the goals, still using fact proxies in the heuristic computation. This way, we can separate the impact of this and the change avoiding the fact proxies.

guicho271828 · 2025-01-29T19:40:40Z

Here is the disassembly comparison (objdump -d -S -C) of two versions from release builds.
I definitely see many call jmp je instructions in the c7d6a4d87 (old) versions around the call into proxies. (54 vs 21).

goal_count_heuristic.cc.o-c7d6a4d87.txt
goal_count_heuristic.cc.o-eb48e6815.txt
dump.tar.gz (contains the binary and the script for getting these info)

guicho271828 · 2025-01-29T19:41:12Z

g++ --version
g++ (GCC) 14.2.1 20240912 (Red Hat 14.2.1-3)

FlorianPommerening · 2025-01-31T11:03:08Z

I can run experiments on our grid but it will take a while as we are currently having technical issues with it. I have it on my to-do list though.

FlorianPommerening · 2025-02-03T08:22:27Z

By the way, we have https://issues.fast-downward.org/issue997 which might tackle the same problem on a more global level. Not sure it the reduction in complexity there is enough to allow inlining here, but it's worth a try.

optimize goal_count (callgrind ir/call : 816 -> 126)

8008590

FlorianPommerening reviewed Jan 13, 2025

View reviewed changes

fixup! optimize goal_count (callgrind ir/call : 816 -> 126)

eb48e68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

optimize goal_count (callgrind ir/call : 816 -> 126) #237

optimize goal_count (callgrind ir/call : 816 -> 126) #237

guicho271828 commented Jan 12, 2025

FlorianPommerening left a comment

FlorianPommerening Jan 13, 2025

FlorianPommerening Jan 13, 2025

FlorianPommerening Jan 13, 2025

maltehelmert Jan 13, 2025

guicho271828 Jan 13, 2025

guicho271828 Jan 13, 2025

guicho271828 Jan 13, 2025

FlorianPommerening Jan 14, 2025

guicho271828 commented Jan 29, 2025

guicho271828 commented Jan 29, 2025

FlorianPommerening commented Jan 31, 2025

FlorianPommerening commented Feb 3, 2025

optimize goal_count (callgrind ir/call : 816 -> 126) #237

Are you sure you want to change the base?

optimize goal_count (callgrind ir/call : 816 -> 126) #237

Conversation

guicho271828 commented Jan 12, 2025

FlorianPommerening left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

guicho271828 commented Jan 29, 2025

guicho271828 commented Jan 29, 2025

FlorianPommerening commented Jan 31, 2025

FlorianPommerening commented Feb 3, 2025