Add a Unique Node Table #444

SSoelvsten · 2022-12-19T12:49:19Z

With Adiar v1.2 we have been able to decrease the threshold as to when Adiar is viable in comparison to conventional depth-first BDD packages. Yet, this does not remove the major relative performance decrease for tiny BDDs. The offshot of this is that Adiar still has very undesirable characteristics in contexts, where one is manipulating a lot of tiny decision diagrams (e.g. our Picotrav benchmarks).

Interestingly, the threshold we derived in our experiments is somewhat below a threshold in the CAL BDD package [Sanghavi96]: if below this size, CAL switches from its breadth-first to the conventional depth-first algorithms. Hence, it is likely that we have hit a lower bound as to when using time-forward processing is a viable option.

For an end-user, it is of interest to have Adiar be efficient across the board. That is, we also need to be competitive for the tiniest of instances. To this end, we will have to (similar to CAL) switch to the conventional approach.

Task: Implement BuDDy's Unique Node Table

From our experiments, the BDD package BuDDy is the fastest implementation. My hypothesis is, that it is due to its extremely simple design in terms of code, but also due to the way it lays out the unique node table. Hence, this seems like a great place to start.

The hash-table is for values of type T (where T is some type of a node). It is stored as single list of elements entry<T> which adds the two (indices) hash and next. The hash-value of an element of type T is used to look up the entry<T> at said index; the hash value is the index for the first entry in that bucket. The bucket is traversed with the next pointer until the desired node is found.

To this end, all nodes are initialised with the default value T{} and the next pointer for element i set to i+1. This creates a long chain of free nodes. A variable free initially is set to i = 0. When a new node node is added to the table, the next free node i is used, free is updated to table[i].next before it is set to null.

Bucket Ordering

When adding a node to a bucket described with the next pointers, one could place it in many different places. Which actually is best should be determined based on experiments.

At the start? This is what BuDDy does. The code is also simple and it might prove useful, as the node may be needed soon again.
At the end? After having checked the entire next chain, we already hold onto the last node. Hence, we don't incur another cache-miss by making a new look-up on the head of the chain.
Maintain a sorting based on table index? This would improve cache-locality, since one only jumps forwards with respect to the addresses. Furthermore, one can potentially stop early when looking through the entire bucket.
Maintain a sorting based on Adiar's node uid, i.e. both its level and its table index? This can maybe be abused for more early termination when looking up a node. Yet, it might be slightly worse for cache-performance.

BuDDy adds the new nodes at the front of the bucket.

Garbage Collection

For garbage collection, each entry<T> also includes a ref_count variable. When allocating a node, the reference count of the children are incremented by one. Here, the dd class also updates the reference count of the root on its constructors, assignments, and destructors.

The basic idea behind collection is to free all nodes that have a reference count of zero (recursively decrementing their children). Here, the freed entry's next pointer is updated to be free before being placed at the front of the free list (its content does not need to be reset to T{} as that will be fixed on the next allocation).

BuDDy's garbage collection essentially is a mark-and-sweep in two simple linear scans (which is good for the CPU cache performance).

In a forward iteration all non-dead nodes are marked and the hash index is reset back to null.
In a backwards iteration, if a node is to-be kept then its hash is appended/prepended to its bucket.

If a node needs to be freed, then the free chain is updated as described above. Since the iteration is backwards, this ensures that all allocated nodes stay clumped together towards the beginning of the array.

Task: Sweeping through the Unique Node Table

How do we best switch from the internal memory (depth-first) to the external memory (time-forward processing) paradigm? How do we go back to the (depth-first) unique node table when the result of some time-forward processing algorithm turns out to be small enough to fit into it?

We could do so by converting the data from one paradigm to the other. But, this has a computational overhead that in fact is not needed! This is because we can run the time-forward processing algorithms on inputs from the depth-first algorithms and also feed their results back into the unique node table.

Unique Node Table in Top-down Sweeps

Adiar's top-down algorithms exploit a levelized ordering of the unique identifier (uid) for a node. Currently, they use a node_stream to seek(...) ( #401 ) the uid within a file on disk.

For the algorithm to be correct, this node does not have to be read from disk. The uid of a node from the depth-first algorithm merely is its index in the unique node table. Hence, we can replace the node_stream within the algorithm with another class that merely does a O(1) lookup in the unique node table on each seek(...).

This can be done merely by templating the algorithm as for v1.2. In fact, the work on v2.0.0-beta.3 already implements an algorithm that changes the order of Apply to abuse random-access.

Unique Node Table in Reduce Sweeps

These things can be done with templates:

The size of the unreduced BDD is an upper bound on the output size of Reduce. We can check whether the unreduced input would fit into the (depth-first) unique node table - assuming none of its current nodes can be reused. If so, we can run the Reduce algorithm where the node_writer is replaced with the unique node table.
Since we are outputting nodes into the unique node table, we may as well use its much faster hashing to identify duplicate nodes. This entirely skips two sorting steps of the current algorithm, which will (roughly) double its performance.

Level Information

The levelised priority queue needs to know the levels a priori to generate its buckets. Hence, the Reduce needs to also create the support for the BDD as a separate And-chain of variables. Furthermore, the levelised priority queue needs to abstract its way of obtaining the levels such that it can freely mix and match between files and the table.

Garbage Collection

Furthermore, we need to consider when to trigger garbage collection. For now, it might be best to do so at the start of the Reduce if there is not enough space (and BDDs in the table have been killed since).

References

[Sanghavi96] Jagesh V. Sanghavi, Rajeev K. Ranjan, Robert K. Brayton, and Alberto Sangiovanni-Vincentelli. “High performance BDD package by exploiting memory hierarchy”. In: Proceedings of the 33rd Annual Design Automation Conference (1996)
[Pastva2023] Samuel Pastva and Thomas Henzinger. “Binary Decision Diagrams on Modern Hardware”. In: Proceedings of the 23rd Conference on Formal Methods in Computer-Aided Design (2023)

The text was updated successfully, but these errors were encountered:

SSoelvsten · 2024-11-08T09:58:08Z

I have split this entire monster of an issue into its smallest collected units:

Adiar's algorithms on a unique node table (this issue)
Adding depth-first algorithms (696)
Adding Pastva and Henzinger's unique node table design (Use Pastva and Henzinger's Unique Node Table #697)

What remains in this part is the difficult work that also requires one big-bang implementation (since all algorithms need to support the table at once).

SSoelvsten added ✨ optimisation It's all about speed / space 📁 internal This is where the ✨magic✨happens 🎓 student project: Talent-Track blocked This is to be done... another day! 🎓 student project Work, work... but academic! labels Dec 19, 2022

SSoelvsten added this to the v1.2 : Higher performance for small instances milestone Dec 19, 2022

SSoelvsten added 🎓 MSc project and removed blocked This is to be done... another day! labels Dec 19, 2022

SSoelvsten removed the 🎓 talent-track project label May 6, 2023

SSoelvsten mentioned this issue Aug 12, 2023

Discrepancy between memory_fits(...) and memory_usage(...) #527

Closed

SSoelvsten removed the 🎓 MSc project label Sep 11, 2023

SSoelvsten modified the milestones: v1.2 : Higher Performance for Small Instances, v3.0 : Higher Performance for Tiny Instances Oct 28, 2023

SSoelvsten changed the title ~~Dealing with Tiny BDDs using Depth-First Algorithms~~ Dealing with Small BDDs using Depth-First Algorithms Nov 13, 2023

SSoelvsten mentioned this issue Nov 14, 2023

Apply Second Reduction Rule inside of Builder #410

Open

14 tasks

SSoelvsten mentioned this issue Jun 24, 2024

Alternatives to TPIE : STD #679

Open

SSoelvsten self-assigned this Oct 23, 2024

SSoelvsten changed the title ~~Dealing with Small BDDs using Depth-First Algorithms~~ Add a Unique Node Table Nov 8, 2024

This was referenced Nov 8, 2024

Use Depth-First Algorithms for Small BDDs #696

Open

Use Pastva and Henzinger's Unique Node Table #697

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a Unique Node Table #444

Add a Unique Node Table #444

SSoelvsten commented Dec 19, 2022 •

edited

Loading

SSoelvsten commented Nov 8, 2024 •

edited

Loading

Add a Unique Node Table #444

Add a Unique Node Table #444

Comments

SSoelvsten commented Dec 19, 2022 • edited Loading

Task: Implement BuDDy's Unique Node Table

Bucket Ordering

Garbage Collection

Task: Sweeping through the Unique Node Table

Unique Node Table in Top-down Sweeps

Unique Node Table in Reduce Sweeps

Level Information

Garbage Collection

References

SSoelvsten commented Nov 8, 2024 • edited Loading

SSoelvsten commented Dec 19, 2022 •

edited

Loading

SSoelvsten commented Nov 8, 2024 •

edited

Loading