Skip to content

Commit

Permalink
Better exclusions
Browse files Browse the repository at this point in the history
  • Loading branch information
Neil Ferguson committed Mar 27, 2024
1 parent e653664 commit 8d48859
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 21 deletions.
59 changes: 47 additions & 12 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ for multiple objects.

## Basic Example

In this example we find all referrers for an instance of `list`:
In this example we find all referrers for an instance of a `list`:

```python
import dataclasses
Expand Down Expand Up @@ -77,8 +77,7 @@ This will output something like:
Although the precise output will vary according to the Python version used.

In this case the list instance is referenced by a member variable of `ParentClass`, which
is in turn referenced by a local variable in the `my_func` function. For the code to produce
this graph see "Basic Example" below.
is in turn referenced by a local variable in the `my_func` function.

## Integration with memory analysis tools

Expand All @@ -103,7 +102,8 @@ The graph produced by `get_referrer_graph` can be converted to a NetworkX graph
its `to_networkx` method. This can be useful for visualizing the graph, or for
performing more complex analysis.

The resulting NetworkX graph consists of nodes of type `ReferrerGraphNode`.
The resulting NetworkX graph consists of nodes of type `ReferrerGraphNode`, with edges
directed from objects to their referrers.

For example, to visualize a graph of references to an object using [NetworkX](https://networkx.org/)
and [Matplotlib](https://matplotlib.org/):
Expand Down Expand Up @@ -135,28 +135,63 @@ my_function()

## Untracked Objects

By default, `get_referrer_graph` will only include objects that are tracked by the garbage
collector. However, the `search_for_untracked_objects` flag can be set to `True` to also
include objects that are not tracked by the garbage collector. This option is experimental
and may not work well in all cases.
By default, `get_referrer_graph` will raise an error if the object passed to it is not
tracked by the garbage collector. In CPython, for example, immutable objects and some
containers that contain only immutable objects (like dicts and tuples) are not tracked
by the garbage collector.

However, the `search_for_untracked_objects` flag can be set to `True` when calling
`get_referrer_graph` to try to find referrers for objects are not tracked by the garbage
collector. This option is experimental and may not work well in all cases.

For example, here we find the referrers of an untracked object (a `dict` containing only
immutable objects):

```python
import dataclasses
import gc
from typing import Dict

import referrers

@dataclasses.dataclass
class ParentClass:
member_variable: Dict

def my_func():
local_variable = ParentClass({"a": 1})
assert not gc.is_tracked(local_variable.member_variable)
print(referrers.get_referrer_graph(local_variable.member_variable, search_for_untracked_objects=True))

my_func()
```

This will output something like:

```plaintext
╙── dict instance (id=4483928640)
└─╼ ParentClass.member_variable (instance attribute) (id=4482048576)
└─╼ my_func.local_variable (local) (id=4482048576)
```

### Known limitations with untracked objects

* The depth of the search for untracked objects is limited by the `max_untracked_search_depth`
parameter. If this is set too low, some untracked objects may be missing from the graph.
Try setting this to a higher value if you think this is happening
* Sometimes internal references (from within `referrers`) may be included in the graph when
finding untracked objects. It should be possible to get rid of these, but I haven't
managed to track them all down yet.
finding untracked objects. It should be possible to get rid of these, but I'm not sure if I've
found them all yet.
* Finding untracked objects may be slow.

## Multi-threading

Referrers works well with multiple threads. For example, you can have a separate thread that
prints the referrers of objects that have references in other threads.

In the following example, there is a thread that prints the referrers of instances of `ChildClass`
every second:
In the following example, there is a thread that prints the referrers of all instances of
`ChildClass` every second (using [Pympler](https://pympler.readthedocs.io/en/latest/) to find
the instances):

```python
import dataclasses
Expand Down
27 changes: 18 additions & 9 deletions src/referrers/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -576,7 +576,10 @@ def __init__(
exclude_object_ids
)
self._target_objects = target_objects
self._untracked_objects_referrers = self._get_untracked_object_referrers(
(
self._untracked_objects_referrers,
extra_exclusions,
) = self._get_untracked_object_referrers(
target_objects,
excluded_id_set=excluded_id_set,
max_depth=max_untracked_search_depth,
Expand All @@ -587,12 +590,8 @@ def __init__(
self._name_finders = _get_name_finders(module_prefixes)
# Exclude the builder and its attributes from the referrer name finders, since we
# store a reference to the target objects. Also exclude the target objects container.
# Also exclude the untracked object referrers dict and the lists it contains.
untracked_exclusions = {id(self._untracked_objects_referrers)}
for referrers in self._untracked_objects_referrers.values():
untracked_exclusions.add(id(referrers))
self._referrer_name_finders = _get_referrer_name_finders(
excluded_id_set | untracked_exclusions
excluded_id_set | extra_exclusions
)

def build(self, max_depth: Optional[int]) -> ReferrerGraph:
Expand Down Expand Up @@ -698,13 +697,16 @@ def _get_untracked_object_referrers(
excluded_id_set: Set[int],
max_depth: int,
module_prefixes: Collection[str],
) -> Mapping[id, List[Any]]:
) -> Tuple[Mapping[id, List[Any]], Set[int]]:
"""
Builds a mapping of object IDs to referrers for objects that are not tracked by the
garbage collector.
garbage collector, and returns this along with extra IDs to exclude.
"""
return_dict: Dict[int, List[Any]] = collections.defaultdict(list)

extra_exclusions = set()
excluded_id_set.add(id(return_dict))

do_not_visit = copy(excluded_id_set)
do_not_visit.add(id(return_dict))

Expand All @@ -722,10 +724,14 @@ def _get_untracked_object_referrers(
)
# Make sure we don't visit the roots list, or very strange things will happen!
do_not_visit.add(id(roots))
# Also add the roots to the excluded set. It's not clear why this is necessary,
# but it seems to be.
extra_exclusions.add(id(roots))

for root in roots:
untracked_stack = collections.deque()
do_not_visit.add(id(untracked_stack))
extra_exclusions.add(id(untracked_stack))
self._populate_untracked_object_referrers(
obj=root,
do_not_visit=do_not_visit,
Expand All @@ -736,7 +742,10 @@ def _get_untracked_object_referrers(
max_depth=max_depth,
)

return return_dict
for value in return_dict.values():
extra_exclusions.add(id(value))

return return_dict, extra_exclusions

def _populate_untracked_object_referrers(
self,
Expand Down

0 comments on commit 8d48859

Please sign in to comment.