Multiple Instances of Simulation Objects #20

krosenfeld-IDM · 2024-07-24T03:54:54Z

krosenfeld-IDM
Jul 24, 2024
Collaborator

From @jonathanhhb 's write-up about Consciously-Choosing-OOP‐ness:

One of the key topics which has emerged is the question of whether we need to create multiple instances of objects such as Simulation, Demographics, Settings, etc. in a single run of a simulation.
Jonathan has been deliberately not doing that. Back in the early days of EMOD (DTK), some code was added to enable exactly that kind of thing: a single run of the executable could run multiple simulations. Over time, as we were running almost everything on COMPS where a single run was a single sim, we ended up deciding that the extra code to enable multiple non-overlapping Simulation instances in a single process was all cost and no benefit.
But researchers have suggested there might be a genuine demand for doing that. One might do that when one is running locally and wanting to do sweeps in a for loop and analyzing and consolidating results in a single process. In that case, we would not want singletons but rather actual instances of class types. But would we ever do that in our batch processing environment (COMPS, K8S)? Does that matter?

It's helpful to hear about the experience with EMOD/DTK, but I'm not sure that utilizing standard python modules represent all cost and no benefit (where the implied benefit is being able to have multiple simulation instances). I think there is potentially a lot of benefit when it comes to someone being able to understand, use, and expand the code.

It may require some effort to have to organize around the class instance rather than rely on global instantiations allowed by e.g., singletons, but this organization can provide key structure that makes code more readable, comprehensible, and ultimately usable. I have an example here how having a single instance can enable really surprising behavior see this repo. But I'm also not sure that it will be harder not to! Python is most commonly structured around instantiated classes so my prior is that it would be, in fact, easier.

I agree it's important to think about potential use cases/environments (e.g. COMPS, K8s), but by doing so we have to take care not to unnecessarily limit flexibility. It's important to make sure that things will be able to run now, but if we conclude that the easier thing works for case A,B,C so we don't need to do the harder thing that allows for more flexibility it might very well rule out case G down the road. @KevinMcCarthyAtIDM and I came up with the examples like the ones listed but what about branching "trajectories", simultaneous "island" populations, maybe even multi-pathogen stuff? This is just off the top of my head - could be hard to do but will be impossible with a singleton.

I think it would be helpful to hear the case for and understand the benefits of singletons. What is there to add to the experience from EMOD (which, by design, should be very different)? Can there be substantial performance gains from e.g., instantiation control (i.e., laziness)? Are there answers to my concerns about hidden dependencies and the ability to track the state?

krosenfeld-IDM · 2024-07-24T03:56:04Z

krosenfeld-IDM
Jul 24, 2024
Collaborator Author

I'll add that my experience working with @clorton 's code, which uses instantiated classes, was pretty good.

0 replies

jonathanhhb · 2024-07-26T04:34:06Z

jonathanhhb
Jul 26, 2024
Maintainer

It sounds like we do foresee a need for running multiple simulations within a single process, and thus would need multiple instances of simulation, demographics, settings and other objects.

I was noting that we thought we would do this in EMOD but in practice never really did, and so was wondering if that learning could apply here. @krosenfeld-IDM notes that LASER isn't EMOD so not all lessons can and should apply. But we'd be remiss if we didn't try to apply as much learning as possible from past experience, I think.

And I suspect in LASER we will not be doing this when running at scale, remotely on a cluster -- I imagine we'll be running 1 sim per process; indeed I think that's built into the nature of COMPS -- so this would be for running locally. But we do want to be able to do a lot locally. I still have some pause since with LASER we are often running very large populations by design, with big memory impacts, so I will be surprised if we end up running a lot of multiple simulation runs (e.g., sweeps) in a single process, but the branching trajectories use case would be sequential.

As for the concerns about hidden dependencies, that was interesting to read about. In my career I've never actually encountered such issues so it feels a bit theoretical to me, I must confess.

I would still love us to stick with a functional programming approach to make the codebase more accessible to more people. I will continue to argue that our ever-growing R-inclined community will appreciate this. But it sounds like, on net, that "level 3 OOP" (see wiki) is what we need for what we are building for LASER.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple Instances of Simulation Objects #20

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Multiple Instances of Simulation Objects #20

krosenfeld-IDM Jul 24, 2024 Collaborator

Replies: 2 comments

krosenfeld-IDM Jul 24, 2024 Collaborator Author

jonathanhhb Jul 26, 2024 Maintainer

krosenfeld-IDM
Jul 24, 2024
Collaborator

krosenfeld-IDM
Jul 24, 2024
Collaborator Author

jonathanhhb
Jul 26, 2024
Maintainer