The ExecutorService indexes executors based on cardinalities.
If an executor handles a cardinality with an integer value greater than 1; e.g.
CLIENTS:5 does this mean that remote executor represents 5 distinct clients? So if a computation requires a cohort of 10 clients then the work of 5 of those clients can be performed by that single executor?
What piece of the TFF stack is responsible for dividing up a cohort into invocations to different executors based on the cardinalities assigned to those executors?
I’m implementing a new Executor with the goal of being able to reverse the direction of communication in TFF to better handle production scenarios; design doc.
I’m modeling my code on ExecutorService and RemoteExecutor. I’m trying to figure out whether its necessary to support cardinalities > 1. In the event a single physical SILO actually contains K SILOS, I imagine I could just start K instances of the executor with a cardinality of 1.
Yes, you can configure a distributed runtime in such a way that (A) each of the processes that perform client-side processing is hosting some number of logical clients. You can also configure a runtime such that (B) there is a 1:1 correspondence (one worker, one logical client). I don’t remember which API you are using, but it’s likely that you’re defaulting to the former .
The two components of interest are FederatingExecutor and ComposingExecutor. In scenario (A), the runtime stack on the server side will include the ComposingExecutor, which will forward federated operators for subsets of clients down the stack (to individual processes that perform client-side work, or as you call them here - “silos”). The latter will each ontain a FederatingExecutor to translate federated operators for their subsets of clients into individual per-client operations. In scenario (B), there is no ComposingExecutor- it’s replaced by the FederatingExecutor lives on the server side. The latter doesn’t forward federated operators - it forwards operations on individual clients.
It sounds like you want (B). I think we should have a code snippet with an example, or you can just take a look at the execution contexts under core / backends - it should be pretty straightforward. For more interactive help, may I suggest that we follow up on Discord? There you can get a faster response from the team (but we’re also happy to continue here if you prefer - LMK).
Hope this helps!