Understanding Simulation Infrastructure in SST
The Simulation Infrastructure
Simulation Object
The simulator’s global data structure is defined in simulation.h
as the base class class Simulation
.
The derived class that is actually instantiated is defined in simulation_impl.h
as class Simulation_impl
,
and the implementation is in simulation.cpp
.
The simulator maintains a per-thread singleton simulation object, which can be accessed by calling
Simulation_impl::getSimulation()
.
The simulation body is defined in Simulation::run()
, which consists of a simple loop that fetches the next
activity from the event queue, and then execute the activity by calling execute()
on the activity object.
This is just the standard implementation of Discrete Event Simulation (DES).
The simulation is driven forward by two mechanisms. First, events handled at time T
can generate new events
at time T + t
. Second, components can also register for periodic clocks that fires at a regular interval,
calling the component handler function. The handler function may also generate events that need to be handled
in the future.
Activity and Event
class Activity
is defined in activity.h, and it is the fundamental building block of Discrete Event Simulation.
class Activity
is an abstract class that cannot be directly instantiated, and it stores event timing related
information, such as delivery time, queue order, and priority, relying on derived classes to implement the
actual actions on calling the virtual abstract interface function execute()
.
The class also overrides operator<
for computing the partial ordering between objects in the global event queue.
Derivative objects from this class can be inserted into the global event queue at a specified time,
on which point the object will be executed.
class Event
is a non-abstract class derived from class Activity
. It is used for passing events across
links that connect two ports (which are defined in the configuration file).
This event object keeps track of the receiving end of the link when it is sent, and will invoke the call back function
on the receiving side on execution.
Sending an event over a non-polling link will result in the event being inserted into the global event queue.
Event objects and its derivatives are identified by unique IDs, which is of type id_type
defined as
std::pair<uint64_t, int>
in the class body, where the first component is the per-process unique ID,
and the second component is the rank number.
The per-process ID is dispensed by incrementing an atomic, static data member of the class, id_counter
,
which is then combined with the rank (i.e., partition ID for multi-process distributed simulation) of the
simulation process to form the event object’s ID.
Note that although the ID generation function generateUniqueId()
is defined in class Event
, the event object
itself has no ID field. In further derived classes, such as class MemEventBase
, IDs are stored for the object
and initialized using this function every time a new object is created.
Events can use this ID as a unique identifier (i.e., key to map objects) during memory hierarchy simulation by
calling getID()
method of class MemEventBase
.
class Event
also defines the handler type, class Event::Handler
, which is just a standard functor (function
object) that stores the object instance and the call back function on the receiving end of the event.
The handler object is registered to the receiving link at initialization time using configureLink()
of
class BaseComponent
, and will be called when the execute()
method of the event is invoked.
Global Event Queue
The global event queue is a data member named timeVortex
of type class TimeVortex
contained in
class Simulation_impl
. The class itself is an abstract class
that cannot be instantiated, but its derived class class TimeVortexPQ
under impl/timevortex
is what is
instantiated by the simulation.
The implementation uses std::priority_queue
for ordering, and in Simulation::run
, the closest events are popped
using the pop()
method of the time vortex, and then executed.
If the activity is an event object, calling execute()
will invoke the registered handler function, which is
registered with the receiving end of the link object.
All events sent over all links will be sent to and serialized by the global event queue.
Link objects (of non-polling type) store references to the global event queue, and for every event object sent,
the insert()
method of the global queue is called to put the event into the global queue at the specified cycle.
Simulation and Component Time
The global simulation time, also known as the “core time”, is the absolute unit of time in the simulation object,
which is of type SimTime_t
(defined as uint64_t
) stored in data member currentSimCycle
, and accessed
via getCurrentSimCycle()
.
This is the highest frequency cycle counter in the system, and hence the name “core time”, since the core typically
runs at the highest frequency in a real system. Components may have their own slower cycle counters, but these
counters must only count at a integer division of the core time, e.g., 1/2, 1/3 or 1/7 of the core frequency.
Component’s local clocks are of type Cycle_t
, which is also defined as uint64_t
, and components are implemented
with their local clocks rather than the global clock.
To convert from core time to component time, components can optionally create a time converter object, which is
defined as class TimeConverter
. The object is very simple: It only contains a clock divisor, namely the
data member factor
, which indicates that the component time runs at 1/factor
of the speed of the core time.
Conversion from core to component time (or from component to core time) only involves dividing the core time by
factor
(or multiplying component time by factor
).
The conversion object defines a component’s local time, and is hence used in all cases where a component
interacts with the simulation object (e.g., when registering a clock).
New time converter objects can be created with getTimeConverter()
in class TimeLord
. This function implements
several conversions between different time units, and is merely a helper for convenience.
The time lord object is a static member object of class Simulation
, and can be accessed by calling
Simulation_impl::getTimeLord()
anywhere in the source code.
The time base, which is used also as the core time, is initialized by calling init()
on the time lord object.
This is performed early in the initialization phase in function main()
using the time base read from the
configuration file. Component frequencies must be an integral division of the base frequency derived from the core
time, which is consistent with the fact that time converter objects can only use integer factors.
Clocks
class Clock
implements the analogy of a clock that fires periodically and invokes a clock handler function.
This corresponds to an actual clock in a real system which fires regularly and drives state transition of
storage components.
The clock object is a direct derivation of class Action
, which itself is a derived class of class Activity
acting as a thin abstraction layer of no critical functionality.
Clock objects can also be inserted into the global priority queue, and be invoked by calling execute()
just like
an event object.
The clock object contains the current component time, currentCycle
(of type Cycle_t
), the next global time
for invocation, next
(of type SimTime_t
), and a time converter object that defines the component’s local time.
The clock class also defines a handler class, Clock::Handler
, which is just a standard functor class that stores
the instance of the component and the call back member function.
The functor object overrides the function call operator, which if invoked, will call the call back member
function with the instance pointer, the component’s local time, and an optional argument.
A clock object can be registered with multiple handlers potentially driving different components by calling
registerHandler()
. Registered handlers are stored in a vector, staticHandlerMap
, and they will be called
exactly once on every clock event.
Handler functions return a boolean value indicating whether the function wishes to be triggered on the next clock
event. If the return value is true, then the handler will be removed from all future clock events.
Clocks are triggered by calling the execute()
method just like an event object.
Each clock invocation will advance currentCycle
by one, then invoke all handler functions and optionally remove
them from the registered handlers list, if the return value indicated so, and computes the next global cycle
that the clock should be invoked before inserting the same clock object into the global priority queue
by calling insertActivity()
.
Note that period->getFactor()
is just the number of global cycles between two consecutive local cycles.
An optimization allows the clock to be de-scheduled from the global queue, if there is currently no registered
handler, in order to avoid empty clock invocations, and rescheduled when handlers are inserted.
This is controlled by the boolean data member scheduled
, which, if set to false
, indicates that the clock is
not scheduled in the global queue. Meanwhile, schedule()
implements rescheduling by synchronizing
the local clock with the global clock first, and then inserting the clock object into the global queue.
Components that require a clock will typically implement a function tick()
, and register it with the clock
by calling registerClock()
with the desired component clock frequency, which is typically read from the configuration.
The function is implemented in class BaseComponent
, and will forward the call to the function with the same name in
class Simulation_impl
.
The simulator object maintains a global pool of clocks in clockMap
, each with a different frequency. A new clock will
be created if the requested frequency does not exist. The handler is then registered with one of the clocks, and the
clock will be scheduled.
Links and Ports
class Link
defines the communication end points between components or within the same component.
It enables components to insert events into the global event queue, which, if executed, can invoke a call back function
registered to the link object. This is roughly the equivalence of links in real systems where messages can be sent,
buffered, and processed by a functional unit.
In the source code, link objects are also referred to as “ports” due to the fact that it simulates a message endpoint.
Links are always attached to components and subcomponents, and they must be specified in the configuration
file. To connect two components, a Python Link
object must be created, and connect()
is called on the Python
link object with the Python component object, the name of the link (i.e., port name), and link latencies.
Components connected this way can send event objects using the corresponding C++ link object to each other by
calling the send()
method of class Link
.
Note that the naming here is a little bit confusing, because in the simulator implementation, class Link
is more
similar to end points of a link (i.e., the conception of ports in SST), while in Python configuration,
the sst.Link
object represents a connection, and users need to explicitly call connect()
on the link object,
with component objects and port names as arguments.
The simulator then, during initialization stage, maps Python sst.Link
objects into class LinkPair
objects, and
for each port name that is explicitly connected in the configuration, creates a class Link
object.
The simulator also does not implement any connection object. Instead, it just
let link objects keep a reference to each other, and directly use the reference for event delivery.
Link Initialization and Configuration
Link configuration is processed during initialization, in function prepareLinks()
of class Simulation_impl
.
This function creates link objects for each component as instructed by the configuration file (which is parsed
into a configuration graph),
inserts them into the per-component map linkMap
(contained in class ComponentInfo
), and connects link objects
using a class LinkPair
object. The link pair object will set pair_link
, which is the data member of class Link
,
to the other link object constituting the connection, essentially connecting the two link objects.
After link initialization, link objects can be retrieved by calling getLink()
on the per-component class LinkMap
object with the port name appearing in the Python configuration file.
Note that only connected ports in the Python configuration file are initialized as link objects.
Components and subcomponents can retrieve the initialized link objects by calling the member function
configureLink()
(which has several different flavors), passing the port name that occurs in the
configuration file as the first argument.
Link objects can thus be further configured (e.g., binding to a call back function) after
they are initialized by component’s member functions.
After prepareLinks()
, connected link objects have been created and hold a reference to each other,
but configuration has not finished yet, because the call back functions have not been registered.
This is indicated by the fact that the boolean data member configured
of class Link
is still set to false
.
This cannot be done, however, in the Python configuration file, and must be performed by the component itself.
During component initialization, which is performed in performWireUp()
of class Simulation_impl
,
the constructor needs to call configureLink()
, which is defined in
class BaseComponent
, with the name of the port (which should match the name specified in the configuration file),
a time converter object (or any equivalence that can be converted to
the time converter), and a functor object of type class Event::Handler
wrapping the call back function and the object
instance as the argument.
The call back function is registered with the specified link object by calling setFunctor()
, and configuration
concludes by calling setAsConfigured()
to set configured
to true
.
To summarize: In SST, link objects are initialized before their components are. Link initialization creates link objects, assigns them a port name, stores them in a per-component map structure, and connects those links as indicated by the Python configuration file. Later at component initialization time, link objects are retrieved given the component ID and the port name, and is eventually bounded to a call back function. This process is somehow counter-intuitive, as most would think that the components would initialize first, which creates the links.
Event Delivery
During simulation, components can send an event object via a link object to another component (possibly itself, if the
link is of type class SelfLink
) by calling send()
on the link object on its own side.
The send()
function will simply compute the time of delivery using component latency value (which is in
the unit of local cycles) and the time
converter, stores delivery time in the event object by calling setDeliveryTime()
, sets the delivery_link
of the
event object to a reference to the other link (i.e., stored in pair_link
of the link
object) by calling setDeliveryLink()
, and inserts the event object into the global event queue.
The event will be delivered at the exact cycle it is scheduled by calling execute()
on the event object, which
further calls deliverEvent()
of the target link object on delivery_link
, and eventually invokes the call back
function registered at the other end of the connection.
Note that the link object actually holds a reference to the global event queue in configuredQueue
and
recvQueue
by calling getTimeVortex()
to obtain the reference and assigning it to the data members during
initialization. In send()
function, the event object is inserted into the global event queue simply by calling
insert()
on the recvQueue
of the pair_link
.
Polling Links
What we have discussed above covers one of the two types of the link, the non-polling link, which has fixed latency,
and delivers the event using the Discrete Event Simulation mechanism at the exact specified cycle,
eliminating the need of explicit receiving calls.
There is a second type of link, the polling link, which does not involve a call back function, and requires the
receiver to explicitly call recv()
to acquire the event object.
This type of link is easier to configure, since they do not need a call back function.
If setPolling()
is called during initialization, then the link is a polling link.
The recvQueue
and configuredQueue
of a polling link is not the global event queue, but a class PollingLinkQueue
object local to the link object (the object is no more than a thin wrapper around std::multiset
).
On send()
calls, events are inserted into the polling queue of the other end of the connection.
On explicit recv()
calls, which must be made on the receiving end, event objects in the polling queue
is checked against the current global cycle, and returned to the caller if the deliver cycle of the event is
not in the future. In other words, the event appears to be received only at the cycle in which recv()
is called,
which can differ from the canonical delivery cycle.
Both types of links can also be configured as self links, defined in class SelfLink
as a derived class of
class Link
. A self link is no more than a simple modification of a normal link, where the pair_link
data member
points to the object itself, such that the component can receive the event object it sends via the self link.
Self links do not need to be declared in the configuration file, and they can be just added by calling
configureSelfLink()
in class BaseComponent
, and use the returned link object as usual.
Initialization and Teardown
Before simulation starts, components and links built from the configuration file needs to go through an initialize
phase. Similarly, before the simulation shuts down, they need to terminate properly and perform certain tasks such
as printing the statistics.
Four interface functions are defined in class BaseComponent
, namely, init()
, setup()
, complete()
,
and finish()
, as dummy functions. Derived components can override these interface functions to change their
behavior and implement actual initialization and teardown.
The high-level initialization sequence is defined in main.c
, function start_simulation()
, which calls into the
simulation object (of type class Simulation_impl
). The sequence is as follows: initialize()
, setup()
,
run()
, complete()
, and finish()
, which also loosely correspond to the four interface functions of
class BaseComponent
except run()
.
During the first stage initialize()
, components can send essential information to other connected components using
the link objects, and can perform multi-step initialization in a tick-by-tick manner (e.g., warming up a
state machine). This stage is similar to the simulation body in a way that it also has a loop in which a clock (tracked
by the variable untimed_phase
) is maintained and incremented for every loop iteration.
For each iteration, the init()
method is called on all components, with the value of the clock as argument, and
components can send event objects across links as in the simulation.
Links behave differently during the init()
stage. First, link latency is ignored, and event objects are always
delivered in the next tick (i.e., untimed_phase + 1
) or the same tick, if the event is sent synchronously.
Second, link objects must be polled like polling links in the normal simulation, and there is no automatic
delivery of event objects by calling the registered call back function handler.
Correspondingly, component init()
method must call the untimed version of send of receive, namely,
sendUntimedData()
(or sendUntimedData_sync()
for same-tick delivery) and recvUntimedData()
, to
communicate with each other during initialization.
These two functions are also aliased to sendInitData()
and recvInitData()
, which just forward the call to
the untimed versions.
To implement this special initialization behavior, link objects use a initialization-specific queue, the
untimedQueue
, to send and receive event objects for untimed operations. This data member is of type
class InitQueue
, which is just a thin wrapper layer of std::deque
. Events are enqueued and dequeued
into and from the queue (plus a time check for receive) on send and receive, respectively.
The recvQueue
, which is used during normal simulation, is disabled at this stage by setting it to an object
of type class UninitializedQueue
. An error will be reported if component initialization routines attempt to
use the regular link functions.
During the first initialization stage, the simulation object also tracks the number of messages being sent via
all links between all components using variable untimed_msg_count
, which is incremented by sendUntimedData()
of link objects. The first stage concludes when the number of events is zero on a tick, indicating that all
components have finished message exchange.
Links also change to their normal behavior after finalizeLinkConfiguration()
of class ComponentInfo
is called.
This function just iteratively calls finalizeConfiguration()
on all links in the component and recursively calls the
same function in child subcomponents, enabling the recvQueue
of all links in the simulation.
The untimed queue is disabled by assigning afterInitQueue
to it, which will report error if used.
The second stage initialization is simple: Just iterate over all components in the system, and call setup()
on these
components. This is the last notification before simulation begins.
complete()
and finish()
are identical to initialize()
and setup()
, except that complete()
and finish()
calls complete()
and finish()
on components, respectively.
complete()
also runs tick-by-tick, in which components communicate post-simulation states using the untimed
channel of the links.
During complete()
, links behavior will be altered as in initialize()
by calling prepareForComplete()
on
each ComponentInfo
, which forwards the call to the function with the same name in class Link
.
This function will disable the recvQueue
and reenable untimedQueue
for untimed communication.
The complete()
phase concludes when no message is sent during a tick.
finish()
is the last notification before objects are destroyed. Statistics information should be printed in this stage
according to the source code.
Shared Memory IPC
Inter-process communication can be performed using Linux shared memory support in SST. IPC provides a message-based communication channel between two independent processes, most likely providing external information to support simulation, such as the flow of instruction (with potentially multiple threads) to the core pipeline simulator. In the abstract model, the IPC channel consists of one or more buffers allocated from the shared memory region, with each buffer being a circular queue that serves messages in a FIFO order. The writing end inserts messages into the queue, and may block on this operation if the queue is full to avoid queue overflow. The reading end retrieves the oldest message from the queue, and can be either blocking or non-blocking. A blocking read will wait for new messages to be inserted, if the queue is empty. A non-blocking read will return a boolean value indicating whether the operation has succeeded or not. An empty or locked queue will cause a non-blocking read to fail.
Circular Queue
The circular queue (class CircularBuffer
) implements a single message buffer as a FIFO queue. To avoid reader
writer contention, the queue is protected by a spin lock, bufferMutex
(class SSTMutex
), which only grants one
exclusive reader or writer.
The queue is parameterized to serve message objects of type T
, given as the template argument.
The actual storage of the queue is at the end of the object, defined as T buffer[0]
, which must be allocated as
extra storage after the object itself.
The queue maintains two pointers, the read pointer readIndex
for retrieving messages, which points to the next
message to read, and the write pointer writeIndex
for inserting new messages, which points to the next free
slot to insert.
The queue is considered as empty when the two pointers have the same value, and full when the write pointer is one slot
before the read pointer (i.e., the actual capacity is one less than the number of physically allocated slots).
Calling read()
on the queue will perform a blocking read, which will not return until the queue is unlocked,
or the queue becomes non-empty.
readNB()
performs non-blocking read, which simply returns false
if the read fails due to a locked queue, or
because of the queue is empty.
Write operations will always block on a full queue, or on the spin lock, which is performed by calling write()
.
Tunnel
The tunnel object defines the layout of the shared memory, given a pointer to the shared region.
The region consists of three parts. The first part is the metadata for the shared region, including the
region size and the layout of the rest of the region. The layout of this part is described by
struct InternalSharedData
. Note that this struct is variable-sized, i.e., there is an array of offset values
at the end of the struct (defined as size_t offsets[0]
, and must be allocated as extra space after the struct).
The first element of the offsets
array is the offset to the second part, while the rest of the array points to
individual queue objects.
The second part is a user-defined shared data structure that is transparent to the tunnel object, the type
of which is given in the template parameter. Storage
is allocated for this data structure right after the tunnel metadata. The layout of this part is unknown,
and users should initialize this part properly.
The last part is an array of buffers, the message type of which is given as template parameters.
There can be multiple buffers of the same size in the shared memory region, and these buffers are allocated in the
remaining storage of the region after the first two parts.
Buffers are referred to by the offsets
array in the first part, as we have mentioned earlier.
Note that although the tunnel object also contains pointers to data structures in the shared region,
in, for example, data members isd
that points to the first part, and circBuffs
that contains pointers to all
the buffer objects, the tunnel object is not shared across processes.
Data member shmPtr
tracks the beginning of the shared memory region, which is allocated outside of this class,
and passed as an constructor argument. Data member shmSize
tracks the size of the region in bytes.
Data member nextAllocPtr
is the allocation pointer that points to the next unallocated byte within the region,
and is advanced by the memory allocation function reserveSpace()
to reserve memory for the previously mentioned
shared data structures.
Data member numBuffs
stores the number of queues in the third part, and buffSize
stores the number of elements
in each queue.
Boolean data member master
tracks whether the tunnel object is on the SST side (set to true
),
or on some third-party process side (set to false
). The master tunnel object is responsible for initializing the
shared data structure, while the non-master object just reads them from the shared region and uses them for local
configuration.
Member function reserveSpace()
allocates a given number of bytes from the shared region by advancing the
nextAllocPtr
pointer. It takes a type as template argument, and a function argument extraSpace
.
The number of bytes allocated is the size of the template type, plus extraSpace
. The extraSpace
argument is
to adjust the size of the actual allocation for variable sized data structures, such as the payload part of the
circular queue (defined as T buffer[0]
) and the offsets
array of struct InternalSharedData
.
The function returns a pair consisting of the allocated size, and a pointer to the allocated memory address.
Messages in the queue can be accessed by calling, with the index of the buffer as the argument,
writeMessage()
, readMessage()
, and readMessageNB()
, the semantics of which is straightforward.
Data structures in the shared region is initialized by calling initialize()
with the pointer to the beginning
of the region. Initialization only takes place when master
equals true
.
This function first calls reserveSpace()
to allocate storage for struct InternalSharedData
as well as the trailing offsets
array (which needs (1 + numBuffs) * sizeof(size_t)
bytes) at the beginning of the
region. Important system arguments, such as the number of queue objects and the size of the shared region, are
stored in the shared region such that the other end of the IPC channel (where master
equals false
) can access
these information as well.
It then allocates storage for template argument type ShareDataType
right after the previous part, and stores the
pointer value to this part to offsets[0]
.
Eventually, the queues are initialized in a loop, which allocates storage for the queue itself, and the message
storage at the end of the queue, which takes an extra sizeof(MsgType) * buffSize
bytes.
Pointers to queue objects are inserted into both the shared part (isd->offsets
) and data member circBuffs
.
On the other hand, if master
equals false
, initialization is merely pulling information from the shared region
(which was written by the master tunnel object), and stores them in the local, non-master tunnel object.
Data member shmSize
stores the total number of bytes (aligned to page boundaries) required for the given
shared region layout. This number can be obtained by external classes by calling getTunnelSize()
.
SHMParent
class SHMParent
encapsulates the OS interface for creating a shared memory region, and contains a master
tunnel object for managing the layout of the shared region. The object is supposed to be instanciated at the
master side of the IPC, which allocates a shared memory region based on the message class, number of
buffers, and the buffer size. The region name (a string object) is then passed to the non-master side of the IPC
(via command line argument of the child process started by the master, for example), such that the other side
can also map the same shared region, instanciate the non-master tunnel object, and pull the configuration
information to complete the construction of the IPC channel.
On construction, class SHMParent
constructor generates a key of the format "/sst_shmem_%u-%u-%d"
, with the three
integer components being the PID, the component ID of the owner (given in construction argument), and a random number.
This key is then used as the path argument for shm_open()
, which creates a shared region with the given path as the
key. Other processes could map the same region into their own address space using the same key.
The tunnel object tunnel
is also constructed after this with arguments defining the shared region’s layout.
shm_open()
returns a file descriptor that represents the shared region, which is then passed into ftruncate()
with the desired size of shared region obtained from the tunnel object by calling getTunnelSize()
.
Finally, the shared region is mapped into the virtual address space of the master process by calling mmap()
with the
file descriptor and region size.
The layout is initialized by calling initialize()
on the tunnel object using the mapped address of the shared
region as the argument, which concludes the shared memory IPC initialization process on the master side.
SHMChild
class SHMChild
implements shared memory IPC endpoint on the non-master side. The class is constructed with the name
of the shared region. It calls shm_open()
to obtain the file descriptor that corresponds to the name, and then
maps a smaller region of size sizeof(InternalSharedData)
. This step is necessary, because the child has not yet
known the size of the shared region. It then initializes the non-master tunnel object, which pulls essential information
from the mapped region, including the shared region size, shmSize
(note that the constructor being called is
not the same one as in master tunnel object).
Then the small region is unmapped, and the larger region is mapped into the address space of the process by calling
mmap()
using the file descriptor and shmSize
.
The tunnel object is initialized again using the full shared region by calling initialize()
under non-master mode,
which pulls the layout information from the struct InternalSharedData
part.
Note that the object offsets recorded in struct InternalSharedData
are relative offsets to the beginning of the shared
region. The non-master tunnel needs to rebuild the virtual address pointer by adding these offsets to the
beginning address of the shared region, since the OS does not guarantee to map the region on the same virtual address
in different processes.
Each initialize()
call from the non-master tunnel will decrement the isd->expectedChildren
counter stored in the
shared region. The counter’s initial value is provided as a construction argument to the master tunnel to upper bound
the number of non-master tunnels that are allowed to connect to this shared region.
Once the counter value reaches zero after a non-master tunnel completes initialization, the
shared memory file descriptor will be removed by calling shm_unlink()
on the non-master side, which eliminates the
region name from the shared memory name space, and prevents it from being mapped by other non-master IPC endpoints.
Variants
There are also variants of the shared memory endpoint implementation to fit into different non-master environment. The
most typical of them is the shared memory child endpoint specially designed for Pin3, due to the fact that common
operating system calls would not work properly in Pin3, which must be replaced with the PinCRT version.
class MMAPChild_Pin3
implements this non-master endpoint that can be used in Pin environment. The high-level logic is
identical to those in the standard implementation of class SHMChild
.
Using mmap()
Instead of relying on shm_open()
to return a file descriptor representing shared region, another way of sharing memory
between processes is to directly mmap()
a disk file backed by external storage.
The high-level logic is similar to the one using shared memory, except that the file descriptor is obtained by creating
a file under the /tmp
directory, which is then passed to mmap()
.
This is implemented in class MMAPParent
.
IPC Tunnel
The combination of tunnel objects and shared memory objects is unnecessarily complicated, since shared memory
allocation and layout management are divided into two classes.
class IPCTunnel
addresses this problem by implementing both allocation and management of the shared region in
one single class.
Like tunnel objects, the class operates on master and non-master mode, when running on SST and external processes,
respectively.
The constructor that takes the component ID and buffer parameters will initialize the object in master mode, which,
just a like a class SHMParent
object, will first generate a key for the shared region and allocate shared memory
using shm_open()
system call, and then initialize the layout into three parts like a tunnel object.
The constructor that takes the key will initialize the object in non-master mode, which simply opens the shared region,
maps the region into its own address space, and then pulls information from the shared information area to the local.
Processes communicate with each other with the same message-based model by calling writeMessage()
,
readMessage()
, and readMessageNB()
, just as in a tunnel object.