Ruby Cache Part 23: In port code blocks

After declaring all of the structures we need in the state machine file, the first “functional” part of the file are the “in ports”. This section specifies what events to trigger on different incoming messages.

However, before we get to the in ports, we must declare our out ports.

out_port(request_out, RequestMsg, requestToDir);
out_port(response_out, ResponseMsg, responseToDirOrSibling);

This code essentially just renames requestToDir and responseToDirOrSibling to request_out and response_out. Later in the file, when we want to enqueue messages to these message buffers we will use the new names request_out and response_out. This also specifies the exact implementation of the messages that we will send across these ports. We will look at the exact definition of these types below in the file MSI-msg.sm.

Next, we create an in port code block. In SLICC, there are many cases where there are code blocks that look similar to if blocks, but they encode specific information. For instance, the code inside an in_port() block is put in a special generated file: L1Cache_Wakeup.cc.

All of the in_port code blocks are executed in order (or based on the priority if it is specified). On each active cycle for the controller, the first in_port code is executed. If it is successful, it is re-executed to see if there are other messages that can be consumed on the port. If there are no messages or no events are triggered, then the next in_port code block is executed.

There are three different kinds of stalls that can be generated when executing in_port code blocks. First, there is a parameterized limit for the number of transitions per cycle at each controller. If this limit is reached (i.e., there are more messages on the message buffers than the transition per cycle limit), then all in_port will stop processing and wait to continue until the next cycle. Second, there could be a resource stall. This happens if some needed resource is unavailable. For instance, if using the BankedArray bandwidth model, the needed bank of the cache may be currently occupied. Third, there could be a protocol stall. This is a special kind of action that causes the state machine to stall until the next cycle.

It is important to note that protocol stalls and resource stalls prevent all in_port blocks from executing. For instance, if the first in_port block generates a protocol stall, none of the other ports will be executed, blocking all messages. This is why it is important to use the correct number and ordering of virtual networks.

Below, is the full code for the in_port block for the highest priority messages to our L1 cache controller, the response from directory or other caches. Next we will break the code block down to explain each section.

in_port(response_in, ResponseMsg, responseFromDirOrSibling) {
    if (response_in.isReady(clockEdge())) {
        peek(response_in, ResponseMsg) {
            Entry cache_entry := getCacheEntry(in_msg.addr);
            TBE tbe := TBEs[in_msg.addr];
            assert(is_valid(tbe));

            if (machineIDToMachineType(in_msg.Sender) ==
                        MachineType:Directory) {
                if (in_msg.Type != CoherenceResponseType:Data) {
                    error("Directory should only reply with data");
                }
                assert(in_msg.Acks + tbe.AcksOutstanding >= 0);
                if (in_msg.Acks + tbe.AcksOutstanding == 0) {
                    trigger(Event:DataDirNoAcks, in_msg.addr, cache_entry,
                            tbe);
                } else {
                    trigger(Event:DataDirAcks, in_msg.addr, cache_entry,
                            tbe);
                }
            } else {
                if (in_msg.Type == CoherenceResponseType:Data) {
                    trigger(Event:DataOwner, in_msg.addr, cache_entry,
                            tbe);
                } else if (in_msg.Type == CoherenceResponseType:InvAck) {
                    DPRINTF(RubySlicc, "Got inv ack. %d left\n",
                            tbe.AcksOutstanding);
                    if (tbe.AcksOutstanding == 1) {
                        trigger(Event:LastInvAck, in_msg.addr, cache_entry,
                                tbe);
                    } else {
                        trigger(Event:InvAck, in_msg.addr, cache_entry,
                                tbe);
                    }
                } else {
                    error("Unexpected response from other cache");
                }
            }
        }
    }
}

First, like the out_port above “response_in” is the name we’ll use later when we refer to this port, and “ResponseMsg” is the type of message we expect on this port (since this port processes responses to our requests). The first step in all in_port code blocks is to check the message buffer to see if there are any messages to be processed. If not, then this in_port code block is skipped and the next one is executed.

in_port(response_in, ResponseMsg, responseFromDirOrSibling) {
    if (response_in.isReady(clockEdge())) {
        . . .
    }
}

Assuming there is a valid message in the message buffer, next, we grab that message by using the special code block peek. Peek is a special function. Any code inside a peek statement has a special variable declared and populated: in_msg. This contains the message (of type ResponseMsg in this case as specified by the second parameter of the peek call) at the head of the port. Here, response_in is the port we want to peek into.

Then, we need to grab the cache entry and the TBE for the incoming address. (We will look at the other parameters in response message below.) Above, we implemented getCacheEntry. It will return either the valid matching entry for the address, or an invalid entry if there is not a matching cache block.

For the TBE, since this is a response to a request this cache controller initiated, there must be a valid TBE in the TBE table. Hence, we see our first debug statement, an assert. This is one of the ways to ease debugging of cache coherence protocols. It is encouraged to use asserts liberally to make debugging easier.

peek(response_in, ResponseMsg) {
    Entry cache_entry := getCacheEntry(in_msg.addr);
    TBE tbe := TBEs[in_msg.addr];
    assert(is_valid(tbe));

    . . .
}

Next, we need to decide what event to trigger based on the message. For this, we first need to discuss what data response messages are carrying.

To declare a new message type, first create a new file for all of the message types: MSI-msg.sm. In this file, you can declare any structures that will be globally used across all of the SLICC files for your protocol. We will include this file in all of the state machine definitions via the MSI.slicc file later. This is similar to including global definitions in header files in C/C++.

In the MSI-msg.sm file, add the following code block:

structure(ResponseMsg, desc="Used for Dir->Cache and Fwd message responses",
          interface="Message") {
    Addr addr,                   desc="Physical address for this response";
    CoherenceResponseType Type,  desc="Type of response";
    MachineID Sender,            desc="Node who is responding to the request";
    NetDest Destination,         desc="Multicast destination mask";
    DataBlock DataBlk,           desc="data for the cache line";
    MessageSizeType MessageSize, desc="size category of the message";
    int Acks,                    desc="Number of acks required from others";

    // This must be overridden here to support functional accesses
    bool functionalRead(Packet *pkt) {
        if (Type == CoherenceResponseType:Data) {
            return testAndRead(addr, DataBlk, pkt);
        }
        return false;
    }

    bool functionalWrite(Packet *pkt) {
        // No check on message type required since the protocol should read
        // data block from only those messages that contain valid data
        return testAndWrite(addr, DataBlk, pkt);
    }
}

The message is just another SLICC structure similar to the structures we’ve defined before. However, this time, we have a specific interface that it is implementing: Message. Within this message, we can add any members that we need for our protocol. In this case, we first have the address. Note, a common “gotcha” is that you cannot use “Addr” with a capitol “A” for the name of the member since it is the same name as the type!

Next, we have the type of response. In our case, there are two types of response data and invalidation acks from other caches after they have invalidated their copy. Thus, we need to define an enumeration, the CoherenceResponseType, to use it in this message. Add the following code before the ResponseMsg declaration in the same file.

enumeration(CoherenceResponseType, desc="Types of response messages") {
    Data,       desc="Contains the most up-to-date data";
    InvAck,     desc="Message from another cache that they have inv. the blk";
}

Next, in the response message type, we have the MachineID which sent the response. MachineID is the specific machine that sent the response. For instance, it might be directory 0 or cache 12. The MachineID contains both the MachineType (e.g., we have been creating an L1Cache as declared in the first machine()) and the specific version of that machine type. We will come back to machine version numbers when configuring the system.

Next, all messages need a destination, and a size. The destination is specified as a NetDest, which is a bitmap of all the MachineID in the system. This allows messages to be broadcast to a flexible set of receivers. The message also has a size. You can find the possible message sizes in src/mem/ruby/protocol/RubySlicc_Exports.sm.

This message may also contain a data block and the number acks that are expected. Thus, we can include these in the message definition as well.

Finally, we also have to define functional read and write functions. These are used by Ruby to inspect in-flight messages on function reads and writes. Note: This functionality currently is very brittle and if there are messages in-flight for an address that is functionally read or written the functional access may fail.

You can download the complete MSI-msg.sm file here.

Now that we have defined the data in the response message, we can look at how we choose which action to trigger in the in_port for response to the cache.

// If it's from the directory...
if (machineIDToMachineType(in_msg.Sender) ==
            MachineType:Directory) {
    if (in_msg.Type != CoherenceResponseType:Data) {
        error("Directory should only reply with data");
    }
    assert(in_msg.Acks + tbe.AcksOutstanding >= 0);
    if (in_msg.Acks + tbe.AcksOutstanding == 0) {
        trigger(Event:DataDirNoAcks, in_msg.addr, cache_entry,
                tbe);
    } else {
        trigger(Event:DataDirAcks, in_msg.addr, cache_entry,
                tbe);
    }
} else {
    // This is from another cache.
    if (in_msg.Type == CoherenceResponseType:Data) {
        trigger(Event:DataOwner, in_msg.addr, cache_entry,
                tbe);
    } else if (in_msg.Type == CoherenceResponseType:InvAck) {
        DPRINTF(RubySlicc, "Got inv ack. %d left\n",
                tbe.AcksOutstanding);
        if (tbe.AcksOutstanding == 1) {
            // If there is exactly one ack remaining then we
            // know it is the last ack.
            trigger(Event:LastInvAck, in_msg.addr, cache_entry,
                    tbe);
        } else {
            trigger(Event:InvAck, in_msg.addr, cache_entry,
                    tbe);
        }
    } else {
        error("Unexpected response from other cache");
    }
}

First, we check to see if the message comes from the directory or another cache. If it comes from the directory, we know that it must be a data response (the directory will never respond with an ack).

Here, we meet our second way to add debug information to protocols: the error function. This function breaks simulation and prints out the string parameter similar to panic.

Next, when we receive data from the directory, we expect that the number of acks we are waiting for will never be less than 0. The number of acks we’re waiting for is the current acks we have received (tbe.AcksOutstanding) and the number of acks the directory has told us to be waiting for. We need to check it this way because it is possible that we have received acks from other caches before we get the message from the directory that we need to wait for acks.

There are two possibilities for the acks, either we have already received all of the acks and now we are getting the data (data from dir acks==0 in Table 8.3), or we need to wait for more acks. Thus, we check this condition and trigger two different events, one for each possibility.

When triggering transitions, you need to pass four parameters. The first parameter is the event to trigger. These events were specified earlier in the Event declaration. The next parameter is the (physical memory) address of the cache block to operate on. Usually this is the same as the address of the in_msg, but it may be different, for instance, on a replacement the address is for the block being replaced. Next is the cache entry and the TBE for the block. These may be invalid if there are no valid entries for the address in the cache or there is not a valid TBE in the TBE table.

When we implement actions below, we will see how these last three parameters are used. They are passed into the actions as implicit variables: address, cache_entry, and tbe.

If the trigger function is executed, after the transition is complete, the in_port logic is executed again, assuming there have been fewer transitions than that maximum transitions per cycle. If there are other messages in the message buffer more transitions can be triggered.

If the response is from another cache instead of the directory, then other events are triggered, as shown in the code above. These events come directly from Table 8.3 in Sorin et al.

Importantly, you should use the in_port logic to check all conditions. After an event is triggered, it should only have a single code path. I.e., there should be no if statements in any action blocks. If you want to conditionally execute actions, you should use different states or different events in the in_port logic.

The reason for this constraint is the way Ruby checks resources before executing a transition. In the generated code from the in_port blocks before the transition is actually executed all of the resources are checked. In other words, transitions are atomic and either execute all of the actions or none. Conditional statements inside the actions prevents the SLICC compiler from correctly tracking the resource usage and can lead to strange performance, deadlocks, and other bugs.

After specifying the in_port logic for the highest priority network, the response network, we need to add the in_port logic for the forward request network. However, before specifying this logic, we need to define the RequestMsg type and the CoherenceRequestType which contains the types of requests. These two definitions go in the MSI-msg.sm file not in MSI-cache.sm since they are global definitions.

It is possible to implement this as two different messages and request type enumerations, one for forward and one for normal requests, but it simplifies the code to use a single message and type.

enumeration(CoherenceRequestType, desc="Types of request messages") {
    GetS,       desc="Request from cache for a block with read permission";
    GetM,       desc="Request from cache for a block with write permission";
    PutS,       desc="Sent to directory when evicting a block in S (clean WB)";
    PutM,       desc="Sent to directory when evicting a block in M";

    // "Requests" from the directory to the caches on the fwd network
    Inv,        desc="Probe the cache and invalidate any matching blocks";
    PutAck,     desc="The put request has been processed.";
}
structure(RequestMsg, desc="Used for Cache->Dir and Fwd messages",  interface="Message") {
    Addr addr,                   desc="Physical address for this request";
    CoherenceRequestType Type,   desc="Type of request";
    MachineID Requestor,         desc="Node who initiated the request";
    NetDest Destination,         desc="Multicast destination mask";
    DataBlock DataBlk,           desc="data for the cache line";
    MessageSizeType MessageSize, desc="size category of the message";

    bool functionalRead(Packet *pkt) {
        // Requests should never have the only copy of the most up-to-date data
        return false;
    }

    bool functionalWrite(Packet *pkt) {
        // No check on message type required since the protocol should read
        // data block from only those messages that contain valid data
        return testAndWrite(addr, DataBlk, pkt);
    }
}

Now, we can specify the logic for the forward network in_port. This logic is straightforward and triggers a different event for each request type.

in_port(forward_in, RequestMsg, forwardFromDir) {
    if (forward_in.isReady(clockEdge())) {
        peek(forward_in, RequestMsg) {
            // Grab the entry and tbe if they exist.
            Entry cache_entry := getCacheEntry(in_msg.addr);
            TBE tbe := TBEs[in_msg.addr];

            if (in_msg.Type == CoherenceRequestType:GetS) {
                trigger(Event:FwdGetS, in_msg.addr, cache_entry, tbe);
            } else if (in_msg.Type == CoherenceRequestType:GetM) {
                trigger(Event:FwdGetM, in_msg.addr, cache_entry, tbe);
            } else if (in_msg.Type == CoherenceRequestType:Inv) {
                trigger(Event:Inv, in_msg.addr, cache_entry, tbe);
            } else if (in_msg.Type == CoherenceRequestType:PutAck) {
                trigger(Event:PutAck, in_msg.addr, cache_entry, tbe);
            } else {
                error("Unexpected forward message!");
            }
        }
    }
}

The final in_port is for the mandatory queue. This is the lowest priority queue, so it must be lowest in the state machine file. The mandatory queue has a special message type: RubyRequest. This type is specified in src/mem/protocol/RubySlicc_Types.sm It contains two different addresses, the LineAddress which is cache-block aligned and the PhysicalAddress which holds the original request’s address and may not be cache-block aligned. It also has other members that may be useful in some protocols. However, for this simple protocol we only need the LineAddress.

in_port(mandatory_in, RubyRequest, mandatoryQueue) {
    if (mandatory_in.isReady(clockEdge())) {
        peek(mandatory_in, RubyRequest, block_on="LineAddress") {
            Entry cache_entry := getCacheEntry(in_msg.LineAddress);
            TBE tbe := TBEs[in_msg.LineAddress];

            if (is_invalid(cache_entry) &&
                    cacheMemory.cacheAvail(in_msg.LineAddress) == false ) {
                Addr addr := cacheMemory.cacheProbe(in_msg.LineAddress);
                Entry victim_entry := getCacheEntry(addr);
                TBE victim_tbe := TBEs[addr];
                trigger(Event:Replacement, addr, victim_entry, victim_tbe);
            } else {
                if (in_msg.Type == RubyRequestType:LD ||
                        in_msg.Type == RubyRequestType:IFETCH) {
                    trigger(Event:Load, in_msg.LineAddress, cache_entry,
                            tbe);
                } else if (in_msg.Type == RubyRequestType:ST) {
                    trigger(Event:Store, in_msg.LineAddress, cache_entry,
                            tbe);
                } else {
                    error("Unexpected type from processor");
                }
            }
        }
    }
}

There are a couple of new concepts shown in this code block. First, we use block_on="LineAddress" in the peek function. What this does is ensure that any other requests to the same cache line will be blocked until the current request is complete.

Next, we check if the cache entry for this line is valid. If not, and there are no more entries available in the set, then we need to evict another entry. To get the victim address, we can use the cacheProbe function on the CacheMemory object. This function uses the parameterized replacement policy and returns the physical (line) address of the victim.

Importantly, when we trigger the Replacement event we use the address of the victim block and the victim cache entry and tbe. Thus, when we take actions in the replacement transitions we will be acting on the victim block, not the requesting block. Additionally, we need to remember to not remove the requesting message from the mandatory queue (pop) until it has been satisfied. The message should not be popped after the replacement is complete.

If the cache block was found to be valid, then we simply trigger the Load or Store event.