Google

Saturday, February 9, 2008

Memory

Tyrell: If we give them a past, we create a cushion for their emotions and, consequently, we can control them better.
Deckard: Memories. You're talking about memories.
—the movie Blade Runner
In this chapter, you will learn everything you need to know about memory in embedded systems. In particular, you will learn about the types of memory you are likely to encounter, how to test memory devices to see if they are working properly, and how to use Flash memory.
6.1. Types of Memory
Many types of memory devices are available for use in modern computer systems. As an embedded software engineer, you must be aware of the differences between them and understand how to use each type effectively. In our discussion, we will approach these devices from a software viewpoint. As you are reading, try to keep in mind that the development of these devices took several decades and that there are significant physical differences in the underlying hardware. The names of the memory types frequently reflect the historical nature of the development process and are often more confusing than insightful.
Most software developers think of memory as being either random-access (RAM) or read-only (ROM). But, in fact, there are subtypes of each and even a third class of hybrid memories. In a RAM device, the data stored at each memory location can be read or written, as desired. In a ROM device, the data stored at each memory location can be read at will, but never written. In some cases, it is possible to overwrite the data in a ROM-like device. Such devices are called hybrid memories because they exhibit some of the characteristics of both RAM and ROM. Figure 6-1 provides a classification system for the memory devices that are commonly found in embedded systems.
6.2. Memory Testing
One of the first pieces of serious embedded software you are likely to write is a memory test. Once the prototype hardware is ready, the designer would like some reassurance that she has wired the address and data lines correctly and that the memory chips are working properly. At first this might seem like a fairly simple assignment, but as you look at the problem more closely you will realize that it can be difficult to detect subtle memory problems with a simple test. In fact, as a result of programmer naiveté, many embedded systems include memory tests that would detect only the most catastrophic memory failures. Some of these might not even notice that the memory chips have been removed from the board!

A Few Words about Hardware

It is the nature of programming that books about the subject must include examples. Typically, these examples are selected so that they can be easily experimented with by interested readers. That means readers must have access to the very same software development tools and hardware platforms used by the author. Unfortunately, in the case of embedded programming, this is unrealistic. It simply does not make sense to run any of the example programs on the platforms available to most readers—PCs, Macs, and Unix workstations. Even selecting a standard embedded platform is difficult. As you have already learned, there is no such thing as a "typical" embedded system. Whatever hardware is selected, the majority of readers will not have access to it. But despite this rather significant problem, I do feel it is important to select a reference hardware platform for use in the examples. In so doing, I hope to make the examples consistent and, thus, the entire discussion more clear.

Variations on the Theme

Unlike software designed for general-purpose computers, embedded software cannot usually be run on other embedded systems without significant modification. This is mainly because of the incredible variety in the underlying hardware. The hardware in each embedded system is tailored specifically to the application, in order to keep system costs low. As a result, unnecessary circuitry is eliminated and hardware resources are shared wherever possible. In this section you will learn what hardware features are common across all embedded systems and why there is so much variation with respect to just about everything else.
By definition all embedded systems contain a processor and software, but what other features do they have in common? Certainly, in order to have software, there must be a place to store the executable code and temporary storage for runtime data manipulation. These take the form of ROM and RAM, respectively; any embedded system will have some of each. If only a small amount of memory is required, it might be contained within the same chip as the processor. Otherwise, one or both types of memory will reside in external memory chips.

What Is an Embedded System?

Tough question, really. There is no one answer. I asked at least a half dozen industry experts and got as many answers. In fact, it was so hard to pin down a definition that I almost started thinking "embedded system" was just another term for "software." Nevertheless, there's one fact about embedded systems that all the experts do seem to agree on:
An embedded system is any software system that must be designed on a platform different from the platform on which the system is intended to be deployed.
What is meant by platform? On the development side, platform typically refers to an operating system capable of hosting software development, such as Windows, Solaris, HP, etc. On the target side, the word platform refers to the devices on which the embedded system will be deployed.
OK then, why the design constraint? Why aren't embedded targets capable of hosting software development? Because these targets are optimized for performance and/or simplicity, they lack the equipment necessary for development (such as keyboards, monitors, networking, etc.). In general, development for the embedded environment is referred to as "cross-platform development."

Friday, February 8, 2008

ISP vs. IAP

ISP vs. IAP
When it comes to re-programming Flash memory that is soldered down to a PCB (either integrated into the microcontroller or external), there are two programming methods: ISP and IAP.
ISP: In-System Programming
ISP allows for re-programming of a Flash memory device while it is soldered into the target hardware. However, the application needs to be stopped during the re-programming process. Usually, ISP requires that a service technician manually starts the re-programming procedure by halting the application and setting it into a special boot and/or programming mode. Only after programming is completed, the application can be restarted.
In the Philips 89C51Rx2 series, ISP is implemented with the boot loader. The chip is set to ISP mode either by driving pin PSEN high externally right after a hardware reset or by software. When in ISP mode, the 89C51Rx2 accepts Flash-programming commands via the serial interface.
IAP: In-Application Programming
IAP allows for re-programming of a Flash memory device while it is soldered into the target hardware and while the application code is running. With IAP it is possible to implement applications that can be re-programmed remotely without the need of a service technician to actually be present.
In general, IAP can always be realized with external Flash memory, where microcontroller and memory are separated components. This is true as long as there is some additional code memory available out of which the microcontroller can execute code, while the Flash memory is re-programmed.
With on-chip Flash, IAP is only possible if supported by the microcontroller. The Philips 89C51Rx2 parts support IAP also via the boot loader. The application code can call functions in the boot loader area by loading parameters into the registers R0, R1 and DPTR and then calling a specific address in the boot loader. To make these functions easier to use, the Embedded Systems Academy provides a C library supporting all boot loader functions. This allows erasing and programming directly from the C level, simply by calling C functions. For details, check ESAcademy's technical library at www.esacademy.com/faq/progs/.
The following is an example application that uses the C library to perform various IAP operations.

#include "rx2iaplib.h" // IAP Library header file

void isr0(void) interrupt 0 { ; }

void main(void)
{
unsigned int intvector;
unsigned char foo = 0x55;
iap_init(16); // initialize IAP Library by
// specifying crystal
// frequency rounded down
// to nearest integer

iap_erase_block(BLOCK_0x4000_0x7FFF); // erase flash memory from
// 0x4000 to 0x7FFF

iap_program_data_byte(foo, 0x4000); // program the value of a
// variable at 0x4000

intvector = iap_read_data_byte(0x0004); // read the address of the
intvector <<= 8; // interrupt 0 service
intvector = iap_read_data_byte(0x0005); // routine

// set security bits 1 and 2
iap_program_security_bits(SECURITY_BIT_1 SECURITY_BIT_2);

while(1);
}
The C library, combined with the features of the boot loader provided in ROM, makes it easy to re-program the start address of the boot loader currently used. This actually allows installing a customized boot loader instead of the one included in ROM. Such an application specific boot loader could ensure that essential initialization code gets executed, even if the chip starts in boot mode.
void callnewbootloader(void)
{
iap_erase_boot_vector_status_byte();
iap_program_boot_vector(0xE8); // new bootloader at 0xE800
iap_program_status_byte(0xFF); // execute bootloader on reset
reset_device();
}

Thursday, February 7, 2008

Embedded Systems Design

This paper addresses the design of reactive real-time embedded systems. Such systems are often heterogeneous in implementation technologies and design styles, for example by combining hardware ASICs with embedded software. The concurrent design process for such embedded systems involves solving the specification, validation, and synthesis problems. We review the variety of approaches to these problems that have been taken.

I. INTRODUCTION
Reactive real-time embedded systems are pervasive in the electronics system industry. Applications include vehicle control, consumer electronics, communication systems, remote sensing, and household appliances. In such applications, specifications may change continuously, and time-to-market strongly affects success. This calls for the use of software programmable components with behavior that can be fairly easily changed. Such systems, which use a computer to perform a specific function, but are neither used nor perceived as a computer, are generically known as embedded systems. More specifically, we are interested in reactive embedded systems. Reactive systems are those that react continuously to their environment at the speed of the environment. They can be contrasted with interactive systems, which react with the environment at their own speed, and transformational systems, which take a body of input data and transform it into a body of output data.
A large percentage of the world-wide market for microprocessors is filled by micro-controllers that are the programmable core of embedded systems. In addition to microcontrollers, embedded systems may consist of ASICs and/or field programmable gate arrays as well as other programmable computing units such as Digital Signal Processors (DSPs). Since embedded systems interact continuously with an environment that is analog in nature, there must typically be components that perform A/D and D/A conversions. A significant part of the design problem consists of deciding the software and hardware architecture for the system, as well as deciding which parts should be implemented in software running on the programmable components and which should be implemented in more specialized hardware.
Embedded systems often are used in life critical situations, where reliability and safety are more important criteria than performance. Today, embedded systems are designed with an ad hoc approach that is heavily based on earlier experience with similar products and on manual design. Use of higher level languages such as C helps somewhat, but with increasing complexity, it is not sufficient. Formal verification and automatic synthesis of implementations are the surest ways to guarantee safety. However, both formal verification and synthesis from high levels of abstraction have been demonstrated only for small, specialized languages with restricted semantics. This is at odds with the complexity and heterogeneity found in typical embedded systems.
We believe that the design approach should be based on the use of one or more formal models to describe the behavior of the system at a high level of abstraction, before a decision on its decomposition into hardware and software components is taken. The final implementation of the system should be made as much as possible using automatic synthesis from this high level of abstraction to ensure implementations that are “correct
by construction.” Validation (through simulation or verification) should be done as much as possible at the higher levels of abstraction. A typical hardware architecture for an embedded system is illustrated in Figure 1. This type of architecture combines custom hardware with embedded software, lending a certain measure of complexity and heterogeneity to the design. Even within the software or hardware portions themselves, however, there is often heterogeneity. In software, control-oriented processes might be mixed under the supervision of a multitasking real time kernel running on a microcontroller. In addition, hard-real time tasks may run cooperatively on one or more programmable DSPs. The design styles used for these two software subsystems are likely to be quite different from one another, and testing the interaction between them is unlikely to be trivial. The hardware side of the design will frequently contain one or more ASICs, perhaps designed using logic or behavioral synthesis tools. On the other hand, a significant part of the hardware design most likely consists of interconnections of commodity components, such as processors and memories. Again, this time on the hardware side, we find heterogeneity. The design styles used to specify and simulate the ASICs and the interconnected commodity components are likely to be quite different. A typical system, therefore, not only mixes hardware design with software design, but also mixes design styles within each of these categories.
Most often the set of tasks that the system implements are not specified in a rigorous and unambiguous fashion, so the design process requires several iterations to obtain convergence. Moreover, during the design process, the level of abstraction, detail, and specificity in different parts of the design varies. To complicate matters further, the skill sets and design styles used by different engineers on the project are likely to be different. The net result is that during the design process, many different specification and modeling techniques will be used. Managing the design complexity and heterogeneity is the key problem. We believe that the use of formal models and high level synthesis for ensuring safe and correct designs depends on understanding the interaction between diverse formal models. Only then can the simplicity of modeling required by verification and synthesis be reconciled with the complexity and heterogeneity of real-world design. The concurrent design process for mixed hardware/software embedded systems involves solving the following subproblems: specification, validation, and synthesis. Although these problems cannot be entirely separated, we deal with them below in three successive sections.
II. SPECIFICATION AND MODELING
The design process is often viewed as a sequence of steps that transforms a set of specifications described informally into a detailed specification that can be used formanufacturing. All the intermediate steps are characterized by a transformation from a
more abstract description to a more detailed one. A designer can perform one or more steps in this process. For the designer, the “input” description is a specification, the final
description of the design is an implementation. For example, a software designer may see a set of routines written in C as an implementation of her/his design even though several other steps may be taken before the design is ready for manufacturing. During this process, verification of the quality of the design with respect to the demands placed on its performance and functionality has to be carried out. Unfortunately, the descriptions of the
design at its various stages are often informal and not logically connected by a set of precise relationships. We advocate a design process that is based on representations with precise mathematical meaning so that both the verification and the map from the initial description to the various intermediate steps can be carried out with tools of guaranteed performance. Such an approach is standard in certain communities, where languages with strong formal properties are used to ensure robust design. Examples include ML [2], dataflow languages (e.g. Lucid [3], Haskell [4]) and synchronous languages (e.g., Lustre, Signal, Esterel [5]). There is a broad range of potential formalizations of a design, but most tools and designers describe the behavior of a design as a relation between a set of inputs and a set of outputs. This relation may be informal, even expressed in natural language. It is easy to find examples where informal specifications resulted in unnecessary redesigns. In our opinion, a formal model of a design should consist of the following components:
1. A functional specification, given as a set of explicit or implicit relations which involve inputs, outputs and possibly internal (state) information.1
2. A set of properties that the design must satisfy, given as a set of relations over inputs, outputs, and states, that can be checked against the functional specification.
3. A set of performance indices that evaluate the quality of the design in terms of cost, reliability, speed, size, etc., given as a set of equations involving, among other things, inputs and outputs.
4. A set of constraints on performance indices, specified as a set of inequalities. The functional specification fully characterizes the operation of a system, while the performance constraints bound the cost (in a broad sense). The set of properties is redundant, in that in a properly constructed design, the functional specification satisfies these properties. However, the properties are listed separately because they are simpler and more abstract (and also incomplete) compared to the functional specification. A property is an assertion about the behavior, rather than a description of the behavior. It is an abstraction of the behavior along a particular axis. For example, when designing a network protocol, we may require that the design never deadlock (this is also called a liveness property). Note that liveness does not completely specify the behavior of the protocol; it is instead a property we require our protocol to have. For the same protocol, we may require that any request will eventually be satisfied (this is also called fairness). Again this does not completely specify the behavior of the protocol but it is a required property. Given a formal model of the functional specifications and of the properties, we can classify properties in three groups:
1. Properties that are inherent in themodel of computation (i.e., they can be shown formally to hold for all specifications described using that model).
2. Properties that can be verified syntactically for a given specification (i.e., they can be shown to hold with a simple, usually polynomial-time, analysis of the specification).
3. Properties that must be verified semantically for a given specification (i.e., they can be shown to hold by executing, at least implicitly, the specification for all inputs that can occur). For example, consider the property of determinate behavior, i.e., the fact that the output of a system depends only on its inputs and not on some internal, hidden choice. Any design described by a dataflow network (a formal model to be described later) is determinate, and hence this property need not be checked. If the design is represented by a network of FSMs, determinacy can be assessed by inspection of the state transition function. In some discrete event models (for example those embodied in Verilog and VHDL) determinacy is difficult to prove: it must be checked by exhaustive simulation. The design process takes a model of the design at a level of abstraction and refines it to a lower one. In doing so, the designer must ensure that the properties at that level of abstraction are verified, that the constraints are satisfied, and that the performance indices are satisfactory. The refinement process involves also mapping constraints, performance indices and properties to the lower level so that they can be computed for the next level down.2 Figure 2 shows a key refinement stage in embedded system design. The more abstract specification in this case is an executable functional model that is closer to the problem level. The specification undergoes a synthesis process (which may be partly manual) that generates a model of an implementation in hardware. That model itself may still be fairly abstract, capturing for example only timing properties. In this example the
model is presumably used for hardware/software partitioning. While figure 2 suggests a purely top-down process, any real design needs more interaction between specification and implementation. Nonetheless, when a design is complete, the best way to present and document it is top down. This is enough to require that the methodology support top-down design.
A. Elements of a Model of Computation
A language is a set of symbols, rules for combining them (its syntax), and rules for interpreting combinations of symbols (its semantics). Two approaches to semantics have evolved, denotational and operational. A language can have both (ideally they are consistent with one another, although in practice this can be difficult to achieve). Operational semantics, which dates back to Turing machines, gives meaning of a language in terms of actions taken by some abstract machine, and is typically closer to the implementation. Denotational semantics, first developed by Scott and Strachey [7], gives the meaning of the language in terms of relations.
How the abstract machine in an operational semantics can behave is a feature of what we call the model of computation underlying the language. The kinds of relations that are possible in a denotational semantics is also a feature of the model of computation. Other features include communication style, how individual behavior is aggregated to make more complex compositions, and how hierarchy abstracts such compositions. A design (at all levels of the abstraction hierarchy from functional specification to final implementation) is generally represented as a set of components, which can be considered as isolated monolithic blocks, interacting with each other and with an environment that is not part of the design. The model of computation defines the behavior and interaction of these blocks.
In the sections that follow, we present a framework for comparing elements of different models of computation, called the tagged-signal model, and use it to contrast different styles of sequential behavior, concurrency, and communication. We will give precise definitions for a number of terms, but these definitions will inevitably conflict with standard usage in some communities. We have discovered that, short of abandoning the use of most common terms, no terminology can be consistent with standard usage in all related communities. Thus we attempt to avoid confusion by being precise, even at the risk of being pedantic.
A.1 The Tagged-SignalModel
Two of the authors (Lee and Sangiovanni-Vincentelli) have proposed the tagged-signal model [8], a formalism for describing aspects of models of computation for embedded system specification. It is denotational in the Scott and Strachey [7] sense, and it defines a semantic framework (of signals and processes) within which models of computation can be studied and compared. It is very abstract—describing a particular model of computation involves imposing further constraints that make it more concrete. The fundamental entity in the Tagged-Signal Model is an event—a value/tag pair. Tags are often used to denote temporal behavior. A set of events (an abstract aggregation) is a signal. Processes are relations on signals, expressed as sets of n-tuples of signals. A particular model of computation is distinguished by the order it imposes on tags and the character of processes in the model. Given a set of values V and a set of tags T, an event is a member of T _ V , i.e., an event has a tag and a value. A signal s is a set of events. A signal can be viewed as a subset of T _V . A functional signal is a (possibly partial) function from T to V . The set of all signals is denoted S. A tuple of n signals is denoted s, and the set of all such tuples is denoted Sn. The different models of time that have been used to model embedded systems can be translated into different order relations on the set of tags T in the tagged signal model. In particular, in a timed system T is totally ordered, i.e., there is a binary relation < on members of T such that if t1; t2 2 T and t1 6= t2, then either t1 < t2 or t2 < t1. In an untimed system, T is only partially ordered. A process P with n signals is a subset of the set of all n-tuples of signals, Sn for some n. A particular s 2 Sn is said to satisfy the process if s 2 P. An s that satisfies a process is called a behavior of the process. Thus a process is a set of possible behaviors, or a relation between signals. For many (but not all) applications, it is natural to partition the signals associated with a process into inputs and outputs. Intuitively, the process does not determine the values of the inputs, and does determine the values of the outputs. If n = i + o, then (Si; So) is a partition of Sn. A process with i inputs and o outputs is a subset of Si _ So. In other words, a process defines a relation between input signals and output signals. A (i + o)- tuple s 2 Si+o is said to satisfy P if s 2 P. It can be written s = (s1;s2), where s1 2 Si is an i-tuple of input signals for process P and s2 2 So is an o-tuple of output signals for process P. If the input signals are given by s1 2 Si, then the set I = f(s1; s2) j s2 2 Sog describes the inputs, and I \ P is the set of behaviors consistent with the input s1. A process F is functional with respect to a partition if it is a
single-valued, possibly partial, mapping from Si to So. That is, if (s1; s2) 2 F and (s1; s3) 2 F, then s2 = s3. In this case, we can write s2 = F(s1), where F : Si ! So is a (possibly partial) function. Given the input signals, the output signals are determined (or there is unambiguously no behavior).
Consider, as a motivating example introducing these several mechanisms to denote temporal behavior, the problem of modeling a time-invariant dynamical system on a computer. The underlying mathematical model, a set of differential equations over continuous time, is not directly implementable on a digital computer, due to the double quantization of real numbers into finite bit strings, and of time into clock cycles. Hence a first translation is required, by means of an integration rule, from the differential equations to a set of difference equations, that are used to compute the values of each signal with a given tag from the values of some other signals with previous and/or current
tags.
If it is possible to identify several strongly connected components in the dependency graph3, then the system is decoupled. It becomes then possible to go from the total order of tags implicit in physical time to a partial order imposed by the depth-first ordering of the components. This partial ordering gives us some freedomin implementing the integration rule on a computer. We could, for example, play with scheduling by embedding the partial order into the total order among clock cycles. It is often convenient, for example, to evaluate a component completely, for all tags, before evaluating components that depend on it. Or it is possible to spread the computation among multiple processors.
In the end, time comes back into the picture, but the double mapping, from total to partial order, and back to total order again, is essential to
1. prove properties about the implementation (e.g., stability of the integration method, a bound on the maximum execution time, . . . ),
2. optimize the implementation with respect to a given cost function (e.g., size of the buffers required to hold intermediate signals versus execution time, satisfaction of a constraint on the maximumexecution time, . . . ),
A.2 State
Most models of computation include components with state, where behavior is given as a sequence of state transitions. In order to formalize this notion, let us consider a process F that is functional with respect to partition (Si; So). Let us assume for the moment that F belongs to a timed system, in which tags are totally ordered. Then for any tuple of signals s, we can define s>t to be a tuple of the (possibly empty) subset of the events in s with tags greater than t. Two input signal tuples r; s 2 Si are in relation EF t (denoted (ri; si) 2 EF t ) if r>t = s>t implies F(r)>t = F(s)>t.
This definition intuitively means that process F cannot distinguish between the “histories” of r and s prior to time t. Thus, if the inputs are identical after time t, then the outputs will also be identical. EF t is an equivalence relation, partitioning the set of input signal tuples into equivalence classes for each t. Following a long tradition, we call these equivalence classes the states of F.
In the hardware community, components with only one state for each t are called combinational, while components with more than one state for some t are called sequential. Note however that the term “sequential” is used in very different ways in other communities.

A.3 Decidability
Components with a finite number of states differ significantly from those with an infinite number of states. For certain infinitestate models (those that are Turing-complete), many desirable properties are undecidable—they cannot be determined in a fi-nite amount of time for all systems. These properties include whether a system will need more memory than is available, whether a system will halt, and how fast a system will run.
Hopcroft and Ullman [9] discuss these issues at length. Undecidability is not an insurmountable barrier, and decidability is not sufficient to answer all questions in practice (e.g., because the required run-time may be prohibitive). Many successful systems have been designed using undecidable languages (i.e., those in which questions about some programs are undecidable). Although no algorithm can solve an undecidable
problem for all systems, algorithms exist that can solve them for most systems. Buck’s Boolean Dataflow scheduler [10], for example, can answer the halting and bounded memory problems for many systems specified in a Turing-complete dataflow model, although it does, necessarily, fail to reach a conclusion for some systems.
The non-terminating nature of embedded systems opens the possibility of using infinite time to solve certain undecidable problems. Parks’ [11] scheduler, for example, will execute a potentially infinite-state system forever in bounded memory if it is possible to do so. However, it does not answer the question of how much memory is needed or whether the program will eventually halt. The classical von Neumann model of computation4 is a familiar model of sequential behavior. A memory stores the state and a processor advances the state through a sequence of memory operations. Most commonly-used programming languages (e.g., C, C++, Lisp, Pascal, FORTRAN) use this model of computation.
Often, the memory is viewed as having an unbounded number of finite-valued words, which, when coupled with an appropriate choice of processor instructions, makes the model Turing complete5. Modern computer systems make this model practical by simulating unbounded memory with an elaborate hierarchy (registers, cache, RAM, hard disk). Few embedded systems, however, can currently afford such a scheme.

A.4 Concurrency and Communication
While sequential or combinational behavior is related to individual processes, embedded systems will typically contain several coordinated concurrent processes. At the very least,
such systems interact with an environment that evolves independently, at its own speed. But it is also common to partition the overall model into tasks that also evolve more or less independently, occasionally (or frequently) interacting with one another.
Communication between processes can be explicit or implicit. In explicit communication, a sender process informs one or more receiver processes about some part of its state. In implicit communication, two or more processes share a common notion of state.
Time plays a larger role in embedded systems than in classical computation. In classical transformational systems, the correct result is the primary concern—when it arrives is less important (although whether it arrives, the termination question, is important).
By contrast, embedded systems are usually real-time systems, where the time at which a computation takes place can be more important than the computation itself. As we discussed above, different models of time become different order relations on the set of tags T in the tagged signal model. Recall that in a timed system T is totally ordered, while
in an untimed system T is only partially ordered. Implicit communication generally requires totally ordered tags, usually identified with physical time.
The tags in a metric-time system have the notion of a “distance” between them, much like physical time. Formally, there exists a partial function d : T _ T ! R mapping pairs of
tags to real numbers such that d(t1; t2) = 0 , t1 = t2, d(t1; t2) = d(t2; t1) and d(t1; t2) + d(t2; t3) >= d(t1; t3).
A discrete-event system is a timed system where the tags in each signal are order-isomorphic with the integers (for a two-sided system) or the natural numbers (for a one-sided system) [8]. Intuitively, this means that any pair of ordered tags has a finite number of intervening tags.
Two events are synchronous if they have the same tag. Two signals are synchronous if each event in one signal is synchronous with an event in the other signal and vice versa. A system is synchronous if every signal in the system is synchronous with every other signal in the system. A discrete-time system is a synchronous discrete-event system. Synchronous/reactive languages (see e.g. [5]) are synchronous in exactly this sense. The set of tags in a behavior of the system denotes a global “clock” for the system. Every signal conceptually has an event at every tag, although in some models this event could have a value denoting the absence of an event (called bottom). At each clock tick, each processmaps input values to output values. If cyclic communication is allowed, then some mechanismmust be provided to resolve or prevent circular dependencies. One possibility is to constrain the output values to have tags corresponding to the next tick. Another possibility (all too common) is to leave the result unspecified, resulting in nondeterminacy (or worse, infinite computation within one tick). A third possibility is to use fixed-point semantics, where the behavior of the system is defined as a set of events that satisfy all processes.
Concurrency in physical implementations of systems occurs through some combination of parallelism, having physically distinct computational resources, and interleaving, sharing of a common physical resource. Mechanisms for achieving interleaving vary widely, ranging from operating systems that manage context switches to fully-static interleaving in which concurrent processes are converted (compiled) into a single nonconcurrent process. We focus here on the mechanisms used to manage communication between concurrent processes. Parallel physical systems naturally share a common notion of time, according to the laws of physics. The time at which an event in one subsystem occurs has a natural ordering relationship with the time at which an event occurs in another subsystem. Physically interleaved systems also share a natural common
notion of time. Logical systems, on the other hand, need a mechanism to explicitly share a notion of time. Consider two imperative programs interleaved on a single processor under the control of time-sharing operating system. Interleaving creates a natural ordering
between events in the two processes, but this ordering is generally unreliable, because it heavily depends on scheduling policy, system load and so on. Some synchronization mechanism is required if those two programs need to cooperate. More generally, in logically concurrent systems, maintaining a coherent global notion of time as a total order on events, can be extremely expensive. Hence in practice this is replaced whenever possible with an explicit synchronization, in which this total order is replaced by a partial order. Returning to the example of two processes running under a time-sharing operating system, we take precautions to ensure an ordering of two events only if the ordering of these two events matters.
A variety of mechanisms for managing the order of events, and hence for communicating information between processes, has arisen. Some of the most common ones are: _ Unsynchronized In an unsynchronized communication, a producer of information and a consumer of the information are not coordinated. There is no guarantee that the consumer reads valid information produced by the producer, and there is no guarantee that the producer will not overwrite previously produced data before the consumer reads the data. In the tagged-signal model, the repository for the data is modeled as a process, and the reading and writing events have no enforced ordering relationship between their tags.
_ Read-modify-write Commonly used for accessing shared data structures, this strategy locks a data structure between a read and write from a process, preventing any other accesses. In other words, the actions of reading, modifying, and writing are atomic (indivisible).
In the tagged-signal model, the repository for the data is modeled as a process where events associated with this process are totally ordered (resulting in a g lobally partially
ordered model). The read-modify-write is modeled as a single event.
_ Unbounded FIFO buffered This is a point-to-point communication strategy, where a producer generates a sequence of data tokens and consumer consumes these tokens, but only after they have been generated.
In the tagged-signalmodel, this is a simple connection where the signal on the connection is constrained to have totally ordered tags. The tagsmodel the ordering imposed by the FIFO model. If the consumer implements blocking reads, then it imposes a total order on events at all its input signals. This model captures essential properties of both Kahn process networks and dataflow [13].
_ Bounded FIFO buffered In this case, the data repository is modeled as a process that
imposes ordering constraints on its inputs (which come from the producer) and the outputs (which go to the consumer).
Each of the input and output signals are internally totally ordered. The simplest case is where the size of the buffer is one, in which case the input and output events must be interleaved so that each output event lies between two input events. Larger buffers impose a maximum difference (often called synchronic distance) between the number of input and output events.
Note that some implementations of this communication mechanism may not really block the writing process whenthe buffer is full, thus requiring some higher level of flow control to ensure that this never happens, or that it does not cause any harm.
_ Rendezvous In the simplest form of rendezvous, implemented for example in Occam and Lotos, a single writing process and a single reading process must simultaneously be at the point in their control flow where the write and the read occur. It is a convenient
communication mechanism, because it has the semantics of a single assignment, in which the writer provides the right-hand side, and the reader provides the left-hand side.
In the tagged-signal model, this is imposed by events with identical tags [8]. Lotos offers, in addition, multiple rendezvous, in which one among multiple possible communications
is non-deterministically selected. Multiple rendezvous is more flexible than single rendezvous, because it allows the designer to specify more easily several “expected” communication ports at any given time, but it is very difficult and expensive to implement correctly.
Of course, various combinations of the above models are possible. For example, in a partially unsynchronized model, a consumer of data may be required to wait until the first time a producer produces data, after which the communication is unsynchronized.
The essential features of the concurrency and communication styles described above are presented in Table I. These are distinguished by the number of transmitters and receivers (e.g., broadcast versus point-to-point communication), the size of the communication
buffer, whether the transmitting or receiving process may continue after an unsuccessful communication attempt (blocking reads and writes), and whether the result of each write
can be read at most once (single reads).

B. Common Models of Computation
We are nowready to use the scheme developed in the previous Section to classify and analyze several models of computation that have been used to describe embedded systems. We will consider issues such as ease of modeling, efficiency of analysis (simulation or formal verification), automated synthesizability, optimization space versus over-specification, and so on.
B.1 Discrete-Event
Time is an integral part of a discrete-eventmodel of computation. Events usually carry a totally-ordered time stamp indicating the time at which the event occurs. A DE simulator usually maintains a global event queue that sorts events by time stamp. Digital hardware is often simulated using a discrete-event approach. The Verilog language, for example, was designed as an input language for a discrete-event simulator. The VHDL language also has an underlying discrete-event model of computation. Discrete-event modeling can be expensive—sorting time stamps can be time-consuming. Moreover, ironically, although discrete-event is ideally suited to modeling distributed systems, it is very challenging to build a distributed discrete-event simulator.
The global ordering of events requires tight coordination between parts of the simulation, rendering distributed execution difficult. Discrete-event simulation is most efficient for large systems with large, frequently idle or autonomously operating sections. Under discrete-event simulation, only the changes in the system need to be processed, rather than the whole system. As the activity of a system increases, the discrete-event paradigm becomes less efficient because of the overhead inherent in processing time stamps.
Simultaneous events, especially those arising from zero-delay feedback loops, present a challenge for discrete-eventmodels of computation. In such a situation, events may need to be ordered, but are not.
Consider the discrete-event system shown in Figure 3. Process B has zero delay, meaning that its output has the same time stamp as its input. If process A produces events with the same time stamp on each output, there is ambiguity about whether B or C should be invoked first, as shown in Figure 3(a). Suppose B is invoked first, as shown in Figure 3(b). Now, depending on the simulator, C might be invoked once, observing both input events in one invocation, or it might be invoked twice, processing the events one at a time. In the latter case, there is no clear way to determine which event should be processed first. The addition of delta delay makes such nondeterminacy easier to prevent, but does not avoid it completely. It introduces a two-level model of time in which each instant of time is broken into (a potentially infinite number of) totally-ordered delta steps. The simulated time reported to the user, however, does not include delta information. A “zero-delay” process in this model actually has delta delay. For example, Process B would have delta delay, so firing A followed by B would result in the situation in Figure 3(c). The next firing of C will see the event
from A only; the firing after that will see the (delay-delayed) event from B.
Other simulators, including the DE simulator in Ptolemy [14], attempt to statically analyze data precedences within a single time instant. Such precedence analysis is similar to that done in synchronous languages (Esterel, Lustre, and Signal) to ensure that simultaneous events are processed deterministically. It determines a partial ordering of events with the same time stamp by examining data precedences. Adding a feedback loop from Process C to A in Figure 3 would create a problem if events circulate through the loop without any increment in time stamp. The same problem occurs in synchronous languages, where such loops are called causality loops. No precedence analysis can resolve the ambiguity. In synchronous languages, the compilermay simply fail to compilesuch a program. Some discrete-event simulators will execute the program nondeterministically, while others support tighter control over the sequencing through graph annotations

Communicating Finite State Machines

Finite State Machines (FSMs) are an attractive model for embedded systems. The amount of memory required by such a model is always decidable, and is often an explicit part of its specification. Halting and performance questions are always decidable since each state can, in theory, be examined in finite time. In practice, however, this may be prohibitively expensive. A traditional FSM consists of:
_ a set of input symbols (the Cartesian product of the sets of values of the input signals),
_ a set of output signals (the Cartesian product of the sets of values of the output signals),
_ a finite set of states with a distinguished initial state,
_ an output function mapping inputs and states to outputs, and
_ a next-state function mapping inputs and states to (next) states.
The input to such a machine is a sequence of input symbols, and the output is a sequence of output symbols.
Traditional FSMs are good for modeling sequential behavior, but are impractical for modeling concurrency or memory because of the so-called state explosion problem. A single machine mimicking the concurrent execution of a group of machines has a number of states equal to the product of the number of states of each machine. A memory has as many states as the number of values that can be stored at each location raised to the power of the number of locations. The number of states alone is not always a good indication of complexity, but it often has a strong correlation. Harel advocated the use of three major mechanisms that reduce the size (and hence the visual complexity) of finite automata for modeling practical systems [15]. The first one is hierarchy, in which a state can represent an enclosed state machine.
That is, being in a particular state a has the interpretation that the state machine enclosed by a is active. Equivalently, being in state a means that the machine is in one of the states enclosed by
a. Under the latter interpretation, the states of a are called “or states.” Or states can exponentially reduce the complexity (the number of states) required to represent a system. They compactly describe the notion of preemption (a high-priority event suspending or “killing” a lower priority task), that is fundamental in embedded control applications. The second mechanism is concurrency. Two or more state machines are viewed as being simultaneously active. Since the system is in one state of each parallel state machine simultaneously, these are sometimes called “and states.” They also provide
a potential exponential reduction in the size of the system representation.
The third mechanism is non-determinism. While often nondeterminism is simply the result of an imprecise (maybe erroneous) specification, it can be an extremely powerful mechanism to reduce the complexity of a system model by abstraction.
This abstraction can either be due to the fact that the exact functionality must still be defined, or that it is irrelevant to the properties currently considered of interest. E.g., during verification of a given system component, other components can be modeled as non-deterministic entities to compactly constrain the overall behavior. A system component can also be described non-deterministically to permit some optimization during the implementation phase. Non-determinism can also provide an exponential reduction in complexity. These three mechanisms have been shown in [16] to cooperate
synergistically and orthogonally, to provide a potential triple exponential reduction in the size of the representation with respect to a single, flat deterministic FSM6. Harel’s Statecharts model uses a synchronous concurrency model (also called synchronous composition). The set of tags is a totally ordered countable set that denotes a global “clock” for the system. The events on signals are either produced by state tran sitions or inputs. Events at a tick of the clock can trigger state transitions in other parallel state machines at the same clock. Unfortunately, Harel left open some questions about the
semantics of causality loops and chains of instantaneous (same tick) events, triggering a flurry of activity in the community that has resulted in at least twenty variants of Statecharts [17]. Most of these twenty variants use the synchronous concurrency model. However, for many applications, the tight coordination implied by the synchronous model is inappropriate. In response to this, a number of more loosely coupled asynchronous FSM models have evolved, including behavioral FSMs [18], SDL process networks [18], and codesign FSMs [19].
A model that is closely related to FSMs is Finite Automata. FAs emphasize the acceptance or rejection of a sequence of inputs rather than the sequence of output symbols produced in response to a sequence of input symbols. Most notions, such as composition and so on, can be naturally extended from one model to the other. In fact, any of the concurrency models described in this paper can be usefully combined with FSMs. In the Ptolemy project [14], FSMs are hierarchically nested with dataflow, discrete-event, or synchronous/reactive models [20]. The nesting is arbitrarily deep and can mix concurrency models at different levels of the hierarchy. This very flexible model is called “*charts,” pronounced “star charts,” where the asterisk is meant to suggest a wildcard. Control Flow Expressions (CFEs, [21]) have been recently proposed to represent the control flow of a set of operations in a cycle-based specification language. CFEs are an algebraic model extending Regular Expressions [9] and can be compiled into FSMs that can be used in the synthesis of a control unit.
B.3 Synchronous/Reactive
In a synchronous model of computation, all events are synchronous, i.e., all signals have events with identical tags. The tags are totally ordered, and globally available. Simultaneous events (those in the same clock tick) may be totally ordered, partially ordered, or unordered, depending on the model of computation. Unlike the discrete-event model, all signals have events at all clock ticks, simplifying the simulator by requiring no sorting. Simulators that exploit this simplification are called cyclebased or cycle-driven simulators. Processing all events at a given clock tick constitutes a cycle. Within a cycle, the order in which events are processed may be determined by data precedences, which define microsteps. These precedences are not allowed to be cyclic, and typically impose a partial order (leaving some arbitrary ordering decisions to the scheduler). Cyclebased models are excellent for clocked synchronous circuits, and have also been applied successfully at the system level in certain signal processing applications.
A cycle-based model is inefficient for modeling systems where events do not occur at the same rate in all signals. While conceptually such systems can be modeled (using, for example, special tokens to indicate the absence of an event), the cost of processing such tokens is considerable. Fortunately, the cyclebased model is easily generalized to multirate systems. In this case, every nth event in one signal aligns with the events in another. A multirate cycle-based model is still somewhat limited. It is an excellent model for synchronous signal processing systems where sample rates are related by constant rationalmultiples, but in situations where the alignment of events in different signals is irregular, it can be inefficient. The more general synchronous/reactive model is embodied
in the so-called synchronous languages [22]. Esterel [23] is a textual imperative language with sequential and concurrent statements that describe hierarchically-arranged processes. Lustre [24] is a textual declarative language with a dataflow flavor and a mechanism for multirate clocking. Signal [25] is a textual relational language, also with a dataflow flavor and a more powerful clocking system. Argos [26], a derivative of Harel’s Statecharts [27], is a graphical language for describing hierarchical finite state machines. Halbwachs [5] gives a good summary of this group of languages.
The synchronous/reactive languages describe systems as a set of concurrently-executing synchronized modules. These modules communicate through signals that are either present or absent in each clock tick. The presence of a signal is called an event, and often carries a value, such as an integer. The modules are reactive in the sense that they only perform computation and produce output events in instants with at least one input event. Every signal in these languages is conceptually (or explicitly) accompanied by a clock signal, which has meaning relative to other clock signals and defines the global ordering of events. Thus, when comparing two signals, the associated clock signals indicate which events are simultaneous and which precede or follow others. In the case of Signal and Lustre, clocks have complex interrelationships, and a clock calculus allows a compiler to reason about these ordering relationships and to detect inconsistencies in the definition. Esterel and Argos have simpler clocking schemes and focus instead on finite-state control. Most of these languages are static in the sense that they cannot request additional storage nor create additional processes while running. This makes them well-suited for bounded and speedcritical embedded applications, since their behavior can be extensively
analyzed at compile time.
This static property makes a synchronous program finite-state, greatly facilitating formal
verification. Verifying that a synchronous program is causal (noncontradictory and deterministic) is a fundamental challenge with these languages. Since computation in these languages is delayfree and arbitrary interconnection of processes is possible, it is possible to specify a program that has either no interpretation (a contradiction where there is no consistent value for some signal) or multiple interpretations (some signal has more than one consistent value). Both situations are undesirable, and usually indicate a design error. A conservative approach that checks for causality problems structurally flags an unacceptably large number of programs as incorrect because most will manifest themselves only in unreachable program states. The alternative, to check for a causality problem in any reachable state, can be expensive since it requires an exhaustive check of the state space of the program. In addition to the ability to translate these languages into finite-state descriptions, it is possible to compile these languages directly into hardware. Techniques for translating both Esterel [28] and Lustre [29] into hardware have been proposed. The result is a logic network consisting of gates and flip-flops that can be optimized using traditional logic synthesis tools.
To execute such a system in software, the resulting network is simply simulated. The technique is also the basis to perform more efficiently causality checks, by means of implicit state space traversal techniques [30].