Glossary

Some Useful Terminology

Author:

Alan G. Isaac

Organization:

Department of Economics, American University

Contact:

aisaac@american.edu

Date:

2023-07-14

Activity

A simple activity is a sequential flow of actions conceived as a single unit. (Abstractly, an activity may be to execute action \(x\), then execute action \(y\), and finally execute action \(z\).) When an activity is naturally conceptualized as belonging to an agent, this course calls it a behavior.

In a programming language, an activity might be implemented as a code block, a subroutine, or a procedure. An activity diagram is a graphical representation of an activity.

Activity Diagram

This course often illustrates action flows in a simulation by means of simplified activity diagrams. These are roughly based on the UML activity-diagram notation, which in turn closely resembles traditional flow-chart notation. In these dagrams, a rectangle with rounded corners represents an action, an arrow from an action points to the next action, and a diamond shape represents the beginning or the end of a conditional branch. (An initial diamond is a decision node; a terminal diamond is a merge node.) See the UML appendix for more details.

Algorithm

An algorithm is a procedure that reliably performs a well-specified task in a finite number of steps. An algorithm that produces and output can be compared to a function. Whereas a function specifies the relationship between inputs and output, and algorithm specifies the steps involved in transforming inputs into outputs. Many different algorithms may be able to implement a single function. (For example, the factorial function may be implemented by means of recursion or iteration.)

Assertions and Exceptions

A runtime assertion requires that a program pass a test to continue running. A runtime exception signals that a program has encountered a situation that needs special handling. If special handling is not present or fails, the program will stop running.

The test for a runtime assertion is a boolean expression; it always returns either true or false. An assertion states that the programmer believes that the expression (where it is placed) will always evaluate to true. If the test fails, this indicates that the programmer’s logic has failed, and that the code needs debugging. A common practice is to use assertions during code development but to disable them in released code.

Associative Array

An associative array is a collection of key-value pairs. Each key occurs once. The values are associated with the keys. Effectively, a key serves as an index for its value. Since this is similar to how actual dictionaries use words to index definitions, an associative array is often called a dictionary.

As an abstract data type, an associative array must support certain operations. Most fundamental is the lookup operation: accessing a value by means of a key. Associative arrays are typically mutable: key-value pairs can be added or removed, and the value associated with a key can be changed (reassigned).

For example, an associative array might use artist names as keys and associate to each name a list of known works. A new artist may be added to the array, perhaps with an initially empty list of works. An artist’s creation of a new work requires that the same name be associated with a longer list of works.

The term hash table is often used synonymously with the term associative array. However, the term hash table more correctly denotes a particular implementation of the associative array abstract data type, which is designed for a particularly speedy lookup operation.

Cobweb Plot

We will use the term ‘cobweb plot’ for a line plot that traces a simple Manhattan path between adjacent points: a single vertical segment, and then a single horizontal segment. This is sometimes called a “Verhulst diagram”, (For more details, see the Wikipedia article on the Cobweb Plot.)

Code Block

A code block is a unit of execution. Roughly, an entire code block can abstractly be considered a single statement. An entire program may be considered to be a large code block that contains smaller code blocks. The prime example is the body of a function. Other examples may include the body of a looping construct and the body of an if statement. In many languages, variables introduced in an inner code block cannot be seen in outer code blocks. Such languages give variables block scope.

Command Shell

When using a computer, users typically use peripherals (such as a keyboard, mouse, or touchscreen) to provide instructions to a software application. The application communicates with the operating system, which communicates with the computer and its peripherals. Sometimes the user wants fairly direct access to the operating system’s services, in order to perform a low-level task like renaming or deleting a file. A shell is a software application the provides such low-level services. A command shell is a software application the provides this low-level access by means of a command-line interface (CLI). Use of a CLI requires learning a collection of commands that can be entered, typically with a keyboard, into the command line of the shell. Examples include OSX Terminal, GNU bash, and Windows Powershell.

Conditional Branching

A condition is an expression that always evaluates to true or false. (Such expressions are binary or boolean.) A branch is a code block that may be executed or skipped. The term conditional branching indicates that the execution of a branch depends on the value of a condition. Programming languages typically provide branching constructs that allow us to implement conditional branching.

Programmers often want to determine whether to execute a block of code based on a condition. For example, whether an absolute-value function changes the value of its input depends on whether that value is negative. As another example, we may wish for the iteration phase of a simulation to continue only if a stopping criterion has not been met. In such cases, we may use some kind of conditional branching.

Conditional Expression

We evaluate a computational expression it order to produce a useful value. A conditional expression has a value that depends on a condition. This value is produced by evaluating one expression if the condition is true and another if the condition is false. Crucially, languages that support conditional expressions do not evaluate the bypassed expression. Outside of the context of any particular language, this course uses the tradition question-mark/colon syntax to represent a conditional expression. For example, \((x \neq 0) ? \sin[x] / x : 1\) evaluates to \(\sin[x] / x\) if \((x \neq 0)\) but otherwise evaluates to \(1\). This introduces a ternary operator: it consumes three inputs. This first input is a boolean value (produced by a condition), and the other two are the values produced if the condition is satisfied or not. Of these last two inputs, most programming languages will evaluate only the one needed to evaluate the conditional expression.

if statement

The most familiar conditional construct is probably the if statement, which renders the execution of a code block dependent on the value of a boolean condition. The dependent code block will be executed if the condition evaluates to true, but otherwise it will not be executed. Often we also supply an alternative code block which is executed if the condition evaluates to false. This alternative is often called the else clause.

More complex variants are possible. See https://en.wikipedia.org/wiki/Conditional_%28computer_programming%29

Command–Query Separation

Command–query separation (CQS) is the principle that a subroutine that returns a value should not produce a change in state. This principle is sometimes called command–query responsibility separation (CQRS). A query returns a value and does not have any side effects. A pure function is thus a query where all inputs are explict. A command causes a change in state; correspondingly, it should not return a value.

CQS is most commonly implemented by convention, with varying degrees of language support. Even languages that largely honor CQS, typically make a few exceptions. For example, a mutable list make have a pop operation that returns the last item of the list and removes it from the list.

Comment

A well-written program includes comments, which are human-readable bits text that explain what the surrounding code does and how. A comment is not part of the source code, even though it is in the same file.

Programming languages adopt different syntaxes to indicate text to ignore (i.e., comments). Most languages allow for line comments, which most often start with a language-specific comment delimiter and continue to the end of a line. Some languages also provide for block comments, which may span several lines.

A good program comment:

  • Does not simply restate the code.

  • Should dispel possible misunderstanding of the code.

  • May link to a source of copied code or an explanation of an algorithm.

  • May indicate a need for additional work or refactoring.

Data Type

A data type provides a way for a user to make use of data by means of well-defined operations on the data. An abstract data type (ADT) is an abstract (e.g., mathematical) description of these behaviors. An abstract data type is pure if it provides no operations to mutate its data. Given a pure data type, a user can retrieve information about the data but cannot change the data. An abstract data type defines

  • a collection of data

  • a set of operations on this collection

  • a set of rules governing the effects of the operations

While an abstract data type provides an abstract (e.g., mathematical) characterization of the behaviors of a data structure, a data structure defines a concrete implementation an ADT. It must include details about data storage and data operations that are hidden behind the abstract data type. For example, whereas an ADT might specify the concept of an indexed collection of items and the behavior of retrieving an item by index, the data structure would specify how the items are laid out in memory and how the retrieval behavior is correspondingly to be implemented. Roughly, a data structure specifies how an ADT can be implemented as a concrete data type. Or we might say that an ADT is an abstraction of a data structure that focuses on the behaviors seen through a simple interface.

Different data structures may implement the same ADT. A data structure for a pure ADT should be stateless.

A data type is a concrete implementation of a data structure in a programming language. Unfortunately, the naming of data types varies widely across languages, as does the variety of data types provided by the language. In addition to simple data types (e.g., number types), most languages provide some structured data types (which hold collections of other data types). Most languages supply at least one structured data type where elements can be retrieved by supplying an integer index.

In computer programming, a data type determines how a value is handled during program execution. Most programming languages support a simple type typically used to represent truth values values, a simple type typically used to represent integers, and a simple type typically used to represent real numbers. For convenience, we will refer to these data types as Boolean, Integer, and Real. (For more detail, see the Wikipedia article on data types.

Instances of data type are identified only by their value. If two data types have the same value, the instances are considered identical. A data type can contain attributes, but only to support the modeling of structured data types. Instances of structured data types are considered the same if the following conditions are true:

The structure of the data types are identical The values of the corresponding attributes are identical

As the following figure illustrates, a data type artifact is displayed as a rectangle that contains the name of the data type. The rectangle also contains the stereotype «data type» and the data type icon.

A Boolean data type takes on one of two values, often called true or false. An Integer data type represents only integer values. A Real data type can additionally represent fractional values. Some languages (like Javascript or NetLogo) do not have a separate integer type.

A simple data type and a structured data type each have an associated data structure and thereby an associated ADT. The ADT for integers may describe an infinite set of integers and the operations they support (e.g., addition or multiplication). The data structure for an integer type will have to deal pragmatically with the constraints of implementation. Most data structures for integers support a (large but) finite number of integer values. However it is relatively uncommon to refer to the ADT or the data structure of a simple data type, such as an Integer or a Boolean.

Languages have very different rules for handling data types. In some languages, each identifier (i.e., variable name) has an asscociated type. The assignment of different type of value to that variable is forbidden. Such languages are statically typed. In contrast, languages like NetLogo and Python are dynamically typed: we can assign any type of value that we wish to any variable at any time. In dynamically typed languages, constancy of the type of value assigned to a variable is not required by the language.

Debugging

It is difficult to write bug-free code of any complexity. Debugging is therefore part of any programmer’s life. Unfortunately, there is no set process for writing code without bugs or for finding bugs when they occur. However, we can suggest a few useful practices.

Scaffolding

See the entry on scaffolding.

Logging

Logging is similar to print-based scaffolding: results and error checks can be “logged” by printing them to a log file. Unlike scaffolding, loggers often remain even in the completed code.

Testing

It is such a good idea to test your scripts as you develop them that some approaches to development (such as test-driven development) require that implementations and tests be developed together. As you remove from your code scaffolding that provides useful tests, consider turning them into tests rather than discarding them. Tests often reside in a separate module.

Bisection

No matter how many assertions, loggers, tests, and contract enforcements you include in your code, occasionally there is no substitute for a brute-force bug search. At this point is is important to try to isolate the bug in a systematic way. If you know there is a bug in your script but you are not sure where, try to divide the script roughly in half and place a bug test at this midpoint. If you do this correctly it will determine which half of the script contains the bug, and you can then repeat this “bisection” process. This is a very effective way to find bugs quickly even in complex scripts.

Assertions

If you believe that as a matter of program logic a certain condition should never arise in your script, you may wish to assert this by raising a runtime error if it occurs. This is a simple and effective way to check for a bug in your program logic. Assertions may be present throughout your code, especially when a language provides a good mechanism for turning them off. (See the entry on Assertions and Exceptions.)

Runtime Errors as Contract Enforcement

It is quite proper to insist that the arguments provided to a command procedure or a function match those permitted in the documentation. This is not a matter of program logic but rather of proper use. Programmers should be reluctant to raise runtime errors, which are an annoyance and an inconvenience to users of our code, but sometimes there is no good alternative. That is, we need to inform users that our code does not accommodate the incorrect uses they make of it. (These users include ourselves.) Be sure to provide a helpful error message in this case.

Decomposition

Composition is the combination of simpler parts into a complex whole. Decomposition is the breaking up of a complex whole into simpler component parts. (Decomposition is sometimes called factoring.)

A program is a complex whole. Many programs are exceedingly complex. As an aid to understanding and to ease maintenance, programmers decompose complex programs into simpler to understand parts.

For example, we may restate a complex algorithm as a sequence of easily understood subroutines. We may decompose a complex subroutine into a sequence of calls to much simpler subroutines. When an existing subroutine proves too unweildy or hard to understand, we look for new ways to decompose it. This one type of refactoring.

Good programmers anticipate complexity at the design stage of a program, which faciliates the design a complex program as the interaction between useful, simple components. A good program design will generally be based on reusable components that have limited and transparent dependencies on each other. (They are loosely coupled.) Appropriate decomposition is essential to good programming.

Decoupling

Two pieces of code are “coupled” to the extent that one piece must know something about the other. The deeper this required knowledge, the tighter the coupling.

Flexible code is only loosely coupled. Coupling is loose if the amount of knowledge is so little that we can make substantial changes in the first piece of code without changing the second piece. We achieve loose coupling by enforcing ignorance about the inner workings of a piece of code, so that we must only on a limited public interface. For example, if our code accesses a function only by means of its name and the function arguments, then we can refactor the function body as needed without changing any of the code that depends on it. (For example, we may have discovered a more efficient way to implement the function’s algorithm.)

In software engineering, the term decoupling refers to refactoring code so that it becomes more loosely coupled. One approach to achieving loose coupling is to depend only on pure functions when this is feasible.

Delegation

Delegation is a programming strategy for reusing code. Delegation uses the “has-a” reference relationship to produce behavior by passing messages to another object. The idea is that one type of object delegates behavior to another type of object. For example, consider creating an agent that needs an alert every ten minutes. One might build that behavior into the agent, or the agent might delegate the task of generating that alert to a timer object. In this case, the agent is the delegator, and the timer is the delegate. The second approach requires creating a timer object. But it also allows other (perhaps unforeseen) types of objects to also delegate timing behavior to a timer object.

Discrete-Time Modeling and Simulation

In discrete-time simulation, conceptual time advances by a fixed-size discrete amount, often called a tick. A tick is an abstract measure, which may correspond conceptually to any real-world period (such as a second, a day, or a year). Discrete-time simulations typically include an activity that runs a single update of the entire system each tick. This system update is the simulation step, which encompasses the simulation schedule.

DRY

DRY is an initialism for Don't Repeat Yourself. This is commonly an exhortation to programmers to replace repetitive code with subroutine calls, whenever possible. This reduces code duplication and promotes code reuse. Following the DRY principle has code-maintenance advantages, ensuring that a single change in design or implementation does not need to be made in multiple locations (some of which might be overlooked). For more information, see the Wikipedia entry on DRY.

Evolution Rule

A dynamic model is governed by an evolution rule, which determines how the modeled system moves from its present state to its next state. For example, in a population model, the evolution rule might determine next year’s population from this year’s population. In system-dynamics modeling, evolution rules are sometimes called equations of motion.

Execution

To execute computer code is to cause a computer to follow the instructions in the code. To run the code is synonymous.

Execution Context

We run a program in a particular execution context, which includes the operating system and any libraries provided by the programming language in use. As a program runs, it typically makes subroutine calls, which cause subroutines to execute. Subroutines run in an environment that is determined by the overall program. This course considers all aspects of the runtime environment that affect the subroutine to constitute execution context (or more briefly, the context) of the subroutine calls. For example, the subroutine may depend on the value of a global variable, which is determined as the program runs.

Roughly, the execution context of a subroutine call is the environment in which the subroutine runs. In agent-based simulations, this environment is often provided by the agent whose behavior it is. The agent who makes the subroutine call determines its context. For example, the willingness or ability of an agent to gamble might depend on its current level of wealth.

Feature

An agent may have structural features (attributes) and behavior features (behaviors). In various computational settings, attributes may be called data attributes, structural features, or properties, and behaviors may be called operations, methods, or behavioral features.

Attributes

At any stage of a simulation, the values of the attributes of an agent constitute the current state of the agent. This is a very general notion. A wealth attribute might store an agent’s wealth as a real number; a friends attribute might store an agent’s social contacts as a list of agents. An attribute is mutable if it can be changed. A mutable attribute need not remain constant during a simulation.

Attribute Access

Some programming languages include objects with attributes that can be accessed by name. The syntax for accessing the value of an attribute varies by language.

Behaviors

The behavioral features of an agent characterize its possible behaviors. This too is a very general notion, but usually the possible behaviors depend on the type of agent. A supply behavior might only be available to entrepreneurial agents, while a demand behavior might only be available to consumer agents. In an object-oriented programming language, a behavioral attribute of an agent may be implemented as a methods of the agent’s class.

In an procedurally-oriented programming language, a behavioral feature of an agent may be implemented as a procedure specific to a type of agent. Our notion of a behavioral feature is indepedent of such considerations. Behavioral features are determined by the conceptual model, not by the software implementation.

File Format

In computing, the term file format refers to the way information is stored in a computer file. This course is primarily concerned with human-readable plain text formats, but images are typically stored in binary formats.

Plain Text

In computing, the term plain text refers to computer files that store information in a particularly simple and potentially human-readable form. A plain text file is essentially a sequence of character codes (e.g., encodings of the characacters used in written languages). Most of the bytes of data stored in plain text files correspond to characters that might be used when producing printed texts. Computer source code files are generally in plain text. Some data exchange formats are plain text, including the popular CSV format.

File Encoding

The source code for computer programs is generally stored as plain text. However, a computer file is fundamentally a collection of bits (“zeros and ones”) that need to be decoded to be useful. A file encoding is a specification of how information is stored in the file. This specification allows use to extract the information from the file (e.g., to be able to read it as a collection of characters).

The meaning of plain text has changed over time. By the early 21st century, plain text came to refer to files with a Unicode-based encoding (such as UTF-8 or UTF-16). Unicode supports all of the characters used in European languages, most of the needs of most other languages, and adds many specialized characters (such as mathematical characters). For more details, see the Wikipedia entry on plain text.

For source code files, the Unicode standard UTF-8 encoding is in widest use. It is often just called Unicode. All modern programming editors and languages provide UTF-8 support, and beginning programmers should not have to worry about file encodings.

CSV Format

Data is often exchanged using the comma-separated values (CSV) format, which is an internationally recognized data-exchange format (specified by RFC4180). It is supported by spreadsheets, statistical packages, and many programming languages.

Each row of a CSV file is a record composed of comma separated values. To see the commas, you must open the file in a text editor, not in a spreadsheet. CSV is a plain text format, which any text editor can open and display. However, spreadsheets can open CSV files and give them a special display. If you open the file in a spreadsheet, the comma-separated values appear in separate cells.

Each record shares the same fields, so a column of values represents the values of a single field across all the records. Typically, the first line of the file is a header line, which lists the names of the fields. Variations on this basic description are often nevertheless called CSV files. For more information, see the Wikipedia entry for comma-separated values.

Filename Extension

Filenames often include a part known as a filename extension. (Extensions may be necessary on some operating systems.) Typically the extension is the last 3 or 4 alphanumeric characters of the name, following a period. The extension should be a kind of metadata: it should provide information about how information is stored in the file. For example, a filename that ends in .txt should signal that the file contains plain text, while a filename that ends in .csv should signal that the file contains plain text that is in CSV format.

Although the use of filename extensions to provide metadata is usually optional, some operating systems handle files differently based on their extension. Furthermore, in some versions of the Windows operating system, the default behavior of the GUI file manager is not to display known filename extensions. Aside from the most casual computer users, most users gain information by choosing to view these extensions. (The web is full of advice of how to enable this behavior. In Windows 10, go to the View tab of the File Explorer and check the File name extensions box.)

Fixed Point

A fixed point of a function \(f\) is a value \(x\) such that \(x = f[x]\). That is, the output value is the same as the input value. (For a brief, elementary overview of the concept of a function, see Chapter 3 of [Feldman-2012-OxfordUP].)

FRED

The Federal Reserve Economic Data (FRED) provides a rich online database that is a particularly good source for U.S. macroeconomic data.

Function

This course uses the term function for any callable subroutine that explicitly returns a value. (Contrast with a command procedure, which does not return a value.) A function is much like a spreadsheet formula: it uses existing values to produce a new value. Think of a functions as transforming valid inputs into outputs. [1]

Mathematics provides the basic notion of a function as a mapping of input values to output values. Permissible input values are called the domain of the function; the associated output values are calle the range of the function. The terms map and transformation are often used as synonyms for the term function.

In order to define a function, we need to be able to represent the relationship between the inputs and the outputs. To abstractly represent any possible input, we often introduce a named function parameter. For example, the function expression \(x \mapsto x * x\) represents the squaring function. (The special arrow is pronounced maps to.) In this function expression, the name x is the function parameter. It has no meaning outside the function expression; you can replace it with any other name (as long as you do so everywhere).

This squaring function is typically thought of as a mapping from real numbers to real numbers. In this case, the domain of the function is the real numbers. The range of this function (i.e., the associated output values) is also a collection of real numbers. A function that maps real numbers to real numbers is a real function.

Giving a function a name makes it easier to discuss. One common mathematical notation to bind the name \(f\) to the function \(x \mapsto x * x\) is the following: \(f = x \mapsto x * x\). Another common notation is \(f(x) = x * x\), and yet another is \(f[x] = x * x\). The notation (syntax) for defining computational functions has even more variety. Nevertheless, all of these are just different ways of associating the name \(f\) with the squaring function.

Each programming language provides its own syntax for function definition. Like mathematical functions, computational functions transform inputs values into output values. In a computational setting, an input value is typically called an argument, and the output value is called the return value. The type of the arguments that are permissible is called the input type of the function. (For example, an isPrime function may only accept integer arguments.) The type of the values that the function produces as output is called the return type of the function. (For example, an isPrime function may always return a boolean value—true or false.) We apply a function to an argument in order to produce a return value. The syntax for function application varies substantially across programming languages.

Pure Function

In principle, replacing all applications of a pure function with precomputed output values would not change the meaning of the program. A pure function is closed and side-effect free. (This combination of properties ensures that our use of the function satisfies referential transparency: the behavior of a program should not change if we replace each application of the function with its value.)

Closed:

The return value (i.e., the value that the function outputs) depends only on the values of the explicit inputs, and not on any other program state.

Side-effect free:

The function does not modify the environment in which it executes. The function has no side effects, not even printing.

Examples of side effects include changing the value of a nonlocal variable, writing data to a file, or communicating with a computer peripheral. If the execution of a function changes a global variable, the function is not side-effect free, so it is not pure.

If the execution of a function depends on a global variable, the function is not closed, so it is not pure. A pure function communicates with the rest of the program only by accepting arguments (which it does not modify) and returning a value (that depends only on the arguments). In this sense, a function whose output is determined entirely by its arguments is closed to influences from the context in which it runs. [2]

Function Literal

Programmers use the term literal for any source-code syntax for expressing a fixed value. We may roughly say that the value of a literal always has a fixed meaning in the language. A function literal is syntax for representing a function as a fixed value. A function literal expresses a function without naming it, and its value (i.e., the function) may or may not subsequently be assigned to a name. The term immediately invoked function expression (IIFE) refers to a function literal immediately followed by application to an argument.

Higher-Order Function

Functions are considered first-class citizens of a programming language if they are values that can be treated like any other values: assigned to names, passed as arguments, or returned as values from functions. A function is called higher-order if it expects a function as an argument or produces a return value that is a function. Languages that treat functions as first-class citizens usually also provide a syntax for function literals.

Function Closure

A higher-order function may return a function closure, which is a new function that carries along part of the environment in which it was created. Typically the new function definition includes a free variable, whose value is set by the higher-order function. For more information, see the Wikipedia entry on function closures.

Functional Programming

Programming is called declarative when it consists of instructions that tell a computer what outputs to produce rather than how to produce them. Declarative programming is called functional when these instructions are typically bundled together into useful subroutines, are known as functions. Functions produce values that can serve as inputs to other functions. Functional programs most essentially chain together functions in order to produce a value of interest. Functions are the core objects manipulated by a functional program. A purely functional program would have no notion of changes in the state of the computer, not even in the peripherals (such as a monitor or printer). Since such changes in state are often desirable, functional programming language accommodate them (in limited ways). Many languages support some functional programming features, including Python and NetLogo.

Function Iteration

To iterate is to repeat an action. A function iteration repeatedly applies a function to produce a sequence of values, each time using the function’s output (from the previous iteration) as the new input (for the current iteration). Given a function \(f\) and an initial value \(x_0\), we can repeatedly apply the function to get

\begin{align*} x_1 &= f[x_0] \\ x_2 &= f[x_1] = f[f[x_0]] = f^{\circ 2}[x_0] \\ x_3 &= f[x_2] = f[f[f[x_0]]] = f^{\circ 3}[x_0] \\ &\vdots \end{align*}

Write \(f^{\circ n}\) to denote the the function producing the \(n-\text{th}\) iterate. That is, \(f^{\circ n}[x_0]\) returns the result of iteratively (\(n\) times) applying the function \(f\), starting from an initial value \(x_0\).

Garbage Collection

Lower-level languages often require the programmer to explicitly allocate and deallocate memory for any objects created. Memory leaks are a common problem in such languages, even for experienced programmers. Other languages build in automatic memory management, called automatic garbage collection. This ensures that the memory allocated to an object will be freed up after the object is no longer needed. Automatic garbage collection may involve some sacrifice of speed and memory usage, but the gain in convenience is considerable. Most social-science researchers will value this convenience, and most social-science students will find automatic garbage collection to be indispensable.

Examples of languages that do not build in automatic garbage collection include C and C++. Examples of languages that build in automatic garbage collection include Python, Java, and Haskell. NetLogo runs on the Java virtual machine, which provides automatic garbage collection.

Graphical User Interface

A user interface for a computer application is the point of interaction between a user and the application. The command-line user interface (CLI) typified early computer applications, but CLIs require often require arcane knowledge of the commands that control the application. Nowadays, a typical consumer-oriented application will rely on a voice user interface (VUI) or a graphical user interface (GUI) or both, which are generally designed to be more intuitive and to deman less arcane knowledge from the user. Typically, a user of a GUI application interacts the application by manipulating graphical elements, called widgets.

GUI Widget

A widget (window gadget) is a user-interface element in a graphical user interface (GUI). Creating a collection of useful widgets is called building a GUI. Widgets may be control elements that allow the user to interact with the GUI or display elements that simply display information in graphical or textual formats. We may loosely categorize these as control widgets and display widgets. (For more details and background,

control widgets

Action controls (buttons) and parameter controls (sliders, switches, choosers, and input boxes).

display widgets

Graphical displays (plots) and textual displays (monitors).

slider

set a numeric variable to value in a range

switch

set a boolean variable to true or false

chooser

set a variable to a value in a fixed list

input box

set a variable to a value input by user

IDE

An integrated development environment (IDE) facilitates code development by combining key features needed by programmers. An IDE will include a code editor, which enables easy creation and modification of source code. Usually the editor of an IDE includes syntax highlighting, which displays syntactic information about that source code based on the particular language of the program. (For example, keywords of the language will be specially colored.) The synatx highlighting is provide on the fly by the editor and is not part of the source code.

An IDE usually provides some kind of debugging facilities. Modern IDEs provide a graphical user interface. Some languages are distributed with an IDE; examples include NetLogo and Python. (See the Wikipedia article on IDEs for more details.)

Indexing

Most programming language provide one of more numerically indexed data types. There are two common indexing conventions: zero-based indexing (which starts from 0) and unit-based indexing (which starts from 1).

Zero-based indexing can initially feel odd to those with no programming experience (even though we often use it, e.g., when referring to someone's age or buidling floors). In practice, it proves extremely convenient.

Initialization

Variable initialization is the setting of an initial value for a variable. Object initialization is the setting of an initial state for an object. The particulars of initialization are language dependent. This course uses the term initialization rather loosely to refer to all components of the model setup. For more details,

Iteration Construct

Programming languages differ in their approaches to iteration.

Iterate

The term iterate means to repeat. An iteration is a repeated action. After figuring out how to instruct a computer to perform an action once, it is easy to instruct it to perform the action repeatedly. A common way to accomplish this is with a looping construct.

One common use of iteration is to access the members of a collection. In traditional imperative programming, it was very common to access each member of sequential data structure by repeatedly incrementing an index and then accessing the member designated by that index. This is called iterating over (or iterating through) the collection.

Looping Construct

In most common programming languages, imperative `looping constructs`_ are the most familiar iteration constructs. In a `definite loop`_, the stopping criterion is a fixed number of iterations. (E.g., run the simulation schedule ten times.) Otherwise, the loop is indefinite. With an `indefinite loop`_, the final number of iterations depends on what happens in the body of the loop. (E.g., repeat a fishing-simulation schedule until no fisher is hungry.)

Imperative looping constructs traditionally include for loops and while loops. In such constructs, the code that will be repeatedly executed is called the loop body. Functional looping constructs traditionally include maps and and reductions (also called folds).

A useful looping construct must also allow us to instruct the computer when to stop repeating the action. An oft-repeated joke describes a programmer who cannot stop shampooing, because the instructions are “lather; rinse; repeat.” We may distinguish between definite and indefinite loops. Definite loops set the number of iterations before executing the loop body. Indefinite loops run conditionally, with the final number of iterations depending on what happens in the body of the loop.

Online resource:

https://en.wikipedia.org/wiki/Control_flow#Loops

Declarative Iteration

In imperative programming, the programmer provides an explicit step-by-step description of how to perform an iteration. In declarative programming, the programmer achieves the same result by providing a simple statement of the desired work.

To see the difference, consider the task of constructing a new array from an existing array by applying a function to each array element. (For example, we may wish to square each element of a numerical array.) This is called a mapping of the function over the array. An imperative program would provide explicit instructions for how to construct a new array of the correct length and then fill it, element-by-element, with the correct values. Declarative code for the same problem would simply restate the task description: map this function over the array. The usual name of the declarative construct that does this is, unsurprisingly, map.

Keyword

A reserved word of a programming language cannot be assigned a new value by a user of the language. Roughly speaking, a keyword of a programming language is a builtin part of the behavioral definition of the core language. (So if true and false are boolean literals, we do not consider them to be keywwords, because they specify literal values and not core language behavior.) A keyword looks like an ordinary user-defined identifier, but its meaning is primitive in the core language. We will assume that any keyword is also a reserved word, so that its meaning cannot be redefined.

Numerical Comparison

Numerical comparisons (of equality or relative size) are usually implemented as infix binary boolean relational operators. That is, they make a comparison of two numerical quantitites, and produce a boolen value (e.g., true or false). For example, \(1 < 2\) evaluates to true.

Object Type

When programming, we manipulate different types of objects (e.g., numbers and strings). For program safety, we should only operate of objects in ways that are appropriate to their type. Two ways of ensuring this are static typing and dynamic typing.

Some languages only check the appropriateness of object types while a program is running. This is called dynamic typing. In such languages, the creation of objects includes a type specification, an this is checked when the program manipulates the object. Dynamically type languages often allow a single variable name to be reassigned to objects of different types, since the object’s type is checked at run time.

Other languages impose checks at compile time (i.e., before the program runs). This is called dynamic typing, and it is accomplished by analyzing the code before compiling it. In a statically typed language, roughly speaking, a variable name has a type associated with it, and only objects of that type can be assigned to the name. In some languages, the object type must be specified by the programmer, while other languages rely on code analysis to infer the object type.

In many statically typed languages, type inference is missing or limited, so programs may appear cluttered with type declarations and explicit casts. Some researchers justify this cost with a claim that static typing provides a crucial check for programming errors. Others claim that type errors are rare and dynamic typing is justified by its conveniences. Attempts to resolve this cost-benefit tradeoff have been at the center of many contentious arguments, which we do not address here.

Examples of dynamically typed languages include NetLogo and Python. Examples of statically typed languages include C and C++. Haskell is an example of a statically typed language where type declartions are generally optional due to the use of type inference.

Operator

An operator provided by a programming language behaves like a function but often has a special syntax. The arithmetic operators +, -, *, and / are most familiar. We use these to construct arithmetic expressions. In most programming languages, these arithmetic operators are `infix operators`_, which means that syntactically occur between their arguments just as in classroom arithmetic. For example, in the expression 1 + 2 the plus operator is between its two input arguments. This expression naturally evaluates to 3.

The logical-comparison operators are also typically infix. For example, in the expression 2 < 3 the less-than operator is between its two input arguments. In most languages, logical comparisons produce a special type of value representing truth or falsehood, which is known as a boolean value. For this reason, the logical-comparison operators are often called boolean operators.

Refactor

Recall that decomposition is the breaking up of a complex whole into simpler component parts. Decomposition is sometimes called factoring. We refactor code when we develop a new decomposition of the code.

Code that works can often still be improved. It may be redesigned to run faster, to use less memory, to eliminate code duplication, to have better separation of concerns or to be easier to understand. Such redesign is called refactoring. A common example is the decomposition of a complicated function into simple, easy to understand component functions.

Matrix Plot

A matrix plot displays a matrix of colors in order to represent a matrix of numbers. When the colors convey the magnitudes of the numbers, a matrix plot is often called a heat map. Spreadsheets can create crude heat maps with conditional formatting of spreadsheet cells. Applications supporting statistical graphics often include good support for heat maps. See [Wilkinson.Friendly-2009-AmStat] for some precursors and extensions of the heat-map approach to data visualization.

Meta-model

A meta-model is a simplified summary of expected simulation outputs. A good meta-model supports prediction of the simulation outcomes from knowledge of the values of the model parameters. It summarizes the scenario-to-output relationship. (For example, a regression model fitted to a set of alternative model parameterizations may support predictions of the outcomes with other parameterizations.)

Module

This course uses the term module for any source-code file can somehow be imported by another source-code file. A module is a separate file containing code for use by a program. Modules are a way to break a larger program into component parts, consisting of separate files. This is called modular programming.

We use a very weak notion of a module: it does not require any separation of concerns, interface specification, or information hiding. In this course, the primary interest in modules is that they enable code reuse across models.

Neighborhood

Spatial models often use the concept of a neighborhood of an agent, which is a set of locations relative to the current position of the agent. Usually the neighborhood is meant to embody some concept of relevant proximity. Concepts of proximity can be extended to non-spatial relationships, such as networks.

Two-dimensional models with a rectangular terrain made up patches or cells are particularly common in agent-based modeling and simulation. In this setting, the two most common types of neighborhoods are box (Moore) neighborhoods and cross neighborhoods, although diamond (von Neumann) neighborhoods are also common. See the chapter on cellular automata for a discussion.

Parameter

In computing, the term parameter has multiple uses and meanings. This glossary distinguishes between model parameters and subroutine parameters (e.g., function parameters).

Model Parameter

In simulation modeling, a model parameter identifies a key influence on the model behavior. To allow for experimentation, focal parameters are typcially variables in the model. Other parameters may be handled by hardcoding, particularly in the early stages of model development. A model parameterization, or scenario, assigns a value to each model parameter. A baseline parameterization serves as a point of reference for discussion of a model; a simulation experiment may compare alternative parameterizations to this baseline.

Model parameters often influence the initial model setup. For example, the initial number of agents is typically a model parameter. Model parameters are constant during a simulation run. Some languages allow declaring this constancy, so that it is enforced by the language. In other languages, it is up to the researcher to ensure this constancy.

A set of values for all the parameters of a simulation model constitutes a scenario for the simulation. A change in the scenario typically leads to a change in simulation outputs. The set of all possible scenarios is called the parameter space. In most models, it is impossible to consider the entire parameter space. Researchers use extreme bounds, parameter sweeps, or other techniques to explore the parameter space. An enumerated sweep explicitly enumerates the values for a parameter (e.g., as a list). A range sweep specifies a range of values for a parameter, typically providing start, stop, and increment values for the range.

Subroutine Parameter

A subroutine parameter is a name used in the subroutine definition to refer to the possible inputs to the subroutine. The acual values supplied to the subroutine are called its arguments. [3]

A subroutine is applied to its arguments; equivalently, it is called with these arguments. The parameters are the names used when describing how the arguments will be manipulated by the subroutine. For example, if a function with parameter x will return the square of its argument, we can say the function will return x * x even though we do not know the actual argument ahead of time.

Most programming languages effectively isolate subroutine parameters from the surrounding code, so a subroutine parameter has meaning only within the subroutine definition. To capture this isolation, programmers often say that a parameter is bound by the definition where it is introduced. Equivalently, we say the scope of the name is restricted to its subroutine.

We may make a comparaison to function defintions in mathematics. Suppose we define a function \(x \mapsto x^2\). If we instead state that \(z \mapsto z^2\), this defines exactly the same function. It does not matter what we call the formal parameter (\(x\) or \(z\)), because this parameter has no meaning outside the function definition. We could even write this same function as \(\square \mapsto \square^2\), except that this makes it a bit harder to mention the parameter by name. We say that a function parameter is bound to the definition where it is introduced.

Because a parameter of a procedure or function is local to its subroutine, we can reuse the same names in diverse subroutines without fear of conflict between the names we are using. This is enormously important for the creation of understandable code. Unfortunately, this apparently simple concept becomes complicated as we delve into the details, and different languages handle these complications differently. As one important example, we may as what happens if a global variable has the name we want to introduce as a function parameter. The answer varies by language.

Do not conflate a function parameter with a model parameter. The values of the model parameters are determined during the setup of your simulation, and they should remain unchanged during the simulation. A function parameter is a name we associate with an input argument, so it can represent a different value each time we invoke the function. We may invoke a single function many times during a single simulation.

Also, pay careful attention to the difference between function parameters and function arguments. Function parameters are variables used during function definition to represent any possible argument that might be provided to the function. Given that a function is defined, we can apply it to actual arguments in order to report an output. For example, we define the identity function \(x \mapsto x\) in terms of a parameter \(x\). If we apply this function to an argument of 2, it will return the same value (2).

Partial Function Application

A function with one parameter is called univariate or unary. A function with two parameters is called bivariate or binary. A function with three parameters is called ternary. The number of parameters is the arity of the function. It can be useful to apply a function with multiple parameters to a smaller number or arguments, in order to produce a function of lower arity. Suppose for example that we have an add function, which simply adds two arguments. This is a bivariate function. We may produce a univariate add1 function, which adds 1 to its argument, by partially applying the add function. For more information, see the Wikipedia entry on partial application.

Refactoring

Programmers break large complex problems into manageable pieces—a process known as decomposition or factoring. Often the initial implementation of a program reveals the benefits of an alternative decomposition of the problem. The possible benefits are diverse, including improved readability, easier maintainability, or better execution speed. Changing the code to implement an alternative decomposition is called code refactoring.

State

A system that changes over time has a particular configuration at each moment. For example, a light switch can be in the on position or in the off position. We summarize the complete configuration at a point in time as the state of the system at that time. For example, we do not repeatedly describe the entire structure of a light switch; we just say which position it is in. In this simple example, a light switch has only two states: on and off. Correspondingly, it has only two possible transitions: from off to on, and from on to off.

As a computer program runs, it produces changes in the computer’s state. For example, executing a piece of simulation code may change the values of model variables, which corresonds to a change the state of the computer’s memory. The program state summarizes the current configuration of the computer state produced by the program execution. Furthermore, in an agent-based model, each agent may be considered a small dynamic system. The current values of its data attributes constitute the state of the agent. Each agent interacts with a larger system and constitutes part of the overall system state.

State Machine

A simple finite state machine may be characterized by a finite set of states (\(S\)), a finite set of inputs (\(I\)), and a state transition function \((S \times I) \to S\). We start the machine initial state (\(s_0\)), and subsequent states are responses to inputs. So \(s_1 = n(s_0, i_0)\), \(s_2 = n(s_1, i_1)\), and so on. For example, in a simple SI model of virus infection there are two state: susceptible, and infected. Each period there is a boolean input, which is true if there has been a disease exposure and false otherwise. A susceptible agent transitions to the infected state when the input is true.

A finite state machine can additionally include an output function, \(S \times I \to O\), where \(O\) is a finite set of possible outputs. Outputs are typically labels as \(o_t = \omega(s_t, i_t)\). Here \(t\) does not denote time; it is just a counter for the input sequence. When a state machine produces outputs, it can be used as a transducer: a converter of an input sequence to a sequence of ouputs.

Hardcoding

Hardcoding involves including a literal value directly in a program’s source code. Modifying a hardcoded value requires editing the source code.

Hardcoding becomes particularly problematic when a frequently changed value occurs in multiple location in the source code. For example, if an important model parameter is hardcoded in several locations, then each time the parameter is changed, it must be changed in all of those locations. This is a violation of DRY and is correspondingly error prone. It is better to introduce a configuration section (or a separate configuration file) that binds the model parameter to a value and then uses the parameter name rather than a hardcoded value throughout the source code.

On the other hand, values that occur once in a program and are expected to change seldom if ever may reasonably be hardcoded. It can be infeasibly messy to introduce a named model parameter for every possible value that governs the behavior of a model, especially during the early stages of model development. Finding the right balance between hardcoding and configuartion is part of the programmer’s art. Attention to the DRY principle provides good guidance.

Logical Connective

Recall that a boolean expression is either True or False. Programming languages often allow boolean expressions to be joined with logical connectives to form new boolean expressions. These are also called boolean operators. Most programming languages provide binary boolean operators for at least logical conjunction (and) and logical disjunction (or). An additional unary operator provides logical negation (not). For a brief introduction, see the Wikipedia article on logical connectives. For more details, see Chapter 3 of [Lehman.Leighton.Meyer-2018-MathCompSci].

Probability Distribution

A probability distribution characterizes the likelihood of possible outcomes. The simplest example is a discrete uniform distribution, where each of a finite set of outcomes is equally likely to occur. Simple examples include the outcome from flipping a fair coin or the outcome from rolling a fair die. Generating an actual value from a probability distribution is called drawing from the distribution. When the possible outcomes are numerical, the actual result is called a random variate.

Roughly speaking, a probability distribution function (PDF) tells us how likely we are to draw particular values from the distribution. For example, suppose we flip a fair coin and write down a \(0\) if it comes up tails and a \(1\) if it comes up heads. The PDF for this discrete uniform distribution is defined as follows. (This is alternatively called the probability mass function of the distribution.)

\begin{equation*} P[0] = 0.5 \qquad P[1] = 0.5 \end{equation*}

Similarly, suppose we roll a die coin and then count the number of pips showing on the top face. The PDF for this discrete uniform distribution is defined as follows.

\begin{align*} P[1] = 1/6 \qquad P[2] = 1/6 \qquad P[3] = 1/6 \\ P[4] = 1/6 \qquad P[5] = 1/6 \qquad P[6] = 1/6 \end{align*}

Procedural Programming

Programming is called imperative when the programmer writes instructions that tell a computer how to make changes step-by-step, in a particular sequence. Imperative programming specifies exactly how a computer should accomplish tasks, rather than just specifying which tasks to accomplish. These tasks may be as simple as assigning a new value to a variable or printing a variable value to an output terminal.

Imperative programming is called procedural when blocks of programming instructions are bundled together into named subroutines, which are known as procedures. Many languages support procedural programming, including Python and NetLogo. As a general rule, procedural programs should strive for command–query separation. Simply put, it is generally desirable to separate commands from functions. A command procedure changes the state of the computer or attached peripherals but does not return a value for use by the program. A function produces a value to be used elsewhere in the program.

Program

A modern computer program uses a programming language to provide instructions to a computer. Programming languages can be read and written by computer programmers. They are turned into instructions that a computer can understand by a compiler or an interpreter. For historical reasons, short programs in some programming languages are sometimes referred to as scripts. This is particularly true in languages that provide an interactive interpreter. The term script originally meant a short program developed to perform a simple task, usually by controling the functioning of other applications. This is particularly true of shell scripts: collections of instructions recognized by an operating system’s command shell.

Pseudo-Random Numbers

Simulation experiments often include simulate randomness. This is the (simulated) random (or stochastic) aspect of the model—its stochasticity. Modern programming languages often provide pseudo-random number generators (PRNGs), which provide streams of numbers that behave like real random numbers. The most common PRNG facility generates number uninformly on the unit interval. This is a computational version of the standard uniform distribution. This facility is adequate for a skilled programmer to generate random variates from diverse distributions. Nevertheless, many modern languages provide much more extensive PRNG facilities.

As a technical aside, it is not really possible to generate all numbers in the unit interval. Numerical computing is usually done with number that use a limited amount of storage, and correspondingly only a finite number of values can be precisely represented. In this course, the finite precision of numerical data types will not usually concern us.

Random Seed

Before computers became widely available, scientists obtained random numbers for experiments from a book of random numbers. They would open the book arbitrarily to pick a starting page, column, and row, and then we would read off the numbers sequentially from that point. Nowadays, this entire process is implemented on a computer, using a pseudo-random number generator (PRNG). Setting a random seed for a PRNG is like picking the page column and row.

Print

Most languages have one or more ways to display values in a command-line setting. A print command or function is a common way to provide this functionality.

Short-Circuit Evaluation

Logically, we know that it is not true that all swans are white as soon as we encounter a black swan. So if we want to evaluate the claim that all swans are white, we can evaluate as false the very first time we encounter a black swan. Rather than continuing to examine all swans, we can “short circuit” our evaluation. Many programming languages take advantage of this observation to provide shortcut evaluation.

Source Code

Computer programmers use programming languages to create computer programs. The programmers write source code, which is stored in one or more files -- usually as plain text. The code is a human-readable collection of computer instructions (in some programming language). Source code cannot be directly executed by a computer. The human-readable instructions must be transformed (by a compiler or interpreter) into instructions that can be understood by a computer.

String Concatenation

String concatenation builds up a larger string from smaller strings.

List

In this course, the term list may generally denote any ordered data type that allows integer indexing for element access. In the discussion context of a particular language or data type, the concept is correspondingly restricted.

Pseudocode

Pseudocode describes an actual implementation in code in a way that is (at least somewhat) programming-language independent. It mixes natural language and mathematics with coding structures in order to better communicate the purpose and structure of and actual implementation. A key goal when writing pseudocode is to communicate algorithmically without assuming a shared programming language.

Rate of Change

Consider a variable \(x_t\), measured at fixed increments of time. For the period from \(t\) to \(t+1\), its rate of change is \(x_{t+1}-x_{t}\). A rate of change is always a flow variable: its measure must reference a period of time. The growth rate from \(t\) to \(t+1\) is \((x_{t+1}-x_{t})/x_{t}\). This is also called the proportional rate of change over the period, and it is sometimes called the proportional growth rate. The related quantity \(x_{t+1}/x_{t}\) is sometimes called the gross growth rate.

Recursion

A function is said to recurse (or equivalently, to be recursive) if it is defined in terms of itself. For example, consider defining a function on the natural numbers by \(f[n]=n f[n-1]\). This is sensible only if something brings a stop to the recursive calls. Recursive functions therefore generally define a base case, which returns a value, and a recursive case, wherein the function calls itself with a new argument.

To continue the example above, if we only define the recursive case \(f[n]=n f[n-1]\), then a call this function has no specified stopping point. But if we add to this function definition a base case that \(f[0]=1\), then the function recurses only until it encounters the base case. Functional programming languages often use recursion where more imperative languages would use a looping construct.

Scaffolding

The term scaffolding has multiple meanings in computer science. It this course, it refers to code that is present during code development but does not run in the final version of a program.

During code development, progammers often temporarily add code in order to aid in the “construction” of a program [Bentley-1985-ACM]. By analogy with construction, progammers use the term scaffolding for these temporary structures. Scaffolding should assist program development by making it easier to detect bugs. For example, during initial work on a program, liberal printing of intermediate results and error checks can be helpful.

Once a program is complete and correct, the output from scaffolding becomes an annoyance and should be suppressed. Sometimes we remove the scaffolding altogether; other times in makes sense to turn it into comments that document past scaffolding.

Separation of Concerns

In computer programming, separation of concerns has two components: programs should be decomposed into parts, and each part should handle one concern well. A concern is just a goal to be achieved by the program. In the context of this course, separation of concerns means that the implementations of functions and procedures will generally be short and will not seek generality at the expense of complexity. It also means that the functionality of agents in a particular model will usually reflect only the needs of that model, discarding unneeded attributes and behaviors. This enhance readability, simplifies debugging, and facilitates code reuse.

It is worth noting that concerns are not always easily separated, especially overarching or cross-cutting concerns. For example, concern with computational efficiency may not be easily separated from other program goals. (See the Wikipedia entry on separation of concerns for more details and background.) Nevertheless, attention to separation of concerns is a helpful computer program design principle.

Side Effects

A subroutine has a side effect if it changes the state of the program or causes an observable event. Printing to a console (or to a file) or plotting to a graphical window are examples of observable events. A change in the value of a global variable is an example of a change in program state. A pure function is a function with no side effects.

When we say a function has no side effects, we mean that evaluating the function does not change the program state in any way. This includes its input-output state, so a function that prints to the console is not pure. For more details, see the Wikipedia entry on side effects.

Signature

The signature of a named subroutine includes its name and its type signature. The type signature characterizes the type (and order) of the arguments that the subroutine consumes and the type of output that the subroutine produces.

Stopping Criterion

A stopping criterion specifies the circumstances under which the iteration of the simulation schedule should stop. This is also known as the stopping condition, termination criterion, or stop rule.

Stub

A stub is a function or procedure that acts purely as a placeholder during code development. A stub is substitute for a yet-to-be-developed subroutine (e.g., function or procedure). The subroutine body is empty, contains only comments, or is otherwise radically incomplete. A good stub will allow your program to compile and run. (Of course your program may not run correctly until you replace the stub with a full implementation.)

Although a subroutine stub is essentially empty or incomplete, it must provide a correct interface with the rest of the program. A stub should execute without a problem, even though the body is not yet implemented. To achieve this, the stub must produce a syntactically correct program before its body is implemented.

The purpose of a stub is to allow a program to run without error despite the fact that it is still underdevelopment. Early in the development of a program, it is common to provide a stub for one or many procedures or functions. Providing such stubs is often called “stubbing out”, and it is an important code development strategy.

Each stub must be usable without error wherever the fully implemented subroutine will ultimately be used. Therefore, each stub must have the same signature as the ultimate implementation. That is, in order to play its role properly, a stub must accept the the same types of arguments and return the same type of output (if any) as the ultimate implementation.

Subroutine

A subroutine is a block of code that we can execute whenever needed. Programmers create subroutines from blocks of code that are usefully reusable. Most programming languages allow the creation of subroutines that produce a return value, which this course calls functions, and those that do not, which this course calls procedures.

Subroutine definition and terminology varies considerably by language. A subroutine may be called a procedure, a function, a routine, a method, or a subprogram. This course uses the term function for any callable subroutine that explicitly returns a value, and it uses the term procedure for a subroutine that does not explicitly return a value. (Procedures can be useful for their side effects.) A function call or procedure call causes the execution of the subroutine (e.g., by means of its name). If the function or procedure accepts input arguments, this course says that we apply it to its arguments.

A subroutine is a collection program instructions that are intended to run as a group. A subroutine is callable if there is a sytax for to call (i.e., branch to) the subroutine and then return back to continue with the next instruction after the call. This means that the same subroutine may be easily used multiple times. Reuse of code by means of subroutines saves time and makes errors less likely. (For example, it makes it much easier to satisfy DRY.)

A computer program will typically define a collection of subroutines. However, it is also very common for a program to depend on subroutines defined in separate libraries. Useful libraries are designed so that they can be easily used by multiple programs. (This again makes it much easier to satisfy DRY.)

When a program has access to a callable subroutine that performs a specific computing task, it can simply call that subroutine to perform that task. A common way to provide access is to give the subroutine a name that is recognized throughout the program. Most programming languages allow the creation of named subroutines, because they are so useful. Named subroutines are important for

semantics

A good name communicates the intent of the code.

maintainability (DRY)

The DRY rule says we should rely on a single implementation of any one task.

Command Procedure

We will use the term command procedure, or just procedure, for a subroutine that does not explicitly return a value. (We use the term function for a subroutine that does explicitly return a value.) Some languages support functions but not procedures, but this course assumes procedures are supported. Since procedures do not produce a return value, we create for their other effects (called side effects). When we create a procedure solely to encapsulate some functionality needed by a calling procedure, we may refer to it as a subprocedure. (For example, we may create a pay subprocedure to implement part of the behavior of a giveGift procedure.)

Topology

In agent-based modeling and simulation, the term topology is often used to describe then handling of boundaries in a spatial model. For example, in a two-dimensional rectangular space, a torus topology is one that wraps at the edges (top-bottom and left-right), while a box topology is one that does not wrap. With a box topology, agents cannot move to some coordinates, as these lie outside the spatial extent of the model.

Trajectory

A dynamic model changes state over time. In a discrete-time model, a trajectory is the sequence of states resulting from some initial state. (This is sometimes called the itinerary of the initial state.) Researchers also speak of the corresponding trajectory of a single state variable. A time-series plot is a common visualization of the trajectory of a single state variable.

Type Annotation

A type annotation provides information about the data type associated with a name. There are many approaches to type annotation, and languages differ widely in their approaches to annotation. This course uses a simplified custom type annotation, very loosely influenced by UML. This annotation is used only in the language-agnostic code outlines provided in the text, not in the actual code examples.

In the annotations of this course, the basic numeric types are either Real or Integer. Collections of a single type are allowed, and the multiplicity of a collection may be indicated by brackets after the underlying type. For example, a list of real numbers may be annotated as Real[0..*] to indicate that there are zero or more real numbers in the collection. In this case, Real[*] and Real[] are also accepted notation.

In the context of type annotations, the colon (:) may be pronounced “has type” or may be given any equivalent phrasing. For example, (x: Real) may be read as “x is real”. (White space before or after the colon is optional.)

In this course, the type annotation of a function differs from UML. An arrow points from the arguments to the type of the returned value. For example, (Real,Real)->Real is the type of a bivarate real function. When it we wish to specify names for the function parameters, the same function may be annotated as e.g. (x:Real,y:Real)->Real.

In another departure from UML, this course occasionally uses the ° character to represent the absence of an explict or usable return value. (This may be pronounced as “nothing” if necessary.) So for example, the type annotation (Real) -> ° describes a procedure that takes a single real argument but returns nothing. If a procedure has no parameters, then it will not receive a type annotation. Implicitly, the language in use supports command procedure definition.

UML

Unified Modeling Language (UML) is an object modeling language that primarily targets the design object-oriented computer software. UML facilitates the documentation, design, and implementation of software and business systems. This course somewhat loosely adapts core UML diagrams, using modified UML activity diagrams and UML class diagrams for model documentation. See the UML appendix for details.

Unit Test

A unit test is a code-based test of a program component, which is known as a unit. This course introduces very simple subroutine tests, where the units are functions or procedures. Many languages provide extensive support for sophisticated unit testing.

Variable

Mathematics and computer science have related and often subtle notions of what it means to be a variable. This course treats this concept rather superficially, thinking of a variable as a name that can represent a value.

Social scientists often distinguish stock variables from flow variables. A stock variable denotes a quantity that can be specified without reference to the passage of time. For example, the variable \(p\) may represent the current world population. No time period is involved in its definition. A flow variable denotes a quantity that cannot be specified without reference to the passage of time. For example, the variable \(g\) may represent the annual growth rate of the world population. This is very different than the monthly growth rate. The definition of \(g\) inherently involves a specified period of time.

In a computer program, the value of a variable may be unambiguously known, as when a programmer makes a single direct assignment to a variable that never changes. (In this case, this course loosely call it a constant, meaning not that it cannot be changed but that it is not changed.) It may be unknown but determinable, as when the variable occurs in an equation that has not yet been solved. Two somewhat related distinctions matter most in this course: the distinction between global and local variables, and the distinction between free and bound variables,

Scope

Programmers often speak of the scope of a variable, which very roughly refers to the portion of the program where the variable can be refered to by its name. This course will generally consider only two types of scope: local and global. A variable name has global scope if it is visible everywhere in a computer program, so that any portion of the program can refer to it. A variable with global scope is a global variable. In contrast, a variable name has local scope if it has meaning only within its code block. A variable with local scope is a local variable.

In modern programming languages, local variables are almost always more useful than global variables. For example, most programming languages ensure that the variables introduced in a function definition can have meaning only within the function definition. They are visible inside the definition (i.e., locally) but they are not visible to the rest of the program. These variables have meaning only inside the function definition. We cannot refer to them outside of that definition context. This implies that two different functions can use the same variable name without creating any confusion or conflict.

There are many language-specific variations on this theme, and discussions of scope can become technical. For additional but still very informal details, see the Wikipedia discussion of variable scope.

Free and Bound Variables

The difference between a free variable and a bound variable is most easily understood in the context of a function definition. Suppose we define the logistic-growth function as \(x \mapsto x + g x (1 - x)\). The variable \(x\) is bound in the function definition, which means that its meaning is completely exhausted by the function definition. For example, \(y \mapsto y + g y (1 - y)\) defines the same function. Substituting \(y\) for \(x\) does not change the function definition in any way.

In the context of this function definition, the variable \(g\) is free: its meaning is not exhausted by the function definition. As far as our function function definition is concerned, \(g\) has a value that is freely determined elsewhere. Any change in this outside value of \(g\) will change the function, so we often say that the function is parameterized by \(g\). We cannot substitute \(y\) for \(g\) in our function definition, unless we also may changes elsewhere in our program.

Verification

Code verification ensures that the code runs as the programmer intends (e.g., as demanded by the conceptual model). Good tests are the primary component of code verification. This course considers two kinds of verification tests: runtime tests, and unit tests. Runtime tests are executed as the entire program runs. Unit tests are run separately from normal program execution.

In writing a unit test, it is natural to wonder what a unit is. In principle, a unit is the smallest piece of the program for which isolated testing makes sense. When a program defines a pure function, this is clearly a unit. In fact, one of the key attractions of pure functions is that they are easy to test effectively and unambiguously.

Unit tests ideally depend only on the unit that is tested. The goal of unit testing is to test isolated chunks of code in order to unambiguosly pinpoint the source of any test failures. If a unit has dependencies in the program, a the origin of a test failure becomes ambgious. Sometimes a test can substitute a stub for a dependency. If necessary, code may be refactored just to make it easier to test independent units.

In practice units for testing are often larger. In object-oriented programming, a unit for testing is often an entire class, even though in principle it could be an individual method.

White Space

A whitespace character traditionally renders just empty space, with no corresponding glyph. This includes spaces, tabs, and end-of-line characters. Languages differ radically on their interpretation of white space. For example, white space is generally ignored in NetLogo but delimits code blocks in Python. When programming in languages that ignore white space, it is nevertheless helpful to develop consistent conventions for indentation and spacing.

References

[Bentley-1985-ACM]

Bentley, Jon. (1985) Programming Pearls: Confessions of a Coder. Communications of the ACM 28, 671--679.

[Feldman-2012-OxfordUP]

Feldman, David P. (2012) Chaos and Fractals: An Elementary Introduction. : Oxford University Press.

[Lehman.Leighton.Meyer-2018-MathCompSci]

Lehman, Eric, F. Thomson Leighton, and Albert R. Meyer. (2018) Mathematics for Computer Science. Cambridge, MA: self published. https://courses.csail.mit.edu/6.042/spring18/mcs.pdf

[matsumoto.kurita-1998-acmt]

Matsumoto, M., and Y. Kurita. (1998) Mersene Twister: A 623-Dimensionally Equidistributed Uniform Pseudo-Random Number Generator. ACM Transactions on Modeling and Computer Simulation 8, 3--30.

[Wilkinson.Friendly-2009-AmStat]

Wilkinson, Leland, and Michael Friendly. (2009) The History of the Cluster Heat Map. The American Statistician 63, 179--184. https://doi.org/10.1198/tas.2009.0033