Basic Introduction to Spreadsheets

Spreadsheet Basics

Cells, Sheets, and Workbooks

Spreadsheet software allows for interactive data entry and manipulation. Launch a spreadsheet application, and create a new blank workbook. This will contain a single spreadsheet, but you can add many spreadsheets to a single workbook. A workbook collects together several spreadsheets and provides facilities for these sheets to interact.

Visually, a spreadsheet is a rectangular collection of cells, which may hold data or formulas (or be empty). Each cell in a spreadsheet has an address, which combines its column letter with its row number. For example, A1 is the address of the left-most, top-most cell. By means of a formula that uses these addresses, the value in a cell may depend on the values in other cells. For example, a cell may contain a formula that computes the mean of the values in some other cells.

Entering Values

To enter a value in a cell, select the cell and type in the value. The information you enter will be stored as a number, a string, or a boolean value (True or False). A cell may display a number in various ways, e.g. formatted as a date, but numbers are always stored in the same way. (Specifically, as double-precision floating point values.)

Entering a Formula

Instead of a value, a cell may hold a formula. To begin entering a formula in an empty cell, select it and enter an equals sign ($=$). The calculation may be entirely numeric—for example, =2+2. However, a formula typically refers to the values of other cells. Refer to a cell by using the row and column of that cell. Use standard operators for addition, subtraction, multiplication, and division. Use ^ for exponentiation.

Operator	Operation	Example
`+`	addition	=1+2
`-`	subtraction	=A1-2
`*`	multiplication	=A1*B2
`/`	division	=1/A2
`^`	exponentiation	=1^2

Local Variables with LET

In Office 365, the LET command can give names to specified values. This can make formulas easier to read by using these names to produce the results. To use LET, introduce a sequence of variable names and values, and finally enter the expression using the names. The variable names are local: they have meaning only inside the LET command. They do not have meaning elsewhere in the spreadsheet.

Example: =LET(x,5,3+2*x+x^2)

Example: =LET(x,A1,3+2*x+x^2)

Example: =LET(x,A1,y,A2,3*y^2+2*x*y+x^2)

Ranges

A rectangular group of adjacent cells is known as a range of cells. Two cell addresses separated by a colon define a range of cells (e.g. A1:D4). The colon separates the top left cell and the bottom right cell of the range.

Range Intersection and Union

Use space separated ranges to specify the intersection of two ranges (e.g. A1:D5 B2:E3). (An empty intersection is an error.) Use comma-separated ranges to specify the union of two ranges (e.g. A1:B5,D1:E5). For example, it is possible to sum over the union of non-adjacent cells.

Arithmetic Sequences

The SEQUENCE command produces arithmetic sequences of values. The simplest use of this command produces a column of consecutive positive integers.

Exercise: In cell A1 enter a name for your sequence, say myseq. Select cell A2 and enter =SEQUENCE(10) in order to create 10 sequence values in the column.

Copying Formula

Exercise: In cell B1 enter a name for a transformed sequence, say myseq2. Select cell B2 and enter =LET(x,A2,x*x). Copy cell B2, select cells B2 through B1, and paste. This creates squared values for all of the original sequence values.

Names in Formulas

Select the myseq column, including the label. From the menus, pick Formulas » Create from Selection. In the dialog box that opens, designate the location that contains the label by selecting the Top row check box. Click OK. This creates a named range. You can now do elementwise computations by using the name.

For example, select cell C2. Enter mseq*myseq to create the same sequence of squared values.

Exercise: Annualizing Data

Background

The rate of change of a variable is its change per unit of time. The proportional rate of change of a variable is its relative change per unit of time. A proportional rate of change is also called a growth rate. Multiply a proportional rate of change by $100$ to produce a percentage growth rate per period.

Rate of Change per period:: $x_t - x_{t-1}$
Proportional Rate of Change per period:: $(x_t / x_{t-1}) - 1$
Continuous growth rate per period:: $\log[x_t] - \log[x_{t-1}]$

To annualize is to adjust a change or growth rate over a period to produce the implied change or growth rate over a year. Data series are observed at different frequencies. To improve comparability, we often annualize the rate of change or the growth rate. Suppose there are $p$ periods per year. (E.g., $p=4$ for quarterly data, or $p=12$ for monthly data.)

Annualized Rate of Change:: $p(x_t - x_{t-1})$
Annualized Growth Rate:: $(x_t / x_{t-1})^p - 1$

An annualized rate of change will provide a better prediction for a quantity that tends to change at a constant rate. Such a quantity is a linear function of time. An annualized growth rate (or percentage growth rate) will provide a better prediction for a quantity that tends to grow at a constant rate. Such a quantity is an exponential function of time.

The covid19 pandemic of 2020 led to radical fluctuations in the growth rate of US GDP. As of January 2021, the BEA estimated that the quarterly GDP estimates (at an annual rate) were $21561.1$, $19520.1$, $21170.3$, $21,494.7$. Enter this data in a spreadsheet and then compute the following.

Find the quarter-to-quarter rates of change of US GDP.
Annualize the quarter-to-quarter rates of change of US GDP.
Find the quarter-to-quarter growth rates of US GDP.
Annualize the quarter-to-quarter growth rates of US GDP.

Entering a LAMBDA Formula

Video: How to use LAMBDA to make Custom Functions in Excel: https://www.youtube.com/watch?v=yDNX7V0eZ8U
Documentation:: https://www.microsoft.com/en-us/research/blog/lambda-the-ultimatae-excel-worksheet-function/

Line Charts

A simple two-dimensional line chart connects a sequence of points with line segments. Given a sequence of univariate data, the first coordinate of the data may be provided by the index of the observation. This is a run-sequence line chart. Line charts of one of the simplest displays of univariate data.

Run-Sequence Line Chart

Run-sequence charts are used with ordered univariate data. The data is considered to be a sequence: order matters. The values are therefore plotted in order.

Dimensionality:: 2D
Points:: $(i, x_i)$ where $i$ is the index and $x_i$ is the i-th data value
Horizontal axis: index values
Vertical axis: data units

If the points are plotted but not joined, we often call the chart a run-sequence plot. If the points are joined, we often call the chart a run-sequence line chart.

A run-sequence chart may allow easy detection of behavior regularities or changes in the behavior of the sequence

cyclicity
autocorrelation
shifts in level
shifts in dispersion

Resource: http://www.itl.nist.gov/div898/handbook/eda/section3/runseqpl.htm

Time-Series Line Chart

Time-series charts are also used with ordered univariate data, when they are indexed by time. Time-series data is inherently sequenced: order matters and is determined by the time index. The values are therefore typically plotted in order.

Dimensionality:: 2D
Points:: $(t, x[t])$ where $t$ is the time and $x_t$ is the data value at time $t$.
Horizontal axis: time units (e.g., date-time values)
Vertical axis: data units

If the points are plotted but not joined, we often call the chart a time-series plot. If the points are joined, we may call the chart a time-series line chart. Since the data for a time-series chart typically includes time as well the values of the variable of interest, we use a spreadsheet scatter plot for the chart.

Exercise: Population Time-Series

Start with a blank workbook.
Enter the world population data from the table below. (Data Source: US Census Bureau.)
Select the cells containing the data.
From the Insert tab, pick Scatter Chart, and pick on with lines.

World Population 10000BCE-1950CE
year	population(millions)
-10000	1
-8000	5
-6500	5
-5000	5
-4000	7
-3000	14
-2000	27
-1000	50
-500	100
-400	162
-200	150
1	170
200	190
400	190
500	190
600	200
700	207
800	220
900	226
1000	254
1100	301
1200	360
1250	400
1300	360
1340	443
1400	350
1500	425
1600	545
1650	470
1700	600
1750	629
1800	813
1850	1128
1900	1550
1910	1750
1920	1860
1930	2070
1940	2300
1950	2400

The default axis scale in most spreadsheets is linear. This means that a given distance along the axis represents a fixed change in the level of the variable. Series that grow or shrink over time are often better represented by a ratio scale. This means that a given distance along the axis represents a fixed percentage change in the level of the variable. Change the scale along an axis as follows.

In your population chart, click on the graph axis.
From the menu, pick Format » Axis…. This displays the Format Axis window.
Under Axis Options tab, check the Logarithmic Scale check box.

Real GDP Time-Series Chart

Getting the Data

Get time-series data for real GDP from the Federal Reserve Economic Database. Here is one way:
1. Go to https://research.stlouisfed.org/fred2/series/GDPC1
2. Click Download » CSV Data and download the data as GDPC1.csv

Plotting the Data (Excel)

Load the data in your favorite spreadsheet.
Adjust the width of the DATE column if needed, so that all dates display correctly.
Select all the data (including the headers).
Pick Insert » Charts » 2D Line Chart
Change size and axes ranges to better display the data. (This may take a little fiddling.)

Edit for Improved Information Display

Reformat the vertical axes to a logarithmic scale.
Change the title to Real GDP

Questions

What is real GDP?
How does real GDP in the US tend to behave over time?
Explain why the logarithmic chart is a more useful way to visualize the GDP data.

Basic Statistics

Univariate Descriptive Statistics

Excel includes a tool for descriptive statistics in the Analysis ToolPak add-in. Pick Data » Data Analysis » Descriptive Statistics. Input the range for your data and whether there are labels in the first row. Check Summary Statistics and click OK. For each data column, you will see the following values.

Mean (=AVERAGE(range))

A measure of central tendency; the arithmetic average of the values.

\begin{equation*} \bar{x} = \frac{1}{N} \sum_{n=1}^{N} x_n \end{equation*}

Sample Variance (=VAR.S(range))

A measure of dispersion around the mean; includes the Bessel correction.

\begin{equation*} s^2=\frac{1}{N-1}\sum_{n=1}^{N} (x_n - \bar{x})^2 \end{equation*}

Sample Standard Deviation (=STDEV.S(range))

Another measure of dispersion around the mean; the square root of the sample variance.

$s = \sqrt{s^2}$

Standard Error (of the mean)

=STDEV.S(range) / SQRT(COUNT(range)) A measure of the dispersion of the sampling distribution of the mean.

Median (=MEDIAN(range))

The middle-most value after sorting the data. (Calculated as the average of the two middle-most values, if there are en even number of data points.)

Mode (MODE.SNGL(range))

The most frequently occuring value, if one exists.

Kurtosis (=KURT(range))

A measure of the likelihood of extreme values. Excel reports excess kurtosis (relative to the normal distribution). Positive kurtosis means the data contain has more extreme values than a normal distribution. Negative kurtosis means the data contain fewer extreme values than a normal distribution.

Skewness (=SKEW(range))

Measures asymmetry around the mean. Positive skewness means that tail is longer and/or fatter to the right. Negative skewness means that tail is longer and/or fatter to the left.

Minimum (=MIN(range))

The minimum value of the numbers in the range.

Maximum (=MAX(range))

The maximum value of the numbers in the range.

Quartile (=QUARTILE(range, n))

The $n$-th quartile of the numbers in the range.

Range (=MAX(range) - MIN(range))

The range of values, found by subtracting the minimum value from the maximum value.

Sum (=SUM(range))

The sum of the values.

Count (=COUNT(range))

Count of the number of observations in this dataset.

Not reported:

Sample Variance (without Bessel correction): =VAR.P(range) A measure of dispersion around the mean:

\begin{equation*} {\hat\sigma}^2=\frac{1}{N}\sum_{n=1}^{N} (x_n - \bar{x})^2 \end{equation*}
Sample Standard Deviation (without Bessel correction): =STDEV.P(range) A measure of dispersion around the mean: ${\hat\sigma} = \sqrt{{\hat\sigma}^2}$

Five-Number Summary

On first encounter with a univariate data set, researchers commonly consider its five-number summary. The five numbers are the minimum, first quartile, median, third quartile, and maximum. Excel does not build in a five-number summary, but the functions listed above can easily produce one. In fact, since these five numbers are quartiles 0, 1, 2, 3, 4 of the data, the QUARTILE function is adequate on its own.

Box and Whisker Charts

A box and whisker chart provides a useful summary of the distribution of a variable. The box shows bounds the interquartile range, with an interior line for the median, while the whiskers show variation beyond the interquaritle range. (See the Basic Statistics supplement for details.) to produce and box and whisker chart in Excel, select a column of data and then click Insert » Insert Statistic Chart » Box and Whisker. See the Microsoft documentation for more details.

Multivariate Descriptive Statistics

Covariance and Correlation

See the Basic Statistics supplement for details.

Regression Analysis: Basic Concepts

Regression analysis is a type statistical modeling. We estimate the response of a dependent variable (say, $y$) to one or more independent variables (say, $x$). Suppose in the real world the two are related as follows:

\begin{equation*} y = \beta_{0} x \beta_{1} + \epsilon \end{equation*}

The dependent variable is the variable explained by the regression model. We hope to use the independent variable to understand and predict the dependent variable. The problem is that the relationship is noisy: the $\epsilon$ term is random. This is why we have to estimate the sensitivity of $y$ to $x$.

Independent variables are explanatory variables. They should be the factors that our model indicates can influence the dependent variable. We estimate a model by finding parameters that minimize the distance of our fitted model from the observed data. The measure of distance is the sum of the squared differences between the observed values of $y$ and the fitted values of $y$. The regression routine of a spreadsheet will find an estimated value for $\beta$ that makes this distance as small as possible. Call that value $\hat{\beta}$.

Preliminaries in Excel

Excel includes a regression tool in the Analysis ToolPak, which is an Add-In that must first be enabled in the Excel Options. Once enabled, we can pick Regression from the Data Analysis group of the Data tab. Fill in the range for the dependent variable and the independent variables. Picke the New Worsheet Ply option for output, and click OK to see the regression results.

The estimated sensitivities are listed in the Coefficients column. When you select a single independent variable there are stil two estimated coefficients because Excel automatically includes a constant. A small P-value (say, below $0.10$) means that a coefficient estimate so far from $0$ would be very unlikely if the true value were $0$.

Estimating the Sustainable Population

The logistic growth model of the Logistic Growth lecture implies a linear relation between the population growth rate and its level: $y_t = m x_{t-1} + g$, where $x_t$ is the population level in period $t$ and $y_t = (x_t - x_{t-1})/x_{t-1}$ is the population growth rate. Together the intercept and slope imply $\bar{P}=g/m$. Using a half century of U.S. population data from the FRED, estimate the $\bar{P}$ for the U.S. (Use the annualized series B230RC0A052NBEA.)

Bar Charts

Bar charts can be horizontal bar charts (which Excel calls bar charts) or vertical bar charts (which Excel calls column charts).

Video: http://www.gcflearnfree.org/excel2016/charts/1/

Categorical Bar Chart: String vs Numerical Categories

Suppose you have the following data in a spreadsheet.

cat1	10
cat2	15
cat3	5

In the spreadsheet, select this data and then insert a 2d column chart, which is also called a vertical-bar chart. The first column will be treated as the categories and you will get a nice vertical-bar chart for you categorical data. Note that the order of the categories is determined by their order in our data set. They are not considered to have a separate sort order.

Numerical Categories

If you try to use numbers for your category names, it may create extra work. Suppose you have the following data:

1	10
2	15
3	5

If you select this data and insert a 2d column (vertical bar) chart, a spreadsheet will generally treat the first column as numerical data, rather than as category names. In this case, you may get a “clustered” vertical bar chart. If the first-column numbers are actually your categories (e.g., bin edges), you will need to choose a different graph style.

Excel:: Choose "more column charts" and find the right type.
Google Sheets:: Check "Use column A as labels"

Data-Based Bar Chart

Create a bar chart for the components of GDP. Then remake the chart in percentage terms.

Getting the U.S. Data

Find the BEA news release for GDP.
In the “Text Full Release and Tables”, find Table 3: Gross Domestic Product.
Find values for C, I, G, and X-M.

Creating the Basic Column Chart

In a new spreadsheet, enter headers C, I, G, and NX in a rows A1:D1.
In the next row (immediately below the headers) fill in the data (from the news release).
Select the headers and data.
Insert a 2D column chart.

Creating the Percentages Column Chart

In cell F2, enter =SUM(A2:D2).
Select cell F2 and change the name box to GDP. (When you define a name, Excel defaults to using absolute cell reference, which is what we want.)
In cell A3 enter =A2/GDP`, and then copy this formula to the next 3 cells.
Select the new cells, right click, and format the values to percentages.
Select the headers and your values in row 3, and insert a 2D column chart.

Function Iteration

Basic Concepts

Recall that to iterate is to repeat an action. Function iteration is the repeated application of a function to produce a sequence of values. (See the Glossary for details.)

Exercise: Exponential Population Projection via Function Iteration

Look up the world population. Assuming a growth rate of 1% per year, produce annual projections for the next 50 years. Create a simple line chart of your projections.

Solution: Step-by-Step Exponential Population Projection

Iterative approach:

Open a new spreadsheet.
Enter the current population in A1.
In A2 enter the formula =1.01*A1.
Copy this formula and paste it into A2:A51.
Create a line chart of your population projections.

Hint: In Excel you can select a range of cells by entering the range specification into the name box. In Google Sheets, you can select a range of cells by creating a named range.

Closed-form approach:

In B1, enter the formula =7.75*POWER(1.01,ROW()-1).
Copy this formula and paste it into A2:A51.
Compare with the results of your iterative approach.

Exercise: Logistic Population Projection via Function Iteration

Look up the world population. Assume population growth depend on the current population, according to:

\begin{equation*} p_{t+1} = p_{t} + 0.03 * p_{t} * (1 - p_{t}) \end{equation*}

where $p$ is the current population relative to the carrying capacity of the planet. Assume a carrying capcity of 11.5 billion. Produce annual projections for the next 50 years. Create a simple line chart of your projections.

Solution: Step-by-Step Logistic Population Projection

Iterative approach:

Open a new spreadsheet.
In A1. enter the current population divided by the carrying capacity.
In A2 enter the formula =A1+0.3*A1*(1-A1).
Copy this formula and paste it into A2:A51.
In column B, scale the values from column A by the carrying capacity.
Create a line chart of your population projections.

Hint: See the discussion of exponential growth (above) for useful hints. Note that we do not have a closed-form for the population.

Tables and Arrays

Structured Reference Tables: Some Basics

Since Excel 2007, Excel provides a structured data type within a sheet, known as a structured reference table. This is sometimes just called a “table”. (In ancient times, is was called a “list”.) A table is rectangular and generally has column labels.

Tables provide convenient data handling. As a simple example, a formula entered in a column is treated as an array formula and immediately autofilled. Tables can also be named, allowing structured references to their data. For example, formulas can simply refer to column names (with a bracket notation).

Documentation: https://support.office.com/en-us/article/Create-or-delete-an-Excel-table-in-a-worksheet-867f3add-0522-493c-94ab-b1b9933f46ca
Helpful Video:: https://www.youtube.com/watch?v=tTYQrk3hSYo

Creating a Table

Select a rectangular array of cells that has a header row of labels and then the data below.
From the menu, pick Insert » Table, check the “My table has headers” box, and click OK.

Name a Table

Select any cell in the table.
From the menus pick Design » Table Name.
Enter a new table name.

Append a Column to a Table

Approach 1:

Select the cell beside your table’s column headers.
Enter a name for your new column header.
The new column is automatically added to your table.

Approach 2:

Select any cell in the last column of your table.
Pick from the menus, Home » Cells » Insert » Insert Table Columns to Right

After you create a new column, you will usually want to change its name.

Table Example: Refer to Column by Name

Table columns can be accessed by name. E.g., if the table is named data and the column is names c1, you can refer to the column by name.

data[c1]

(When you are working within a single table, you can omit the table name.)

This is called a structured reference.

Table Example: Function Plot

We will illustrate the use of tables by doing a function plot of $\sin(x)$. Note that the SIN function takes one argument: an angle measure in radians. We will create points in $[0..2\pi]$, and evaluate the function at each of these points.

Open a blank workbook.
Use the Name Box to select A1:B102
Pick Insert » Table or click ctrl+t, check “My table has headers”, and click OK.
Change the headers to x and sinx.
In A2 enter the formula =2*PI()*(ROW()-2)/100; note how the rest of the column is filled in, creating the domain values for our function.
In B2, enter the formula =SIN([@x]); note how the rest of the column is filled in, creating the range values for our function.

Note how we were able to refer to another column in the table by placing its header in brackets. We are now ready to create a chart from the table data.

Click anywhere in your table.
Insert a scatter chart (Insert » Charts » Insert Scatter).
Be careful: in early versions of Excel 2016, moving this chart to a new sheet can cause Excel to hang.

Documentation: https://support.office.com/en-us/article/Using-structured-references-with-Excel-tables-f5ed2452-2337-4f71-bed3-c8ae6d2b276e

Edit the Chart

Make your charts look good.

Select the graph and move it up to the top of the worksheet near the list of values. (Or move it to its own sheet.)
Drag the boundaries to get a pleasing aspect ratio.
Get rid of the grid unless your have a need for it.
Choose a background color. (Do not be dramatic.)

Table Example: Rectangular Data

Import rectangular data into a spreadsheet.
Select the data
press ctrl-t to open the table dialogue; create the table
rename the table (Design » Table Name)

Table Alternatives

Excel tables are great, and they are missing from many other spreadsheets. But they are not irreplaceable. Think of a table as a of mini sheet with automatic named ranges via structured references. The handling of formulas is also just a convenience; it is largely equivalent to array formulas.

Sheets: named ranges: https://support.google.com/docs/answer/63175

Named Ranges

Most spreadsheets support named ranges, which allow easy array-oriented computations. A named range is just what is sounds like: a range of cells that can be referenced by name.

Example:

choose from the menus Formulas > Define Name and fill out the dialog
tap the OK button to make a named range from the selected cells

Note: The menu location for named-range creation is application specific. (E.g., Data » Named Ranges ... in Sheets.)

Named Ranges Example: Function Plot

Example:

We will create 101 evenly-spaced values in $[-1,1]$ for our domain ($x$ values) and then plot a function over this domain.

Open a blank workbook and enter x in cell A1 and y in B1.
Create the $x$-values.
1. Tap in the Name Box (to the left of the Formula Bar) and enter A2:A102; the range of cells that you want to fill.
2. Type =(ROW()-52)/50 in the Formula Bar, and then ctrl-shift-enter This copies the formula to all cells; and gives you the domain values for your function.
Make a named range for your 101 new cells; name it x.
Select B2:B102, type the formula =x*x, and then press ctrl+shift+enter; this gives you the range values for your function
select a location for your scatter plot, and insert a scatter plot. Right click on the plot body, and choose Select Data, and set the range to x,y. (European users may need a semicolon rather than comma for the union operator.)
Edit the chart as before.

Array Formulas

We can use array formulas to fill spreadsheet ranges with an entire array at one go. An array formula that fills multiple cells is called a multi-cell formula; all the filled cells depend on the single formula.

Documentation:: https://support.office.com/en-us/article/Guidelines-and-examples-of-array-formulas-7d94a64e-3ff3-4686-9372-ecfd5caa57c7
Examples:: http://www.cpearson.com/excel/arrayformulas.aspx

Simplest Array Formula Example

In a new sheet, select A1:A3 and type the formula:

={1;2;3}

To enter this as an array formula, use Ctrl+Shift+Enter. (For this reason, array formulas are sometimes called CSE formulas.) Note that we needed to select the proper output area for the result.

The value in braces is called an array constant. We can do elementwise arithmetic operations with array constants. In a new sheet, select A1:A3 and type the following formula, entering it with Ctrl+Shift+Enter.

={1;2;3}*{4;5;6}

Note: Google sheets has the ARRAYFORMULA command: (It also supports Ctrl+Shift+Enter, like Excel.)

ARRAYFORMULA: https://support.google.com/docs/answer/3093275

Array Formulas with Functions

Many functions can be use in array formulas to produce elementwise results. Just select the proper output range, and enter the formula with Ctrl+Shift+Enter.

Example:

In a new sheet, select A1:A3. Then enter the following array formula. (Remember: Ctrl+Shift+Enter!)

=ROW()

You will see the row numbers entered in column A. Next select B1:B3 and enter the following array formula. (Remember: Ctrl+Shift+Enter!)

=SQRT(A1:A3)

You will see the square roots entered in column B.

Random Numbers

Core Random Number Generators

The core random number generators are RAND and RANDBETWEEN.

function:: RAND()
ouput:: a random decimal number in $[0,1)$.
documentation:: https://support.office.com/en-us/article/RAND-function-4cbfa695-8869-4788-8d90-021ea9f5be73
function:: RANDBETWEEN(min,max)
ouput:: a random integer between min and max, inclusive.
documentation:: https://support.office.com/en-us/article/RANDBETWEEN-function-4cc7f0d1-87dc-4eb7-987f-a469ab381685

Additional options are usually available (e.g., in Excel, through the Analysis ToolPak).

Random Decimal in $[x_\min,x_\max)$

=xmin + (xmax-xmin)*RAND()

Recalculation

The random number functions are “volatile”: they are called each time the spreadsheet is recalculated.

Example:

in a new worksheet, enter =RAND() in A1
enter =RAND() in A2; note that the value in A1 changes
press shift+F9 for force recalculation of the sheet; note that the values in A1 and A2 change

We can replace formulae with values.

select A1:A2
copy
pick from menus Home » Paste » Values

Random Integer Data

Example:

Start with a new worksheet.
In A1, enter RANDBETWEEN(0,10), then select and copy (ctrl-c) the cell.
Enter A2:A101 in the Name Box.
Paste (ctrl-v) the copied formula into the cell.

You now have random data, but there is an underlying formula. This means that it will be recalculated every time your spreadsheet is recalculated. We usually do not want this dynamic recalculation. So select all the data, copy it (with ctrl+c), and then paste the values (using Paste Special).

Random Integer Data (Table Version)

Example:

Start with a new worksheet.
Enter data in the A1 cell.
Enter A1:A101 in the Name Box.
From the menus, pick Insert > Table, check the “My table has headers” box, and click OK.
In A2, enter :code`=RANDBETWEEN(0,10)`.

Random Variates: Analysis ToolPak

The Analysis ToolPak (ATP) provides several additional distributions for sampling.

Example: sampling from the standard uniform distribution.

Preliminary: install the Analysis ToolPak: https://support.office.com/en-us/article/load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89

Example:

Choose Data Analysis from the Data tab
Pick Random Number Generation
Choose Binomial with paramters $p=0.1$ and $n=5$
Set an Output Range determined by the number of variates you want
Click OK

Pivot Tables

Pivot tables provide ways to summarize worksheet data.

Basic videos about pivot tables:

Pivot Table Exercise

Pivot tables provide power functionality to regroup and aggregate your data. For example, enter the following lifetable data into a spreadsheet. (More data is available from the Social Security Administration.)

Sex	Age	YearsLeft
Male	15	61.70
Female	15	66.58
Male	20	56.91
Female	20	61.68
Male	25	52.30
Female	25	56.83

Select the entire table, pick Insert»PivotTable, and select the New worksheet option. Using the dialog that opens, drag Age to Rows and Sex to Columns. The totals are meaningless, so remove them as follows: right-click the table, select PivotTable Options, and in the Totals & Filters tab, uncheck the Grand Totals. The result is a quick and easy two-way table.

Pivot tables provide the basis for pivot charts. For example, select the table and pick Insert»PivotChart. Pick a line chart to get a nice plot, for each sex, of life expectancy for each age.

Simple video introduction:: https://www.youtube.com/watch?v=9NUjHBNWe9M

Value Counts

Create random integer data as above, except this time you must include a column header. Suppose the header is mydata. We want to determine the number of times each value occurs in the mydata column. Pivot tables make this easy, as long as our data has a column header.

Select the data, including the header.
Pick Insert » Pivot Table » OK; this should create a new worksheet with a pivot table.
In the Fields dialog, check the mydata box so that this item appears in the Values area.
Drag mydata to the Rows area, of the Fields dialog, so that it appears there too. Now your pivot table should show each value in the mydata column along with its sum.
In the Value area, click on the arrow beside mydata and change the Value Field Settings to Count.

You now have a table of value counts.

Simple-Minded Frequency Chart

Make a table of value counts, as above. We are going to make a frequency chart from this table. This will be a vertical bar chart, which spreadsheets often call a “column chart”. There will be one column for each unique value, with that value's frequency as the column height.

Approach 1: Pivot Chart

click anywhere in your pivot table
pick Insert » Pivot Chart » Column to make what Excel calls a Clustered Column chart.

Approach 1: Column Chart

Select the data in your pivot table.
Pick Insert » Pivot Chart » Column to make what Excel calls a Clustered Column chart.
Change the vertical axis bounds as needed.

We call this approach “simple-minded” because we do not worry about values that have zero frequency.

Simple-Minded Relative-Frequency Chart

Make a frequency chart as above. In the PivotTable Fields dialog box, go back to the Value Fields Settings box, and click the Show Values As tab. Pick % of Column Total, and set the Number Format to General.

Frequency Chart Redux

A table of value counts supports a simple-minded frequency chart. This approach “simple-minded” because it does not worry about values that have zero frequency. To do a little better, display cells with zero counts. To illustrate, make a pivot table of counts for the data 1,2,3,5,3,2. Note that the value $4$ does not occur in this data.

In your pivot table of counts, right-click on one of the the row labels and choose Grouping.
Enter 1,7,1 as the start, stop, and increment values for grouping, and click OK.
Right-click on one of the the row labels and choose Field Settings » Layout and Print. Check the Show items with no data box.
Right-click a pivot table cell and choose PivotTable Options. Fill in For empty cells show: with a $0$.
Make a column chart (i.e., vertical bar chart) from the pivot table.
Click on the data labels on the horizontal axis, and uncheck out the useless categories.

The overextended range allows us to work around undesirable features of the default Excel display. (That is why we picked an end value of $7$ instead of the more obvious value of $5$.) Using the filter to manually unselect the out of range values produces a better display.

Pivot Tables and Ordinary Charts

Fortunately, you can make a regular chart from your pivot table.

Select an empty cell that is not immediately adjacent to your pivot table and insert a blank chart.
Right click on the chart and pick Select Data from the context menu.
Click the Add button, to add a new series. For a vertical bar (column) chart, select the y-axis values.
Once you select the data, you can edit the axis labels in the Select Data Source dialog. For a horizontal bar chart, set the data range to the range containing your labels.

Surface Chart

3D Plots

Excel 2016 includes very weak 3D plotting capabilities, but it can do simple surface plots.

Grided Data

To create a surface plot, Excel 2016 needs a filled rectangular grid of data. In real data, we face the following problems:

we usually have data records $(x,y,z)$ rather than gridded data
there may be missing grid points

We can use a pivot table to turn a complete set of data into grid data.

Select the $(x,y,z)$ data
Insert a pivot table
Drag the $z$ data to the values
Drag the $x$ data to the rows
Drag the $y$ data to the columns

This gives us gridded data. Now we can:

Select the data, including the row and column labels
Insert a surface chart

Lorenz Curve

Cumulative Sum

Consider a sequence of numbers. A cumulative sum is a running total of the sequence elements. For example, the values 1,2,3,4 have a running total of 1,3,6,10.

The cumulative sum sequence will have the same length as the original sequence.

Cumulative Sum (Running Total): Simplest

Oddly, spreadsheets often lack a function to produce a cumulative sum (also called a running total).

Generate random integer data as described above. We will assume this is in A1:A100 and that column B is empty.
Select cell B1 and enter the formula =SUM(A$1:A1). Note the use of both relative and absolute coordinates.
Select and copy cell B1. Select B1:B100 and paste in the formula.

Exercise: make a run-sequence line chart of your cumulative sum.

Cumulative Sum (Running Total): Better

The first method is the simplest approach and with a small data set it is fine. However, note that we repeatedly sum over the column of data, repeating a lot of the computational work already done. If you have a lot of data, the can cost time.

Here is a better approach, using the same random integer data. We assume column C is empty.

in C1 enter =$A$1
in C2 enter =$A2+C1
copy the formula from B2 to the remaining rows of `B`

Comment: Pay careful attention to the distinction between absolute and relative cell references.

Add a Column to a Table

Select a data cell in your table. Right click to get the context menu. Pick Insert » Table columns to the Right.

Cumulative Sum: Table

Assume a table with column x. Add a column to the table and enter the formula:

=SUM(INDEX([x],1):[@x])

We use INDEX to get the row and column in the spreadsheet. The at-sign (@) provides a shorthand for [[#THIS ROW],[x]]. Without this shorthand, we would have written:

=SUM(INDEX([x],1):[[#This Row],[x]])

The redundant summing is unfortunate, but there does not seem to be a better way to do partial sums with structured references.

Cumulative Moving Average (Running Average)

Using the same data as for our running total, just change the formula slightly: divide each cell result by Row().

Or if you have a header cell, so you data starts in row 2, divide each cell result by (Row() - 1).

Cumulative Proportion

To produce the cumulative proportion of data, first produce a cumulative sum. Then deflate the cumulative sum by the total.

This produces an increasing sequence of fractions with the same length as the original sequence and a final value of 1.0.

A cumulative proportion usually has most interest if the data is first sorted.

Sorting

Excel does not provide sorting functions. Instead, we sort a column in place. Often it is best not to manipulate the original data, so first copy it to a new column, which you will sort.

Note: the sorted column will not automatically update when the data column changes! If you want to do dynamic sorting, consider the LARGE function. http://excelbyjoe.com/using-excel-functions-to-dynamically-sort-data/

To sort a column:

Select the data, and then choose Data » Sort and sort smallest to largest

Lorenz Curve

We now think of our data as representing income or wealth. Each observation represents one population unit. (E.g., a household.) A Lorenz curve plots the cumulative proportion of the data against the cumulative proportion of the population.

sort the data
produce a cumulative proportion of the sorted data
plot against $i/N$, where $i \in [1 .. N]$ indexes the population

Refinements:

a Lorenz curve diagram looks best with an aspect ratio of 1
a nice Lorenz curve includes the point $(0,0)$.
usually we provide a 45 degree line as a reference line for the Lorenz curve; one way to produce this is with a second plot of either series against itself

Basic Charts

There is inconsistent usage of the terms chart, graph, and plot. We can use the term chart for almost any visual display of data (including maps, which are the original use of the term), aside from data tables. Such visual displays of data are called graphical even when they are not called graphs.

Many chart types specify axes, each representing either incommensurate categories or commensurate magnitudes. As long as the data display is at least abstractly determined by points related to axes, people freely substitute the term graph for chart. For example, it is reasonably common to find bar charts referred to as a bar graphs, but we always refer to pie charts (and never to pie graphs). Related to this, when data are visually related to axes, it is much more common to refer to graphing the data than to charting the data.

When all axes represent ordered commensurate magnitudes (e.g., a number line), the chart will often be called a plot or a graph. When individual points are drawn, the chart will most often be referred to as a plot, even when additional information is conveyed by the size or color of the points. Examples include the scatter plot, the line plot (not the line graph), and the bubble plot (more often called a bubble chart).

Roughly then, plots are graphs, which are charts. However, you will find many exceptions to this usage. For example, the GraphPlot command in Mathematica draws a diagram of a binary relation. Better names might have been RelationDiagram or RelationChart.

Histogram Statistical Chart

We now define a related data-display concept: the frequency histogram. This categorizes data into mutually exclusive bins, and then displays the counts of the data points in each bin. The bins are usually equal-sized, non-overlapping intervals.

In earlier versions on Excel, a basic histogram bar chart had to be created from a categorical bar chart, where the categories are bins and the values are frequency counts. However, Excel 2016 added Histogram as a Statistical Chart. For produce a simple histogram as follows.

Select a column of data.
Pick Insert » Insert Statistical Chart » Histogram.
Right click on the horizontal axis and adjust the bins as desired.

Histogram: The Old Way

The following discussion of how to produce a histogram prior to Excel 2016 is included only for its use in illustrating other spreadsheet features.

Unique Values: Remove Duplicates

The easiest way to remove duplicate values from a column is to use the Remove Duplicates facility of Data Tools. (See Analysis ToolPak

Select the data
Pick Data » Data Tools » Remove Duplicates
Keep the current selection.
Click OK.

Notice that you are not given an option to copy to another column. If you want to keep the original column, copy it first, and remove duplicates from the copy.

Documentation:: https://support.office.com/en-us/article/Filter-for-unique-values-or-remove-duplicate-values-ccf664b0-81d6-449b-bbe1-8daaec1e83c2

Unique Values: Filtering

We will determine the unique values in a column by using a data filter.

Select the data.
Go to Data » Sort & Filter and click Advanced.
Check Filter the list, in place or Copy to another location
Click Copy to another location. In the Copy to box, enter a cell reference (the range will be expanded as needed).
Select the Unique records only check box, and click OK.

The unique values from the selected range are copied to the new location. Next, we will sort them.

Value Counts (with COUNTIF)

Suppose your original data is in A1:A1000, and your filtered unique values are in B1:B50. In C1, enter the forumula:

=COUNTIF($A$1:$A$1000,$B1)

Copy this formula into C1:C50. Now you have the values in B and their counts in C. We can use these to make a frequency chart.

Binning

Binning is a partition of the data set. Every data point is allocated to a single bin, based on its satsfaction of the criteria for that bin.

Here we focus on the binning of univariate numerical data. Suppose we have N observations, $X_1 ,\dots,X_N$. We create K bins by specifying a strictly increasing sequence of boundary points, $a_{0},\dots,a_{K}$, where $a_{k} < a_{k+1}$. These K + 1 points determine K bins, which correspond to the intervals $[a_{k}, a_{k+1}]$. The inclusive interval $[a_{0}, a_{K}]$ is called the binning range. Each point within the binning range is allocated to the bin that contains it.

These leave two questions: what do we do with points that do not fall in any bin, and what do we do with points that fall exactly on a boundary. Unfortunately, there is no unique standard for behavior in these cases.

Boundaries between bins:: Most spreadsheets treat bins as excluding lower bounds and including upper bounds, so we end up binning in intervals of the form $(b_{k} .. b_{k+1}]$. [#pivotbin]_ This is the include-right convention. "A data point is included in a particular bin if the number is greater than the lowest bound and equal to or less than the greatest bound for the data bin." This is the opposite of the convention used by much scientific software, which adopts the include-left convention, producing intervals of the form $[b_{k} .. b_{k+1})$. (However, the default in R is include-right.)
Values outside the binning range:: Spreadsheets usually bin all values. Values in $(-\infty,a_0]$ are collected in an initial bin. Values in $(a_K,\infty)$ are collected in a final bin. Scientific software is more likely to discard data outide the binning range, but observations on the binning-range endpoints are usually included (in the first or last bin).

Basic Histogram Binning

Mark off the data axis into a number K equal-length intervals. $[a_{k}, a_{k+1}]$. (Usually these cover the whole range of the data points.) The number K should vary with N but not be too small. (Use at least 5 intervals.): A common choice is the square-root of N. (This is widely used in spreadsheets.) Another theoretically based proposal is to let K = Floor[N^(2/5)]. (I.e., rounded down to the nearest integer.) Let h=(Xmax - Xmin)/K be the width of each interval.
For each k=1,...,K, tally the number $n_k$ of data points falling in the interval $(a_{k}, a_{k+1})$ plus any boundary points (handled as discussed above)
Draw a column-bar chart of adjacent rectangles. Each of the N rectangles aligns with an interval and so has width h. Each rectangle has height $n_k$. over the interval [a_{k}, a_{k+1}].

Histogram Table: Excel

Before creating a histogram in Excel, you need to install the Analysis Toolpak: https://support.office.com/en-us/article/load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89

Excel bin boundaries are inclusive upper bounds, of the form (bl..bh]. “A data point is included in a particular bin if the number is greater than the lowest bound and equal to or less than the greatest bound for the data bin.”

Data values less than or equal to the smallest specified bin value are binned together. Data values greater than the largest specified bin value are binned together.

"If you omit the bin range, Excel creates a set of evenly distributed bins between the minimum and maximum values of the input data."

http://www.math.kent.edu/~honli/teaching/statistics/Chapter2/Excell_Histogram.html

Histogram Chart

Histogrammed data appears as a column chart, mapping bins to frequencies. Detailed instructions: https://support.office.com/en-us/article/Present-your-data-in-a-histogram-ccc0fc1a-2ccd-490f-a346-87f3084cc26e

To turn a `histogram table`_ into a histogram chart, select the table, and from the Insert tab pick a column-bar chart.

Excel and Calc defaults produce a very bad histogram chart. (Google Sheets has a much better default histogram, but is currently less customizable: e.g., you cannot separately set the edge color on the column bars.)

Titles:: Add meaningful chart and axis titles, (In older Excel versions, you may also need to remove the legend.)
Column bars:: make bar columns touch, and remove the fill color.
Tick labels:: specify the bin range as the tick label

Bar improvment details:

Right click in a column to open the Format Data Series dialog. Use the Series Options Tab to set the Gap Width to 0%. Use the Fill and Line Tab to select No fill. and a solid black line border.

Tick-label improvment details:

Tick labels constitute the worst problem and are the most painful to fix. The histogram bars are incorrectly centered on the right boundary of the bin. (The reason is that Excel is repurposing its Column Chart.) You need to go to your histogram table and enter correct labels under “Bins”. For example, for a bin holding value from 5 to 10, you might ender (5..10]. (Remember, upper bounds are inclusive.)

Even after all this work, you will produce a decent histogram only if you have used a single bin width. This is because a true histogram should have column width proportional to the bin widths, but Excel will produce common column widths irrespective of bin width. Fortunately, it is often convenient to use common bin widths, as this allows easy visualization of density of observations.

Analysis ToolPak Example:

Here are two approaches to making a histogram chart in Excel:

pivot table histogram:: https://www.youtube.com/watch?v=mQBS_C9frRI
Analysis ToolPak histogram: https://support.microsoft.com/en-us/kb/214269

Begin by creating some data. Copy the following formula into A2:A101:

=NORM.INV(RAND(),0.0,0.25)

In column B, put some a few (sequential) bin boundaries. Pick DataAnalysis > Histogram, put your data range in the Input Range, and put your bin boundaries in the Bin Range. Be sure to check Chart Output, then click ok. Voila!

Relative-Frequency Histogram

We now define a related data-display concept: the relative frequency histogram. To do this, we scale each bar of our frequency histogram by 1/N.

Create a frequency histogram, as above.
Change the height of each bar to :math`n_{k}/N`.

Note that the area of the k-th bar of a relative-frequency histogram is h * n_{k}/N, so the total area of the bars is h.

Scaled Relative-Frequency Histogram

We now define another histogram display concept: the scaled relative-frequency histogram.

This is a frequency histogram with rescaled heights, so that the total area of the rectangles is 1.

We accomplish this by scaling the height of each bar of our frequency histogram by 1/(hN).

Create a frequency histogram, as above.
Change the height of each bar to :math`n_{k}/(N h)`.

The total area of the bars is 1. This scaling of the histogram makes it easy to compare to a plot of a theoretical density function.

To illustrate, make a scaled relative-frequency histogram for 100 draws from a standard normal, and then do the same thing for 100 draws from a mean 1 exponential.

Empirical Distribution

The empirical distribution function based on the data is a distribition function which has jumps at observed values $X_n$ of size equal to 1/n multiplied by the number of sample points equal to that value.

\begin{equation*} \hat{F}(t) = (1/N) \ \#\{X_n \le t\} \end{equation*}

Given $t$, we divide the $X_n$ into two groups: those to the left of t (or equal), and those to the right of t. We count up those in the first group to get $\hat{F}(t)$. Equivalently, we assign a 1 to observations in the first group, and a 0 to observations in the second group, and then add up these values. (We can formalize this with an indicator function, I[X_n ≤ t].)

Let us construct an empirical distribution function based some univariate data, $X_1 ,\dots,X_N$.

Suppose our data sample is independently drawn from a single distribution, F. Then our data is a sequence of independent random variables which are all ‘identically distributed’. Given t, the probability of falling in the first group is each time

\begin{equation*} F(t) = P\{X \le t\} \end{equation*}

The value $N \hat{F}(t)$ is the sum of N iid binary (‘coin-toss’) random variables, all with the same probability p = P(X i ≤ t) = F(t) of being 1.

Thus $N \hat{F}(t) ~ Binom(N, F(t))$.

The Law of Large Numbers implies that $\hat{F}(t) \rightarrow F(t)$, with probability close to 1 when n gets large.

Plots of empirical CDFs from data overlaid with theoretical d.f.’s are among the best simple ways of checking visually that the data really fit the theoretical CDF well.

Cobweb Chart

Cobweb charts are also called step charts. Spreadsheets are not the easiest places to produce cobweb charts. You may find this description helpful:

http://archives.math.utk.edu/ICTCM/VOL17/C005/paper.pdf

Tornado Charts

Tornado charts are commonly produced as horizontal bar charts. See https://excelchamps.com/blog/tornado-chart/ for an example. Also see https://vimeo.com/34389151

Heat Map

Unfortunately, Excel does not include a heat-map chart (as of 2016). A heat map uses color to enable visualization of a retangular array of data. However, data in a range can be color coded, giving an approximation to the desired visualization.

Heat Map Exercise

1 1 0 0 0 0 0 0 0 0
1 1 0 0 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 0 1 1 0 0 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 1 1 0 0 0 0
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 1 1 0 0
0 0 0 0 0 0 0 0 1 1
0 0 0 0 0 0 0 0 1 1

As a simple example, consider the following binary data. Enter this data in a spreadsheet. Select the data and choose Condition Formatting, and then from Color Scales pick the color map of your choice. Colors will be assigned based on the value in the cell, which in this case makes it easier to see patterns in the data. (To remove the numbers, right-click on the selected data and pick Format Cells » Number » Custom and enter and empty string in the dialog.)

This crude heat map is volatile: it immediately reflects any changes in the data. To see this, try changing a few of the values.

Using a Spreadsheet to Graph a Parmeterized Function

Set Up

Start a new spreadsheet.

Put your amplitude parameter (say, 2.75) in cell A1.

Create the Data

Decide on the range and the number of values. We will use [0,1] with 101 evenly-spaced points.
Select cell B1 and enter the formula =ROW(A1).
Select and copy cell B1.
Select down to B101.
- In Excel, you can use the GoTo command:
- press ctrl-g (or function key f5) and enter destination B101
- hold the shift key while clicking OK.
Paste your fomula into your selected range (e.g., ctrl-v).
Make a named range for your 101 new cells; name it idx.
- If not already selected, select by selecting B1, then press holding crtl+shift+down
- Make a named range from the selected cells by choosing from the menus Formula > Define Name.
Select cells C1:C101.

You should know how now, but here's another way. Go to cell B1 and press ctrl+down. This takes you to B101. Press the right arrow. This takes you to C101. Press ctrl+shift+up. This selects up to C1.
type the formula =(idx-1)/100 and then press ctrl+shift+enter; this gives you the input values for your function
make a named range for these new cells; call it domain
select cells D1:D101
type the formula =A1*domain*(1-domain) and then press ctrl+shift+enter; this gives you your function values over your domain

Create Your Graph

Select the cells C1 through D101
From the “Insert” menu, choose a Scatter chart (with lines but no markers)
Experiment with changing the amplitude parameter

Refine Your Graph

Change the x-axis and y-axis minimum and maximum to 0.0 and 1.0
Select the “Titles” tab and use x for the x-label and y for y-label.
Change the chart title to Logistic Map
Change the aspect ratio of the plot to be more square.
Select the “Legend” tab and un-check “Show legend”
Get rid of the gridlines

Error Bar Chart

Excel: https://www.youtube.com/watch?v=2Etko-slowk
Sheets: https://www.youtube.com/watch?v=8LuKPw0_pMI
gnuplot: http://gnuplot.sourceforge.net/docs_4.2/node140.html

Array and Matrix Operations

Array and matrix operations use normal functions but are entered with ctrl+shift+enter. Also, you need to select an appropriately shaped range for the result.

A nice feature is that the result will be dynamically updated if you change the underlying data.

Scalar Operations

Let mA be a rectangular array. Let's add a constant to every element of mA. Select an area the same shape as mA. Then simply include mA in a formula, as in:

= mA + 100

and then press CTRL-SHIFT-ENTER. Subtraction, multiplication, and division are handled in the same way.

Elementwise Operations

Assume that mA and mB are two ranges of the same size (say, {20*1}). To create the vector sum, select an empty {20*1} range on the sheet, type in the formula:

= mA + mB

and then press CTRL-SHIFT-ENTER.

You might wish to name the resultant range (e.g. mC) for future reference.

Any two arrays of the same size can be added in this manner, with the result placed in a range of the same size.

Elementwise subtraction, multiplication, and division are handled in the same way.

Matrix Operations: Matrix Multiplication, Inverse, and Transpose

To multiply two matrices, use the MMULT function. Thus, if mA and mB are compatible for multiplication, you could compute the value of a portfolio with the formula:

= MMULT(mA, mB)

If mA is an invertible matrix, produce the inverse of a matrix with MINVERSE:

= MINVERSE(mA)

TRANSPOSE(mA) produces the matrix transpose.

Example:

Suppose mA and mB are both {20*1}. The matrix product is not defined. But you produce the scalar product as

= MMULT(TRANSPOSE(mA), mB)

with ctrl+shift+enter or, more simply, as

= SUMPRODUCT(mA, mB)

with just enter.

Spreadsheet Exercises

Exercise: Frequency Chart

For this exercise, you need a file temp.csv containing one column of integers with some duplicates.

Open temp.csv as a spreadsheet; it should have one column. Copy these to column B.
Reduce column B to unique values

https://subversion.american.edu/aisaac/notes/spreadsheet-intro.xhtml#unique-values-remove-duplicates
Sort column B.
In column C, create a count for each unique value (using COUNTIF)
Make a frequency chart for your values.

https://subversion.american.edu/aisaac/notes/spreadsheet-intro.xhtml#frequency-chart

Additional Resources

Choice of Software

Major spreadsheet applications include Microsoft Excel (the current industry leader), Google Sheets, LibreOffice Calc, and Gnumeric. All but the first are gratis, and the last two are free and open source software (FOSS).

Spreadsheets share many features, but menu layout and keyboard shortcuts vary by application. The keyboard or menu details used below assume a Windows operating system and Excel 2016. Often the variations required on other operating systems or with other spreadsheet implementations are minor.

Comparison Resources:

Windows vs Mac keystroke comparisons:: http://www.mtholyoke.edu/lits/labs/docs/MacWinKBeq.htm
Calc vs Excel: https://wiki.openoffice.org/wiki/Documentation/UserGuide/Migration_Guide/Calc_and_Excel

Analysis ToolPak

Some material in this appendix relies on Excel’s Analysis ToolPak. Up to Excel 2016, the Data Analysis ToolPak is not installed by default. The installation procedure has changed across versions, so do a websearch for you version. In Excel 2016 for Windows, you can install it by going to File » Options » Add-Ins » Manage Add-Ins.

Download HTML Data from Web Pages

Excel can extract data from the tables in web pages. You just need the web address (URL) of the page.

Open the spreadsheet for this new data.
In the Data tab, click From Web.
Paste the web address (URL) into the popup box.
Connect to the page anonymously.
One connected, locate the desired table data under Display options.
Check that the data is displayed correctly and then Load it into the spreadsheet.

Data from FRED

The Federal Reserve Economic Database allows programmatic access, but you need to

register for an API key.
install the FRED add in (video)
use the add in to get data (1, 2)
explore the use of 1date

Excel Add-In

Download the FRED Excel add-in from https://research.stlouisfed.org/fred-addin/ and install by browsing to it in the Add-Ins menu. (You can use the 2013 add-in with Excel 2016.)
Add M1NS and CPIAUCSL as headers in the first row and click "Get FRED data".
Change to a common data range by starting in Jan 1959, following the instructions on https://www.youtube.com/watch?v=OAH-wGtP_k8
Create a multiseries graph of the two series with the build graph button.
Switch to a ratio scale on the y-axis to compare the two series.

Introductory Videos

Many good videos discuss spreadsheet use at all levels. I consider the GCFLearnFree Excel tutorials to offer accessible coverage of introductory and intermediate topics. Microsoft also produces some helpful introductory videos.

Formulas:: Microsoft’s Getting Started with Formulas tutorial
Basic usage:: GCFLearnFree’s Getting Started with Excel
Relative vs absolute cell references:: GCFLearnFree’s Relative and Absolute Cell References

Line-Chart Introduction:

The GCFGlobal introduction to Excel Charts is very good and is supported by a helpful video.

Correlation and Regression

Video: Correlation and Regression

Glossary of Spreadsheet Terms

Absolute Cell Reference

A reference that does not change when it is copied. This produced by putting a $ before the reference letter and number (ex: $A$1). (Also see: Relative Reference.)

Active Cell

The cell listed in the Name box, which is currently selected for data entry. It is typically highlighted by a distinct border.

Anchor Cell

The first cell that is selected in a range. If only one cell is selected in the sheet, it is the anchor cell. It is typically highlighted by a distinct (white) background color.

Automatic recalculation

This is the typical spreadsheet default, which eliminates the need to explicitly request value recalculation after a change to a cell. May be turned off in a very large spreadsheet.

Bar or Column Chart

Used to summarize and compare categorical data. The length of each bar represents the value assigned to that particular category. Bars run horizontally; columns run vertically.

Cell

The rectangular area formed by the intersection of a column and a row. A cell may a numerical values, a strings, or a formula. Identify a cell with a letter-number cell reference, giving its row and column. (Cells may also be named.)

Cell Reference

Typically the row and column of the cell, but a cell may also be reference by an assigned name.

Column

An Excel spreadsheet can have 1,048,576 rows and 16,384 columns. A column displays vertically and is labeled with letters of the alphabet A–Z, then AA–AZ, etc.

Column Chart

See Bar Chart.: Back To Top

Fill

Quick data copy from the anchor cell to an adjoining range, updating the data as appropriate. For example, if the anchor cell contains a formula with relative cell references, those references will automatically update relative to their position when copied to a new location.

Fill Handle

he small bold square in the bottom right corner of a cell, be used to copy (fill) data to adjacent cells in the same row or column. Hover over the fill handle box until the mouse pointer changes to a plus sign. Then hold the left mouse button while selecting the adjacent cells to copy to. Releasing the mouse button will fill the content.

Filter

Show only data that meet a criterion.

Formula

An expression in a cell, beginning with an equal sign "=" and using cell references and Excel functions. For example, in cell C1 enter =A1+B1 to display that sum in C1.

Formula Bar

The display line above the column headings of a spreadsheet. It displays the contents of the active cell.

Freezing

Force some cells to display. For example, if a spreadsheet contains many rows, freezing the row containing the heading labels allows scrolling down while still viewing the headings.

Gridlines

The horizontal and vertical lines displayed to separate cells in a spreadsheet.

Label

Non-numeric text typed into a spreadsheet cell.

Locked cell

A cell (or many) can optionally be given a read-only status, in order to prevent accidental overwriting. Uses this for cells containing important formulas or data that should never change (e.g., model parameters). This feature must be activated in the file preferences.

Name Box

The box to the left of the formula bar, which displays the name of the current cell. This is typically the letter-number cell reference, unless the cell is assigned a special name.

Pie Chart

A circular chart that visually represent proportions of a whole by the relative size of sections (“slices”).

Adjacent Range

A rectangular group of cells. identified by two colon-separated cell references that are opposite corners of the rectangle (e.g., A1:B3). A range may be assigned a name.

Relative Reference

A cell reference that changes when copied. Relative references do not contain dollar signs ($).

Rows

An Excel spreadsheet can have 1,048,576 rows and 16,384 columns. A row displays horizontally and is numbered.

Sheet Tabs

Tabs below the worksheet grid, used to switch from one worksheet to another in a workbook.

Sort

Impose an ordering on existing data. Sorts can be ascending or descending. Ties can be resolved by nested criteria.

Value

The actual value contained by the cell. This may differ from the displayed value (e.g., due to rounding).

Workbook

A collection of worksheets that bundled together a a single file.

Worksheet

A single grid of columns and rows. A workbook may contain multiple worksheets, which can reference each other by name.

References

version:: 2022-12-05

Basic Introduction to Spreadsheets

Spreadsheet Basics

Cells, Sheets, and Workbooks

Entering Values

Entering a Formula

Local Variables with LET

Ranges

Range Intersection and Union

Arithmetic Sequences

Copying Formula

Names in Formulas

Exercise: Annualizing Data

Background

Entering a LAMBDA Formula

Line Charts

Run-Sequence Line Chart

Time-Series Line Chart

Exercise: Population Time-Series

Real GDP Time-Series Chart

Getting the Data

Plotting the Data (Excel)

Edit for Improved Information Display

Questions

Basic Statistics

Univariate Descriptive Statistics

Five-Number Summary

Box and Whisker Charts

Multivariate Descriptive Statistics

Covariance and Correlation

Regression Analysis: Basic Concepts

Preliminaries in Excel

Estimating the Sustainable Population

Bar Charts

Categorical Bar Chart: String vs Numerical Categories

Numerical Categories

Data-Based Bar Chart

Getting the U.S. Data

Creating the Basic Column Chart

Creating the Percentages Column Chart

Function Iteration

Basic Concepts

Exercise: Exponential Population Projection via Function Iteration

Solution: Step-by-Step Exponential Population Projection

Exercise: Logistic Population Projection via Function Iteration

Solution: Step-by-Step Logistic Population Projection

Tables and Arrays

Structured Reference Tables: Some Basics

Creating a Table

Name a Table

Append a Column to a Table

Table Example: Refer to Column by Name

Table Example: Function Plot

Edit the Chart

Table Example: Rectangular Data

Table Alternatives

Named Ranges

Named Ranges Example: Function Plot

Array Formulas

Simplest Array Formula Example

Array Formulas with Functions

Random Numbers

Core Random Number Generators

Random Decimal in \([x_\min,x_\max)\)

Recalculation

Random Integer Data

Random Integer Data (Table Version)

Random Variates: Analysis ToolPak

Pivot Tables

Pivot Table Exercise

Value Counts

Simple-Minded Frequency Chart

Simple-Minded Relative-Frequency Chart

Frequency Chart Redux

Pivot Tables and Ordinary Charts

Surface Chart

3D Plots

Grided Data

Lorenz Curve

Cumulative Sum

Cumulative Sum (Running Total): Simplest