Basic Introduction to Spreadsheets
Spreadsheet Basics
Cells, Sheets, and Workbooks
Spreadsheet software allows for interactive data entry and manipulation. Launch a spreadsheet application, and create a new blank workbook. This will contain a single spreadsheet, but you can add many spreadsheets to a single workbook. A workbook collects together several spreadsheets and provides facilities for these sheets to interact.
Visually,
a spreadsheet is a rectangular collection of cells,
which may hold data or formulas (or be empty).
Each cell in a spreadsheet has an address,
which combines its column letter with its row number.
For example, A1
is the address of the left-most, top-most cell.
By means of a formula that uses these addresses,
the value in a cell may depend on the values in other cells.
For example, a cell may contain a formula
that computes the mean of the values in some other cells.
Entering Values
To enter a value in a cell, select the cell and type in the value. The information you enter will be stored as a number, a string, or a boolean value (True or False). A cell may display a number in various ways, e.g. formatted as a date, but numbers are always stored in the same way. (Specifically, as double-precision floating point values.)
Entering a Formula
Instead of a value, a cell may hold a formula.
To begin entering a formula in an empty cell,
select it and enter an equals sign (\(=\)).
The calculation may be entirely numeric—for example, =2+2
.
However, a formula typically refers to the values of other cells.
Refer to a cell by using the row and column of that cell.
Use standard operators for addition, subtraction, multiplication, and division.
Use ^
for exponentiation.
Operator |
Operation |
Example |
---|---|---|
|
addition |
=1+2 |
|
subtraction |
=A1-2 |
|
multiplication |
=A1*B2 |
|
division |
=1/A2 |
|
exponentiation |
=1^2 |
Local Variables with LET
In Office 365,
the LET
command can give names to specified values.
This can make formulas easier to read
by using these names to produce the results.
To use LET
,
introduce a sequence of variable names and values,
and finally enter the expression using the names.
The variable names are local:
they have meaning only inside the LET
command.
They do not have meaning elsewhere in the spreadsheet.
Example: =LET(x,5,3+2*x+x^2)
Example: =LET(x,A1,3+2*x+x^2)
Example: =LET(x,A1,y,A2,3*y^2+2*x*y+x^2)
Ranges
A rectangular group of adjacent cells is known as a range of cells.
Two cell addresses separated by a colon define a range of cells (e.g. A1:D4
).
The colon separates the top left cell and the bottom right cell of the range.
Range Intersection and Union
Use space separated ranges to specify the intersection of two ranges (e.g. A1:D5 B2:E3
).
(An empty intersection is an error.)
Use comma-separated ranges to specify the union of two ranges (e.g. A1:B5,D1:E5
).
For example, it is possible to sum over the union of non-adjacent cells.
Arithmetic Sequences
The SEQUENCE
command produces arithmetic sequences of values.
The simplest use of this command produces a column of consecutive positive integers.
Exercise:
In cell A1
enter a name for your sequence, say myseq
.
Select cell A2
and enter =SEQUENCE(10)
in order to create 10 sequence values in the column.
Copying Formula
Exercise:
In cell B1
enter a name for a transformed sequence, say myseq2
.
Select cell B2
and enter =LET(x,A2,x*x)
.
Copy cell B2
, select cells B2
through B1
, and paste.
This creates squared values for all of the original sequence values.
Names in Formulas
Select the myseq
column,
including the label.
From the menus, pick Formulas » Create from Selection
.
In the dialog box that opens,
designate the location that contains the label by selecting the Top row check box.
Click OK.
This creates a named range.
You can now do elementwise computations by using the name.
For example, select cell C2
.
Enter mseq*myseq
to create the same sequence of squared values.
Exercise: Annualizing Data
Background
The rate of change of a variable is its change per unit of time. The proportional rate of change of a variable is its relative change per unit of time. A proportional rate of change is also called a growth rate. Multiply a proportional rate of change by \(100\) to produce a percentage growth rate per period.
- Rate of Change per period:
\(x_t - x_{t-1}\)
- Proportional Rate of Change per period:
\((x_t / x_{t-1}) - 1\)
- Continuous growth rate per period:
\(\log[x_t] - \log[x_{t-1}]\)
To annualize is to adjust a change or growth rate over a period to produce the implied change or growth rate over a year. Data series are observed at different frequencies. To improve comparability, we often annualize the rate of change or the growth rate. Suppose there are \(p\) periods per year. (E.g., \(p=4\) for quarterly data, or \(p=12\) for monthly data.)
- Annualized Rate of Change:
\(p(x_t - x_{t-1})\)
- Annualized Growth Rate:
\((x_t / x_{t-1})^p - 1\)
An annualized rate of change will provide a better prediction for a quantity that tends to change at a constant rate. Such a quantity is a linear function of time. An annualized growth rate (or percentage growth rate) will provide a better prediction for a quantity that tends to grow at a constant rate. Such a quantity is an exponential function of time.
The covid19 pandemic of 2020 led to radical fluctuations in the growth rate of US GDP. As of January 2021, the BEA estimated that the quarterly GDP estimates (at an annual rate) were \(21561.1\), \(19520.1\), \(21170.3\), \(21,494.7\). Enter this data in a spreadsheet and then compute the following.
Find the quarter-to-quarter rates of change of US GDP.
Annualize the quarter-to-quarter rates of change of US GDP.
Find the quarter-to-quarter growth rates of US GDP.
Annualize the quarter-to-quarter growth rates of US GDP.
Entering a LAMBDA Formula
- Video: How to use LAMBDA to make Custom Functions in Excel
- Documentation:
https://www.microsoft.com/en-us/research/blog/lambda-the-ultimatae-excel-worksheet-function/
Line Charts
A simple two-dimensional line chart connects a sequence of points with line segments. Given a sequence of univariate data, the first coordinate of the data may be provided by the index of the observation. This is a run-sequence line chart. Line charts of one of the simplest displays of univariate data.
Run-Sequence Line Chart
Run-sequence charts are used with ordered univariate data. The data is considered to be a sequence: order matters. The values are therefore plotted in order.
- Dimensionality:
2D
- Points:
\((i, x_i)\) where \(i\) is the index and \(x_i\) is the i-th data value
- Horizontal axis
index values
- Vertical axis
data units
If the points are plotted but not joined, we often call the chart a run-sequence plot. If the points are joined, we often call the chart a run-sequence line chart.
A run-sequence chart may allow easy detection of behavior regularities or changes in the behavior of the sequence
cyclicity
autocorrelation
shifts in level
shifts in dispersion
Resource: http://www.itl.nist.gov/div898/handbook/eda/section3/runseqpl.htm
Time-Series Line Chart
Time-series charts are also used with ordered univariate data, when they are indexed by time. Time-series data is inherently sequenced: order matters and is determined by the time index. The values are therefore typically plotted in order.
- Dimensionality:
2D
- Points:
\((t, x[t])\) where \(t\) is the time and \(x_t\) is the data value at time \(t\).
- Horizontal axis
time units (e.g., date-time values)
- Vertical axis
data units
If the points are plotted but not joined, we often call the chart a time-series plot. If the points are joined, we may call the chart a time-series line chart. Since the data for a time-series chart typically includes time as well the values of the variable of interest, we use a spreadsheet scatter plot for the chart.
Exercise: Population Time-Series
Start with a blank workbook.
Enter the world population data from the table below. (Data Source: US Census Bureau.)
Select the cells containing the data.
From the
Insert
tab, pickScatter Chart
, and pick on with lines.
year |
population(millions) |
---|---|
-10000 |
1 |
-8000 |
5 |
-6500 |
5 |
-5000 |
5 |
-4000 |
7 |
-3000 |
14 |
-2000 |
27 |
-1000 |
50 |
-500 |
100 |
-400 |
162 |
-200 |
150 |
1 |
170 |
200 |
190 |
400 |
190 |
500 |
190 |
600 |
200 |
700 |
207 |
800 |
220 |
900 |
226 |
1000 |
254 |
1100 |
301 |
1200 |
360 |
1250 |
400 |
1300 |
360 |
1340 |
443 |
1400 |
350 |
1500 |
425 |
1600 |
545 |
1650 |
470 |
1700 |
600 |
1750 |
629 |
1800 |
813 |
1850 |
1128 |
1900 |
1550 |
1910 |
1750 |
1920 |
1860 |
1930 |
2070 |
1940 |
2300 |
1950 |
2400 |
The default axis scale in most spreadsheets is linear. This means that a given distance along the axis represents a fixed change in the level of the variable. Series that grow or shrink over time are often better represented by a ratio scale. This means that a given distance along the axis represents a fixed percentage change in the level of the variable. Change the scale along an axis as follows.
In your population chart, click on the graph axis.
From the menu, pick
Format » Axis…
. This displays theFormat Axis
window.Under
Axis Options
tab, check theLogarithmic Scale
check box.
Real GDP Time-Series Chart
Getting the Data
Get time-series data for real GDP from the Federal Reserve Economic Database. Here is one way:
Click Download » CSV Data and download the data as GDPC1.csv
Plotting the Data (Excel)
Load the data in your favorite spreadsheet.
Adjust the width of the DATE column if needed, so that all dates display correctly.
Select all the data (including the headers).
Pick Insert » Charts » 2D Line Chart
Change size and axes ranges to better display the data. (This may take a little fiddling.)
Edit for Improved Information Display
Reformat the vertical axes to a logarithmic scale.
Change the title to Real GDP
Questions
What is real GDP?
How does real GDP in the US tend to behave over time?
Explain why the logarithmic chart is a more useful way to visualize the GDP data.
Basic Statistics
Univariate Descriptive Statistics
Excel includes a tool for descriptive statistics in the Analysis ToolPak add-in.
Pick Data » Data Analysis » Descriptive Statistics
.
Input the range for your data and whether there are labels in the first row.
Check Summary Statistics
and click OK
.
For each data column, you will see the following values.
- Mean (
=AVERAGE(range)
) A measure of central tendency; the arithmetic average of the values.
\begin{equation*} \bar{x} = \frac{1}{N} \sum_{n=1}^{N} x_n \end{equation*}- Sample Variance (
=VAR.S(range)
) A measure of dispersion around the mean; includes the Bessel correction.
\begin{equation*} s^2=\frac{1}{N-1}\sum_{n=1}^{N} (x_n - \bar{x})^2 \end{equation*}- Sample Standard Deviation (
=STDEV.S(range)
) Another measure of dispersion around the mean; the square root of the sample variance.
\(s = \sqrt{s^2}\)
- Standard Error (of the mean)
=STDEV.S(range) / SQRT(COUNT(range))
A measure of the dispersion of the sampling distribution of the mean.- Median (
=MEDIAN(range)
) The middle-most value after sorting the data. (Calculated as the average of the two middle-most values, if there are en even number of data points.)
- Mode (
MODE.SNGL(range)
) The most frequently occuring value, if one exists.
- Kurtosis (
=KURT(range)
) A measure of the likelihood of extreme values. Excel reports excess kurtosis (relative to the normal distribution). Positive kurtosis means the data contain has more extreme values than a normal distribution. Negative kurtosis means the data contain fewer extreme values than a normal distribution.
- Skewness (
=SKEW(range)
) Measures asymmetry around the mean. Positive skewness means that tail is longer and/or fatter to the right. Negative skewness means that tail is longer and/or fatter to the left.
- Minimum (
=MIN(range)
) The minimum value of the numbers in the range.
- Maximum (
=MAX(range)
) The maximum value of the numbers in the range.
- Quartile (
=QUARTILE(range, n)
) The \(n\)-th quartile of the numbers in the range.
- Range (
=MAX(range) - MIN(range)
) The range of values, found by subtracting the minimum value from the maximum value.
- Sum (
=SUM(range)
) The sum of the values.
- Count (
=COUNT(range)
) Count of the number of observations in this dataset.
Not reported:
- Sample Variance (without Bessel correction)
=VAR.P(range)
A measure of dispersion around the mean:\begin{equation*} {\hat\sigma}^2=\frac{1}{N}\sum_{n=1}^{N} (x_n - \bar{x})^2 \end{equation*}- Sample Standard Deviation (without Bessel correction)
=STDEV.P(range)
A measure of dispersion around the mean: \({\hat\sigma} = \sqrt{{\hat\sigma}^2}\)
Five-Number Summary
On first encounter with a univariate data set,
researchers commonly consider its five-number summary.
The five numbers are the minimum, first quartile,
median, third quartile, and maximum.
Excel does not build in a five-number summary,
but the functions listed above can easily produce one.
In fact, since these five numbers are quartiles
0, 1, 2, 3, 4 of the data,
the QUARTILE
function is adequate on its own.
Box and Whisker Charts
A box and whisker chart provides a useful summary
of the distribution of a variable.
The box shows bounds the interquartile range,
with an interior line for the median,
while the whiskers show variation beyond the interquaritle range.
(See the Basic Statistics supplement for details.)
to produce and box and whisker chart in Excel,
select a column of data and then click
Insert » Insert Statistic Chart » Box and Whisker
.
See the Microsoft documentation for more details.
Multivariate Descriptive Statistics
Covariance and Correlation
See the Basic Statistics supplement for details.
Regression Analysis: Basic Concepts
Regression analysis is a type statistical modeling. We estimate the response of a dependent variable (say, \(y\)) to one or more independent variables (say, \(x\)). Suppose in the real world the two are related as follows:
The dependent variable is the variable explained by the regression model. We hope to use the independent variable to understand and predict the dependent variable. The problem is that the relationship is noisy: the \(\epsilon\) term is random. This is why we have to estimate the sensitivity of \(y\) to \(x\).
Independent variables are explanatory variables. They should be the factors that our model indicates can influence the dependent variable. We estimate a model by finding parameters that minimize the distance of our fitted model from the observed data. The measure of distance is the sum of the squared differences between the observed values of \(y\) and the fitted values of \(y\). The regression routine of a spreadsheet will find an estimated value for \(\beta\) that makes this distance as small as possible. Call that value \(\hat{\beta}\).
Preliminaries in Excel
Excel includes a regression tool in the Analysis ToolPak,
which is an Add-In that must first be enabled in the Excel Options.
Once enabled, we can pick Regression
from the Data Analysis
group of the Data
tab.
Fill in the range for the dependent variable and the independent variables.
Picke the New Worsheet Ply
option for output,
and click OK
to see the regression results.
The estimated sensitivities are listed in the Coefficients
column.
When you select a single independent variable there are stil two estimated coefficients
because Excel automatically includes a constant.
A small P-value
(say, below \(0.10\)) means that
a coefficient estimate so far from \(0\)
would be very unlikely if the true value were \(0\).
Estimating the Sustainable Population
The logistic growth model of the Logistic Growth lecture
implies a linear relation
between the population growth rate and its level:
\(y_t = m x_{t-1} + g\),
where \(x_t\) is the population level in period \(t\)
and \(y_t = (x_t - x_{t-1})/x_{t-1}\) is the population growth rate.
Together the intercept and slope imply \(\bar{P}=g/m\).
Using a half century of U.S. population data from the FRED,
estimate the \(\bar{P}\) for the U.S.
(Use the annualized series B230RC0A052NBEA
.)
Bar Charts
Bar charts can be horizontal bar charts (which Excel calls bar charts) or vertical bar charts (which Excel calls column charts).
Video: http://www.gcflearnfree.org/excel2016/charts/1/
Categorical Bar Chart: String vs Numerical Categories
Suppose you have the following data in a spreadsheet.
cat1 |
10 |
cat2 |
15 |
cat3 |
5 |
In the spreadsheet, select this data and then insert a 2d column chart, which is also called a vertical-bar chart. The first column will be treated as the categories and you will get a nice vertical-bar chart for you categorical data. Note that the order of the categories is determined by their order in our data set. They are not considered to have a separate sort order.
Numerical Categories
If you try to use numbers for your category names, it may create extra work. Suppose you have the following data:
1 |
10 |
2 |
15 |
3 |
5 |
If you select this data and insert a 2d column (vertical bar) chart, a spreadsheet will generally treat the first column as numerical data, rather than as category names. In this case, you may get a “clustered” vertical bar chart. If the first-column numbers are actually your categories (e.g., bin edges), you will need to choose a different graph style.
- Excel:
Choose "more column charts" and find the right type.
- Google Sheets:
Check "Use column A as labels"
Data-Based Bar Chart
Create a bar chart for the components of GDP. Then remake the chart in percentage terms.
Getting the U.S. Data
Find the BEA news release for GDP.
In the “Text Full Release and Tables”, find Table 3: Gross Domestic Product.
Find values for C, I, G, and X-M.
Creating the Basic Column Chart
In a new spreadsheet, enter headers C, I, G, and NX in a rows
A1:D1
.In the next row (immediately below the headers) fill in the data (from the news release).
Select the headers and data.
Insert a 2D column chart.
Creating the Percentages Column Chart
In cell
F2
, enter=SUM(A2:D2)
.Select cell
F2
and change the name box to GDP. (When you define a name, Excel defaults to using absolute cell reference, which is what we want.)In cell
A3
enter=A2/GDP`
, and then copy this formula to the next 3 cells.Select the new cells, right click, and format the values to percentages.
Select the headers and your values in row 3, and insert a 2D column chart.
Function Iteration
Basic Concepts
Recall that to iterate is to repeat an action. Function iteration is the repeated application of a function to produce a sequence of values. (See the Glossary for details.)
Exercise: Exponential Population Projection via Function Iteration
Look up the world population. Assuming a growth rate of 1% per year, produce annual projections for the next 50 years. Create a simple line chart of your projections.
Solution: Step-by-Step Exponential Population Projection
Iterative approach:
Open a new spreadsheet.
Enter the current population in
A1
.In
A2
enter the formula=1.01*A1
.Copy this formula and paste it into
A2:A51
.Create a line chart of your population projections.
Hint: In Excel you can select a range of cells by entering the range specification into the name box. In Google Sheets, you can select a range of cells by creating a named range.
Closed-form approach:
In
B1
, enter the formula=7.75*POWER(1.01,ROW()-1)
.Copy this formula and paste it into
A2:A51
.Compare with the results of your iterative approach.
Exercise: Logistic Population Projection via Function Iteration
Look up the world population. Assume population growth depend on the current population, according to:
where \(p\) is the current population relative to the carrying capacity of the planet. Assume a carrying capcity of 11.5 billion. Produce annual projections for the next 50 years. Create a simple line chart of your projections.
Solution: Step-by-Step Logistic Population Projection
Iterative approach:
Open a new spreadsheet.
In
A1
. enter the current population divided by the carrying capacity.In
A2
enter the formula=A1+0.3*A1*(1-A1)
.Copy this formula and paste it into
A2:A51
.In column
B
, scale the values from columnA
by the carrying capacity.Create a line chart of your population projections.
Hint: See the discussion of exponential growth (above) for useful hints. Note that we do not have a closed-form for the population.
Tables and Arrays
Structured Reference Tables: Some Basics
Since Excel 2007, Excel provides a structured data type within a sheet, known as a structured reference table. This is sometimes just called a “table”. (In ancient times, is was called a “list”.) A table is rectangular and generally has column labels.
Tables provide convenient data handling. As a simple example, a formula entered in a column is treated as an array formula and immediately autofilled. Tables can also be named, allowing structured references to their data. For example, formulas can simply refer to column names (with a bracket notation).
Creating a Table
Select a rectangular array of cells that has a header row of labels and then the data below.
From the menu, pick Insert » Table, check the “My table has headers” box, and click OK.
Name a Table
Select any cell in the table.
From the menus pick Design » Table Name.
Enter a new table name.
Append a Column to a Table
Approach 1:
Select the cell beside your table’s column headers.
Enter a name for your new column header.
The new column is automatically added to your table.
Approach 2:
Select any cell in the last column of your table.
Pick from the menus, Home » Cells » Insert » Insert Table Columns to Right
After you create a new column, you will usually want to change its name.
Table Example: Refer to Column by Name
Table columns can be accessed by name.
E.g., if the table is named data
and the column is names c1
,
you can refer to the column by name.
data[c1]
(When you are working within a single table, you can omit the table name.)
This is called a structured reference.
Table Example: Function Plot
We will illustrate the use of tables by doing a function plot of \(\sin(x)\).
Note that the SIN
function takes one argument:
an angle measure in radians.
We will create points in \([0..2\pi]\),
and evaluate the function at each of these points.
Open a blank workbook.
Use the
Name Box
to selectA1:B102
Pick
Insert » Table
or clickctrl+t
, check “My table has headers”, and clickOK
.Change the headers to
x
andsinx
.In
A2
enter the formula=2*PI()*(ROW()-2)/100
; note how the rest of the column is filled in, creating the domain values for our function.In
B2
, enter the formula=SIN([@x])
; note how the rest of the column is filled in, creating the range values for our function.
Note how we were able to refer to another column in the table by placing its header in brackets. We are now ready to create a chart from the table data.
Click anywhere in your table.
Insert a scatter chart (
Insert » Charts » Insert Scatter
).Be careful: in early versions of Excel 2016, moving this chart to a new sheet can cause Excel to hang.
Edit the Chart
Make your charts look good.
Select the graph and move it up to the top of the worksheet near the list of values. (Or move it to its own sheet.)
Drag the boundaries to get a pleasing aspect ratio.
Get rid of the grid unless your have a need for it.
Choose a background color. (Do not be dramatic.)
Table Example: Rectangular Data
Import rectangular data into a spreadsheet.
Select the data
press
ctrl-t
to open the table dialogue; create the tablerename the table (
Design » Table Name
)
Table Alternatives
Excel tables are great, and they are missing from many other spreadsheets. But they are not irreplaceable. Think of a table as a of mini sheet with automatic named ranges via structured references. The handling of formulas is also just a convenience; it is largely equivalent to array formulas.
- Sheets: named ranges
Named Ranges
Most spreadsheets support named ranges, which allow easy array-oriented computations. A named range is just what is sounds like: a range of cells that can be referenced by name.
Example:
choose from the menus
Formulas > Define Name
and fill out the dialogtap the
OK
button to make a named range from the selected cells
Note:
The menu location for named-range creation is application specific.
(E.g., Data » Named Ranges ...
in Sheets.)
Named Ranges Example: Function Plot
Example:
We will create 101 evenly-spaced values in \([-1,1]\) for our domain (\(x\) values) and then plot a function over this domain.
Open a blank workbook and enter
x
in cellA1
andy
inB1
.Create the \(x\)-values.
Tap in the
Name Box
(to the left of the Formula Bar) and enterA2:A102
; the range of cells that you want to fill.Type
=(ROW()-52)/50
in the Formula Bar, and thenctrl-shift-enter
This copies the formula to all cells; and gives you the domain values for your function.
Make a named range for your 101 new cells; name it
x
.Select
B2:B102
, type the formula=x*x
, and then pressctrl+shift+enter
; this gives you the range values for your functionselect a location for your scatter plot, and insert a scatter plot. Right click on the plot body, and choose
Select Data
, and set the range tox,y
. (European users may need a semicolon rather than comma for theunion
operator.)Edit the chart as before.
Array Formulas
We can use array formulas to fill spreadsheet ranges with an entire array at one go. An array formula that fills multiple cells is called a multi-cell formula; all the filled cells depend on the single formula.
Simplest Array Formula Example
In a new sheet, select A1:A3
and type the formula:
={1;2;3}
To enter this as an array formula, use Ctrl+Shift+Enter
.
(For this reason, array formulas are sometimes called CSE formulas.)
Note that we needed to select the proper output area for the result.
The value in braces is called an array constant.
We can do elementwise arithmetic operations with array constants.
In a new sheet, select A1:A3
and type the following formula,
entering it with Ctrl+Shift+Enter
.
={1;2;3}*{4;5;6}
Note:
Google sheets has the ARRAYFORMULA
command:
(It also supports Ctrl+Shift+Enter
, like Excel.)
- ARRAYFORMULA
Array Formulas with Functions
Many functions can be use in array formulas
to produce elementwise results.
Just select the proper output range,
and enter the formula with Ctrl+Shift+Enter
.
Example:
In a new sheet, select A1:A3
.
Then enter the following array formula.
(Remember: Ctrl+Shift+Enter
!)
=ROW()
You will see the row numbers entered in column A.
Next select B1:B3
and enter the following array formula.
(Remember: Ctrl+Shift+Enter
!)
=SQRT(A1:A3)
You will see the square roots entered in column B.
Random Numbers
Core Random Number Generators
The core random number generators are
RAND
and RANDBETWEEN
.
- function:
RAND()
- ouput:
a random decimal number in \([0,1)\).
- documentation:
https://support.office.com/en-us/article/RAND-function-4cbfa695-8869-4788-8d90-021ea9f5be73
- function:
RANDBETWEEN(min,max)
- ouput:
a random integer between
min
andmax
, inclusive.- documentation:
https://support.office.com/en-us/article/RANDBETWEEN-function-4cc7f0d1-87dc-4eb7-987f-a469ab381685
Additional options are usually available (e.g., in Excel, through the Analysis ToolPak).
Random Decimal in \([x_\min,x_\max)\)
=xmin + (xmax-xmin)*RAND()
Recalculation
The random number functions are “volatile”: they are called each time the spreadsheet is recalculated.
Example:
in a new worksheet, enter
=RAND()
inA1
enter
=RAND()
inA2
; note that the value inA1
changespress
shift+F9
for force recalculation of the sheet; note that the values inA1
andA2
change
We can replace formulae with values.
select
A1:A2
copy
pick from menus Home » Paste » Values
Random Integer Data
Example:
Start with a new worksheet.
In A1, enter
RANDBETWEEN(0,10)
, then select and copy (ctrl-c) the cell.Enter
A2:A101
in the Name Box.Paste (ctrl-v) the copied formula into the cell.
You now have random data, but there is an underlying formula. This means that it will be recalculated every time your spreadsheet is recalculated. We usually do not want this dynamic recalculation. So select all the data, copy it (with ctrl+c), and then paste the values (using Paste Special).
Random Integer Data (Table Version)
Example:
Start with a new worksheet.
Enter
data
in theA1
cell.Enter
A1:A101
in the Name Box.From the menus, pick Insert > Table, check the “My table has headers” box, and click OK.
In A2, enter :code`=RANDBETWEEN(0,10)`.
Random Variates: Analysis ToolPak
The Analysis ToolPak (ATP) provides several additional distributions for sampling.
Example: sampling from the standard uniform distribution.
Preliminary: install the Analysis ToolPak: https://support.office.com/en-us/article/load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89
Example:
Choose Data Analysis from the Data tab
Pick Random Number Generation
Choose Binomial with paramters \(p=0.1\) and \(n=5\)
Set an Output Range determined by the number of variates you want
Click OK
Pivot Tables
Pivot tables provide ways to summarize worksheet data.
Basic videos about pivot tables:
http://www.gcflearnfree.org/excel2016/intro-to-pivottables/1/
http://www.gcflearnfree.org/excel2016/doing-more-with-pivottables/1/
Pivot Table Exercise
Pivot tables provide power functionality to regroup and aggregate your data. For example, enter the following lifetable data into a spreadsheet. (More data is available from the Social Security Administration.)
Sex |
Age |
YearsLeft |
---|---|---|
Male |
15 |
61.70 |
Female |
15 |
66.58 |
Male |
20 |
56.91 |
Female |
20 |
61.68 |
Male |
25 |
52.30 |
Female |
25 |
56.83 |
Select the entire table, pick Insert»PivotTable
,
and select the New worksheet
option.
Using the dialog that opens,
drag Age
to Rows
and Sex
to Columns
.
The totals are meaningless, so remove them as follows:
right-click the table, select PivotTable Options
,
and in the Totals & Filters
tab, uncheck the Grand Totals
.
The result is a quick and easy two-way table.
Pivot tables provide the basis for pivot charts.
For example, select the table and pick Insert»PivotChart
.
Pick a line chart to get a nice plot, for each sex,
of life expectancy for each age.
- Simple video introduction:
Value Counts
Create random integer data as above,
except this time you must include a column header.
Suppose the header is mydata
.
We want to determine the number of times each value occurs in the mydata
column.
Pivot tables make this easy,
as long as our data has a column header.
Select the data, including the header.
Pick Insert » Pivot Table » OK; this should create a new worksheet with a pivot table.
In the Fields dialog, check the
mydata
box so that this item appears in the Values area.Drag
mydata
to the Rows area, of the Fields dialog, so that it appears there too. Now your pivot table should show each value in themydata
column along with its sum.In the Value area, click on the arrow beside
mydata
and change the Value Field Settings to Count.
You now have a table of value counts.
Simple-Minded Frequency Chart
Make a table of value counts, as above. We are going to make a frequency chart from this table. This will be a vertical bar chart, which spreadsheets often call a “column chart”. There will be one column for each unique value, with that value's frequency as the column height.
Approach 1: Pivot Chart
click anywhere in your pivot table
pick
Insert » Pivot Chart » Column
to make what Excel calls a Clustered Column chart.
Approach 1: Column Chart
Select the data in your pivot table.
Pick
Insert » Pivot Chart » Column
to make what Excel calls a Clustered Column chart.Change the vertical axis bounds as needed.
We call this approach “simple-minded” because we do not worry about values that have zero frequency.
Simple-Minded Relative-Frequency Chart
Make a frequency chart as above. In the PivotTable Fields dialog box, go back to the Value Fields Settings box, and click the Show Values As tab. Pick % of Column Total, and set the Number Format to General.
Frequency Chart Redux
A table of value counts supports a simple-minded frequency chart.
This approach “simple-minded” because it does not
worry about values that have zero frequency.
To do a little better,
display cells with zero counts.
To illustrate, make a pivot table of counts
for the data 1,2,3,5,3,2
.
Note that the value \(4\) does not occur in this data.
In your pivot table of counts, right-click on one of the the row labels and choose Grouping.
Enter 1,7,1 as the start, stop, and increment values for grouping, and click OK.
Right-click on one of the the row labels and choose Field Settings » Layout and Print. Check the Show items with no data box.
Right-click a pivot table cell and choose PivotTable Options. Fill in For empty cells show: with a \(0\).
Make a column chart (i.e., vertical bar chart) from the pivot table.
Click on the data labels on the horizontal axis, and uncheck out the useless categories.
The overextended range allows us to work around undesirable features of the default Excel display. (That is why we picked an end value of \(7\) instead of the more obvious value of \(5\).) Using the filter to manually unselect the out of range values produces a better display.
Pivot Tables and Ordinary Charts
Fortunately, you can make a regular chart from your pivot table.
Select an empty cell that is not immediately adjacent to your pivot table and insert a blank chart.
Right click on the chart and pick Select Data from the context menu.
Click the Add button, to add a new series. For a vertical bar (column) chart, select the y-axis values.
Once you select the data, you can edit the axis labels in the Select Data Source dialog. For a horizontal bar chart, set the data range to the range containing your labels.
Surface Chart
3D Plots
Excel 2016 includes very weak 3D plotting capabilities, but it can do simple surface plots.
Grided Data
To create a surface plot, Excel 2016 needs a filled rectangular grid of data. In real data, we face the following problems:
we usually have data records \((x,y,z)\) rather than gridded data
there may be missing grid points
We can use a pivot table to turn a complete set of data into grid data.
Select the \((x,y,z)\) data
Insert a pivot table
Drag the \(z\) data to the values
Drag the \(x\) data to the rows
Drag the \(y\) data to the columns
This gives us gridded data. Now we can:
Select the data, including the row and column labels
Insert a surface chart
Lorenz Curve
Cumulative Sum
Consider a sequence of numbers.
A cumulative sum is a running total
of the sequence elements.
For example, the values 1,2,3,4
have a running total of 1,3,6,10
.
The cumulative sum sequence will have the same length as the original sequence.
Cumulative Sum (Running Total): Simplest
Oddly, spreadsheets often lack a function to produce a cumulative sum (also called a running total).
Generate random integer data as described above. We will assume this is in
A1:A100
and that columnB
is empty.Select cell
B1
and enter the formula=SUM(A$1:A1)
. Note the use of both relative and absolute coordinates.Select and copy cell
B1
. SelectB1:B100
and paste in the formula.
Exercise: make a run-sequence line chart of your cumulative sum.
Related video: https://www.youtube.com/watch?v=iWBl--aIhig
Cumulative Sum (Running Total): Better
The first method is the simplest approach and with a small data set it is fine. However, note that we repeatedly sum over the column of data, repeating a lot of the computational work already done. If you have a lot of data, the can cost time.
Here is a better approach,
using the same random integer data.
We assume column C
is empty.
in
C1
enter=$A$1
in
C2
enter=$A2+C1
copy the formula from
B2 to the remaining rows of `B`
Comment: Pay careful attention to the distinction between absolute and relative cell references.
Related video: https://www.youtube.com/watch?v=iWBl--aIhig
Add a Column to a Table
Select a data cell in your table.
Right click to get the context menu.
Pick Insert » Table columns to the Right
.
Cumulative Sum: Table
Assume a table with column x
.
Add a column to the table and enter the formula:
=SUM(INDEX([x],1):[@x])
We use INDEX
to get the row and column in the spreadsheet.
The at-sign (@
) provides a shorthand for [[#THIS ROW],[x]]
.
Without this shorthand, we would have written:
=SUM(INDEX([x],1):[[#This Row],[x]])
The redundant summing is unfortunate, but there does not seem to be a better way to do partial sums with structured references.
Cumulative Moving Average (Running Average)
Using the same data as for our running total,
just change the formula slightly:
divide each cell result by Row()
.
Or if you have a header cell,
so you data starts in row 2,
divide each cell result by (Row() - 1)
.
Cumulative Proportion
To produce the cumulative proportion of data, first produce a cumulative sum. Then deflate the cumulative sum by the total.
This produces an increasing sequence of fractions
with the same length as the original sequence
and a final value of 1.0
.
A cumulative proportion usually has most interest if the data is first sorted.
Sorting
Excel does not provide sorting functions. Instead, we sort a column in place. Often it is best not to manipulate the original data, so first copy it to a new column, which you will sort.
Note: the sorted column will not automatically update when the data column changes!
If you want to do dynamic sorting, consider the LARGE
function.
http://excelbyjoe.com/using-excel-functions-to-dynamically-sort-data/
To sort a column:
Select the data, and then choose
Data » Sort
and sort smallest to largest
Lorenz Curve
We now think of our data as representing income or wealth. Each observation represents one population unit. (E.g., a household.) A Lorenz curve plots the cumulative proportion of the data against the cumulative proportion of the population.
sort the data
produce a cumulative proportion of the sorted data
plot against \(i/N\), where \(i \in [1 .. N]\) indexes the population
Refinements:
a Lorenz curve diagram looks best with an aspect ratio of 1
a nice Lorenz curve includes the point \((0,0)\).
usually we provide a 45 degree line as a reference line for the Lorenz curve; one way to produce this is with a second plot of either series against itself
Basic Charts
There is inconsistent usage of the terms chart, graph, and plot. We can use the term chart for almost any visual display of data (including maps, which are the original use of the term), aside from data tables. Such visual displays of data are called graphical even when they are not called graphs.
Many chart types specify axes, each representing either incommensurate categories or commensurate magnitudes. As long as the data display is at least abstractly determined by points related to axes, people freely substitute the term graph for chart. For example, it is reasonably common to find bar charts referred to as a bar graphs, but we always refer to pie charts (and never to pie graphs). Related to this, when data are visually related to axes, it is much more common to refer to graphing the data than to charting the data.
When all axes represent ordered commensurate magnitudes (e.g., a number line), the chart will often be called a plot or a graph. When individual points are drawn, the chart will most often be referred to as a plot, even when additional information is conveyed by the size or color of the points. Examples include the scatter plot, the line plot (not the line graph), and the bubble plot (more often called a bubble chart).
Roughly then, plots are graphs, which are charts.
However, you will find many exceptions to this usage.
For example, the GraphPlot
command in Mathematica
draws a diagram of a binary relation.
Better names might have been RelationDiagram
or RelationChart
.
Histogram Statistical Chart
We now define a related data-display concept: the frequency histogram. This categorizes data into mutually exclusive bins, and then displays the counts of the data points in each bin. The bins are usually equal-sized, non-overlapping intervals.
In earlier versions on Excel, a basic histogram bar chart had to be created from a categorical bar chart, where the categories are bins and the values are frequency counts. However, Excel 2016 added Histogram as a Statistical Chart. For produce a simple histogram as follows.
Select a column of data.
Pick
Insert » Insert Statistical Chart » Histogram
.Right click on the horizontal axis and adjust the bins as desired.
Histogram: The Old Way
The following discussion of how to produce a histogram prior to Excel 2016 is included only for its use in illustrating other spreadsheet features.
Unique Values: Remove Duplicates
The easiest way to remove duplicate values from a column is to use the Remove Duplicates facility of Data Tools. (See Analysis ToolPak
Select the data
Pick
Data » Data Tools » Remove Duplicates
Keep the current selection.
Click OK.
Notice that you are not given an option to copy to another column. If you want to keep the original column, copy it first, and remove duplicates from the copy.
Unique Values: Filtering
We will determine the unique values in a column by using a data filter.
Select the data.
Go to
Data » Sort & Filter
and click Advanced.Check
Filter the list, in place
orCopy to another location
Click Copy to another location. In the Copy to box, enter a cell reference (the range will be expanded as needed).
Select the Unique records only check box, and click OK.
The unique values from the selected range are copied to the new location. Next, we will sort them.
Value Counts (with COUNTIF)
Suppose your original data is in A1:A1000, and your filtered unique values are in B1:B50. In C1, enter the forumula:
=COUNTIF($A$1:$A$1000,$B1)
Copy this formula into C1:C50. Now you have the values in B and their counts in C. We can use these to make a frequency chart.
Binning
Binning is a partition of the data set. Every data point is allocated to a single bin, based on its satsfaction of the criteria for that bin.
Here we focus on the binning of univariate numerical data. Suppose we have N observations, \(X_1 ,\dots,X_N\). We create K bins by specifying a strictly increasing sequence of boundary points, \(a_{0},\dots,a_{K}\), where \(a_{k} < a_{k+1}\). These K + 1 points determine K bins, which correspond to the intervals \([a_{k}, a_{k+1}]\). The inclusive interval \([a_{0}, a_{K}]\) is called the binning range. Each point within the binning range is allocated to the bin that contains it.
These leave two questions: what do we do with points that do not fall in any bin, and what do we do with points that fall exactly on a boundary. Unfortunately, there is no unique standard for behavior in these cases.
- Boundaries between bins:
Most spreadsheets treat bins as excluding lower bounds and including upper bounds, so we end up binning in intervals of the form \((b_{k} .. b_{k+1}]\). [#pivotbin]_ This is the include-right convention. "A data point is included in a particular bin if the number is greater than the lowest bound and equal to or less than the greatest bound for the data bin." This is the opposite of the convention used by much scientific software, which adopts the include-left convention, producing intervals of the form \([b_{k} .. b_{k+1})\). (However, the default in R is include-right.)
- Values outside the binning range:
Spreadsheets usually bin all values. Values in \((-\infty,a_0]\) are collected in an initial bin. Values in \((a_K,\infty)\) are collected in a final bin. Scientific software is more likely to discard data outide the binning range, but observations on the binning-range endpoints are usually included (in the first or last bin).
Basic Histogram Binning
Mark off the data axis into a number K equal-length intervals. \([a_{k}, a_{k+1}]\). (Usually these cover the whole range of the data points.) The number K should vary with N but not be too small. (Use at least 5 intervals.): A common choice is the square-root of N. (This is widely used in spreadsheets.) Another theoretically based proposal is to let K = Floor[N^(2/5)]. (I.e., rounded down to the nearest integer.) Let h=(Xmax - Xmin)/K be the width of each interval.
For each k=1,...,K, tally the number \(n_k\) of data points falling in the interval \((a_{k}, a_{k+1})\) plus any boundary points (handled as discussed above)
Draw a column-bar chart of adjacent rectangles. Each of the N rectangles aligns with an interval and so has width h. Each rectangle has height \(n_k\). over the interval [a_{k}, a_{k+1}].
Histogram Table: Excel
Before creating a histogram in Excel, you need to install the Analysis Toolpak: https://support.office.com/en-us/article/load-the-Analysis-ToolPak-305c260e-224f-4739-9777-2d86f1a5bd89
Excel bin boundaries are inclusive upper bounds, of the form (bl..bh]
.
“A data point is included in a particular bin if the number is greater than the
lowest bound and equal to or less than the greatest bound for the data bin.”
Data values less than or equal to the smallest specified bin value are binned together. Data values greater than the largest specified bin value are binned together.
"If you omit the bin range, Excel creates a set of evenly distributed bins between the minimum and maximum values of the input data."
http://www.math.kent.edu/~honli/teaching/statistics/Chapter2/Excell_Histogram.html
Histogram Chart
Histogrammed data appears as a column chart, mapping bins to frequencies. Detailed instructions: https://support.office.com/en-us/article/Present-your-data-in-a-histogram-ccc0fc1a-2ccd-490f-a346-87f3084cc26e
To turn a `histogram table`_ into a histogram chart,
select the table,
and from the Insert
tab pick a column-bar chart.
Excel and Calc defaults produce a very bad histogram chart. (Google Sheets has a much better default histogram, but is currently less customizable: e.g., you cannot separately set the edge color on the column bars.)
- Titles:
Add meaningful chart and axis titles, (In older Excel versions, you may also need to remove the legend.)
- Column bars:
make bar columns touch, and remove the fill color.
- Tick labels:
specify the bin range as the tick label
Bar improvment details:
Right click in a column to open the Format Data Series dialog.
Use the Series Options Tab to set the Gap Width to 0%.
Use the Fill and Line Tab to select No fill
.
and a solid black line border.
Tick-label improvment details:
Tick labels constitute the worst problem and are the most painful to fix.
The histogram bars are incorrectly centered on the right boundary of the bin.
(The reason is that Excel is repurposing its Column Chart.)
You need to go to your histogram table and enter correct labels under “Bins”.
For example, for a bin holding value from 5 to 10, you might ender (5..10]
.
(Remember, upper bounds are inclusive.)
Even after all this work, you will produce a decent histogram only if you have used a single bin width. This is because a true histogram should have column width proportional to the bin widths, but Excel will produce common column widths irrespective of bin width. Fortunately, it is often convenient to use common bin widths, as this allows easy visualization of density of observations.
Analysis ToolPak Example:
Here are two approaches to making a histogram chart in Excel:
- pivot table histogram:
- Analysis ToolPak histogram
Begin by creating some data. Copy the following formula into A2:A101:
=NORM.INV(RAND(),0.0,0.25)
In column B, put some a few (sequential) bin boundaries.
Pick DataAnalysis > Histogram
,
put your data range in the Input Range
,
and put your bin boundaries in the Bin Range
.
Be sure to check Chart Output
, then click ok.
Voila!
Relative-Frequency Histogram
We now define a related data-display concept: the relative frequency histogram. To do this, we scale each bar of our frequency histogram by 1/N.
Create a frequency histogram, as above.
Change the height of each bar to :math`n_{k}/N`.
Note that the area of the k-th bar of a relative-frequency histogram is h * n_{k}/N, so the total area of the bars is h.
Scaled Relative-Frequency Histogram
We now define another histogram display concept: the scaled relative-frequency histogram.
This is a frequency histogram with rescaled heights, so that the total area of the rectangles is 1.
We accomplish this by scaling the height of each bar of our frequency histogram by 1/(hN).
Create a frequency histogram, as above.
Change the height of each bar to :math`n_{k}/(N h)`.
The total area of the bars is 1. This scaling of the histogram makes it easy to compare to a plot of a theoretical density function.
To illustrate, make a scaled relative-frequency histogram for 100 draws from a standard normal, and then do the same thing for 100 draws from a mean 1 exponential.
Empirical Distribution
The empirical distribution function based on the data is a distribition function which has jumps at observed values \(X_n\) of size equal to 1/n multiplied by the number of sample points equal to that value.
Given \(t\), we divide the \(X_n\) into two groups: those to the left of t (or equal), and those to the right of t. We count up those in the first group to get \(\hat{F}(t)\). Equivalently, we assign a 1 to observations in the first group, and a 0 to observations in the second group, and then add up these values. (We can formalize this with an indicator function, I[X_n ≤ t].)
Let us construct an empirical distribution function based some univariate data, \(X_1 ,\dots,X_N\).
Suppose our data sample is independently drawn from a single distribution, F. Then our data is a sequence of independent random variables which are all ‘identically distributed’. Given t, the probability of falling in the first group is each time
The value \(N \hat{F}(t)\) is the sum of N iid binary (‘coin-toss’) random variables, all with the same probability p = P(X i ≤ t) = F(t) of being 1.
Thus \(N \hat{F}(t) ~ Binom(N, F(t))\).
The Law of Large Numbers implies that \(\hat{F}(t) \rightarrow F(t)\), with probability close to 1 when n gets large.
Plots of empirical CDFs from data overlaid with theoretical d.f.’s are among the best simple ways of checking visually that the data really fit the theoretical CDF well.
Cobweb Chart
Cobweb charts are also called step charts. Spreadsheets are not the easiest places to produce cobweb charts. You may find this description helpful:
Tornado Charts
Tornado charts are commonly produced as horizontal bar charts. See https://excelchamps.com/blog/tornado-chart/ for an example. Also see https://vimeo.com/34389151
Heat Map
Unfortunately, Excel does not include a heat-map chart (as of 2016). A heat map uses color to enable visualization of a retangular array of data. However, data in a range can be color coded, giving an approximation to the desired visualization.
Heat Map Exercise
1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 1 1
As a simple example, consider the following binary data. Enter this data in a spreadsheet. Select the data and choose Condition Formatting, and then from Color Scales pick the color map of your choice. Colors will be assigned based on the value in the cell, which in this case makes it easier to see patterns in the data. (To remove the numbers, right-click on the selected data and pick Format Cells » Number » Custom and enter and empty string in the dialog.)
This crude heat map is volatile: it immediately reflects any changes in the data. To see this, try changing a few of the values.
Using a Spreadsheet to Graph a Parmeterized Function
Set Up
Start a new spreadsheet.
Put your amplitude parameter (say, 2.75) in cell A1.
Create the Data
Decide on the range and the number of values. We will use [0,1] with 101 evenly-spaced points.
Select cell B1 and enter the formula
=ROW(A1)
.Select and copy cell B1.
Select down to B101.
In Excel, you can use the
GoTo
command:press ctrl-g (or function key f5) and enter destination B101
hold the shift key while clicking
OK
.
Paste your fomula into your selected range (e.g., ctrl-v).
Make a named range for your 101 new cells; name it idx.
If not already selected, select by selecting B1, then press holding crtl+shift+down
Make a named range from the selected cells by choosing from the menus
Formula > Define Name
.
Select cells C1:C101.
You should know how now, but here's another way. Go to cell B1 and press ctrl+down. This takes you to B101. Press the right arrow. This takes you to C101. Press ctrl+shift+up. This selects up to C1.
type the formula
=(idx-1)/100
and then press ctrl+shift+enter; this gives you the input values for your functionmake a named range for these new cells; call it
domain
select cells D1:D101
type the formula
=A1*domain*(1-domain)
and then press ctrl+shift+enter; this gives you your function values over your domain
Create Your Graph
Select the cells C1 through D101
From the “Insert” menu, choose a Scatter chart (with lines but no markers)
Experiment with changing the amplitude parameter
Refine Your Graph
Change the x-axis and y-axis minimum and maximum to 0.0 and 1.0
Select the “Titles” tab and use
x
for the x-label andy
for y-label.Change the chart title to Logistic Map
Change the aspect ratio of the plot to be more square.
Select the “Legend” tab and un-check “Show legend”
Get rid of the gridlines
Error Bar Chart
Array and Matrix Operations
Array and matrix operations use normal functions but are entered with ctrl+shift+enter. Also, you need to select an appropriately shaped range for the result.
A nice feature is that the result will be dynamically updated if you change the underlying data.
Scalar Operations
Let mA be a rectangular array. Let's add a constant to every element of mA. Select an area the same shape as mA. Then simply include mA in a formula, as in:
= mA + 100
and then press CTRL-SHIFT-ENTER. Subtraction, multiplication, and division are handled in the same way.
Elementwise Operations
Assume that mA and mB are two ranges of the same size (say, {20*1}). To create the vector sum, select an empty {20*1} range on the sheet, type in the formula:
= mA + mB
and then press CTRL-SHIFT-ENTER.
You might wish to name the resultant range (e.g. mC) for future reference.
Any two arrays of the same size can be added in this manner, with the result placed in a range of the same size.
Elementwise subtraction, multiplication, and division are handled in the same way.
Matrix Operations: Matrix Multiplication, Inverse, and Transpose
To multiply two matrices, use the MMULT function. Thus, if mA and mB are compatible for multiplication, you could compute the value of a portfolio with the formula:
= MMULT(mA, mB)
If mA
is an invertible matrix,
produce the inverse of a matrix with MINVERSE
:
= MINVERSE(mA)
TRANSPOSE(mA) produces the matrix transpose.
Example:
Suppose mA and mB are both {20*1}. The matrix product is not defined. But you produce the scalar product as
= MMULT(TRANSPOSE(mA), mB)
with ctrl+shift+enter
or, more simply, as
= SUMPRODUCT(mA, mB)
with just enter
.
Spreadsheet Exercises
Exercise: Frequency Chart
For this exercise,
you need a file temp.csv
containing one column of integers
with some duplicates.
Open temp.csv as a spreadsheet; it should have one column. Copy these to column B.
Reduce column B to unique values
https://subversion.american.edu/aisaac/notes/spreadsheet-intro.xhtml#unique-values-remove-duplicates
Sort column B.
In column C, create a count for each unique value (using COUNTIF)
Make a frequency chart for your values.
https://subversion.american.edu/aisaac/notes/spreadsheet-intro.xhtml#frequency-chart
Additional Resources
Choice of Software
Major spreadsheet applications include Microsoft Excel (the current industry leader), Google Sheets, LibreOffice Calc, and Gnumeric. All but the first are gratis, and the last two are free and open source software (FOSS).
Spreadsheets share many features, but menu layout and keyboard shortcuts vary by application. The keyboard or menu details used below assume a Windows operating system and Excel 2016. Often the variations required on other operating systems or with other spreadsheet implementations are minor.
Comparison Resources:
- Windows vs Mac keystroke comparisons:
- Calc vs Excel
https://wiki.openoffice.org/wiki/Documentation/UserGuide/Migration_Guide/Calc_and_Excel
Analysis ToolPak
Some material in this appendix relies on Excel’s Analysis ToolPak.
Up to Excel 2016, the Data Analysis ToolPak is not installed by default.
The installation procedure has changed across versions,
so do a websearch for you version.
In Excel 2016 for Windows,
you can install it by going to File » Options » Add-Ins » Manage Add-Ins
.
Download HTML Data from Web Pages
Excel can extract data from the tables in web pages. You just need the web address (URL) of the page.
Open the spreadsheet for this new data.
In the
Data
tab, clickFrom Web
.Paste the web address (URL) into the popup box.
Connect to the page anonymously.
One connected, locate the desired table data under
Display options
.Check that the data is displayed correctly and then
Load
it into the spreadsheet.
Data from FRED
The Federal Reserve Economic Database allows programmatic access, but you need to
Excel Add-In
Download the FRED Excel add-in from https://research.stlouisfed.org/fred-addin/ and install by browsing to it in the Add-Ins menu. (You can use the 2013 add-in with Excel 2016.)
Add M1NS and CPIAUCSL as headers in the first row and click "Get FRED data".
Change to a common data range by starting in Jan 1959, following the instructions on https://www.youtube.com/watch?v=OAH-wGtP_k8
Create a multiseries graph of the two series with the build graph button.
Switch to a ratio scale on the y-axis to compare the two series.
Introductory Videos
Many good videos discuss spreadsheet use at all levels. I consider the GCFLearnFree Excel tutorials to offer accessible coverage of introductory and intermediate topics. Microsoft also produces some helpful introductory videos.
- Formulas:
Microsoft’s Getting Started with Formulas tutorial
- Basic usage:
GCFLearnFree’s Getting Started with Excel
- Relative vs absolute cell references:
GCFLearnFree’s Relative and Absolute Cell References
Line-Chart Introduction:
The GCFGlobal introduction to Excel Charts is very good and is supported by a helpful video.
Correlation and Regression
Video: Correlation and Regression
Glossary of Spreadsheet Terms
- Absolute Cell Reference
A reference that does not change when it is copied. This produced by putting a $ before the reference letter and number (ex: $A$1). (Also see: Relative Reference.)
- Active Cell
The cell listed in the Name box, which is currently selected for data entry. It is typically highlighted by a distinct border.
- Anchor Cell
The first cell that is selected in a range. If only one cell is selected in the sheet, it is the anchor cell. It is typically highlighted by a distinct (white) background color.
- Automatic recalculation
This is the typical spreadsheet default, which eliminates the need to explicitly request value recalculation after a change to a cell. May be turned off in a very large spreadsheet.
- Bar or Column Chart
Used to summarize and compare categorical data. The length of each bar represents the value assigned to that particular category. Bars run horizontally; columns run vertically.
- Cell
The rectangular area formed by the intersection of a column and a row. A cell may a numerical values, a strings, or a formula. Identify a cell with a letter-number cell reference, giving its row and column. (Cells may also be named.)
- Cell Reference
Typically the row and column of the cell, but a cell may also be reference by an assigned name.
- Column
An Excel spreadsheet can have 1,048,576 rows and 16,384 columns. A column displays vertically and is labeled with letters of the alphabet A–Z, then AA–AZ, etc.
- Column Chart
- See Bar Chart.
Back To Top
- Fill
Quick data copy from the anchor cell to an adjoining range, updating the data as appropriate. For example, if the anchor cell contains a formula with relative cell references, those references will automatically update relative to their position when copied to a new location.
- Fill Handle
he small bold square in the bottom right corner of a cell, be used to copy (fill) data to adjacent cells in the same row or column. Hover over the fill handle box until the mouse pointer changes to a plus sign. Then hold the left mouse button while selecting the adjacent cells to copy to. Releasing the mouse button will fill the content.
- Filter
Show only data that meet a criterion.
- Formula
An expression in a cell, beginning with an equal sign "=" and using cell references and Excel functions. For example, in cell
C1
enter=A1+B1
to display that sum inC1
.- Formula Bar
The display line above the column headings of a spreadsheet. It displays the contents of the active cell.
- Freezing
Force some cells to display. For example, if a spreadsheet contains many rows, freezing the row containing the heading labels allows scrolling down while still viewing the headings.
- Gridlines
The horizontal and vertical lines displayed to separate cells in a spreadsheet.
- Label
Non-numeric text typed into a spreadsheet cell.
- Locked cell
A cell (or many) can optionally be given a read-only status, in order to prevent accidental overwriting. Uses this for cells containing important formulas or data that should never change (e.g., model parameters). This feature must be activated in the file preferences.
- Name Box
The box to the left of the formula bar, which displays the name of the current cell. This is typically the letter-number cell reference, unless the cell is assigned a special name.
- Pie Chart
A circular chart that visually represent proportions of a whole by the relative size of sections (“slices”).
- Adjacent Range
A rectangular group of cells. identified by two colon-separated cell references that are opposite corners of the rectangle (e.g.,
A1:B3
). A range may be assigned a name.- Relative Reference
A cell reference that changes when copied. Relative references do not contain dollar signs ($).
- Rows
An Excel spreadsheet can have 1,048,576 rows and 16,384 columns. A row displays horizontally and is numbered.
- Sheet Tabs
Tabs below the worksheet grid, used to switch from one worksheet to another in a workbook.
- Sort
Impose an ordering on existing data. Sorts can be ascending or descending. Ties can be resolved by nested criteria.
- Value
The actual value contained by the cell. This may differ from the displayed value (e.g., due to rounding).
- Workbook
A collection of worksheets that bundled together a a single file.
- Worksheet
A single grid of columns and rows. A workbook may contain multiple worksheets, which can reference each other by name.
References
Copyright © 2016–2022 Alan G. Isaac. All rights reserved.
- version:
2022-12-05