|
Tree Ordinance Guidelines
Sampling from populations
Many of the evaluation techniques we describe involve collecting information from
or about discrete units, such as trees, streets, blocks, or residents. In many
cases, it may not be practical to perform a complete census of every unit in the
overall population. However, it is still possible to obtain reliable information
about the overall population by collecting data from a representative subset or
sample. Sampling is simply the technique used to choose representative units for
study from a larger population. Sampling is a prerequisite of several of the assessment
methods discussed in section 3, including photogrammetry,
ground survey, and public polling.
Statistical bias
The reason for using statistically sound sampling methods is to avoid bias
in the estimates of the parameter(s) you are measuring. Although the value of
any single estimate (biased or not) is unlikely to equal the true population
value, the mean of a large number of unbiased estimates will approximate the
true value. In contrast, the mean of a large number of biased estimates will
either be higher or lower than the true population value, depending on the direction
of the bias. Hence, if you are interested in knowing the actual value of a parameter
from the population (e.g., actual percent tree canopy cover), you generally
want to use an unbiased estimator of that parameter. In some situations, a small
bias (e.g., a tendency to slightly over- or underestimate cover) can be tolerated
if the bias is small relative to the standard deviation of the estimation errors
(perhaps 10% to 15% or less).
Bias in estimates can come from various sources. For instance, if tree shadows
are counted as canopy in aerial photo interpretation (misclassification bias),
the canopy cover estimate will be biased upward. In public polling, people who
fail to respond to a survey may constitute a source of sampling bias. If some
segment(s) of the population (e.g., retirees, working couples, low-income households)
are either more or less likely to respond than other population segments, responses
may not be representative of the population as a whole. Many types of bias can
be avoided through good sampling design and the careful implementation of appropriate
evaluation techniques.
Random sampling
and random numbers
Most statistical methods are based on the assumption of random sampling.
This simply means that every unit in the population has an equal chance of being
chosen for the sample. Furthermore, the selection of random units should be
independent of other units that have been sampled. If you reject a sample
unit because you think it is too close to one already chosen, your sample will
not be random and independent. A relatively simple and reliable method for randomization
is to use random numbers. Most spreadsheet, database, and statistical programs
that run on personal computers have functions that generate random numbers.
Although these random number generators may not be optimal, they will generally
suffice. You can also download random number generators (e.g., http://www.buffalo.edu/~raulin/random.html
or http://nhse.npac.syr.edu/projects/random/)
or look up random numbers from printed tables.
Several techniques can be used to draw a random sample from a population that
consists of individual objects or records (e.g., street addresses or tree numbers).
Many spreadsheet programs, including Microsoft Excel® and Corel Quattro®
Pro, include tools that can produce a random sample of a specified size from
a range of cells. Alternatively, you can assign a unique random number to each
unit or record, sort on the random number, and pick the required number of units
from the top of the sorted database.
In some cases, it is necessary to take random samples across a geographic area,
such as part or all of a city or forested area. In such a situation, random
sample points can be assigned by randomly sampling from a coordinate grid that
has been established for the area in question. This may either be an existing
set of map-based coordinates, such as UTM or State Plane grids, or an arbitrary
grid based on units measured on a map or aerial photograph (e.g., distances
measured from the bottom and left edge of the map or photo). After you have
determined the range of X and Y coordinates within the area to be sampled, X
and Y coordinates can be selected randomly to generate random sample points.
Stratified sampling
In many urban forestry applications, it is desirable to have samples distributed
throughout the population. For instance, you may want to ensure that trees
from each of several different maintenance districts are included in the
sample. In such situations, stratified random sampling will be the most
efficient and meaningful method for selecting samples. In this method,
the population to be sampled is first divided into meaningful subunits
or strata. These may be large subdivisions, planning sectors, maintenance
districts, or any other convenient management or planning unit.
If strata are assigned so that each is more or less homogeneous with respect
to the characters being measured, fewer samples will be needed to adequately
characterize each stratum. For instance, if tree cover is to be assessed in
different portions of a city, visual
estimates of the tree canopy cover could be used to help demarcate zones
where canopy cover is relatively uniform. A sample of street trees might be
stratified by tree species, size, and/or age, depending on the purpose of the
evaluation. If these trees were classified in a municipal street tree database,
stratification might be accomplished relatively simply from existing tree data.
However, if such data are lacking, it may be necessary to conduct a preliminary
sample to delineate the population before sampling occurs. For example, in a
study we conducted on utility pruning, we needed to sample from a population
of matched pairs of London plane (Platanus x acerifolia) street
trees that were both directly under conductors and had clearances within a certain
range. Because existing tree inventories did not contain all of the necessary
information, we surveyed the study area to identify a population of trees that
met these criteria. These trees constituted a particular stratum of the street
tree population.
Once strata are assigned and delineated, samples are drawn at random from within
each stratum. If the number of samples selected from each stratum is not proportional
to the size of the stratum, then the averages from each will have to be weighted
to obtain an overall population average.
Sample size
Optimal sample size will vary somewhat with the characteristics being rated or
tallied.
In general:
- up to a point, the reliability of estimates will increase as sample size
increases;
- the more variable the population is with respect to the characteristic(s)
being rated, the larger the sample should be;
- a large sample is required to accurately estimate the frequencies of relatively
rare events or characteristics;
- larger sample sizes are needed in order to detect relatively small differences
between means or proportions; smaller sample sizes may suffice if the differences
are relatively large.
The optimum sample size represents a compromise between cost and accuracy,
since both generally increase with increasing sample size. You can determine
an optimum sample size by identifying the point of diminishing returns beyond
which further increases in accuracy are not worth the additional costs of data
collection. Optimum sample size will vary with the type of data being collected,
so it is not possible to set a single number for all applications.
However, you can use certain statistical formulas to estimate the minimum
sample size needed for a specific purpose. A number of statistics web sites
include on-line interactive calculators that allow you to estimate required
sample sizes. Before you can use these sample size calculators, you will need
to know several things about the data you are collecting and how it will be
analyzed:
Type of data. Main
data types include:
continuous - variables can take any value, e.g., tree diameters
discrete - variables can only have certain discrete values.
Types of discrete data include
ranks - ordered ratings, e.g. low, moderate, high
counts - e.g., number of trees by species
binary - variables have only two outcomes, e.g., present/absent.
Binary data is typically expressed as proportions or percents, such as
the percent canopy cover determined from dot grid counts (canopy is rated
as present or absent for each dot).
Type of analysis. Continuous data are
typically analyzed using linear models, including linear regression and analysis
of variance techniques. Discrete data may be analyzed in various ways, including
contingency table analysis, logistic regression, and survival analysis. Different
formulas are used to estimate sample sizes for various analysis methods.
Expected values. To estimate sample sizes
for analyses of continuous data you will have to specify estimates of expected
population means (the Greek letter mu may be used for this term) and standard
deviations or variances (the Greek letter sigma symbolizes the population
standard deviation; variance is the square of the standard deviation). For
proportions, estimates of the expected proportions are needed; margins of
error (as percents) may also be needed.
Data structure. If data are paired or
arranged in blocks or other more complex designs, the structure of the statistical
model should be specified.
Confidence level.
Also abbreviated as the Greek letter alpha, this is the probability
of Type I error, the chance that you will say that a difference is significant
when it really isn't (i.e., the probability of rejecting the null hypothesis
when it is true). This is typically set a low level, often 5% (alpha=0.05),
meaning that there would only be a 5% (1 in 20) chance of deciding that a
spurious difference is real (i.e., you have a 95% chance of avoiding Type
I error).
Power.
This parameter
is the flip side of the confidence level, and is expressed as (1-beta) where
beta is the probability of Type II error. Power is the the probability of
detecting a real difference (i.e., the probability of rejecting the null hypothesis
when it is false). If you are interested in detecting real differences, the
power of a test should be high, generally at least 80% (0.8) or greater.
Links to sample size calculators
Some useful web sites with sample size calculators are listed below. Additional
sites can be found by following links on some of these pages or by searching
on the term "sample size" on various web search engines.
http://www.stat.uiowa.edu/~rlenth/Power/
: Russ Lenth's Java applets for power and sample size -This site provides
a variety of powerful but easy to use applets that allow you calculate sample
size and interactively see how sample size, power, alpha, and other study design
factors are interrelated.
http://home.clara.net/sisa/index.htm
: SISA: Simple Interactive Statistical Analysis - This site includes
a number of statistical analysis applications that can be run interactively
online. It includes sample size calculators for both continuous and binary (proportion)
data.
http://www.meduniwien.ac.at/medstat/research/samplesize/ssize.html
: Four basic and easy to use Javascript-based calculators for sample size or
power.
http://www.answersresearch.com/response.php
: One of various basic sample size estimators used for public polling surveys.
This provides sample sizes based on the margin of error desired in a survey.
Several other survey-related calculators are also provided here.
http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/PowerSampleSize
: Power and Sample Size Estimation - A downloadable application (PS)
for calculating sample size and power.
|