
Univariate analysis of variance. Parametric model
univariate analysis of variance. Planning the
experiment, the formulation of hypotheses and
their statistical test.
Correlation analysis. Construction of correlation
fields. Construction of the empirical regression
line. Calculation of the correlation coefficient
estimation and analysis of the significance of the
linear correlation.

Hypothesis Testing
1 INTRODUCTION
Statistics plays an important role in decision making. In statistics, one utilizes random samples to
make inferences about the population from which the samples were obtained. Statistical inference
regarding population parameters takes two forms: estimation and hypothesis testing, although both
hypothesis testing and estimation may be viewed as different aspects of the same general problem of
arriving at decisions on the basis of observed data. We already saw several estimation procedures in
earlier chapters. Hypothesis testing is the subject of this chapter. Hypothesis testing has an important
role in the application of statistics to real-life problems. Here we utilize the sampled data to make
decisions concerning the unknown distribution of a population or its parameters. Pioneering work
on the explicit formulation as well as the fundamental concepts of the theory of hypothesis testing
are due to J. Neyman and E. S. Pearson.
A statistical hypothesis is a statement concerning the probability distribution of a random variable
or population parameters that are inherent in a probability distribution. The following example
illustrates the concept of hypothesis testing. An important industrial problem is that of accepting or
rejecting lots of manufactured products. Before releasing each lot for the consumer, the manufacturer
usually performs some tests to determine whether the lot conforms to acceptable standards. Let us
say that both the manufacturer and the consumer agree that if the proportion of defectives in a lot is
less than or equal to a certaiumber p, the lot will be released. Very often, instead of testing every
item in the lot, we may test only a few items chosen at random from the lot and make decisions
about the proportion of defectives in the lot; that is, we make the decisions about the population
on the basis of sample information. Such decisions are called statistical decisions. In attempting to
reach decisions, it is useful to make some initial conjectures about the population involved. Such
conjectures are called statistical hypotheses. Sometimes the results from the sample may be markedly
different from those expected under the hypothesis. Then we can say that the observed differences
are significant and we would be inclined to reject the initial hypothesis. These procedures that enable
us to decide whether to accept or reject hypotheses or to determine whether observed samples differ
significantly from expected results are called tests of hypotheses, tests of significance, or rules of decision.
In any hypothesis testing problem, we formulate a null hypothesis and an alternative hypothesis such that
if we reject the null, then we have to accept the alternative. The null hypothesis usually is a statement
of either the “status quo” or “no effect.” A guideline for selecting a null hypothesis is that when the
objective of an experiment is to establish a claim, the nullification of the claim should be taken as
the null hypothesis. The experiment is often performed to determine whether the null hypothesis is
false. For example, suppose the prosecution wants to establish that a certain person is guilty. The null
hypothesis would be that the person is innocent and the alternative would be that the person is guilty.
Thus, the claim itself becomes the alternative hypothesis. Customarily, the alternative hypothesis is
the statement that the experimenter believes to be true. For example, the alternative hypothesis is
the reason a person is arrested (police suspect the person is not innocent). Once the hypotheses

1 Introduction
have been stated, appropriate statistical procedures are used to determine whether to reject the null
hypothesis. For the testing procedure, one begins with the assumption that the null hypothesis is true.
If the information furnished by the sampled data strongly contradicts (beyond a reasonable doubt)
the null hypothesis, then we reject it in favor of the alternative hypothesis. If we do not reject the
null, then we automatically reject the alternative. Note that we always make a decision with respect
to the null hypothesis. Note that the failure to reject the null hypothesis does not necessarily mean
that the null hypothesis is true. For example, a person being judged “not guilty” does not mean the
person is innocent. This basically means that there is not enough evidence to reject the null hypothesis
(presumption of innocence) beyond “a reasonable doubt.”
We summarize the elements of a statistical hypothesis in the following.
THE ELEMENTS OF A STATISTICAL HYPOTHESIS
1. The null hypothesis, denoted by H0, is usually the nullification of a claim. Unless evidence from the
data indicates otherwise, the null hypothesis is assumed to be true.
2. The alternate hypothesis, denoted by Ha (or sometimes denoted by H1), is customarily the claim
itself.
3. The test statistic, denoted by TS, is a function of the sample measurements upon which the
statistical decision, to reject or not reject the null hypothesis, will be based.
4. A rejection region (or a critical region) is the region (denoted by RR) that specifies the values
of the observed test statistic for which the null hypothesis will be rejected. This is the range of
values of the test statistic that corresponds to the rejection of H0 at some fixed level of significance,
α, which will be explained later.
5. Conclusion: If the value of the observed test statistic falls in the rejection region, the null hypothesis
is rejected and we will conclude that there is enough evidence to decide that the alternative
hypothesis is true. If the TS does not fall in the rejection region, we conclude that we cannot reject
the null hypothesis.
In practice one may have hypotheses such as H0 : μ = μ0 against one of the following alternatives:
⎧
⎪
Ha : μ = μ0, called a two-tailed alternative
⎨
or Ha : μ < μ0, called a lower (or left) tailed alternative
⎪or Ha : μ > μ0, called an upper (or right) tailed alternative
⎩
A test with a lower or upper tailed alternative is called a one-tailed test. In an applied hypothesis testing
problem, we can use the following general steps.
GENERAL METHOD FOR HYPOTHESIS TESTING
1. From the (word) problem, determine the appropriate null hypothesis, H0, and the alternative, Ha.
2. Identify the appropriate test statistics and calculate the observed test statistic from the data.
3. Find the rejection region by looking up the critical value in the appropriate table.
4. Draw the conclusion: Reject or fail to reject the null hypothesis, H0.
5. Interpret the results: State in words what the conclusion means to the problem we started with.

Hypothesis Testing
It is always necessary to state a null and an alternate hypothesis for every statistical test performed.
All possible outcomes should be accounted for by the two hypotheses.
Example 1.1
In a coin-tossing experiment, let p be the probability of heads. We start with the claim that the coin is fair,
that is, H0 : p = 1/2. We test this against one of the following alternatives:
(a) Ha: The coin is not fair (p = 1/2). This is a two-tailed alternative.
(b) Ha: The coin is biased in favor of heads (p > 1/2). This is an upper tailed alternative.
(c) Ha: The coin is biased in favor of tails (p < 1/2). This is a lower tailed alternative.
It is important to observe that the test statistic is a function of a random sample. Thus, the test statistic
itself is a random variable whose distribution is known under the null hypothesis. The value of a test
statistic when specific sample values are substituted is called the observed test statistic or simply test
statistic.
For example consider the hypothesis H0 : μ = μo versus Ha : μ = μo, where μo is known. Assume
that the population is normal with a known variance σ2. Consider X, an unbiased estimator of μ
based on the random sample X1, . . . , Xn. Then Z = (X − μ0)/(σ/√n) is a function of the random
sample X1, . . . , Xn, and has a known distribution, a standard normal, under H0. If x1, x2, . . . , xn are
specific sample values, then z = (x − μ0)/(σ/√n) is called the observed sample statistic or simply sample
statistic.
Definition 1.1 A hypothesis is said to be a simple hypothesis if that hypothesis uniquely specifies
the distribution from which the sample is taken. Any hypothesis that is not simple is called a composite
hypothesis.
Example 1.2
Refer to Example 1.1. The null hypothesis p =1/2 is simple, because the hypothesis completely specifies
the distribution, which in this case will be a binomial with p = 1/2 and with n being the number of tosses.
The alternative hypothesis p = 1/2 is composite because the distributioow is not completely specified
(we do not know the exact value of p).
Because the decision is based on the sample information, we are prone to commit errors. In a statistical
test, it is impossible to establish the truth of a hypothesis with 100% certainty. There are two possible
types of errors. On the one hand, one can make an error by rejecting H0 when in fact it is true. On
the other hand, one can also make an error by failing to reject the null hypothesis when in fact it is
false. Because the errors arise as a result of wrong decisions, and the decisions themselves are based
on random samples, it follows that the errors have probabilities associated with them. We now have
the following definitions.

1 Introduction
Table 1 Statistical Decision and Error Probabilities
Statistical
True state of null hypothesis
decision
H0 true
H0 false
Do not reject H0
Correct decision
Type II error (β)
Reject H0
Type I error (α)
Correct decision
The decision and the errors are represented in Table 1.
Definition 1.2 (a) A type I error is made if H0 is rejected when in fact H0 is true. The probability of
type I error is denoted by α. That is,
P (rejecting H0 H0 is true) = α.
The probability of type I error, α, is called the level of significance.
(b) A type II error is made if H0 is accepted when in fact Ha is true. The probability of a type II error is
denoted by β. That is,
P (not rejecting H0 H0 is false) = β.
It is desirable that a test should have a = β = 0 (this can be achieved only in trivial cases), or at least
we prefer to use a test that minimizes both types of errors. Unfortunately, it so happens that for a
fixed sample size, as α decreases, β tends to increase and vice versa. There are no hard and fast rules
that can be used to make the choice of α and β. This decision must be made for each problem based
on quality and economic considerations. However, in many situations it is possible to determine
which of the two errors is more serious. It should be noted that a type II error is only an error in
the sense that a chance to correctly reject the null hypothesis was lost. It is not an error in the sense
that an incorrect conclusion was drawn, because no conclusion is made when the null hypothesis is
not rejected. In the case of type I error, a conclusion is drawn that the null hypothesis is false when,
in fact, it is true. Therefore, type I errors are generally considered more serious than type II errors.
For example, it is mostly agreed that finding an innocent person guilty is a more serious error than
finding a guilty person innocent. Here, the null hypothesis is that the person is innocent, and the
Prob (TYPE II Error) 5 Beta
Prob (TYPE I Error) 5 Alpha
Under H0
Under Ha
Critical value

Hypothesis Testing
alternate hypothesis is that the person is guilty. “Not rejecting the null hypothesis” is equivalent to
acquitting a defendant. It does not prove that the null hypothesis is true, or that the defendant is
innocent. In statistical testing, the significance level α is the probability of wrongly rejecting the null
hypothesis when it is true (that is, the risk of finding an innocent person guilty). Here the type II risk
is acquitting a guilty defendant. The usual approach to hypothesis testing is to find a test procedure
that limits α, the probability of type I error, to an acceptable level while trying to lower β as much as
possible.
The consequences of different types of errors are, in general, very different. For example, if a doctor
tests for the presence of a certain illness, incorrectly diagnosing the presence of the disease (type I
error) will cause a waste of resources, not to mention the mental agony to the patient. On the other
hand, failure to determine the presence of the disease (type II error) can lead to a serious health risk.
To formulate a hypothesis testing problem, consider the following situation. Suppose a toy store
chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys. We feel
that this claim is inflated. In an attempt to dispose of this claim, we observe the buying pattern of 20
randomly selected girls under 8 years old, and we observe X, the number of girls under 8 years old
who buy stuffed toys or dolls. Now the question is, how can we use X to confirm or reject the store’s
claim? Let p be the probability that a girl under 8 chosen at random prefers stuffed toys or dolls. The
questioow can be reformulated as a hypothesis testing problem. Is p ≥ 0.8 or p < 0.8? Because we
would like to reject the store’s claim only if we are highly certain of our decision, we should choose
the null hypothesis to be H0 : p ≥ 0.8, the rejection of which is considered to be more serious. The
null hypothesis should be H0 : p ≥ 0.8, and the alternative Ha : p < 0.8. In order to make the null
hypothesis simple, we will use H0 : p = 0.8, which is the boundary value with the understanding that
it really represents H0 : p ≥ 0.8. We note that X, the number of girls under 8 years old who prefer
stuffed toys or dolls, is a binomial random variable. Clearly a large sample value of X would favor
H0. Suppose we arbitrarily choose to accept the null hypothesis if X >12. Because our decision is
based on only a sample of 20 girls under 8, there is always a possibility of making errors whether
we accept or reject the store chain’s claim. In the following example, we will now formally state this
problem and calculate the error probabilities based on our decision rule.
Example 1.3
A toy store chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys.
After observing the buying pattern of many girls under 8 years old, we feel that this claim is inflated. In an
attempt to dispose of this claim, we observe the buying pattern of 20 randomly selected girls under 8 years
old, and we observe X, the number of girls who buy stuffed toys or dolls. We wish to test the hypothesis
H0 : p = 0.8 against Ha : p < 0.8. Suppose we decide to accept the H0 if X > 12 (that is X ≥ 13). This
means that if {X ≤ 12} (that is X < 13) we will reject H0.
(a) Find α.
(b) Find β for p = 0.6.
(c) Find β for p = 0.4.
(d) Find the rejection region of the form {X ≤ K} so that (i) α = 0.01; (ii) α = 0.05.
(e) For the alternative Ha :p = 0.6, find β for the values of α in part (d).

1 Introduction
Solution
The TS X is the number of girls under 8 years old who buy dolls. X follows the binomial distribution with
n = 20 and p, the unknown population proportion of girls under 8 who prefer dolls. We now calculate α
and β.
(a)
For p = 0.8, the probability of type I error is
α = P{reject H0 H0 is true}
= P{X ≤ 12 p = 0.8}
∑
(20)(0.8)
=
x(0.2)20−x
x
x=0
= 0.0321.
If we calculate α for any other value of p > 0.8, then we will find that it is smaller than 0.0321.
Hence, there is at most a 3.21% chance of rejecting a true null hypothesis. That is, if the store’s claim
is in fact true, then the chance that our test will erroneously reject that claim is at most 3.21%.
(b)
Here p = 0.6. The probability of type II error is
β = P{accept H0 H0 false}
= P{X > 12 p = 0.6}
= 1 − P{X ≤ 12 p = 0.6}
= 1 − 0.584
= 0.416
so there is a 4.2% chance of accepting a false null hypothesis. Thus, in case the store’s claim is not
true, and the truth is that only 60% of girls under 8 years old prefer dolls over other types of toys,
then there is a 4.2% chance that our test will erroneously conclude that the store’s claim is true.
(c)
If p = 0.4, then
β = P{accept H0 H0 false}
= P{X > 12 p = 0.4}
= 1 − P{X ≤ 12 p = 0.4}
= 1 − 0.979
= 0.021.
That is, there is a 2.1% chance of accepting a false null hypothesis.
(d)
(i) To find K such that
α = P{X ≤ K p = 0.8} = 0.01
from the binomial table, K = 11. Hence, the rejection region is: Reject H0 if {X ≤ 11}.
(ii) To find K such that
α=P{X≤K p=0.8}=0.05

Hypothesis Testing
from the binomial table, α = 0.05 falls between K = 12 and K = 13. However, for K = 13, the
value for α is 0.087, exceeding 0.05. If we want to limit α to be no more than 0.05, we will
have to take K = 12. That is, we reject the null hypothesis if X ≤ 12, yielding an α = 0.0321
as shown in (a).
(e)
(i) When a = 0.01, from (d), the rejection region is of the form {X ≤ 11}. For p = 0.6,
β = P{accept H0 H0 false}
= P{Y > 11 p = 0.6}
= 1 − P{Y ≤ 11 p = 0.6}
= 1 − 0.404
= 0.596.
(ii) From (a) and (b) for testing the hypothesis H0 : p = 0.8 against Ha : p < 0.8 with n = 20.
We see that when α is 0.0321, β is 0.416. From (d)(i) and (e)(i) for the same hypothesis, we
see that when α is 0.01, β is 0.596. This holds in general. Thus, we observe that for fixedas
α decreases, β increases and vice versa.
In the next example, we explore what happens to β as the sample size increases, with α fixed.
Example 1.4
Let X be a binomial random variable. We wish to test the hypothesis H0 : p = 0.8 against Ha : p = 0.6. Let
α = 0.03 be fixed. Find β for n = 10 and n = 20.
Solution
For n = 10, using the binomial tables, we obtain P {X ≤ 5 p = 0.8}= 0.03. Hence the rejection region for
the hypothesis H0 : p = 0.8 vs. Ha : p = 0.6 is given by reject H0 if X ≤ 5. The probability of type II error is
β = P{accept H0 p = 0.6}
β = P{X > 5 p = 0.6} = 1 − P{X ≤ 5 p = 0.6} = 0.733.
For n = 20, as shown in Example 1.3, if we reject H0 for X ≤ 12, we obtain
P (X ≤ 12 p = 0.8)= 0.03
and
β = P(X > 12 p = 0.6) = 1 − P{X ≤ 12 p = 0.6} = 0.416.
We see that for a fixed α, asincreases β decreases and vice versa. It can be shown that this result holds in
general.

1 Introduction
In order for us to compute the value of β, it is necessary that the alternate hypothesis is simple. Now
we will discuss a three-step procedure to calculate β.
STEPS TO CALCULATE β
1. Decide an appropriate test statistic (usually this is a sufficient statistic or an estimator for the
unknown parameter, whose distribution is known under H0).
2. Determine the rejection region using a given α, and the distribution of the test statistic (TS).
3. Find the probability that the observed test statistic does not fall in the rejection region assuming
Ha is true. This gives β. That is,
β = P(T.S. falls in the complement of the rejection region Hais true).
Example 1.5
A random sample of size 36 from a population with known variance, σ2 = 9, yields a sample mean of
x = 1 Find β, for testing the hypothesis H0 : μ = 15 versus Ha : μ = 16. Assume α = 0.05.
Solution
Here n = 36, x = 17, and σ2 = 9. In general, to test H0 : μ = μ0 versus Ha : μ > μ0, we proceed as
follows. An unbiased estimator of μ is X. Intuitively we would reject H0 if X is large, say X > c. Now using
α = 0.05, we will determine the rejection region. By the definition of α, we have
P(X > c μ = μ0) = 0.05
or
(
)
P σ/√μ0
> σ/√0
= 0.05
n
n μ=μ0
But if μ = μ0, because the sample size n ≥ 30, [(X − μ0)/(σ/√n)] ∼ N(0, 1). Therefore, P((σ/√0
n) >
)
(
)
c−μ0
= 0.05 is equivalent to P Z >(σ/√0
= 0.05. From standard normal tables, we obtain P (Z >
(σ/√n)
n)
1.645) = 0.05. Hence(σ/√0
n) =1.645orc=μ0+1.645(σ/√n).
Therefore, the rejection region is the set of all sample means x such that
(
)
σ
x > μ0 + 1.645
√n
Substituting μ0 = 15, and σ = 3, we obtain
)
μ0 + 1.645(σ/√n) = 15 + 1.645(3
= 15.8225.
36
The rejection region is the set of x such that x ≥ 15.8225.

Hypothesis Testing
Then by definition,
β = P (X ≤ 15.8225 when μ = 16).
Consequently, for μ = 16,
(
)
X − 16
15.8225 − 16
β=P
≤
√
σ/√n
3/
36
= P (Z ≤ −0.36)
= 0.3594.
That is, under the given information, there is a 35.94% chance of not rejecting a false null hypothesis.
1.1 Sample Size
It is clear from the preceding example that once we are given the sample size n, an α, a simple
alternative Ha, and a test statistic, we have no control over β and it is exactly determined. Hence, for
a given sample size and test statistic, any effort to lower β will lead to an increase in α and vice versa.
This means that for a test with fixed sample size it is not possible to simultaneously reduce both α
and β. We also notice from Example 1.4 that by increasing the sample size n, we can decrease β
(for the same α) to an acceptable level. The following discussion illustrates that it may be possible to
determine the sample size for a given α and β.
Suppose we want to test H0 : μ = μ0 versus Ha : μ > μ0. Given α and β, we want to find n, the
sample size, and K, the point at which the rejection begins. We know that
α = P (X > K when μ = μ0)
(
)
X−μ0
=P
> σ/√μ0
when μ = μ0
(1)
σ/√n
n ,
= P (Z > za)
and
β = P (X ≤ K, when μ = μa)
(
)
X−μa
=P
≤ σ/√μa
when μ = μa
(2)
σ/√n
n ,
= P (z ≤ −zβ).
From Equations (1) and (2),
K−μ0
zα =
σ/√n

1 Introduction
and
K−μa
−zβ =
σ/√n
This gives us two equations with two unknowns (K and n), and we can proceed to solve them.
Eliminating K, we get
(
)
(
σ
σ )
μ0 + zα
=μa−zβ
√n
√n
From this we can derive
(zα + zβ)σ
√n =
μa − μ0
Thus, the sample size for an upper tail alternative hypothesis is
)2σ2
(zα + zβ
n=
(μa − μ0)2.
The sample size increases with the square of the standard deviation and decreases with the square of
the difference between mean value of the alternative hypothesis and the mean value under the null
hypothesis. Note that in real-world problems, care should be taken in the choice of the value of μa
for the alternative hypothesis. It may be tempting for a researcher to take a large value of μa in order
to reduce the required sample size. This will seriously affect the accuracy (power) of the test. This
alternative value must be realistic within the experiment under study. Care should also be taken in
the choice of the standard deviation σ. Using an underestimated value of the standard deviation to
reduce the sample size will result in inaccurate conclusions similar to overestimating the difference
of means. Usually, the value of σ is estimated using a similar study conducted earlier. The problem
could be that the previous study may be old and may not represent the new reality. When accuracy is
important, it may be necessary to conduct a pilot study only to get some idea on the estimate of σ.
Once we determine the necessary sample size, we must devise a procedure by which the appropriate
data can be randomly obtained. This aspect of the design of experiments is discussed in Chapter 9.
Example 1.6
Let σ = 3.1 be the true standard deviation of the population from which a random sample is chosen. How
large should the sample size be for testing H0 : μ = 5 versus Ha : μ = 5.5, in order that α = 0.01 and
β = 0.05?
Solution
We are given μ0 = 5 and μa = 5.5. Also, zα = z0.01 = 2.33 and zβ = z0.05 = 1.645. Hence, the
sample size
(zα + zβ)2σ2
n=
= (2.33+1.645)2(3.1)2
= 603
(μa − μ0)2
(0.5)2

Hypothesis Testing
So, n = 608 will provide the desired levels. That is, in order for us to test the foregoing hypothesis, we must
randomly select 608 observations from the given population.
From a practical standpoint, the researcher typically chooses α, and the sample size β is ignored.
Because a trade-off exists between α and β, choosing a very small value of α will tend to increase β in
a serious way. A general rule of thumb is to pick reasonable values of α, possibly in the 0.05 to 0.10
range so that β will remain reasonably small.
EXERCISES 1
1.1.
An appliance manufacturer is considering the purchase of a new machine for stamping out
sheet metal parts. If μ0 (given) is the true average of the number of good parts stamped out
per hour by their old machine and μ is the corresponding true unknown average for the
new machine, the manufacturer wants to test the null hypothesis μ = μ0 versus a suitable
alternative. What should the alternative be if he does not want to buy the new machine
unless it is (a) more productive than the old one? (b) At least 20% more productive than the
old one?
1.2.
Formulate an alternative hypothesis for each of the following null hypotheses.
(a) H0: Support for a presidential candidate is unchanged after the start of the use of TV
commercials.
(b) H0: The proportion of viewers watching a particular local news channel is less
than 30%.
(c) H0: The median grade point average of undergraduate mathematics majors is 2.9.
1.3.
It is suspected that a coin is not balanced (not fair). Let p be the probability of tossing a head.
To test H0 : p = 0.5 against the alternative hypothesis Ha : p > 0.5, a coin is tossed 15 times.
Let Y equal the number of times a head is observed in the 15 tosses of this coin. Assume the
rejection region to be {Y ≥ 10}.
(a) Find α.
(b) Find β for p = 0.
(c) Find β for p = 0.6.
(d) Find the rejection region for {Y ≥K} for α = 0.01, and α = 0.03.
(e) For the alternative Ha : p = 0.7, find β for the values of α given in (d).
1.4.
In Exercise 1.3:
(a) Assume that the rejection region is {Y ≥ 8}. Calculate α and β if p = 0.6. Compare the
results with the corresponding values obtained in Exercise 1.3. (This gives the effect of
enlarging the rejection region on α and β.)
(b) Assume that the rejection region is {Y ≥ 8}. Calculate α and β if p = 0.6 and (i) the coin
is tossed 20 times, or (ii) the coin is tossed 25 times. (This shows the effect of increasing
the sample size on α and β for a fixed rejection region.)
1.5.
Suppose we have a random sample of size 25 from a normal population with an unk-
nown mean μ and a standard deviation of 4. We wish to test the hypothesis H0 : μ = 10 vs.

2 The Neyman-Pearson Lemma
Ha
: μ
> 10. Let the rejection region be defined by: reject H0 if the sample mean
X > 11.2.
(a) Find α.
(b) Find β for Ha : μ = 11.
(c) What should the sample size be if α = 0.01 and β = 0.8?
1.6.
A process for making steel pipe is under control if the diameter of the pipe has mean 3.0 in.
with standard deviation of no more than 0.0250 in. To check whether the process is under
control, a random sample of size n = 30 is taken each day and the null hypothesis μ = 3.0
is rejected if X is less than 2.9960 or greater than 3.0040. Find (a) the probability of type I
error; (b) the probability of type II error when μ = 3.0050 in. Assume σ = 0.0250 in.
1.
A bowl contains 20 balls, of which x are green and the remain- der red. To test H0 : x = 10
versus Ha : x = 15, three balls are selected at random without replacement, and H0 is rejected
if all three balls are green. Calculate α and β for this test.
1.8.
Suppose we have a sample of size 6 from a population with pdf f (x) = (1/θ)e−x/θ , x > 0, θ >
0. We wish to test H0 : θ = 1 vs. Ha : θ > 1. Let the rejection region be defined by reject H0 if
∑6
θ = 2.
i=1 Xi >8.(a)Findα.(b)FindβforHa :
1.9.
Let σ2 = 16 be the variance of a normal population from which a random sample is chosen.
How large should the sample size be for testing H0 : μ = 25 versus Ha : μ = 24, in order that
α=0.05 and β = 0.05?
2 THE NEYMAN-PEARSON LEMMA
In practical hypothesis testing situations, there are typically many tests possible with significance level
α for a null hypothesis versus alternative hypothesis (see Project 7A). This leads to some important
questions, such as (1) how to decide on the test statistic and (2) how to know that we selected the best
rejection region. In this section, we study the answer to these questions using the Neyman-Pearson
approach.
Definition 2.1 Suppose that W is the test statistic and RR is the rejection region for a test of hypothesis
concerning the value of a parameter θ. Then the power of the test is the probability that the test rejects H0
when the alternative is true. That is,
π = Power(θ)
= P(W in RR when the parameter value is an alternative θ).
If H0 : θ = θ0 and Ha : θ = θ0, then the power of the test at some θ = θ1 = θ0 is
Power(θ1) = P (reject H0 θ = θ1).
But, β(θ1) = P (accept H0 θ = θ1). Therefore,
Power(θ1) = 1 − β(θ1).
A good test will have high power.

Hypothesis Testing
Note that the power of a test H0 cannot be found until some true situation Ha is specified. That is,
the sampling distribution of the test statistic when Ha is true must be known or assumed. Because
β depends on the alternative hypothesis, which being composite most of the time does not specify
the distribution of the test statistic, it is important to observe that the experimenter cannot control
β. For example, the alternative Ha : μ < μ0 does not specify the value of μ, as in the case of the null
hypothesis, H0 : μ = μ0.
Example 2.1
Let X1, . . . , Xn be a random sample from a Poisson distribution with parameter λ, that is, the pdf is
given by f (x) = e−λλx/(x!). Then the hypothesis H0 : λ = 1 uniquely specifies the distribution, because
f (x) = e−1/(x!) and hence is a simple hypothesis. The hypothesis Ha : λ > 1 is composite, because f (x) is
not uniquely determined.
Definition 2.2 A test at a given α of a simple hypothesis H0 versus the simple alternative Ha that has
the largest power among tests with the probability of type I error no larger than the given α is called a most
powerful test.
Consider the test of hypothesis H0 : θ = θ0 versus Ha : θ = θ1. If α is fixed, then our interest is to
make β as small as possible. Because β = 1 − Power(θ1), by minimizing β we would obtain a most
powerful test. The following result says that among all tests with given probability of type I error, the
likelihood ratio test given later minimizes the probability of a type II error, in other words, it is most
powerful.
Theorem 2.1 (Neyman-Pearson Lemma) Suppose that one wants to test a simple hypothesis H0 :
θ = θ0 versus the simple alternative hypothesis Ha :θ =θ1 based on a random sample X1,…,Xn from a
distribution with parameter θ. Let L(θ) ≡ L(θ; X1, . . . , Xn) > 0 denote the likelihood of the sample when
the value of the parameter is θ. If there exist a positive constant K and a subset C of the sample space Rn (the
Euclidean n-space) such that
L(θ0)
1.
≤ K for (x1,x2,…,xn) ∈ C
L(θ1)
L(θ0)
2.
≥ K for (x1,x2,…,xn) ∈ C′, where C′ is the complement of C, and
L(θ1)
3. P [(X1, . . . , Xn) ∈ C; θ0] = α.
Then the test with critical region C will be the most powerful test for H0 versus Ha. We call α the size of the
test and C the best critical region of size α.
Proof. We prove this theorem for continuous random variables. For discrete random variables, the
proof is identical with sums replacing the integral. Let S be some region in Rn, an n-dimensional
Euclidean space. For simplicity we will use the following notation:
∫
∫
∫
L(θ) = . . . L(θ; x1, x2, . . . , xn)dx1dx2, . . . , dxn
S
S
S

2 The Neyman-Pearson Lemma
Note that
∫
P ((X1, . . . , Xn) ∈ C; θ0) = f (x1, . . . , xn; θ0)dx1, . . . , dxn
C
∫
= L(θ0; x1, . . . , xn)dx1, . . . , dxn.
C
Suppose that there
is another critical region, say B, of size less than or equal
to
α,
that
is
∫
B L(θ0) ≤ α. Then
∫
∫
∫
0
≤ L(θ0) − L(θ0), because L(θ0) = α by assumption 3.
C
B
C
Therefore,
∫
∫
0 ≤ L(θ0) − L(θ0)
C
B
∫
∫
∫
∫
= L(θ0) +
L(θ0) −
L(θ0) −
L(θ0)
C∩B
C∩B′
C∩B
C′∩B
∫
∫
= L(θ0) −
L(θ0).
C∩B′
C′∩B
Using assumption 1
of Theorem 2.1, KL(θ1) ≥ L(θ0) at each point in the region C and hence in
C ∩ B′. Thus
∫
∫
L(θ0) ≤ K
L(θ1).
C∩B′
C∩B′
By assumption 2 of the theorem, KL(θ1) ≤ L(θ0) at each point in C′, and hence in C′ ∩ B. Thus,
∫
∫
L(θ0) ≥ K
L(θ1).
C′∩B
C′∩B
Therefore,
∫
∫
0≤
L(θ0) −
L(θ0)
C∩B′
C′∩B
⎧
⎫
⎨
∫
∫
⎬
≤K
L(θ1)
⎩
⎪ L(θ1)−
⎭
C∩B′
C′∩B

Hypothesis Testing
That is,
⎧
⎫
⎨
∫
∫
∫
∫
⎬
0≤K
L(θ1) +
L(θ1)−
L(θ1) −
L(θ1)
⎩
⎭
C∩B
C∩B′
C∩B
C′∩B
⎧
⎫
⎨∫
∫
⎬
L(θ1) − L(θ1)
= K⎩
⎭.
C
B
As a result,
∫
∫
L(θ1) ≥ L(θ1).
C
B
Because this is true for every critical region B of size ≤ α, C is the best critical region of size α, and
the test with critical region C is the most powerful test of size α.
When testing two simple hypotheses, the existence of a best critical region is guaranteed by the
Neyman-Pearson lemma. In addition, the foregoing theorem provides a means for determining
what the best critical region is. However, it is important to note that Theorem 2.1 gives only the
form of the rejection region; the actual rejection region depends on the specific value of α.
In real-world situations, we are seldom presented with the problem of testing two simple hypotheses.
There is no general result in the form of Theorem 4.1 for composite hypotheses. However, for
hypotheses of the form H0 : θ = θ0 versus Ha : θ > θ0, we can take a particular value θ1 > θ0 and
then find a most powerful test for H0 : θ = θ0 versus Ha : θ > θ1. If this test (that is, the rejection
region of the test) does not depend on the particular value θ1, then this test is said to be a uniformly
most powerful test for H0 : θ = θ0 versus Ha : θ > θ0.
The following example illustrates the use of the Neyman-Pearson lemma.
Example 2.2
Let X1, . . . , Xn denote an independent random sample from a population with a Poisson distribution with
mean λ. Derive the most powerful test for testing H0 : λ = 2 versus Ha : λ = 1/2.
Solution
Recall that the pdf of Poisson variable is
e
−λλx
, λ > 0,x = 0,1,2,…
x!
p(x) =
0,
otherwise.
Thus, the likelihood function is
[
∑
]
(
xi)
λi=1
e−λn
L=
n
(xi!)
i=1

2 The Neyman-Pearson Lemma
For λ = 2,
)
]
∑x
i
2 i=1
e−2n
L(θ0) = L(λ = 2) =
n
(xi!)
i=1
and for λ = 1/2,
⎡
(
)
⎤
∑
x
⎣(1/2) i=1
i e−(1/2)n⎦
L(θ1) = L(λ = 1/2) =
n
(xi!)
i=1
Thus,
(
)
xi)
2(∑
e−n2
L(θ0)
<K
(
)∑xi
L(θ1) =
1
e−2
2
which implies
∑
(
)
(4)
xi e−
2
<K
or, taking natural logarithm,
(∑ )
3n
xi ln 4 −
ln K.
2 <
Solving for (∑xi) and letting {[ln K + (3n/2)]/ln 4} = K′, we will reject H0 whenever (
∑xi) < K′.
A step-by-step procedure in applying the Neyman-Pearson lemma is now given.
PROCEDURE FOR APPLYING THE NEYMAN-PEARSON LEMMA
1. Determine the likelihood functions under both null and alternative hypotheses.
2. Take the ratio of the two likelihood functions to be less than a constant K .
3. Simplify the inequality in step 2 to obtain a rejection region.
Example 2.3
Suppose X1, . . . , Xn is a random sample from a normal distribution with a known mean of μ and an
unknown variance of σ2. Find the most powerful α-level test for testing H0
: σ2
= σ2
0 versusHa :
σ2 = σ2(σ2
1
1>σ0).Showthatthistestisequivalenttotheχ2-test.Isthetestuniformlymostpowerfulfor
Ha : σ2 > σ2
0?

Hypothesis Testing
Solution
To test H0 : σ2 = σ2
σ2 > σ2
0 versusHa :
1.Wehave
(xi
− μ)2
−
∏
1
2σ2
L(σ2
√
0
0)=
2πσne
i=1
0
∑(x
− i−μ)2
1
2σn
=
√
0
(
2π)nσne
0
Similarly,
∑(x
− i−μ)2
1
2σ2
L(σ2
√
1
1)=
(
2π)nσne
1
Therefore, the most powerful test is, reject H0 if,
[
]
(
)n
L(σ2
σ2
−(σ1−σ0)2
∑(xi − μ)2
0)
1
2σ2
=
e
1σ0
≤K
L(σ2
σ2
1)
0
for some K.
Taking the natural logarithms, we have
)
(σ1
(σ2
∑
1 −σ0)
n ln
−
(xi − μ)2 ≤ ln K
σ0
2σ2
1σ0
or
[
)
](
)
∑
(σ1
2σ2
1σ0
(xi − μ)2 ≥ n ln
− ln K
= C.
σ0
σ2
1 −σ0
To find the rejection region for a fixed value of α, write the region as
∑(xi − μ)2
≥ C
= C′.
σ2
σ2
0
0
Note that∑(xi − μ)2/σ2
because the same
0 hasaχ2-distributionwithndegreesoffreedom.UndertheH0
rejection region (does not depend upon the specific value of σ2
1 inthealternative)wouldbeusedforany
σ2
σ2
the test is uniformly most powerful.
1 >
0,
The foregoing example shows that, in order to test for variance using a sample from a normal
distribution, we could use the chi-square table to obtain the critical value for the rejection region
given α.

3 Likelihood Ratio Tests
EXERCISES 2
2.1.
Suppose X1, . . . , Xn is a random sample from a normal distribution with a known variance
of σ2 and an unknown mean of μ. Find the most powerful α-level test of H0 : μ = μ0 versus
Ha : μ = μa if (a) μ0 > μa, and (b) μa > μ0.
2.2.
Show that the most powerful test obtained in Example 2.1 is uniformly most powerful for
testing H0 : μ ≤ μ0 versus Ha : μ > μa, but not uniformly most powerful for testing H0 : μ = μ0
versus Ha : μ = μ0.
2.3.
Suppose X1, . . . , Xn is a random sample from a U(0, θ) distribution. Find the most powerful
α-level test for testing H0 : θ = θ0 versus Ha : θ = θ1, where θ0 < θ1.
2.4.
Let X1, . . . , Xn be a random sample from a geometric distribution with parameter p. Find the
most powerful test of H0 : p = p0 versus Ha : p = pa(> p0). Is this uniformly most powerful
test for H0 : p = p0 versus Ha : p > p0?
2.5.
Let X1, . . . , Xn be a random sample from a distribution having a pdf of
⎧
y2
⎨2y
η2 , ifx > 0
f (y) =
⎩η2 e−
0,
otherwise.
Find a uniformly most powerful test for testing H0 : η = η0 versus Ha : η < η0.
2.6.
Let X be a single observation from the pdf
θxθ−1,
0<x<1
f (x) =
0,
otherwise.
Find the most powerful test with a level of significance α = 0.01 to test H0 : θ = 3 versus
Ha : θ = 4.
2.
Let X1, . . . , Xn be a random sample from a Bernoulli distribution with parameter p. Find the
most powerful test of H0 : p = p0 versus Ha : p = pa, where pa > p0.
2.8.
Let X1, . . . , Xn be a random sample from a Poisson distribution with mean λ. Find a best
critical region for testing H0 : λ = 3 against Ha : λ = 6.
3 LIKELIHOOD RATIO TESTS
The Neyman-Pearson lemma provides a method for constructing most powerful tests for simple
hypotheses. We also have seen that in some instances when a hypothesis is not simple, it is pos-
sible to find uniformly most powerful tests. In general, uniformly most powerful (UMP) tests do
not exist for composite hypotheses. As an example, consider the two-sided hypothesis, at level α,
given by
H0 : μ = μ0
vs. Ha : μ = μ0
where μ is the mean of a normal population with known variance σ2. If X is the sample mean of a
random sample of size n, then as shown earlier, we can use the test statistic

Hypothesis Testing
X−μ0
Z=
∕
σ
√n
For Ha : μ = μ1 > μ0, the rejection region for the most powerful test would be
Reject H0 if z > zα.
On the other hand for Ha : μ = μ2 < μ0, the rejection region for the most powerful test would be
Reject H0 if z < −zα.
Thus, the rejection region depends on the specific alternative. Consequently, the two-sided hypothesis
just given has no UMP test.
In this section, we shall study a general procedure that is applicable when one or both H0 and Ha are
composite. In fact, this procedure works for simple hypotheses as well. This method is based on the
maximum likelihood estimation and the ratio of likelihood functions used in the Neyman-Pearson
lemma. We assume that the pdf or pmf of the random variable X is f (x, θ), where θ can be one or
more unknown parameters. Let represent the total parameter space that is the set of all possible
values of the parameter θ given by either H0 or H1.
Consider the hypotheses
H0 : θ ∈
0 vs. Ha : θ ∈ a =
−
0.
where θ is the unknown population parameter (or parameters) with values in
, and
0 is a subset
of
Let L(θ) be the likelihood function based on the sample X1, . . . , Xn. Now we define the likelihood
ratio corresponding to the hypotheses H0 and Ha. This ratio will be used as a test statistic for the
testing procedure that we develop in this section. This is a natural generalization of the ratio test used
in the Neyman-Pearson lemma when both hypotheses were simple.
Definition 3.1 The likelihood ratio λ is the ratio
max L(θ; x1, . . . , xn)
θ∈
L∗
0
0
λ=
=
max
L(θ; x1, . . . , xn)
L∗.
θ∈
We note that 0 ≤ λ ≤ 1. Because λ is the ratio of nonnegative functions, λ ≥ 0. Because
0 is a subset
of
, we know that max
L(θ) ≤ max L(θ). Hence, λ ≤ 1.
θ∈
0
θ∈
If the maximum of L in
0 is much smaller as compared with the maximum of L in
, that is, if
λ is small, it would appear that the data X1, . . . , Xn do not support the null hypothesis θ ∈
0. On
the other hand, if λ is close to 1, one could conclude that the data support the null hypothesis, H0.
Therefore, small values of λ would result in rejection of the null hypothesis, and large values nearer
to 1 will result a decision in support of the null hypothesis.

3 Likelihood Ratio Tests
For the evaluation of λ, it is important to note that maxθ∈ L(θ) = L(θml.), where θml. is the maximum
likelihood estimator of θ ∈
, and maxθ∈
0 L(θ)isthelikelihoodfunctionwithunknownparameters
replaced by their maximum likelihood estimators subject to the condition that θ ∈
0. We can
summarize the likelihood ratio test as follows.
LIKELIHOOD RATIO TESTS (LRTs)
To test
H0 : θ ∈
0 vs. Ha : θ ∈ a
max L(θ; x1, . . . , xn )
θ∈
L∗
0
0
λ=
=
maxL(θ; x1, . . . , xn )
L∗
θ∈
will be used as the test statistic.
The rejection region for the likelihood ratio test is given by
Reject H0 if λ ≤ K .
K is selected such that the test has the given significance level α.
Example 3.1
Let X1, . . . , Xn be a random sample from an N(μ, σ2). Assume that σ2 is known. We wish to test, at level
α, H0 : μ = μ0 vs. Ha : μ = μ0. Find an appropriate likelihood ratio test.
Solution
We have seen that to test
H0 : μ = μ0
vs. Ha : μ = μ0
there is no uniformly most powerful test for this case. The likelihood function is
∑
(xi − μ)2
(
)n
−i=1
1
2σ2
L(μ) =
√
e
2πσ
Here,
0 = {μ0} and a = R − {μ0}.
Hence,
∑
(xi − μ)2
(
)n
−i=1
1
2σ2
L∗
max
√
e
0 =
μ=μ0
2πσ
∑
(xi − μ0)2
(
)n
−i=1
1
2σ2
=
√
e
2πσ

Hypothesis Testing
Similarly,
∑
(xi − μ)2
(
)n
−i=1
1
2σ2
L∗ = max
√
e
−∞<μ<∞
2πσ
Because the only unknown parameter in the parameter space is μ, −∞ < μ < ∞, the maximum of the
likelihood function is achieved when μ equals its maximum likelihood estimator, that is,
μml. = X.
Therefore, with a simple calculation we have
(
)
∑
−
(xi−μ0)2
/2σ2
e
i=1
λ=
(
)
=e−n(x−μ0)2/2σ2.
∑
−
(xi−x)2
/2σ2
e i=1
Thus, the likelihood ratio test has the rejection region
Reject H0
if λ ≤ K
which is equivalent to
− n
2σ2(X−μ0)2≤lnK⇔
(X − μ0)2
≥ 2lnK ⇔
σ2/n
X − μ0
σ/√n ≥2lnK=c1,say.
Note that we use the symbol ⇔ to mean ‘‘if and only if.’’ We now compute c1. Under H0
,
[(X − μ0
)/
(σ/√n)] ∼ N(0, 1).
Observe that
}
X − μ0
α=P
σ∕√n ≥c1
gives a possible value of c1 as c1
= zα/2. Hence, LRT for the given hypothesis is
X − μ0
Reject H0 if
σ/√n ≥za/2.
Thus, in this case, the likelihood ratio test is equivalent to the z-test for large random samples.
In fact, when both the hypotheses are simple, the likelihood ratio test is identical to the Neyman-
Pearson test. We caow summarize the procedure for the likelihood ratio test, LRT.

3 Likelihood Ratio Tests
PROCEDURE FOR THE LIKELIHOOD RATIO TEST (LRT)
1. Find the largest value of the likelihood L(θ) for any θ0 ∈
0 by finding the maximum likelihood
estimate within
0 and substituting back into the likelihood function.
2. Find the largest value of the likelihood L(θ) for any θ ∈ by finding the maximum likelihood
estimate within and substituting back into the likelihood function.
3. Form the ratio
L(θ) in
0
λ = λ(x1,x2,…,xn) =
L(θ) in
4. Determine a K so that the test has the desired probability of type I error, α.
5. Reject H0 if λ ≤ K .
In the next example, we find a LRT for a testing problem when both H0 and Ha are simple.
Example 3.2
Machine I produces 5% defectives. Machine 2 produces 10% defectives. Ten items produced by each of
the machines are sampled randomly; X = number of defectives. Let θ be the true proportion of defectives.
Test H0 : θ = 0.05 versus Ha : θ = 0.1. Use α = 0.05.
Solution
We need to test H0
:
θ=
0.05 vs. Ha : θ = 0.1. Let
⎧
⎪(10)
(0.05)x(0.95)10−x, if θ = 0.05
⎨
x
L(θ) =
⎪
)
(10
⎩
(0.1)x(0.90)10−x, if θ = 0.10.
x
And
)
(10
L1 = L(0.05) =
(0.05)x(0.95)10−x
x
and
)
(10
L2 = L(0.1) =
(0.1)x(0.90)10−x.
x
Thus, we have
L1
0.05x (0.95)10−x
(1)x(19)10−x
=
=
L2
0.1x
(0.9)10−x
2
18

Hypothesis Testing
The ratio
L1
λ=
max(L1, L2).
Note that if max(L1, L2) = L1, then λ = 1. Because we want to reject for small values of λ, max(L1, L2) =
L2, and we reject H0 if (L1/L2) ≤ K or (L2/L1) > K (note thatL2
2x(18
L1 =
19 )10−x).
That is, reject H0 if
)10−x
( 18
2x
>K
19
(
)x
2
⇔
>K1
18
19 )x
(19
⇔
>K1.
9
Hence, reject H0 if X > C; P (X > C H0 : θ = 0.05) ≤ 0.05.
Using the binomial tables, we have
P (X > 2 θ = 0.05) = 0.0116
and
P (X ≥ 2 θ = 0.05) = 0.0862.
Reject H0 if X > 2. If we want α to be exactly 0.05, we have to use randomized test. Reject with
probability0.0384
0.0762 =0.5039ifX=2.
The likelihood ratio tests do not always produce a test statistic with a known probability distribu-
tion such as the z-statistic of Example 3.1. If we have a large sample size, then we can obtain an
approximation to the distribution of the statistic λ, which is beyond the level of this book.
EXERCISES 3
3.1. Let X1, . . . , Xn be a random sample from an N(μ, σ2). Assume that σ2 is unknown. We wish
to test, at level α, H0 : μ = μ0 vs Ha : μ < μ0. Find an appropriate likelihood ratio test.
3.2.
Let X1, . . . , Xn be a random sample from an N(μ, σ2). Assume that both μ and σ2 are
unknown. We wish to test, at level α, H0 : σ2 = σ2
0
vs. Ha : σ2 > σ2
0.
Find an appropriate
likelihood ratio test.
3.3.
Let X1, . . . , Xn be a random sample from an N(μ1, σ2) and let Y1, Y2, . . . , Yn be an indepen-
dent sample from an N(μ2, σ2), where σ2 is unknown. We wish to test, at level α, H0 : μ1 =
μ2 vs. Ha : μ1 = μ2. Find an appropriate likelihood ratio test.
3.4.
Let X1, . . . , Xn be a sample from a Poisson distribution with parameter λ. Show that a like-
lihood ratio test of H0 : λ = λ0 vs. Ha : λ = λ0 rejects the null hypothesis if X ≥ m1 or
X≤m2.

4 Hypotheses for a Single Parameter
3.5.
Let X1, . . . , Xn be a sample from an exponential distribution with parameter θ. Show that a
likelihood ratio test of H0 : θ = θ0 vs. Ha : θ = θ0 rejects the null hypothesis if
∑n
i=1 Xi ≥m1
or∑n
i=1 Xi ≤m2.
3.6.
A clinical oncology program developed a set of guidelines for their cancer patients to follow.
It is believed that the proportion of patients who are still living after 24 months is greater
for those who follow the guidelines. Of the 40 patients who followed the guidelines, 30 are
still living after 24 months, whereas of 32 patients who did not follow the guidelines, 21 are
living after 24 months. Find a likelihood ratio test at α = 0.01 to decide whether the program
is effective.
4 HYPOTHESES FOR A SINGLE PARAMETER
In this section, we first introduce the concept of p-value. After that, we study hypothesis testing
concerning a single parameter.
4.1 The p-Value
In hypothesis testing, the choice of the value of α is somewhat arbitrary. For the same data, if the test
is based on two different values of α, the conclusions could be different. Many statisticians prefer to
compute the so-called p-value, which is calculated based on the observed test statistic. For computing
the p-value, it is not necessary to specify a value of α. We can use the given data to obtain the
p-value.
Definition
4.1 Corresponding to an observed value of a test statistic, the p-value
(or attained
significance level) is the lowest level of significance at which the null hypothesis would have been
rejected.
For example, if we are testing a given hypothesis with α = 0.05 and we make a decision to reject H0
and we proceeded to calculate the p-value equal to 0.03, this means that we could have used an α as
low as 0.03 and still maintain the same decision, rejecting H0.
Based on the alternative hypothesis, one can use the following steps to compute the p-value.
STEPS TO FIND THE p-VALUE
1. Let TS be the test statistic.
2. Compute the value of TS using the sample X1, . . . , Xn . Say it is a.
3. The p-value is given by
⎧
⎪P (T S < a H0 ),
if lower tail test
⎨
p–value =
P (T S > a H0 ),
if upper tail test
⎪
⎩P ( T S > a H0 ), if two tail test.

Hypothesis Testing
Example 4.1
To test H0 : μ = 0 vs. Ha : μ = 0, suppose that the test statistic Z results in a computed value of 1.58.
Then, the p-value = P ( Z > 1.58) = 2(0.0571) = 0.1142. That is, we must have a type I error of 0.1142 in
order to reject H0. Also, if Ha : μ > 0, then the p-value would be P (Z > 1.58) = 0.0582. In this case we
must have an α of 0.0582 in order to reject H0.
The p-value can be thought of as a measure of support for the null hypothesis: The lower its value,
the lower the support. Typically one decides that the support for H0 is insufficient when the p-value
drops below a particular threshold, which is the significance level of the test.
REPORTING TEST RESULT AS p-VALUES
1. Choose the maximum value of α that you are willing to tolerate.
2. If the p-value of the test is less than the maximum value of α, reject H0.
If the exact p-value cannot be found, one can give an interval in which the p-value can lie. For example,
if the test is significant at α = 0.05 but not significant for α = 0.025, report that 0.025 ≤ p-value ≤
0.05. So for α > 0.05, reject H0, and for α < 0.025, do not reject H0.
In another interpretation, 1−(p-value) is considered as an index of the strength of the evidence against
the null hypothesis provided by the data. It is clear that the value of this index lies in the interval
[0, 1]. If the p-value is 0.02, the value of index is 0.98, supporting the rejection of the null hypothesis.
Not only do p-values provide us with a yes or no answer, they provide a sense of the strength of the
evidence against the null hypothesis. The lower the p-value, the stronger the evidence. Thus, in any
test, reporting the p-value of the test is a good practice.
Because most of the outputs from statistical software used for hypothesis testing include the p-value,
the p-value approach to hypothesis testing is becoming more and more popular. In this approach,
the decision of the test is made in the following way. If the value of α is given, and if the p-value of the
test is less than the value of α, we will reject H0. If the value of α is not given and the p-value associated
with the test is small (usually set at p-value < 0.05), there is evidence to reject the null hypothesis in
favor of the alternative. In other words, there is evidence that the value of the true parameter (such as
the population mean) is significantly different (greater, or lesser) than the hypothesized value. If the
p-value associated with the test is not small (p > 0.05), we conclude that there is not enough evidence
to reject the null hypothesis. In most of the examples in this chapter, we give both the rejection region
and p-value approaches.
Example 4.2
The management of a local health club claims that its members lose on the average 15 pounds or more
within the first 3 months after joining the club. To check this claim, a consumer agency took a random
sample of 45 members of this health club and found that they lost an average of 13.8 pounds within the
first 3 months of membership, with a sample standard deviation of 4.2 pounds.

4 Hypotheses for a Single Parameter
(a) Find the p-value for this test.
(b) Based on the p-value in (a), would you reject the null hypothesis at α = 0.01?
Solution
(a) Let μ be the true mean weight loss in pounds within the first 3 months of membership in this club.
Then we have to test the hypothesis
H0 : μ = 15 versus Ha : μ < 15
Here n = 45, x = 13.8, and s = 4.2. Because n = 45 > 30, we can use normal approximation.
Hence, the test statistic is
13.8 − 15
z=
√
= −1.9166
4.2/
45
and
p-value = P (Z < −1.9166) ≃ P (Z < −1.92) = 0.0274.
Thus, we can use an α as small as 0.0274 and still reject H0.
(b) No. Because the p-value = 0.0274 is greater than α = 0.01, one cannot reject H0.
In any hypothesis testing, after an experimenter determines the objective of an experiment and decides
on the type of data to be collected, we recommend the following step-by-step procedure for hypothesis
testing.
STEPS IN ANY HYPOTHESIS TESTING PROBLEM
1. State the alternative hypothesis, Ha (what is believed to be true).
2. State the null hypothesis, H0 (what is doubted to be true).
3. Decide on a level of significance α.
4. Choose an appropriate TS and compute the observed test statistic.
5. Using the distribution of TS and α, determine the rejection region(s) (RR).
6. Conclusion: If the observed test statistic falls in the RR, reject H0 and conclude that based on the
sample information, we are (1 − α)100% confident that Ha is true. Otherwise, conclude that there is
not sufficient evidence to reject H0. In all the applied problems, interpret the meaning of your
decision.
State any assumptions you made in testing the given hypothesis.
8. Compute the p–value from the null distribution of the test statistic and interpret it.
4.2 Hypothesis Testing for a Single Parameter
Now we study the testing of a hypothesis concerning a single parameter, θ, based on a random sample
X1,…,Xn. Let θ be the sample statistic. First, we deal with tests for the population mean μ for large
and small samples. Next, we study procedures for testing the population variance σ2. We conclude
the section by studying a test procedure for the true proportion p.

Hypothesis Testing
To test the hypothesis H : μ = μ0 concerning the true population mean μ, when we have a large
sample (n ≥ 30) we use the test statistic Z given by
X−μ0
Z=
S/√n
where S is the sample standard deviation and μ0 is the claimed mean under H0 (if the population
variance is known, we replace S with σ.
For a small random sample (n < 30), the test statistic is
X−μ0
T =
S/√n
where μ0 is the claimed value of the true mean, and X and S are the sample mean and standard
deviation, respectively. Note that we are using the lowercase letters, such as z and t, to represent the
observed values of the test statistics Z and T , respectively.
In practice, with raw data, it is important to verify the assumptions. For example, in the small sample
case, it is important to check for normality by using normal plots. If this assumption is not satisfied,
the nonparametric methods described in Chapter 12 may be more appropriate. In addition, because
the sample statistic such as X and S will be greatly affected by the presence of outliers, drawing a box
plot to check for outliers is a basic practice we should incorporate in our analysis.
We now summarize the typical test of hypothesis for tests concerning population (true) mean.
In order to compute the observed test statistic, z in the large sample case and t in the small sample
case, calculate the values of z = (x − μ0)/(s/√n) and t = [(x − μ0)/(s/√n)], respectively.
SUMMARY OF HYPOTHESIS TESTS FOR μ
Large Sample (n ≥ 30)
Small Sample (n < 30)
To test
To test
H0 : μ = μ0
H0 : μ = μ0
versus
versus
μ > μ0, upper tail test
μ > μ0, upper tail test
μ < μ0, lower tail test
Ha :
Ha : μ < μ0, lower tail test
μ = μ0, two-tailed test
μ = μ0, two-tailed test
X −μ0
X −μ0
Test statistic: Z =
Test statistic: T =
σ/√n
S/√n
Replace σ by S, if σ is unknown.
⎧
⎧
⎪z >zα,
upper tail RR
⎪t >tα,n−1,
upper tail RR
⎨
⎨
Rejection region :
z < −zα, lower tail RR
RR :
t < −tα,n−1,
lower tail RR
⎪
⎪
⎩ z > zα/2, two tail RR
⎩ t > tα/2,n−1, two tail RR

4 Hypotheses for a Single Parameter
Assumption: n ≥ 30
Assumption: Random sample
comes from a normal
population
Decision: Reject H0, if the observed test statistic falls in the RR and conclude that Ha is true with
(1 − α)100% confidence. Otherwise, keep H0 so that there is not enough evidence to conclude that
Ha is true for the given α and more experiments may be needed.
Example 4.3
It is claimed that sports-car owners drive on the average 18,000 miles per year. A consumer firm believes that
the average mileage is probably lower. To check, the consumer firm obtained information from 40 randomly
selected sports-car owners that resulted in a sample mean of 17,463 miles with a sample standard deviation
of 1348 miles. What can we conclude about this claim? Use α = 0.01.
Solution
Let μ be the true population mean. We can formulate the hypotheses as H0
: μ
= 18,000 versus
Ha : μ < 18,000.
The observed test statistic (for n ≥ 30) is
x−μ
0
17,463 − 18,000
z=
=
√
σ/√n
1348/
40
= −2.52.
Rejection region is {z < −z0.01} = {z < −2.33}.
Decision: Because z = −2.52 is less than −2.33, the null hypothesis is rejected at α = 0.01. There is
sufficient evidence to conclude that the mean mileage on sport cars is less than 18,000 miles per year.
Example 4.4
In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is thought that
people travel on the average of at least 75 mph. To check this claim, the following radar measurements of
the speeds (in mph) is obtained for 10 vehicles traveling on this stretch of the interstate highway.
66
74
79
80
69
77
78
65
79
81
Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this
stretch of highway is at most 75 mph? Test the appropriate hypothesis using α = 0.01. Draw a box plot and
normal plot for this data, and comment.
Solution
We need to test
H0 : μ = 75 vs. Ha : μ > 75

Hypothesis Testing
80
75
70
65
■ FIGURE 1
Box plot of speed data.
For this sample, the sample mean is x = 74.8 mph and the standard deviation is σ = 5.9963 mph. Hence,
the observed test statistic is
x−μ0
74.8 − 75
t=
=
√
σ/√n
5.9963/
10
= −0.1054
From the t-table, t0.019 = 2.821. Hence, the rejection region is {t > 2.821}.
Because, t = −0.10547 does not fall in the rejection region, we do not reject the null hypothesis at α = 0.01.
Note that we assumed that the vehicles were randomly selected and that collected data follow the normal
distribution, because of the small sample size,< 30, we use the t-test.
Figures 1 and 2 are the box plot and the normal plot of the data, respectively.
99
ML Estimates
Mean : 74.8
95
Std Dev: 5.68858
90
80
70
60
50
40
30
20
10
5
1
55
65
75
85
95
Data
■ FIGURE 2
Normal probability plot for speed.
The box plot suggests that there are no outliers present. However, the normal plot indicates that the normality
assumption for this data set is not justified. Hence, it may be more appropriate to do a nonparametric test.

4 Hypotheses for a Single Parameter
Example 4.5
In attempting to control the strength of the wastes discharged into a nearby river, an industrial firm has
taken a number of restorative measures. The firm believes that they have lowered the oxygen consuming
power of their wastes from a previous mean of 450 manganate in parts per million. To test this belief,
readings are taken on n = 20 successive days. A sample mean of 312.5 and the sample standard deviation
106.23 are obtained. Assume that these 20 values can be treated as a random sample from a normal
population. Test the appropriate hypothesis. Use α = 0.05.
Solution
Here we need to test the following hypothesis:
H0 : μ = 450 vs. Ha : μ < 450
Give = 20, x = 312.5, and s = 106.23. The observed test statistic is
312.5 − 450
t=
√
= −5.79.
106.23/
20
The rejection region for α = 0.05 and with 19 degrees of freedom is the set of t-values such that
{t < −t0.05,19} = {t < −1.729}.
Decision: Because t = −5.79 is less than −1.729, reject H0. There is sufficient evidence to confirm the
firm’s belief.
For large random samples, the following procedure is used to perform tests of hypotheses about the
population proportion, p.
Example 4.6
A machine is considered to be unsatisfactory if it produces more than 8% defectives. It is suspected that the
machine is unsatisfactory. A random sample of 120 items produced by the machine contains 14 defectives.
Does the sample evidence support the claim that the machine is unsatisfactory? Use α = 0.01.
Solution
Let Y be the number of observed defectives. This follows a binomial distribution. However, because np0 and
nq0 are greater than 5, we can use a normal approximation to the binomial to test the hypothesis. So we
need to test H0 : p = 0.08 versus Ha : p > 0.08. Let the point estimate of p be p = (Y /n) = 0.117, the
sample proportion. Then the value of the TS is
p−p0
0.117 − 0.08
z=
√
= 0.13
√p0q0=
(0.08)(0.92)
n
120
For α = 0.01, z0.01 = 2.33. Hence, the rejection region is {z > 2.33}.

Hypothesis Testing
Decision: Because 0.137 is not greater than 2.33, we do not reject H0. We conclude that the evidence does
not support the claim that the machine is unsatisfactory.
SUMMARY OF LARGE SAMPLE HYPOTHESIS TEST FOR p
To test
H0 : p = p0
versus
p > p0, upper tail test
Ha : p < p0, lower tail test.
Test statistic:
p−p0
√p0q0
Z =
,
where σˆ
p =
,
where q0 = 1 − p0.
σˆ
n
p
⎧
⎨
z >zα,
upper tail RR
Rejection region :
z < −zα, lower tail RR
⎩ z > zα/2, two tail RR,
where z is the observed test statistic.
Assumption: n is large. A good rule of thumb is to use the normal approximation to the binomial
distribution only when np0 and n(1 − p0) are both greater than 5.
Decision: Reject H0, if the observed test statistic falls in the RR and conclude that Ha is true with
(1 − α)100% confidence. Otherwise, do not reject H0 because there is not enough evidence to
conclude that Ha is true for given α and more data are needed.
Note that this an approximate test, and the test can be improved by increasing the sample size.
Now we give the procedure for testing the population variance when the samples come from a normal
population.
SUMMARY OF HYPOTHESIS TEST FOR THE VARIANCE σ2
To test
H0 : σ2 = σ2
0
versus
σ2 > σ2
upper tail test
0,
Ha : σ2 < σ2
lower tail test
0,
σ2 = σ2
two-tailed test.
0,

4 Hypotheses for a Single Parameter
Test statistic:
2
(n − 1)S
χ2 =
σ2
0
where S2 is the sample variance.
Observed value of test statistic:
(n − 1)s2
σ2
0
⎧
⎪
χ2 > χ2
upper tail RR
α,n−1 ,
⎨
Rejection region :
χ2 < χ2
1−α,n−1 ,
lower tail RR
⎪
⎩χ2 > χ2
two tail RR
α/2,n−1 orχ2<χ1−α/2,n−1 ,
where χ2
α,n−1 issuchthattheareaunderthechi-squaredistributionwith(n−1)degreesoffreedomtoits
right is equal to α.
Assumption: Sample comes from a normal population.
Decision: Reject H0, if the observed test statistic falls in the RR and conclude that Ha is true with
(1 − α)100% confidence. Otherwise, do not reject H0 because there is not enough evidence to conclude
that Ha is true for given α and more data are needed.
Because the chi-square distribution is not symmetric, the “equal tails” used for the two-sided alter-
native may not be the best procedure. However, in real-world problems we seldom use a two tail test
for the population variance.
Example 4.7
A physician claims that the variance in cholesterol levels of adult men in a certain laboratory is at least 100.
A random sample of 25 adult males from this laboratory produced a sample standard deviation of
cholesterol levels as 12. Test the physician’s claim at 5% level of significance.
Solution
To test
H0 : σ2 = 100 versus Ha : σ2 < 100
for α = 0.05, and 24 degrees of freedom, the rejection region is
RR = {χ2 < χ2
1−α,n−1}={χ2<13.484}.
The observed value of the TS is
2
(n − 1)S
χ2 =
= (24)(144)
= 34.56.
σ2
100
0

Hypothesis Testing
Because the value of the test statistic does not fall in the rejection region, we cannot reject H0 at 5% level
of significance. Here, we assumed that the 25 cholesterol measurements follow the normal distribution.
EXERCISES 4
4.1.
A random sample of 50 measurements resulted in a sample mean of 62 with a sample
standard deviation 8. It is claimed that the true population mean is at least 64.
(a) Is there sufficient evidence to refute the claim at the 2% level of significance?
(b) What is the p-value?
(c) What is the smallest value of α for which the claim will be rejected?
4.2.
A machine in a certain factory must be repaired if it produces more than 12% defectives
among the large lot of items it produces in a week. A random sample of 175 items from
a week’s production contains 45 defectives, and it is decided that the machine must be
repaired.
(a) Does the sample evidence support this decision? Use α = 0.02.
(b) Compute the p-value.
4.3.
A random sample of 78 observations produced the following sums:
∑
∑
xi = 22.8,
(xi − x)2 = 2.05.
i=1
i=1
(a) Test the null hypothesis that μ = 0.45 against the alternative hypothesis that μ < 0.45
using α = 0.01. Also find the p-value.
(b) Test the null hypothesis that μ = 0.45 against the alternative hypothesis that μ = 0.45
using α = 0.01. Also find the p-value.
(c) What assumptions did you make for solving (a) and (b)?
4.4.
Consider the test H0 : μ = 35 vs. Ha : μ > 35 for a population that is normally distributed.
(a) A random sample of 18 observations taken from this population produced a sample
mean of 40 and a sample standard deviation of 5. Using α = 0.025, would you reject
the null hypothesis?
(b) Another random sample of 18 observations produced a sample mean of 36.8 and
a sample standard deviation of 6.9. Using α
= 0.025, would you reject the null
hypothesis?
(c) Compare and discuss the decisions of parts (a) and (b).
4.5.
According to the information obtained from a large university, professors there earned an
average annual salary of $55,648 in 1998. A recent random sample of 15 professors from
this university showed that they earn an average annual salary of $58,800 with a sample
standard deviation of $8300. Assume that the annual salaries of all the professors in this
university are normally distributed.

4 Hypotheses for a Single Parameter
(a) Suppose the probability of making a type I error is chosen to be zero. Without perform-
ing all the steps of test of hypothesis, would you accept or reject the null hypothesis
that the current mean annual salary of all professors at this university is $55,648?
(b) Using the 1% significance level, can you conclude that the current mean annual salary
of professors at this university is more than $55,648?
4.6.
A check-cashing service company found that approximately 7% of all checks submitted to the
service were without sufficient funds. After instituting a random check verification system to
reduce its losses, the service company found that only 70 were rejected in a random sample of
1125 that were cashed. Is there sufficient evidence that the check verification system reduced
the proportion of bad checks at α = 0.01? What is the p-value associated with the test? What
would you conclude at the α = 0.05 level?
4.
A manufacturer of washers provides a particular model in one of three colors, white, black,
or ivory. Of the first 1500 washers sold, it is noticed that 550 were of ivory color. Would
you conclude that customers have a preference for the ivory color? Justify your answer. Use
α = 0.01.
4.8.
A test of the breaking strength of six ropes manufactured by a company showed a mean
breaking strength of 6425 lb and a standard deviation of 120 lb. However, the manufacturer
claimed a mean breaking strength of 7500 lb.
(a) Can we support the manufacturer’s claim at a level of significance of 0.10?
(b) Compute the p-value. What assumptions did you make for this problem?
4.9.
A sample of 10 observations taken from a normally distributed population produced the
following data:
44
31
52
48
46
39
43
36
41
49
(a) Test the hypothesis that H0 : μ = 44 vs. Ha : μ = 44 using α = 0.10. Draw a box plot
and normal plot for this data, and comment.
(b) Find a 90% confidence interval for the population mean μ.
(c) Discuss the meanings of (a) and (b). What can we conclude?
4.10.
The principal of a charter school in Tampa believes that the IQs of its students are above
the national average of 100. From the past experience, IQ is normally distributed with a
standard deviation of 10. A random sample of 20 students is selected from this school and
their IQs are observed. The following are the observed values.
95
91
110
93
133
119
113
107
110
89
113
100
100
124
116
113
110
106
115
113
(a) Test for the normality of the data
(b) Do the IQs of students at the school run above the national average at α = 0.01?
4.11.
In order to find out whether children with chronic diarrhea have the same average hemo-
globin level (Hb) that is normally seen in healthy children in the same area, a random

Hypothesis Testing
sample of 10 children with chronic diarrhea are selected and their Hb levels (g/dL) are
obtained as follows.
12.3
11.4
14.2
15.3
14.8
13.8
11.1
15.1
15.8
13.2
Do the data provide sufficient evidence to indicate that the mean Hb level for children with
chronic diarrhea is less than that of the normal value of 14.6 g/dL? Test the appropriate
hypothesis using α = 0.01. Draw a box plot and normal plot for this data, and comment.
4.12.
A company that manufactures precision special-alloy steel shafts claims that the variance in
the diameters of shafts is no more than 0.0003. A random sample of 10 shafts gave a sample
variance of 0.0002 At the 5% level of significance, test whether the company’s claim can
be substantiated.
4.13.
It was claimed that the average annual expenditures per consumer unit had continued to
rise, as measured by the Consumer Price Index annual averages (Bureau of Labor Statistics
report, 1995). To test this claim, 100 consumer units were randomly selected in 1995 and
found to have an average annual expenditure of $32,277 with a standard deviation of $1200.
Assuming that the average annual expenditure of all consumer units was $30,692 in 1994,
test at the 5% significance level whether the annual expenditure per consumer unit had
really increased from 1994 to 1995.
4.14.
It is claimed that two of three Americans say that the chances of world peace are seriously
threatened by the nuclear capabilities of other countries. If in a random sample of 400
Americans, it is found that only 252 hold this view, do you think the claim is correct? Use
α = 0.05. State any assumptions you make in solving this problem.
4.15.
According to the Bureau of Labor Statistics (1996), the average price of a gallon of gasoline
in all U.S. cities in the United States in January 1996 was $1.129. A later random sample in
24 cities found the mean price to be $1.24 with a standard deviation of 0.01. Test at α = 0.05
to see whether the average price of a gallon of gas in the cities had recently changed.
4.16.
A manufacturer claims that the mean life of batteries manufactured by his company is at
least 44 months. A random sample of 40 of these batteries was tested, resulting in a sample
mean life of 41 months with a sample standard deviation of 16 months. Test at α = 0.01
whether the manufacturer’s claim is correct.
5 TESTING OF HYPOTHESES FOR TWO SAMPLES
In this section we study the hypothesis testing procedures for comparing the means and variances
of two populations. For example, suppose that we want to determine whether a particular drug is
effective for a certain illness. The sample subjects will be randomly selected from a large pool of
people with that particular illness and will be assigned randomly to the two groups. To one group
we will administer a placebo; to the other we will administer the drug of interest. After a period of
time, we measure a physical characteristic, say the blood pressure, of each subject that is an indicator
of the severity of the illness. The question is whether the drug can be considered effective on the
population from which our samples have been selected. We will consider the cases of independent
and dependent samples.

5 Testing of Hypotheses for Two Samples
5.1 Independent Samples
Two random samples are drawn independently of each other from two populations, and the sample
information is obtained. We are interested in testing a hypothesis about the difference of the true
means. Let X11, . . . , X1n be a random sample from population 1 with mean μ1 and variance σ2
1,and
X21,…,X2n be a random sample from population 2 with mean μ2 and variance σ2
2.LetXi,
i = 1,2,
represent the respective sample means and S2,i = 1,2, represent the sample variances. In this case,
i
we shall consider following three cases in testing hypotheses about μ1 and μ2: (i) when σ2
1 andσ2
are known, (ii) when σ2
1 andσ2areunknownandn1 ≥30andn2 ≥30,and(iii)whenσ1andσ2are
unknown and n1 < 30 and n2 < 30. In case (iii) we have the following two possibilities, (a) σ2
σ2
1 =
2,
and (b) σ2
σ2
1 =
2.
In the large sample case, knowledge of population variances σ2
1 andσ2doesnotmakemuchdiffer-
ence. If the population variances are unknown, we could replace them with sample variances as an
approximation. If both n1 ≥ 30 and n2 ≥ 30 (large sample case), we can use normal approximation.
The following box sums up a large sample hypothesis testing procedure for the difference of means
for the large sample case.
SUMMARY OF HYPOTHESIS TEST FOR μ1 − μ2 FOR LARGE SAMPLES (n1& n2
≥ 30)
To test
H0 : μ1 − μ2 = D0
versus
⎧
⎨μ1 − μ2 > D0, upper tailed test
Ha :
μ1 − μ2 < D0, lower tailed test
⎩
μ1 − μ2 = D0, two-tailed test.
The test statistic is
X1 − X2 − D0
Z =
√
σ2
1
+ σ2
n2
n1
Replace σi by Si , if σi ,i = 1,2 are not known.
Rejection region is
⎧
⎪z >zα,
upper tail RR
⎨
RR :
z < −zα, lower tail RR
⎪
⎩ z > zα/2, two tail RR,
where z is the observed test statistic given by
x1 − x2 − D0
z =
√
σ2
σ2
1
2
+
n1
n2

Hypothesis Testing
Assumption: The samples are independent and n1 and n2 ≥ 30.
Decision: Reject H0, if test statistic falls in the RR and conclude that Ha is true with (1 − a)100% confidence.
Otherwise, do not reject H0 because there is not enough evidence to conclude that Ha is true for given α
and more experiments are needed.
Example 5.1
In a salary equity study of faculty at a certain university, sample salaries of 50 male assistant professors and
50 female assistant professors yielded the following basic statistics.
Sample mean
Sample standard
salary
deviation
Male assistant professor
$36,400
360
Female assistant professor
$34,200
220
Test the hypothesis that the mean salary of male assistant professors is more than the mean salary of female
assistant professors at this university. Use α = 0.05.
Solution
Let μ1 be the true mean salary for male assistant professors and μ2 be the true mean salary for female
assistant professors at this university. To test
H0 : μ1 − μ2 = 0 vs. Ha : μ1 − μ2 > 0
the test statistic is
x1 − x2 − D0
36,400 − 34,200
z=
√
=
√
= 36.872.
s2
s2
(360)2
1
2
+
+ (220)2
n1
n2
50
50
The rejection region for α = 0.05 is {z > 1.645}.
Because z = 36.872 > 1.645, we reject the null hypothesis at α = 0.05. We conclude that the salary of
male assistant professors at this university is higher than that of female assistant professors for α = 0.05.
Note that even though σ2
30 and n2 ≥ 30, we could replace σ2
1 andσ2areunknown,becausen1 ≥
1 and
σ2
2 bytherespectivesamplevariances.Weareassumingthatthesalariesofmaleandfemalearesampled
independently of each other.
Giveext is the procedure we follow to compare the true means from two independent normal
populations when n1 and n2 are small (n1 < 30 or n2 < 30) and we can assume homogeneity in the
population variances, that is, σ2
σ2
In this case, we pool the sample variances to obtain a point
1 =
2.
estimate of the common variance.

5 Testing of Hypotheses for Two Samples
COMPARISON OF TWO POPULATION MEANS, SMALL SAMPLE CASE (POOLED t-TEST)
To test
H0 : μ1 − μ2 = D0
versus
μ1 − μ2 > D0, upper tailed test
Ha : μ1 − μ2 < D0, lower tailed test
μ1 − μ2 = D0, two-tailed test.
The test statistic is
X1 − X2 − D0
T =
√
1
1
Sp
n1 +
n2
Here the pooled sample variance is
(n1 − 1)S2
(n2 − 1)S2
1 +
2
S2
p =
n1 + n2 − 2
Then the rejection region is
⎧
⎪
t >tα,
upper tailed test
⎨
RR :
t < −tα, lower tail test
⎪
⎩ t > tα/2, two-tailed test
where t is the observed test statistic and tα is based on (n1 + n2 − 2) degrees of freedom, and such that
P(T > tα) = α.
Decision: Reject H0, if test statistic falls in the RR and conclude that Ha is true with (1 − α)100% confidence.
Otherwise, do not reject H0 because there is not enough evidence to conclude that Ha is true for given α.
Assumptions: The samples are independent and come from normal populations with means μ1 and μ2,
and with the (unknown) but equal variances, that is, σ2
σ2
1 =
2.
Now we shall consider the case where σ2
1 andσ2areunknownandcannotbeassumedtobeequal.
In such a case the following test is often used. For the hypothesis
⎧
⎨μ1 − μ2 > D0
H0 : μ1 − μ2 = D0 vs. H0 :
μ1 − μ2 < D0
⎩
μ1 − μ2 = D0

Hypothesis Testing
define the test statistic Tν as
X1 − X2 − D0
Tν =
√
S2
S2
1
2
n1 +
n2
where Tν has a t-distribution with ν degrees of freedom, and
[
(s2
1/n1)+(s2/n2)]2
ν=
(s2
(s2
1/n1)2
2/n2)2
+
n1 − 1
n2 − 1
The value of ν will not necessarily be an integer. In that case, we will round it down to the nearest
integer. This method of hypothesis testing with unequal variances is called the Smith-Satterthwaite
procedure. Even though this procedure is not widely used, some simulation studies have shown that
the Smith-Satterthwaite procedure perform well when variances are unequal and it gives results that
are more or less equivalent to those obtained with the pooled t-test when the variances are equal.
However, when the sample sizes are approximately equal, the pooled t-test may still be used. Note
that in addressing the question which of the cases (iii)(a) or (iii)(b) to use in a given problem, we
suggest that if the point estimates S2
1 ofσ1,andS2ofσ2areapproximatelythesame,thenitislogical
to assume homogeneity, σ2
σ2
1 =
2 anduse(iii)(a),whereasifS1andS2aresignificantlydifferentwe
use (iii)(b). More appropriately, we have tests that can be used to test hypotheses concerning σ2
1 =
σ22
or σ2
σ2
1 =
2,knownastheF-test,whichwediscussattheendofthissubsection.
Example 5.2
The intelligence quotients (IQs) of 17 students from one area of a city showed a sample mean of 106 with a
sample standard deviation of 10, whereas the IQs of 14 students from another area chosen independently
showed a sample mean of 109 with a sample standard deviation of Is there a significant difference
between the IQs of the two groups at α = 0.02? Assume that the population variances are equal.
Solution
We test
H0 : μ1 − μ2 = 0 vs. Ha : μ1 − μ2 = 0
Here n1 = 17, x1 = 106, and s1 = 10. Also, n2 = 14, x2 = 109, and s2 =
We have
(n1 − 1)s2
(n2 − 1)s2
1 +
2
s2
p =
n1 + n2 − 2
2
= (16)(10)2+(13)(7)
= 7138.
29

5 Testing of Hypotheses for Two Samples
The test statistic is
X1 − X2 − D0
106 − 109
T =
=
= −0.94644.
)√1
√ 1
1
(√
sp
+
7138
n1
n2
17+11
4
For α = 0.02, t0.01,29 = 2.462. Hence, the rejection region is t < − 2.462 or t > 2.462.
Because the observed value of the test statistic, T = −0.94644, does not fall in the rejection region, there is
not enough evidence to conclude that the mean IQs are different for the two groups. Here we assume that
the two samples are independent and taken from normal populations.
Example 5.3
Assume that two populations are normally distributed with unknown and unequal variances. Two inde-
pendent samples were drawn from these populations and the data obtained resulted in the following basic
statistics:
n1 = 18
x1 = 20.17
s1 = 4.3
n2 = 12
x2 = 19.23
s2 = 3.8
Test at the 5% significance level whether the two population means are different.
Solution
We need to test the hypothesis
H0 : μ1 − μ2 = 0 versus Ha : μ1 − μ2 = 0.
Here n1 = 18, x1 = 20.17, and s1 = 4.3. Also, n2 = 12, x2 = 19.23, and s2 = 3.8.
The degrees of freedom for the t-distribution are given by
(
)2
s2
s2
1/n1 +
2/n2
ν=
(s2
(s2
1/n1)2
2/n2)2
+
n1 − 1
n2 − 1
(
(4.3)2
+ (3.8)2)2
18
12
=
(
(
= 25.685.
(4.3)2)2
(3.8)2)2
18
12
+
17
11
Hence, we have ν = 25 degrees of freedom. For α = 0.05, t0.025,25 = 2.060. Thus, the rejection region is
t < −2.060 or t > 2.060.
The test statistic is given by
x1 − x2 − D0
Tν =
√
S2
S2
1
2
n1 +
n2

Hypothesis Testing
20.17 − 19.23
=
√
= 0.62939.
(4.3)2
+ (3.8)2
18
12
Because the observed value of the test statistic, Tν = 0.62939, does not fall in the rejection region, we do not
reject the null hypothesis. At α = 0.05 there is not enough evidence to conclude that the population means
are different. Note that the assumptions we made are that the samples are independent and came from two
normal populations. No homogeneity assumption is made.
Example 5.4
Infrequent or suspended menstruation can be a symptom of serious metabolic disorders in women. In a
study to compare the effect of jogging and running on the number of menses, two independent subgroups
were chosen from a large group of women, who were similar in physical activity (aside from running),
heights, occupations, distribution of ages, and type of birth control methods being used. The first group
consisted of a random sample of 26 women joggers who jogged “slow and easy” 5 to 30 miles per week,
and the second group consisted of a random sample of 26 women runners who ran more than 30 miles per
week and combined long, slow distance with speed work. The following summary statistics were obtained
(E. Dale, D. H. Gerlach, and A. L. Wilhite, “Menstrual Dysfunction in Distance Runners,” Obstet. Gynecol. 54,
47-53, 1979).
Joggers x1 = 10.1, s1 = 2.1
Runners x2 = 9.1, s2 = 2.4
Using α = 0.05, (a) test for differences in meaumber of menses for each group assuming equality of
population variances, and (b) test for differences in meaumber of menses for each group assuming
inequality of population variances.
Solution
Here we need to test
H0 : μ1 − μ2 = 0 versus Ha : μ1 − μ2 = 0.
Here, n1 = 26, x1 = 10.1, and s1 = 2.1. Also, n2 = 26, x2 = 9.1, and s2 = 2.4.
(a) Under the assumption σ2
σ2
1 =
2,wehave
(n1 − 1)s2
(n2 − 1)s2
1 +
2
s2
p =
n1 + n2 − 2
2
= (25)(2.1)2+(25)(2.4)
= 5.085.
50
The test statistic is
X1 − X2 − D0
T =
√
1
1
sp
n1 +
n2

5 Testing of Hypotheses for Two Samples
10.1 − 9.1
=
)√
= 1.5989.
(√
1
1
5.085
26 +
26
For α = 0.05, t0.025,50 ≈ 1.96. Hence, the rejection region is t < −1.96 and t > 1.96. Because
T = 1.589 does not fall in the rejection region, we do not reject the null hypothesis. At α = 0.05
there is not enough evidence to conclude that the population meaumber of menses for joggers
and runners are different.
(b)
Under the assumption σ2
σ2
1 =
2,wehave
(
)2
s2
s2
1/n1 +
2/n2
ν=
(s2
1/n1)2
+ (s2/n2)2
n1−1
n2−1
(
(2.1)2
+ (2.4)2)2
26
26
=
(
(
= 49.134.
(2.1)2)2
(2.4)2)2
26
26
+
25
25
Hence, we have ν = 49 degrees of freedom. Because this value is large, the rejection region is still
approximately t < − 1.96 and t > 1.96. Hence, the conclusion is the same as that of part (a). In
both parts (a) and (b), we assumed that the samples are independent and came from two normal
populations.
Now we present the summary of the test procedure for testing the difference of two proportions,
inherent in two binomial populations. Here, again we assume that the binomial distribution is
approximated by the normal distribution and thus it is an approximate test.
SUMMARY OF HYPOTHESIS TEST FOR (p1 − p2) FOR LARGE SAMPLES (nipi > 5 AND niqi > 5,
FOR i = 1, 2)
To test
H0 : p1 − p2 = D0
versus
p1 − p2 < D0,
upper tailed test
Ha : p1 − p2 > D0, lower tailed test
p1 − p2 = D0,
two-tailed test
at significance level α, the test statistic is
p1 − p2 − D0
Z =
√
p1 q1
+ p2q2
n1
n2
where z is the observed value of Z .

Hypothesis Testing
The rejection region is
⎧
⎨
z >zα,
upper tailed RR
RR :
z < −zα, lower tailed RR
⎩ z > zα/2, two-tailed RR
Assumption: The samples are independent and
nipi > 5 and niqi > 5, for i = 1,2.
Decision: Reject H0 if the test statistic falls in the RR and conclude that Ha is true with (1 − a)100%
confidence. Otherwise, do not reject H0, because there is not enough evidence to conclude that Ha is true
for given α and more experiments are needed.
Example 5.5
Because of the impact of the global economy on a high-wage country such as the United States, it is claimed
that the domestic content in manufacturing industries fell between 1977 and 199 A survey of 36 randomly
picked U.S. companies gave the proportion of domestic content total manufacturing in 1977 as 0.37 and in
1997 as 0.36. At the 1% level of significance, test the claim that the domestic content really fell during the
period 1977-199
Solution
Let p1 be the domestic content in 1977 and p2 be the domestic content in 199
Give1 = n2 = 36, p1 = 0.37 and p2 = 0.36. We need to test
H0 : p1 − p2 = 0 vs. Ha : p1 − p2 > 0.
The test statistic is
p1 − p2
z=
√
p1 q2
+ ˆp1q2
n1
n2
0.37 − 0.36
=
√
= 0.08813.
(0.37)(0.63)
+ (0.36)(0.64)
36
36
For α = 0.01, z0.01 = 2.325. Hence, the rejection region is z > 2.325.
Because the observed value of the test statistic does not fall in the rejection region, at α = 0.01, there is not
enough evidence to conclude that the domestic content in manufacturing industries fell between 1977 and
199
Let X1, . . . , Xn and Y1, . . . , Yn be two independent random samples from two normal populations
with sample variances s2
1 ands2,respectively.Theproblemhereisoftestingfortheequalityofthe

5 Testing of Hypotheses for Two Samples
variances, H0 : σ2
σ2
1 =
2.WehavealreadyseeninChapter4that
S2
1/σ1
F =
S2
2/σ2
follows the F -distribution with ν1 = n1 − 1 numerator and ν2 = n2 − 1 degrees of freedom. Under
the assumption H0 : σ2
σ2
1 =
2,wehave
S2
1
F =
S2
2
which has an F -distribution with (ν1, ν2) degrees of freedom. We summarize the test procedure for
the equality of variances.
TESTING FOR THE EQUALITY OF VARIANCES
To test
H0 : σ2
σ2
1 =
2
versus
σ2
σ2
lower tailed test
1 >
2,
Ha : σ2
σ2
upper tailed test
1 <
2,
σ2
σ2
two-tailed test
1 =
2,
at significance level α, the test statistic is
S2
1
F =
S2.
2
The rejection region is
⎧
⎨
f
> Fα(ν1,ν2),
upper tailed RR
RR :
f
< F1−α(ν1,ν2),
lower tailed RR
⎩
f
> Fα/2(ν1,ν2) or f < F1−α/2(ν1,ν2), two-tailed RR
where f is the observed test statistic given by f =s1
s2 .
2
Decision: Reject H0 if the test statistic falls in the RR and conclude that Ha is true with (1 − α)100%
confidence. Otherwise, keep H0, because there is not enough evidence to conclude that Ha is true for
a given α and more experiments are needed.
Assumption:
(i) The two random samples are independent.
(ii) Both populations are normal.
Recall from Section 4.2 that in order to find F1−α(ν1, ν2), we use the identity F1−α(ν1, ν2
)
=
(1/Fα(ν2, ν1)).

Hypothesis Testing
Example 5.6
Consider two independent random samples X1, . . . , Xn from an N(μ1, σ2
distribution and Y1, . . . , Yn
1)
from an N(μ2, σ2
σ2
σ2
σ2
σ2
2)distribution.TestH0 :
1 =
2 versusHa :
1 =
2 forthefollowingbasicstatistics:
n1 = 25,x1 = 410,s2
95, and n2 = 16, x2 = 390, s2
300
1 =
2 =
Use α = 0.20.
Solution
Test H0 : σ2
σ2
σ2
σ2
This is a two-tailed test.
1 =
2 versusHa :
1 =
2.
Here the degrees of freedom are ν1 = 24 and ν2 = 15. The test statistic is
s2
95
1
F =
=
= 0.31
s2
300
2
From the F -table, F0.10(24, 15) = 1.90 and F0.90(24, 15) =(1/F0.10(15, 24)) = 0.50.
Hence, the rejection region is F > 1.90 or F < 0.56. Because the observed value of the test statistic, 0.317,
is less than 0.56, we reject the null hypothesis. There is evidence that the population variances are not equal.
5.2 Dependent Samples
We now consider the case where the two random samples are not independent. When two samples
are dependent (the samples are dependent if one sample is related to the other), then each data
point in one sample can be coupled in some natural, nonrandom fashion with each data point in
the second sample. This situation occurs when each individual data point within a sample is paired
(matched) to an individual data point in the second sample. The pairing may be the result of the
individual observations in the two samples: (1) representing before and after a program (such as
weight before and after following a certain diet program), (2) sharing the same characteristic, (3)
being matched by location, (4) being matched by time, (5) control and experimental, and so forth.
Let (X1i, X2i), for i = 1, 2, . . . , n, be a random sample. X1i, and X2j (i = j) are independent. To test
the significance of the difference between two population means when the samples are dependent,
we first calculate for each pair of scores the difference, Di = X1i − X2i, i = 1, 2, . . . , n, between the
two scores. Let μD = E(Di). Because pairs of observations form a random sample D1, . . . , Dn are
independent and identically distributed random variables, if d1, . . . , dn are the observed values of
D1,…,Dn, then we define
∑
∑ )2
d2
−
di
∑
∑
i
1
1
n
i=1
d =
di and s2
(di − d)2 =i=1
d =
n
n−1
n−1
i=1
i=1
Now the testing for these n observed differences will proceed as in the case of a single sample. If the
number of differences is large (n ≥ 30), large sample inferential methods for one sample case can
be used for the paired differences. We now summarize the hypothesis testing procedure for small
samples.

5 Testing of Hypotheses for Two Samples
SUMMARY OF TESTING FOR MATCHED PAIRS EXPERIMENT
To test
μD > d0,
upper tail test
H0 : μD = d0 versus Ha :
μD < d0, lower tail test
μD = d0, two-tailed test
the test statistic: T =SD/√
n (thisapproximatelyfollowsaStudentt-distributionwith(n−1)degreesof
freedom).
The rejection region is
⎧
⎨
t >tα,n−1,
upper tail RR
t < −tα,n−1, lower tail RR
⎩
t > tα/2,n−1, two-tailed RR
where t is the observed test statistic.
Assumptions: The differences are approximately normally distributed.
Decision: Reject H0 if the test statistic falls in the RR and conclude that Ha is true with (1 − α)100%
confidence. Otherwise, do not reject H0, because there is not enough evidence to conclude that Ha is true
for a given α and more data are needed.
Example 5.7
A new diet and exercise program has been advertised as remarkable way to reduce blood glucose levels in
diabetic patients. Ten randomly selected diabetic patients are put on the program, and the results after 1
month are given by the following table:
Before
268
225
252
192
307
228
246
298
231
185
After
106
186
223
110
203
101
211
176
194
203
Do the data provide sufficient evidence to support the claim that the new program reduces blood glucose
level in diabetic patients? Use α = 0.05.
Solution
We need to test the hypothesis
H0 : μD = 0
vs. Ha : μD < 0.
First we calculate the difference of each pair given in the following table.
Before
268
225
252
192
307
228
246
298
231
185
After
106
186
223
110
203
101
211
176
194
203
Difference
−162
−39
−29
−82
−104
−127
−35
−122
−37
18
(after−before)

Hypothesis Testing
From the table, the mean of the differences is d = −71.9 and the standard deviation sd = 56.2.
The test statistic is
d−d0
−71.9
t=
=
√
= −4.0457 ≈ −4.05.
sd/√n
56.2/
10
From the t-table, t0,05,9 = 1.833. Because the observed value of t = − 4.05 < −t0,05,9 = −1.833, we reject
the null hypothesis and conclude that the sample evidence suggests that the new diet and exercise program
is effective.
We can also obtain a (1 − α)100% confidence interval for μD using the formula
(
)
Sd
d
D−tα/2
√n,D+tα/2√n
where tα/2 is obtained from the t-table with (n − 1) degrees of freedom. The interpretation of the
confidence interval is identical to the earlier interpretation.
Example 5.8
For the data in Example 5.7, obtain a 95% confidence interval for μD and interpret its meaning.
Solution
We have already calculated d = − 71.9 and sd = 56.2. From the t-table, t0.025,9 = 2.262. Hence, a 95%
confidence interval for μD is (−112.1, −31.7). That is, P (−112.1 ≤ μD ≤ −31.7) = 0.95. Note that
μD = μ1 − μ2, and from the confidence limits we can conclude with 95% confidence that μ2 is always
greater than μ1, that is, μ2 > μ1.
It is interesting to compare the matched pairs test with the corresponding two independent sample
test. One of the natural questions is, why must we take paired differences and then calculate the mean
and standard deviation for the differences—why can’t we just take the difference of means of each
sample, as we did for independent samples? The answer lies in the fact that σ2
D neednotbeequalto
σ2
(X1−X2).Assumethat
E(Xji) = μj , Var(Xji) = σ2
for j = 1, 2,
j,
and
Cov(X1i, X2i) = ρσ1σ2
where ρ denotes the assumed common correlation coefficient of the pair (X1i, X2i) for i = 1, 2, . . . , n.
Because the values of Di, i = 1, 2, . . . , n, are independent and identically distributed,
μD = E(Di) = E(X1i) − E(X2i) = μ1 − μ2

5 Testing of Hypotheses for Two Samples
and
σ2
D =Var(Di)=Var(X1i)+Var(X2i)−2Cov(X1i,X2i)
=σ2
1 +σ2−2ρσ1σ2.
From these calculations,
E(D) = μD = μ1 − μ2
and
σ2
1
D
σ2
=
D =Var(D)=
n
n(σ1+σ2−2ρσ1σ2).
Now, if the samples were independent with n1 = n2 = n,
E(X1 − X2) = μ1 − μ2
and
1
σ2
(X1−X2) =
n(σ1+σ2).
Hence, if ρ > 0, then σ2
σ2
D <
(X1−X2).Asaresult,wecanseethatthematchedpairstestreducesany
variability introduced by differences in physical factors in comparison to the independent samples
test when ρ > 0. It is also important to observe that normality assumption for the difference does not
imply that the individual samples themselves are normal. Also, in a matched pairs experiment, there
is no need to assume the equality of variances for the two populations. Matching also reduces degrees
of freedom, because in case of two independent samples, the degrees of freedom is (n1 + n2 − 2),
whereas for the case of two dependent samples it is only (n − 1).
EXERCISES 5
5.1. Two sets of elementary school children were taught to read by different methods, 50 by each
method. At the conclusion of the instructional period, a reading test gave results y1 = 74,
y2 = 71, s1 = 9, and s2 = 10. What is the attained significance level if you wish to see if
there is evidence of a real difference between the two population means? What would you
conclude if you desired an α-value of 0.05?
5.2. The following information was obtained from two independent samples selected from two
normally distributed populations with unknown but equal variances.
Sample 1
14
15
11
14
10
8
13
10
12
16
15
Sample 2
17
16
21
12
20
18
16
14
21
20
13
20
13
Test at the 2% significance level whether μ1 is lower than μ2.

Hypothesis Testing
5.3.
In the academic year 1997-1998, two random samples of 25 male professors and 23 female
professors from a large university produced a mean salary for male professors of $58,550
with a standard deviation of $4000 and an average for female professors of $53,700 with a
standard deviation of $3200. At the 5% significance level, can you conclude that the mean
salary of all male professors for 1997-1998 was higher than that of all female professors?
Assume that the salaries of male and female professors are both normally distributed with
equal standard deviations.
5.4.
It is believed that the effects of smoking differ depending on race. The following table gives
the results of a statistical study for this question.
Number in the Average number of Number of lung
study
cigarettes per day
cancer cases
Whites
400
15
78
African
280
15
70
Americans
Do the data indicate that African Americans are more likely to develop lung cancer due to
smoking? Use α = 0.05.
5.5.
A supermarket chain is considering two sources A and B for the purchase of 50-pound bags
of onions. The following table gives the results of a study.
Source A
Source B
Number of bags weighed
80
100
Mean weight
105.9
100.5
Sample variance
0.21
0.19
Test at α = 0.05 whether there is a difference in the mean weights.
5.6.
In order to compare the mean Hemoglobin (Hb) levels of well-nourished and undernour-
ished groups of children, random samples from each of these groups yielded the following
summary.
Number of Sample Sample standard
children
mean
deviation
Well nourished
95
11.2
0.9
Undernourished
75
9.8
1.2
Test at α = 0.01 whether the mean Hb levels of well-nourished children were higher than
those of undernourished children.
5.
An aquaculture farm takes water from a stream and returns it after it has circulated through
the fish tanks. In order to find out how much organic matter is left in the waste water after
the circulation, some samples of the water are taken at the intake and other samples are
taken at the downstream outlet and tested for biochemical oxygen demand (BOD). BOD is
a common environmental measure of the quantity of oxygen consumed by microorganisms
during the decomposition of organic matter. If BOD increases, it can be said that the waste

5 Testing of Hypotheses for Two Samples
matter contains more organic matter than the stream can handle. The following table gives
data for this problem.
Upstream
9.0
6.8
6.5
8.0
7
8.6
6.8
8.9
2
0
Downstream
10.2
10.2
9.9
11.1
9.6
8.7
9.6
9.7
10.4
8.1
Assuming that the samples come from a normal distribution,
(a) Test that the mean BOD for the downstream samples is less than for the samples
upstream at α = 0.05. Assume that the variances are equal.
(b) Test for the equality of the variances at α = 0.05.
(c) In parts (a) and (b), we assumed samples are independent. Now, we feel this assump-
tion is not reasonable. Assuming that the difference of each pair is approximately
normal, test that the mean BOD for the downstream samples is less than for the
upstream samples at α = 0.05.
5.8.
Suppose we want to know the effect on driving of a drug for cold and allergy, in a study
in which the same people were tested twice, once after 1 hour of taking the drug and once
wheo drug is taken. Suppose we obtain the following data, which represent the number
of cones (placed in a certain pattern) knocked down by each of the nine individuals before
taking the drug and after an hour of taking the drug.
No drug
0
0
3
2
0
0
3
3
1
After drug
1
5
6
5
5
5
6
1
6
Assuming that the difference of each pair is coming from an approximately normal distribu-
tion, test if there is any difference in the individuals’ driving ability under the two conditions.
Use α = 0.05.
5.9.
Suppose that we want to evaluate the role of intravenous pulse cyclophosphamide (IVCP)
infusion in the management of nephrotic syndrome in children with steroid resistance.
Children were given a monthly infusion of IVCP in a dose of 500 to 750 mg/m2. The
following data (source: S. Gulati and V. Kher, “Intravenous pulse cyclophosphamide—A new
regime for steroid resistant focal segmental glomerulosclerosis,” Indian Pediatr. 37, 2000)
represent levels of serum albumin (g/dL) before and after IVCP in 14 randomly selected
children with nephrotic syndrome.
Pre-IVCP
2.0
2.5
1.5
2.0
2.3
2.1
2.3
1.0
2.2
1.8
2.0
2.0
1.5
3.4
Post-IVCP
3.5
4.3
4.0
4.0
3.8
2.4
3.5
1.7
3.8
3.6
3.8
3.8
4.1
3.4
Assuming that the samples come from a normal distribution:
(a) Test whether the mean Pre-IVCP is less than the mean Post-IVCP at α = 0.05. Assume
that the variances are equal.
(b) Test for the equality of the variances at α = 0.05.
(c) In parts (a) and (b), we assumed that the samples are independent. Now, we feel
this assumption is not reasonable. Assuming that the difference of each pair is
approximately normal, test that the mean Pre-IVCP is less than the Post-IVCP at
α=0.05.

Hypothesis Testing
5.10.
Show that S2
D isanunbiasedestimatorofσD.
5.11.
Test H0 : σ21 = σ2
σ2
σ2
2 versusHa :
1 =
2 forthefollowingdata.
n1 = 10,x1 = 71,s2
64
and n2 = 25, x2 = 131, s2
96.
1 =
2 =
Use α = 0.10.
5.12.
The IQs of 17 students from one area of a city showed a mean of 106 with a standard
deviation of 10, whereas the IQs of 14 students from another area showed a mean of 109 with
a standard deviation of Test for equality of variances between the IQs of the two groups at
α = 0.02.
5.13.
The following data give SAT mean scores for math by state for 1989 and 1999 for 20 randomly
selected states (source: The World Almanac and Book of Facts 2000).
State
1989
1999
Arizona
523
525
Connecticut
498
509
Alabama
539
555
Indiana
487
498
Kansas
561
576
Oregon
509
525
Nebraska
560
571
New York
496
502
Virginia
507
499
Washington
515
526
Illinois
539
585
North Carolina
469
493
Georgia
475
482
Nevada
512
517
Ohio
520
568
New Hampshire
510
518
Assuming that the samples come from a normal distribution:
(a) Test that the mean SAT score for math in 1999 is greater than that in 1989 at α =
0.05.
Assume the variances are equal.
(b) Test for the equality of the variances at α = 0.05.
6 CHI-SQUARE TESTS FOR COUNT DATA
In this section, we study several commonly used tests for count data. These are basically large sample
tests based on a χ2-approximation. Suppose that we have outcomes of a multinomial experiment that
consists of K mutually exclusive and exhaustive events A1, . . . , Ak . Let P (Ai) = pi, i = 1, 2, . . . , k.
Then∑n
i=1 pi
= 1. Let the experiment be repeated n times, and let Xi(i = 1, 2, . . . , k) represent
the number of times the event Ai occurs. Then (X1, . . . , Xk ) have a multinomial distribution with
parameters n, p1, . . . , pk .

6 Chi-Square Tests for Count Data
Let
∑
(Xi − npi)2
Q2 =
2.
i=1 (Xi−npi)
It can be shown that for large n, the random variable Q2 is approximately χ2-distributed with (k − 1)
degrees of freedom. It is usual to demand npi ≥ 5 (i = 1, 2, . . . , k) for the approximation to be valid,
although the approximation generally works well if for only a few values of i (about 20%), npi ≥ 1
and the rest (about 80%) satisfy the condition npi ≥ 5. This statistic was proposed by Karl Pearson
in 1900.
It should be noted that the χ2-tests that we discuss in this section are approximate tests valid for
large samples. Often Xi is called the observed frequency and is denoted by Oi (this is the observed
value in class i), and npi is called the expected frequency and is denoted by Ei (this is the theoretical
distribution frequency under the null hypothesis). Thus, with these notations, we get
∑
(Oi − Ei)2
Q2 =
Ei
i=1
Example 6.1
A plant geneticist grows 200 progeny from a cross that is hypothesized to result in a 3 : 1 phenotypic
ratio of red-flowered to white-flowered plants. Suppose the cross produces 170 red- to 30 white-flowered
plants. Calculate the value of Q2 for this experiment.
Solution
There are two categories of data totaling n = 200. Hence, k = 2. Let i = 1 represent red-flowered and i = 2
represent white-flowered plants. Then O1 = 170, and O2 = 30.
Here, H0 : The flower color population ratio is not different from 3 : 1, and the alternate is Ha : The flower
color population sampled has a flower color ratio that is not 3 red : 1 white.
Under the null hypothesis, the expected frequencies are E1 = (200)(3/4) = 150, and E2 = (200)(1/4) = 50.
Hence,
∑
(Oi − Ei)2
Q2 =
Ei
i=1
2
= (170−150)
+ (30−50)2
= 10.66
150
50
The type of calculation in Example 6.1 gives a measure of how close our observed frequencies come
to the expected frequencies and is referred to as a measure of goodness of fit. Smaller values of Q2
values indicate better fit.
One of the most frequent uses of the χ2-test is in comparison of observed frequencies. Unless the
sample size is exactly 100, percentages cannot be used. These are approximate tests. Let the random

Hypothesis Testing
variables (X1, . . . , Xk ) have a multinomial distribution with parameters n, p1, . . . , pk . Let n be known.
We will now present some important tests based on the chi-square statistic.
6.1 Testing the Parameters of Multinomial Distribution: Goodness-of-Fit
Test
Let an experiment have k mutually exclusive and exhaustive outcomes A1, A2, . . . , Ak . We would
like to test the null hypothesis that all the pi = p(Ai), i = 1, 2, . . . , k are equal to knowumbers
pi0,i = 1,2,…,k. We now summarize the test procedure.
TESTING THE PARAMETERS OF A MULTINOMIAL DISTRIBUTION (SUMMARY)
To test
H0 : p1 = p10,…,pk = pk0
versus
Ha : At least one of the probabilities is different from the hypothesized value.
The test is always a one-sided upper tail test.
Let Oi be the observed frequency, Ei = npi0 be the expected frequency (frequency under the null
hypothesis), and k be the number of classes. The test statistic is
∑
(Oi − Ei )2
Q2 =
Ei
i=1
The test statistic Q2 has an approximate chi-square distribution with k − 1 degrees of freedom.
The rejection region is
Q2 ≥ χ2
α,k −1.
Assumption: Ei ≥ 5: Exact methods are available. Computing the power of this test is difficult.
This test is known as the goodness-of-fit test. It implies that if the observed data are very close to the
expected data, we have a very good fit and we accept the null hypothesis. That is, for small Q2 values,
we accept H0.
Example 6.2
A TV station broadcasts a series of programs on the ill effects of smoking marijuana. After the series, the
station wants to know whether people have changed their opinion about legalizing marijuana. Given in the
following tables are the data based on a survey of 500 randomly chosen people:

6 Chi-Square Tests for Count Data
Before the Series Was Shown
For legalization
Decriminalization
Existing law
No opinion
(fine or imprisonment)
7%
18%
65%
10%
After the Series Was Shown
For legalization
Decriminalization
Existing law
No opinion
(fine or imprisonment)
39%
9%
36%
16%
Here, n = 4, and we wish to test
H0 : p1 = 0.07; p2 = 0.18; p3 = 0.65; p4 = 0.1
versus
Ha : At least one of the probabilities is different from the hypothesized value.
The test is always an upper tail test. Test this hypothesis using α = 0.01.
Solution
We have
E1 = (500)(0.07) = 35;E2 = 90;E3 = 325;E4 = 50.
The observed frequencies are
O1 = (500)(0.39) = 195;O2 = 45;O3 = 180;O4 = 80.
The test statistic is
∑
(Oi − Ei)2
Q2 =
Ei
i=1
[
]
2
(195 − 35)
=
+ (45−90)2
+ (180−325)2
+ (80−50)2
35
90
325
50
= 836.62.
From the χ2-table, χ2
11.3449. Because the test statistic Q2 = 836.62 > 11.3449, we reject H0
at
0.01,3 =
α = 0.01. Hence, the data suggest that people have changed their opinion after the series on the ill effects
of smoking marijuana was shown.

2
Hypothesis Testing
Example 6.3
A die is rolled 60 times and the face values are recorded. The results are as follows.
Up face
1
2
3
4
5
6
Frequency
8
11
5
12
15
9
Is the die balanced? Test using α = 0.05.
Solution
If the die is balanced, we must have
1
p1 = p2 = … = p6 =
6
where pi = P (face value on the die is i), i = 1, 2, . . . , 6. This has the discrete uniform distribution.
Hence,
1
H0 : p1 = p2 = … = p6 =
6
versus
Ha : At least one of the probabilities is different from the hypothesized value of
1/6
E1 = n1p1 = (60)(1/6) = 10,…,E6 = 10.
We summarize the calculations in the following table:
Face value
1
2
3
4
5
6
Frequency, Oi
8
11
5
12
15
9
Expected value, Ei
10
10
10
10
10
10
The test statistic value is given by
∑
(Oi − Ei)2
Q2 =
= 6.
Ei
i=1
From the chi-square table with 5 d.f., χ2
0.05,5 =11.070.
Because the value of the test statistic does not fall in the rejection region, we do not reject H0. Therefore, we
conclude that the die is balanced.
6.2 Contingency Table: Test for Independence
One of the uses of the χ2-statistic is in contingency (dependence) testing where n randomly selected
items are classified according to two different criteria, such as when data are classified on the basis of
two factors (row factor and column factor) where the row factor has r levels and the column factor
has c levels. The obtained data are displayed as shown in the following table, where nij represents

6 Chi-Square Tests for Count Data
the number of data values under row i and column j. Our interest here is to test for independence of
two methods of classification of observed events. For example, we might classify a sample of students
by sex and by their grade on a statistics course in order to test the hypothesis that the grades are
dependent on sex. More generally the problem is to investigate a dependency (or contingency) between
two classification criteria.
Levels of column factor
1
2
… c
Row total
Row
1
n11
n12
n1c
n1
levels
2
n21
n21
n2c
n2
r
nr1
nr2
anrc
nr
Column total n.1
n.2
n.c
N
∑ r∑ r∑
where N = n.j = ni. =
∑ nij is the grand total.
j=1
i=1
i=1 j=1
We wish to test the hypothesis that the two factors are independent. We summarize the procedure
in the following table for testing that the factors represented by the rows are independent with that
represented by the columns.
TESTING FOR THE INDEPENDENCE OF TWO FACTORS
To test
H0 : The factors are independent
versus
Ha : The factors are dependent
the test statistic is,
∑∑ (Oij − Eij )2
Q2 =
Eij
i=1 j=1
where
Oij = nij
and
ninj
Eij
=
N .
Then under the null hypothesis the test statistic Q2 has an approximate chi-square distribution with
(r − 1)(c − 1) degrees of freedom.
Hence, the rejection region is Q2 > χ2
α,(r −1)(c−1) .
Assumption: Eij ≥ 5.

Hypothesis Testing
Example 6.4
The following table gives a classification according to religious affiliation and marital status
for
500
randomly selected individuals.
Religious affiliation
A
B
C
D
None
Total
Marital status
Single
39
19
12
28
18
116
With spouse
172
61
44
70
37
384
Total
211
80
56
98
55
500
For α = 0.01, test the null hypothesis that marital status and religious affiliation are independent.
Solution
We need to test the hypothesis
H0 : Marital status and religious affiliation are independent
versus
Ha : Marital status and religious affiliation are dependent.
Here, c = 5, and r = 2. For α = 0.01, and for (c − 1)(r − 1) = 4 degrees of freedom, we have
χ2
0.01,4 =13.2767
Hence, the rejection region is Q2 > 13.276
We have Eij =ninj
Thus,
N .
(116)(211)
(116)(80)
E11 =
= 48.952; E12 =
= 18.5;
500
500
(116)(56)
(116)(98)
E13 =
= 12.992, E14 =
= 22.736;
500
500
(116)(55)
(384)(211)
E15 =
= 12.76, E21 =
= 162.05;
500
500
(384)(80)
(384)(56)
E22 =
= 61.44; E23 =
= 43.008;
500
500
and
(384)(98)
(384)(55)
E24 =
= 75.264; E25 =
= 42.24.
500
500
The value of the test statistic is
∑
∑(O
ij −Eij )2
Q2 =
Eij
i=1 j=1

6 Chi-Square Tests for Count Data
[
]
2
= (39−48.952)
+ (19−18.5)2
+ (12−12.992)2
+ (28−22.736)2
48.952
18.5
12.992
22.736
2
+ (18−12.76)
+ (172−162.05)2
+ (61−61.44)2
+ (44−43.08)2
12.76
162.05
61.44
43.08
2
+ (70−75.264)
+ (37−42.24)2
75.264
42.24
= 1351.
Because the observed value of Q2 does not fall in the rejection region, we do not reject the null hypoth-
esis at α = 0.01. Therefore, based on the observed data, the marital status and religious affiliation are
independent.
6.3 Testing to Identify the Probability Distribution: Goodness-of-Fit
Chi-Square Test
Another application of the chi-square statistic is using it for goodness-of-fit tests in a different context.
In hypothesis testing problems we often assume that the form of the population distribution is known.
For example, in a χ2-test for variance, we assume that the population is normal. The goodness-of-fit
tests examine the validity of such an assumption if we have a large enough sample. We now describe
the goodness-of-fit test procedure for such applications.
GOODNESS-OF-FIT TEST PROCEDURES FOR PROBABILITY DISTRIBUTIONS
Let X1, . . . ,Xn be a sample from a population with cdf F (x ), which may depend on the set of unknown
parameters θ. We wish to test H0 : F (x ) = F0(x ), where F0(x ) is completely specified.
1. Divide the range of values of the random variables X1 into K nonoverlapping intervals I1, I2, . . . , IK .
Let Oj be the number of sample values that fall in the interval Ij (j = 1, 2, . . . , K ).
2. Assuming the distribution of X to be F0(x ), find P(X ∈ Ij ). Let P(X ∈ Ij ) = πi . Let ej = nπj be the
expected frequency.
3. Compute the test statistic Q2 given by
∑
(Oi − Ei )2
Q2 =
Ei
i=1
The test statistic Q2 has an approximate χ2-distribution with (K − 1) degrees of freedom.
4. Reject the H0 if Q2 ≥ χ2
α, (K −1) .
5. Assumptions: ej ≥ 5, j = 1, 2, . . . , K .
If the null hypothesis does not specify F0(x) completely, that is, if F0(x) contains some unknown
parameters θ1, θ2, . . . , θp, we estimate these parameters by the method of maximum likelihood. Using

Hypothesis Testing
these estimated values we specify F0(x) completely. Denote the estimated F0(x) by F0(x). Let
{
}
πi = P X ∈ Ii F0(x)
and
Êi= nπi.
The test statistic is
∑
(Oi − êi)2
Q2
=
êi
i=1
The statistic Q2 has an approximate chi-square distribution with (K − 1 − p) degrees of freedom. We
reject H0 if Q2 ≥ χ2
a,(K−1−p).
We now illustrate the method of goodness-of-fit with an example.
Example 6.5
The grades of students in a class of
200 are given in the following table. Test the hypothesis
that the grades are normally distributed with a mean of 75 and a standard deviation of
8.
Use
α = 0.05.
Range
0-59
60-69
70-79
80-89
90-100
Number of students
12
36
90
44
18
Solution
We have O1 = 12, O2 = 36, O3 = 90, O4 = 44, O5 = 18.
We now compute πi(i = 1, 2, . . . , 5), using the continuity correction factor,
π1 = P{X ≤ 59.5 H0} = P{z ≤59.5−75} = 0.0262,
8
π2 = 0.2189,π3 = 0.4722,π4 = 0.2476,π5 = 0.0351,
and
E1 = 5.24,E2 = 43.78,E3 = 94.44,E4 = 49.52,E5 = 02.
The test statistic results in
∑
(Oi − ei)2
Q2 =
ei
i=1
2
(18 − 02)2
= (12−5.74)
+ (36−43.78)2
+ (90−94.44)2
+ (44−49.52)2
+
5.74
43.78
94.44
49.52
02
= 26.22.

6 Chi-Square Tests for Count Data
Q2 has a chi-square distribution with (5 − 1) = 4 degrees of freedom. The critical value is χ2
11.
0.05,4 =
Hence, the rejection region is Q2 > 11. Because the observed value of Q2 = 26.22 > 11, we reject H0
at α = 0.05. Thus, we conclude that the population is not normal.
EXERCISES 6
6.1.
The following table gives the opinion on collective bargaining by a random sample of 200
employees of a school system, belonging to a teachers’ union.
Opinion on Collective Bargaining by Teachers’ Union
For
Against
Undecided
Total
Staff
30
15
15
60
Faculty
50
10
40
100
Administration
10
25
5
40
Column totals
90
50
60
200
Test the hypotheses
H0 : Opinion on collective bargaining is independent of employee classification
versus
Ha : Opinion on collective bargaining is dependent on employee classification
using α = 0.05.
6.2.
A random sample was taken of 300 undergraduate students from a university. The students
in the sample were classified according to their gender and according to the choice of their
major. The result is given in the following table.
College
Gender Arts and sciences Engineering Business Other Total
Male
75
40
24
66
205
Female
45
12
15
23
95
Total
120
52
39
89
300
Test the hypothesis that the choice of the major by undergraduate students in this university
is independent of their gender. Use α = 0.01.
6.3.
The speeds of vehicles (in mph) passing through a section of Highway 75 are recorded for a
random sample of 150 vehicles and are given below. Test the hypothesis that the speeds are
normally distributed with a mean of 70 and a standard deviation of 4. Use a = 0.01.
Range
40-55
56-65
66-75
76-85
> 85
Number
12
14
78
40
6
6.4.
Based on the sample data of 50 days contained in the following table, test the hypothesis that
the daily mean temperatures in the city are normally distributed with mean 77 and variance
6. Use α = 0.05.

Hypothesis Testing
Temperature
46-55
56-65
66-75
76-85
86-95
Number of days
4
6
13
23
4
6.5.
A presidential candidate advertises on TV by comparing his positions on some important
issues with those of his opponent. After a series of advertisements, a pollster wants to know
whether people have changed their opinion about the candidate. The following are the data
based on a survey of 950 randomly chosen people:
Before the Advertisement Was Shown
Support the
Oppose the
Need to know more
Undecided
candidate
candidate
about the candidate
40%
20%
5%
35%
After the Advertisement Was Shown
Support the
Oppose the
Need to know more
Undecided
candidate
candidate
about the candidate
45%
25%
2%
28%
Let pi, i = 1, 2, 3, 4, represent the respective true proportions.
Test
H0 : p1 = 0.35;p2 = 0.20;p3 = 0.15;p4 = 0.3
versus
Ha : At least one of the probabilities is different from the hypothesized value.
Test this hypothesis using α = 0.05.
6.6.
A survey of footwear preferences of a random sample of 100 undergraduate students (50
females and 50 males) from a large university resulted in the following data.
Boots Leather Sneakers Sandals Other
shoes
Female
12
9
12
10
7
Male
10
12
17
7
4
(a) Let pi, i
= 1, 2, 3, 4, 5, represent the respective true proportions of students with a
particular footwear preference, and let
H0 : p1 = 0.20;p2 = 0.20;p3 = 0.30;p4 = 0.20;p5 = 0.10
versus
Ha : At least one of the probabilities is different from the hypothesized value.
Test this hypothesis using α = 0.05.
(b) Test the hypothesis that the choice of footwear by undergraduate students in this
university is independent of their gender, using α = 0.05.

8 Computer Examples
7 CHAPTER SUMMARY
In this chapter, we have learned various aspects of hypothesis testing. First, we dealt with hypothesis
testing for one sample where we used test procedures for testing hypotheses about true mean, true
variance, and true proportion. Then we discussed the comparison of two populations through their
true means, true variances, and true proportions. We also introduced the Neyman-Pearson lemma
and discussed likelihood ratio tests and chi-square tests for categorical data.
We now list some of the key definitions in this chapter.
■ Statistical hypotheses
■ Tests of hypotheses, tests of significance, or rules of decision
■ Simple hypothesis
■ Composite hypothesis
■ Type I error
■ Type II error
■ The level of significance
■ The p-value or attained significance level
■ The Smith-Satterthwaite procedure
■ Power of the test
■ Most powerful test
■ Likelihood ratio
In this chapter, we also learned the following important concepts and procedures:
■ General method for hypothesis testing
■ Steps to calculate β
■ Steps to find the p-value
■ Steps in any hypothesis testing problem
■ Summary of hypothesis tests for μ
■ Summary of large sample hypothesis tests for p
■ Summary of hypothesis tests for the variance σ2
■ Summary of hypothesis tests for μ1 − μ2 for large samples (n1 & n2 ≥ 30)
■ Summary of hypothesis tests for p1 − p2 for large samples
■ Testing for the equality of variances
■ Summary of testing for a matched pairs experiment
■ Procedure for applying the Neyman-Pearson lemma
■ Procedure for the likelihood ratio test
■ Testing the parameters of a multinomial distribution (summary)
■ Testing the independence of two factors
■ Goodness-of-fit test procedures for probability distributions
8 COMPUTER EXAMPLES
In the following examples, if the value of α is not specified, we will always take it as 0.05.

Hypothesis Testing
8.1 Minitab Examples
Example 8.1
(t-Test): Consider the data
66
74
79
80
69
77
78
65
79
81
Using Minitab, test H0 : μ = 75 vs. H1 : μ > 75.
Solution
Enter the data in C1. Then
Stat > Basic Statistics > 1-sample t. . .
> In Variables: enter C1 > choose Test Mean > enter 75 >
in Alternative: choose greater than and click OK
We obtain the following output.
T-Test of the Mean
Test of mu = 75.00 vs mu > 75.00
Variable N
Mean
StDev SE Mean T
P
C1
10
74.80
6.00
1.90
−0.11
0.54
Example 8.2
For the following data:
Sample 1:
16
18
21
13
19
16
18
15
20
19
14
21
14
Sample 2:
14
15
10
13
11
7
12
11
12
15
14
Test H0 : μ1 = μ2 vs. H1 : μ1 < μ2. Use α = 0.02.
Solution
Enter sample 1 data in C1 and sample 2 data in C2. Then
Stat > Basic Statistics > 2-sample t. . .
> Choose Samples in different columns > in Alternative:
choose less than > in Confidence level: enter 98 > click Assumed equal variances and click OK
We obtain the following output.
Two Sample T-test and Confidence Interval
Two sample T for C1 vs C2

8 Computer Examples
N
Mean StDev SE Mean
C1
13
123
2.74
0.76
C2
11
12.18
2.40
0.72
98% CI for mu C1 − mu C2: (2.38, 71)
T-Test mu C1 = mu C2 (vs <): T = 4.75 P = 1.0 DF = 22
Both use Pooled StDev = 2.59
If we did not select Assumed equal variances, we will obtain the following output.
Two Sample T-Test and Confidence Interval
Two sample T for C1 vs C2
N
Mean StDev SE Mean
C1
13
123
2.74
0.76
C2
11
12.18
2.40
0.72
98% CI for mu C1 – mu C2: (2.40, 69)
T-Test mu C1 = mu C2 (vs <): T = 4.81 P = 1.0 DF = 21
Example 8.3
For the following data:
6.8
5.6
8.5
8.5
8.4
5
9.3
9.4
8
1
9.9
9.6
9.0
9.4
13.7
16.6
9.1
10.1
10.6
11.1
8.9
11.7
12.8
11.5
12.0
10.6
11.1
6.4
12.3
12.3
11.4
9.9
14.3
11.5
11.8
13.3
12.8
13.7
13.9
12.9
14.2
14.0
15.5
16.9
18.0
19
21.8
18.4
34.3
Test H0 : μ = 12 versus H1 : μ = 12. Use α = 0.05.
Solution
Enter the data in C1. Then
Stat > Basic Statistics > 1-sample z. . . > in Variables: Type C1 > choose Test Mean and enter
12 >
choose not equal in Alternative, and Type 4.7 for sigma > Click OK
We obtain the following output.
Z-Test
Test of mu = 12.000 vs mu not = 12.000
The assumed sigma = 4.70
Variable N
Mean
StDev SE Mean
Z
P
C1
49
12.124
4.700
0.671
0.19
0.85
Here the test statistic is 0.19 and the p-value is 0.85, which is larger than 0.05. Hence, we cannot reject the
null hypothesis.

Hypothesis Testing
Example 8.4
(Contingency Table): Consider the following data with five levels and two factors. Test for dependence
of the factors.
Factors
Levels
1
2
3
4
5
1
39
19
12
28
18
2
172
61
44
70
37
Solution
In C1 enter the data in column 1 (39 and 172), and continue to C5. Then
Stat > Tables > Chi-Square-Test. . . > in Columns containing the table: Type C1 C2 C3 C4 C5 >
click OK
We will obtain the following output.
Chi-Square Test
Expected counts are printed below observed counts
C1
C2
C3
C4
C5
Total
1
39
19
12
28
18
116
48.95
18.56
12.99
22.74
12.76
2
172
61
44
70
37
384
162.05
61.44
43.01
75.26
42.24
Total
211
80
56
98
55
500
Chi-Sq = 2.023 + 0.010 + 0.076 + 1.219 + 2.152 +
0.611 + 0.003 + 0.023 + 0.368 + 0.650 = 135
DF = 4, p-value = 0.129
Example 8.5
(Paired t-Test): Consider the data of Example 5. Using Minitab, perform a paired t-test.
Solution
Enter sample 1 in column C1 and sample 2 in column C2. Then:
Stat > Basic Statistics > Paired t. . . > in First Sample: Type C2, and in the Second sample: Type
C1 > click options > and click less than (if α is other than 0.05, enter appropriate percentage in
Confidence level: and enter appropriate number if it is not zero in Test mean:) > click OK > OK

8 Computer Examples
We obtain the following output.
Paired T-test and Confidence Interval
Paired T for C2 − C1
N
Mean
StDev
SE Mean
C2
10
171.3
41
14.9
C1
10
243.2
40.1
12.7
Difference
10
−71.9
56.2
18
95% CI for mean difference: (−112.1, −31.7)
T-Test of mean difference = 0 (vs < 0): T-Value = −4.05
p-value = 0.001
because the p-value 0.001 < 0.05 = α.
8.2 SPSS Examples
Example 8.6
Consider the data
66
74
79
80
69
77
78
65
79
81
Using SPSS, test H0 : μ = 75 vs. H1 : μ > 75.
Solution
Use the following procedure:
1. Enter the data in column 1.
2. Click Analyze > Compare Means > One-sample t Test. . . , Move var00001
to Test Variable(s),
and change Test Value: 0 to 75. Click OK
We obtain the following output.
One-Sample Statistics
Std. Error
N
Mean
Std. Deviation
Mean
VAR00001
10
74.8000
5.99630
1.89620
One-Sample Test
Test Value = 75
95% Confidence
Interval of the
Sig.
Mean
Difference
t
df
(2-tailed)
Difference
Lower
Upper
VAR00001
−.105
9
.918
−.2000
−4.4895
4.0895
For the one sample t-test H0 : μ = 75 vs. H1 : μ > 75, the t-statistic is −0.105 with 9 degrees of freedom.
The p-value is 0.46 > 0.02. Hence, we will not reject the null hypothesis.

404
Hypothesis Testing
If we want the computer to calculate the p-value in the previous example, use the following procedure.
1. Enter the test statistic (−0.105) in the data editor using ‘teststat’.
2. Click Transform > compute. . .
3. Type ‘p-value’ in the box called Tarobtain value. In the box called Functions: scroll and click on
CDF.T(q,df) and move to Numeric Expressions.
4. The CDF(q,df) will appear as CDF(?,?) in the Numeric Expressions box. Replace teststat for q and 9
for df (the degree of freedom in this example is 9). Click OK
We obtain the p-value as 0.46.
Example 8.7
For the following data
Sample 1:
16
18
21
13
19
16
18
15
20
19
14
21
14
Sample 2:
14
15
10
13
11
7
12
11
12
15
14
Test H0 : μ1 = μ2 vs. H1 : μ1 < μ2. Use α = 0.02.
Solution
In column 1, under the title ‘‘group’’ enter 1s to identify the sample 1 data and 2s to identify sample 2 data.
In column C2, under the title ‘‘data’’ enter the data corresponding to samples 1 and 2. Then:
Analyze > Compare Means > Independent Samples t–test. . . > bring Data to Test Variable(s): and
group to Grouping Variable:, click Define Groups. . . , and enter 1 for sample 1, 2 for sample 2 >
click continue > click Options
Enter 98 in Confidence interval: > click continue > OK
We obtain the following output.
Group Statistics
GROUP
N
Mean
Std. Deviation
Std. Error Mean
DATA
1.00
13
12308
2.74329
.76085
2.00
11
12.1818
2.40076
.72386
Independent Samples Test
Levene’s Test
t-test for
for Equality
Equality
of Variances
of Means
F
Sig.
t
df
Sig.
Mean
Std. Error
98% Confidence
(2-tailed)
Difference
Difference
Interval of the
Difference
Lower
Upper
DATA
Equal variances
.975
.334
4.753
22
.000
5.0490
1.06237
2.38419
71372
assumed
4.808
21.963
.000
5.0490
1.05017
2.41443
68347
Equal
variances
not
assumed

8 Computer Examples
Looking at the statistical significance values, which are greater than
0.05, we do not reject the null
hypothesis.
Example 8.8
(Paired t-Test) For the data of Example 5.7, use SPSS to test whether the data provide sufficient
evidence for the claim that the new program reduces blood glucose level in diabetic patients. Use α = 0.05.
Solution
Enter after data in column C1 and before data in column C2. Then:
Analyze > Compare Means > Paired-Sample T-Test > bring after and before to Paired Variables:
so that it will look after-before > click OK
We obtain the following output.
Paired Samples Statistics
Mean
N
Std. Deviation
Std. Error Mean
Pair 1
AFTER
171.3000
10
411228
14.89821
BEFORE
243.2000
10
40.12979
12.69015
Paired Samples Correlations
N
Correlation
Sig.
Pair 1
AFTER & BEFORE
10
.179
.621
Paired Samples Test
Paired
t
df
Sig.
Differences
(2-tailed)
Std.
Std. Error
Mean
95% Confidence
Deviation
Mean
Interval of the
Difference
Upper
Lower
Pair 1
AFTER —
−71.9000
175791
−112.0712
−31.7288
−4.049
9
56.15544
.003
BEFORE
Because the significance level for the test is 0.003, which is less than α = 0.05, we reject the null hypothesis.
8.3 SAS Examples
To conduct a hypothesis test using SAS, we could use proc ttest, or proc means with option of
computing the t-value and corresponding probability. However, to use this, we need a hypothesis
of the form H0 : μ = 0. For testing nonzero values, H0 : μ = μ0, we must create a new variable

Hypothesis Testing
by subtracting μ0 from each observation, and then use the test procedure for this new variable. The
following example illustrates this concept.
Example 8.9
(t-Test): The following radar measurements of speed (in miles per hour) are obtained for 10 vehicles
traveling on a stretch of interstate highway.
66
74
79
80
69
77
78
65
79
81
Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this
stretch of highway is at least 75 mph? Test using α = 0.01. Use an SAS procedure to do the analysis.
Solution
In the SAS editor, type in the following commands.
data speed;
title ’Test on highway speed’;
input X @@;
Y=X-75;
datalines;
66 74 79 80 69 77 78 65 79
81
;
PROC TTEST data=speed;
run;
We obtain the following output.
Test on highway speed
The TTEST Procedure
Statistics
Lower CL
Upper
CL
Lower
Upper
CL
CL
Variable N
Mean
Mean
Mean
Std
Std
Std
Std
Dev
Dev
Dev
Err
X
10
70.511
74.8
79.089
4.1245
5.9963
10.947
1.8962
Y
10
−4.489
−0.2
4.0895
4.1245
5.9963
10.947
T-Tests
Variable
DF
t Value
Pr > t
X
9
39.45
<.0001
Y
9
−0.11
0.9183
To test H0 : μ = 75, we need to look at the Y-values. The corresponding t-value is −0.11, and because this
is a one-sided test, we need to divide 0.9183 by 2 to obtain the p-value as p = 0.45915. Because the p-value
is larger than 0.01 = α, we cannot reject the null hypothesis.

8 Computer Examples
One of the easier ways to conduct large sample hypothesis testing using SAS procedures is through
the computation of the p-value. The following example illustrates the procedure.
Example 8.10
(z-Test): It is claimed that the average miles driven per year for sports cars is at least 18,000 miles. To check
the claim, a consumer firm tests 40 of these cars randomly and obtains a mean of 17,463 miles with standard
deviation of 1348 miles. What can it conclude if α = 0.01?
Solution
Here we will find the p-value and compare that with α to test the hypothesis. We use the following SAS
procedure:
Data ex888;
z=(17463–18000)/(1348/(SQRT(40)));
pval=probnorm(z);
run;
proc print data=ex888;
title ’Test of mean, large sample’;
run;
We obtain the following output.
Test of mean,
large sample
Obs
z
pval
1
2.51950
.005876079
Because the p-value of 0.005876079 is less than α = 0.01, we reject the null hypothesis. There is sufficient
evidence to conclude that the mean miles driven per year for sport cars is less than 18,000.
Note that in the previous example, the value of z was negative. If the value of z is positive, use
pval=probnorm(-z);, also, if it is a two-sided hypothesis, we need to multiply by 2, so use
pval=probnorm(z)*2; to obtain the p-value.
Example 8.11
(Paired t-Test): For the data of Example 5.7, use SAS to test whether the data provide sufficient evidence
for the claim that the new program reduces blood glucose level in diabetic patients. Use α = 0.05.
Solution
We can use the following commands.

Hypothesis Testing
data dietexr;
input before after;
diff = after – before;
datalines;
268 106
225 186
252 223
192 110
307 203
228 101
246 211
298 176
231 194
185 203
;
run;
proc means data=dietexr t prt;
var diff;
title ’Test of mean, Paired difference’;
run;
We obtain the following output.
Test of mean, Paired difference
The MEANS Procedure
Analysis Variable : diff
t Value
Pr > t
−4.05
0.0029
Because the p-value 0.0029 is less than α = 0.05, we reject the null hypothesis.
PROJECTS FOR
7A. Testing on Computer-Generated Samples
(a) Small sample test:
Generate a sample of size 20 from a normal population with μ = 10, and σ2 = 4.
(i) Perform a t-test for the test H0 : μ = 10 versus Ha : μ = 10 at level α = 0.05.
(ii) Perform the test H0 : σ2 = 4 versus Ha : σ2 = 4 at level α = 0.05.
Repeat the procedure 10 times, and comment on the results.
(b) Large sample test:

Projects for
Generate a sample of size 50 from a normal population with μ = 10, and σ2 = 4. Perform a z-test
for the test H0 : μ = 10 versus Ha : μ = 10 at level α = 0.05. Repeat the procedure 10 times and
comment on the results.
7B. Conducting a Statistical Test with Confidence Interval
Let θ be any population parameter. Consider the three tests of hypotheses
H0 : θ = θ0 vs. Ha : θ > θ0
(1)
H0 : θ = θ0 vs. Ha : θ < θ0
(2)
H0 : θ = θ0 vs. Ha : θ = θ0
(3)
The following procedure can be exploited to test a statistical hypothesis utilizing the confidence
intervals.
Procedure to Use Confidence Interval for Hypothesis Testing
Let θ be any population parameter.
(a)
For test (1), that is,
H0 : θ = θ0 vs. Ha : θ > θ0
choose a value for α. From a random sample, compute a confidence interval for θ using
a confidence coefficient equal to 1 − 2α. Let L be the lower end point of this confidence
interval.
Reject H0 if θ0 < L.
That is, we will reject the null hypothesis if the confidence interval is completely to the right
of θ0.
(b)
For test (2), that is,
H0 : θ = θ0 vs. Ha : θ < θ0
choose a value for α. From a random sample, compute a confidence interval for θ using
a confidence coefficient equal to 1 − 2α. Let U be the upper end point of this confidence
interval.
Reject H0 if U < θ0.
That is, we will reject the null hypothesis if the confidence interval is completely to the
left of θ0.
(c)
For test (3), that is,
H0 : θ = θ0 vs. Ha : θ = θ0

Hypothesis Testing
choose a value for α. From a random sample, compute a confidence interval for θ using a
confidence coefficient equal to 1 − α. Let L be the lower end point and U be the upper end
point of this confidence interval.
Reject H0 if θ0 < L or U < θ0.
That is, we will reject the null hypothesis if the confidence interval does not contain θ0.
(i) For any large data set, conduct all three of these hypothesis tests using a confidence
interval for the population mean.
(ii) For any small data set, conduct all three of these hypothesis tests using a confidence
interval for the population mean.