1 INTRODUCTION

Statistics plays an important role in decision making. In statistics, one utilizes random samples to

make inferences about the population from which the samples were obtained. Statistical inference

regarding population parameters takes two forms: estimation and hypothesis testing, although both

hypothesis testing and estimation may be viewed as different aspects of the same general problem of

arriving at decisions on the basis of observed data. We already saw several estimation procedures in

earlier chapters. Hypothesis testing is the subject of this chapter. Hypothesis testing has an important

role in the application of statistics to real-life problems. Here we utilize the sampled data to make

decisions concerning the unknown distribution of a population or its parameters. Pioneering work

on the explicit formulation as well as the fundamental concepts of the theory of hypothesis testing

are due to J. Neyman and E. S. Pearson.

A statistical hypothesis is a statement concerning the probability distribution of a random variable

or population parameters that are inherent in a probability distribution. The following example

illustrates the concept of hypothesis testing. An important industrial problem is that of accepting or

rejecting lots of manufactured products. Before releasing each lot for the consumer, the manufacturer

usually performs some tests to determine whether the lot conforms to acceptable standards. Let us

say that both the manufacturer and the consumer agree that if the proportion of defectives in a lot is

less than or equal to a certain number p, the lot will be released. Very often, instead of testing every

item in the lot, we may test only a few items chosen at random from the lot and make decisions

about the proportion of defectives in the lot; that is, we make the decisions about the population

on the basis of sample information. Such decisions are called statistical decisions. In attempting to

reach decisions, it is useful to make some initial conjectures about the population involved. Such

conjectures are called statistical hypotheses. Sometimes the results from the sample may be markedly

different from those expected under the hypothesis. Then we can say that the observed differences

are significant and we would be inclined to reject the initial hypothesis. These procedures that enable

us to decide whether to accept or reject hypotheses or to determine whether observed samples differ

significantly from expected results are called tests of hypotheses, tests of significance, or rules of decision.

In any hypothesis testing problem, we formulate a null hypothesis and an alternative hypothesis such that

if we reject the null, then we have to accept the alternative. The null hypothesis usually is a statement

of either the “status quo” or “no effect.” A guideline for selecting a null hypothesis is that when the

objective of an experiment is to establish a claim, the nullification of the claim should be taken as

the null hypothesis. The experiment is often performed to determine whether the null hypothesis is

false. For example, suppose the prosecution wants to establish that a certain person is guilty. The null

hypothesis would be that the person is innocent and the alternative would be that the person is guilty.

Thus, the claim itself becomes the alternative hypothesis. Customarily, the alternative hypothesis is

the statement that the experimenter believes to be true. For example, the alternative hypothesis is

the reason a person is arrested (police suspect the person is not innocent). Once the hypotheses

have been stated, appropriate statistical procedures are used to determine whether to reject the null

hypothesis. For the testing procedure, one begins with the assumption that the null hypothesis is true.

If the information furnished by the sampled data strongly contradicts (beyond a reasonable doubt)

the null hypothesis, then we reject it in favor of the alternative hypothesis. If we do not reject the

null, then we automatically reject the alternative. Note that we always make a decision with respect

to the null hypothesis. Note that the failure to reject the null hypothesis does not necessarily mean

that the null hypothesis is true. For example, a person being judged “not guilty” does not mean the

person is innocent. This basically means that there is not enough evidence to reject the null hypothesis

(presumption of innocence) beyond “a reasonable doubt.”

We summarize the elements of a statistical hypothesis in the following.

THE ELEMENTS OF A STATISTICAL HYPOTHESIS

1. The null hypothesis, denoted by H₀, is usually the nullification of a claim. Unless evidence from the

data indicates otherwise, the null hypothesis is assumed to be true.

2. The alternate hypothesis, denoted by H_a (or sometimes denoted by H₁), is customarily the claim

itself.

3. The test statistic, denoted by TS, is a function of the sample measurements upon which the

statistical decision, to reject or not reject the null hypothesis, will be based.

4. A rejection region (or a critical region) is the region (denoted by RR) that specifies the values

of the observed test statistic for which the null hypothesis will be rejected. This is the range of

values of the test statistic that corresponds to the rejection of H₀ at some fixed level of significance,

α, which will be explained later.

5. Conclusion: If the value of the observed test statistic falls in the rejection region, the null hypothesis

is rejected and we will conclude that there is enough evidence to decide that the alternative

hypothesis is true. If the TS does not fall in the rejection region, we conclude that we cannot reject

the null hypothesis.

In practice one may have hypotheses such as H₀ : μ = μ₀ against one of the following alternatives:

⎧

⎪

H_a : μ = μ₀, called a two-tailed alternative

⎨

or H_a : μ < μ₀, called a lower (or left) tailed alternative

⎪or Ha : μ > μ0, called an upper (or right) tailed alternative

⎩

A test with a lower or upper tailed alternative is called a one-tailed test. In an applied hypothesis testing

problem, we can use the following general steps.

GENERAL METHOD FOR HYPOTHESIS TESTING

1. From the (word) problem, determine the appropriate null hypothesis, H₀, and the alternative, H_a.

2. Identify the appropriate test statistics and calculate the observed test statistic from the data.

3. Find the rejection region by looking up the critical value in the appropriate table.

4. Draw the conclusion: Reject or fail to reject the null hypothesis, H₀.

5. Interpret the results: State in words what the conclusion means to the problem we started with.

It is always necessary to state a null and an alternate hypothesis for every statistical test performed.

All possible outcomes should be accounted for by the two hypotheses.

Example 7.1.1

In a coin-tossing experiment, let p be the probability of heads. We start with the claim that the coin is fair,

that is, H₀ : p = 1/2. We test this against one of the following alternatives:

(a) H_a: The coin is not fair (p = 1/2). This is a two-tailed alternative.

(b) H_a: The coin is biased in favor of heads (p > 1/2). This is an upper tailed alternative.

It is important to observe that the test statistic is a function of a random sample. Thus, the test statistic

itself is a random variable whose distribution is known under the null hypothesis. The value of a test

statistic when specific sample values are substituted is called the observed test statistic or simply test

statistic.

For example consider the hypothesis H₀ : μ = μ_o versus H_a : μ = μ_o, where μ_o is known. Assume

that the population is normal with a known variance σ². Consider X, an unbiased estimator of μ

based on the random sample X₁, . . . , X_n. Then Z = (X − μ₀)/(σ/^√n) is a function of the random

sample X₁, . . . , X_n, and has a known distribution, a standard normal, under H₀. If x₁, x₂, . . . , x_n are

specific sample values, then z = (x − μ₀)/(σ/^√n) is called the observed sample statistic or simply sample

statistic.

Definition 1.1 A hypothesis is said to be a simple hypothesis if that hypothesis uniquely specifies

the distribution from which the sample is taken. Any hypothesis that is not simple is called a composite

hypothesis.

Example 1.2

Refer to Example 7.1.1. The null hypothesis p =1/2 is simple, because the hypothesis completely specifies

the distribution, which in this case will be a binomial with p = 1/2 and with n being the number of tosses.

The alternative hypothesis p = 1/2 is composite because the distribution now is not completely specified

(we do not know the exact value of p).

Because the decision is based on the sample information, we are prone to commit errors. In a statistical

test, it is impossible to establish the truth of a hypothesis with 100% certainty. There are two possible

types of errors. On the one hand, one can make an error by rejecting H₀ when in fact it is true. On

the other hand, one can also make an error by failing to reject the null hypothesis when in fact it is

false. Because the errors arise as a result of wrong decisions, and the decisions themselves are based

on random samples, it follows that the errors have probabilities associated with them. We now have

the following definitions.

Table 7.1 Statistical Decision and Error Probabilities

Statistical

True state of null hypothesis

decision

H₀ true

H₀ false

Do not reject H₀

Correct decision

Type II error (β)

Reject H₀

Type I error (α)

Correct decision

The decision and the errors are represented in Table .1.

Definition 1.2 (a) A type I error is made if H₀ is rejected when in fact H₀ is true. The probability of

type I error is denoted by α. That is,

P (rejecting H₀|H₀ is true) = α.

The probability of type I error, α, is called the level of significance.

(b) A type II error is made if H₀ is accepted when in fact H_a is true. The probability of a type II error is

denoted by β. That is,

P (not rejecting H₀|H₀ is false) = β.

It is desirable that a test should have a = β = 0 (this can be achieved only in trivial cases), or at least

we prefer to use a test that minimizes both types of errors. Unfortunately, it so happens that for a

fixed sample size, as α decreases, β tends to increase and vice versa. There are no hard and fast rules

that can be used to make the choice of α and β. This decision must be made for each problem based

on quality and economic considerations. However, in many situations it is possible to determine

which of the two errors is more serious. It should be noted that a type II error is only an error in

the sense that a chance to correctly reject the null hypothesis was lost. It is not an error in the sense

that an incorrect conclusion was drawn, because no conclusion is made when the null hypothesis is

not rejected. In the case of type I error, a conclusion is drawn that the null hypothesis is false when,

in fact, it is true. Therefore, type I errors are generally considered more serious than type II errors.

For example, it is mostly agreed that finding an innocent person guilty is a more serious error than

finding a guilty person innocent. Here, the null hypothesis is that the person is innocent, and the

Prob (TYPE II Error) 5 Beta

Prob (TYPE I Error) 5 Alpha

Under H₀

Under H_a

Critical value

alternate hypothesis is that the person is guilty. “Not rejecting the null hypothesis” is equivalent to

acquitting a defendant. It does not prove that the null hypothesis is true, or that the defendant is

innocent. In statistical testing, the significance level α is the probability of wrongly rejecting the null

hypothesis when it is true (that is, the risk of finding an innocent person guilty). Here the type II risk

is acquitting a guilty defendant. The usual approach to hypothesis testing is to find a test procedure

that limits α, the probability of type I error, to an acceptable level while trying to lower β as much as

possible.

The consequences of different types of errors are, in general, very different. For example, if a doctor

tests for the presence of a certain illness, incorrectly diagnosing the presence of the disease (type I

error) will cause a waste of resources, not to mention the mental agony to the patient. On the other

hand, failure to determine the presence of the disease (type II error) can lead to a serious health risk.

To formulate a hypothesis testing problem, consider the following situation. Suppose a toy store

chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys. We feel

that this claim is inflated. In an attempt to dispose of this claim, we observe the buying pattern of 20

randomly selected girls under 8 years old, and we observe X, the number of girls under 8 years old

who buy stuffed toys or dolls. Now the question is, how can we use X to confirm or reject the store’s

claim? Let p be the probability that a girl under 8 chosen at random prefers stuffed toys or dolls. The

question now can be reformulated as a hypothesis testing problem. Is p ≥ 0.8 or p < 0.8? Because we

would like to reject the store’s claim only if we are highly certain of our decision, we should choose

the null hypothesis to be H₀ : p ≥ 0.8, the rejection of which is considered to be more serious. The

null hypothesis should be H₀ : p ≥ 0.8, and the alternative H_a : p < 0.8. In order to make the null

hypothesis simple, we will use H₀ : p = 0.8, which is the boundary value with the understanding that

it really represents H₀ : p ≥ 0.8. We note that X, the number of girls under 8 years old who prefer

stuffed toys or dolls, is a binomial random variable. Clearly a large sample value of X would favor

H₀. Suppose we arbitrarily choose to accept the null hypothesis if X >12. Because our decision is

based on only a sample of 20 girls under 8, there is always a possibility of making errors whether

we accept or reject the store chain’s claim. In the following example, we will now formally state this

problem and calculate the error probabilities based on our decision rule.

Example 7.1.3

A toy store chain claims that at least 80% of girls under 8 years old prefer dolls over other types of toys.

After observing the buying pattern of many girls under 8 years old, we feel that this claim is inflated. In an

attempt to dispose of this claim, we observe the buying pattern of 20 randomly selected girls under 8 years

old, and we observe X, the number of girls who buy stuffed toys or dolls. We wish to test the hypothesis

H₀ : p = 0.8 against H_a : p < 0.8. Suppose we decide to accept the H₀ if X > 12 (that is X ≥ 13). This

means that if {X ≤ 12} (that is X < 13) we will reject H₀.

(a) Find α.

(b) Find β for p = 0.6.

(d) Find the rejection region of the form {X ≤ K} so that (i) α = 0.01; (ii) α = 0.05.

(e) For the alternative H_a :p = 0.6, find β for the values of α in part (d).

Then by definition,

β = P (X ≤ 15.8225 when μ = 16).

Consequently, for μ = 16,

(

)

X − 16

15.8225 − 16

β=P

≤

√

σ/^√n

= P (Z ≤ −0.36)

= 0.3594.

That is, under the given information, there is a 35.94% chance of not rejecting a false null hypothesis.

7.1.1

Sample Size

It is clear from the preceding example that once we are given the sample size n, an α, a simple

alternative H_a, and a test statistic, we have no control over β and it is exactly determined. Hence, for

a given sample size and test statistic, any effort to lower β will lead to an increase in α and vice versa.

This means that for a test with fixed sample size it is not possible to simultaneously reduce both α

and β. We also notice from Example 7.1.4 that by increasing the sample size n, we can decrease β

(for the same α) to an acceptable level. The following discussion illustrates that it may be possible to

determine the sample size for a given α and β.

Suppose we want to test H₀ : μ = μ₀ versus H_a : μ > μ₀. Given α and β, we want to find n, the

sample size, and K, the point at which the rejection begins. We know that

α = P (X > K when μ = μ₀)

(

)

X−μ₀

> σ/√μ0

when μ = μ₀

(.1)

σ/^√n

n ,

= P (Z > z_a)

and

β = P (X ≤ K, when μ = μ_a)

(

)

X−μ_a

≤ σ/√μa

when μ = μ_a

(2)

σ/^√n

n ,

= P (z ≤ −z_β).

From Equations (7.1) and (7.2),

K−μ₀

z_α =

σ/^√n

and

K−μ_a

−z_β =

σ/^√n

This gives us two equations with two unknowns (K and n), and we can proceed to solve them.

Eliminating K, we get

(

)

(

σ )

μ₀ + z_α

=μ_a−z_β

^√n

From this we can derive

(z_α + z_β)σ

^√n =

μ_a − μ₀

Thus, the sample size for an upper tail alternative hypothesis is

)²σ²

(z_α + z_β

(μ_a − μ₀)²

The sample size increases with the square of the standard deviation and decreases with the square of the difference

between mean value of the alternative hypothesis and the mean value under the null hypothesis. Note that in

real-world problems, care should be taken in the choice of the value of μ_a for the alternative hypothesis. It may

be tempting for a researcher to take a large value of μ_a in order to reduce the required sample size. This will

seriously affect the accuracy (power) of the test. This alternative value must be realistic within the experiment

under study. Care should also be taken in the choice of the standard deviation σ. Using an underestimated

value of the standard deviation to reduce the sample size will result in inaccurate conclusions similar to

overestimating the difference of means. Usually, the value of σ is estimated using a similar study conducted

earlier. The problem could be that the previous study may be old and may not represent the new reality. When

accuracy is important, it may be necessary to conduct a pilot study only to get some idea on the estimate of σ.

Once we determine the necessary sample size, we must devise a procedure by which the appropriate data can

be randomly obtained. This aspect of the design of experiments is discussed in Chapter 9.

2 THE NEYMAN-PEARSON LEMMA

In practical hypothesis testing situations, there are typically many tests possible with significance level α for a null

hypothesis versus alternative hypothesis (see Project 7A). This leads to some important questions, such as (1)

how to decide on the test statistic and (2) how to know that we selected the best rejection region. In this section,

we study the answer to these questions using the Neyman-Pearson approach.

Definition 7.2.1 Suppose that W is the test statistic and RR is the rejection region for a test of hypothesis concerning

the value of a parameter θ. Then the power of the test is the probability that the test rejects H₀ when the alternative is

true. That is,

π =Power(θ)

= P(W in RR when the parameter value is an alternative θ). If H₀ : θ =

θ₀ and H_a : θ = θ₀, then the power of the test at some θ = θ₁ = θ₀ is

Power(θ₁) = P(reject H₀|θ = θ₁).

But, β(θ₁) = P(accept H₀|θ = θ₁). Therefore,

Power(θ₁) = 1 − β(θ₁).

A good test will have high power.

Note that the power of a test H₀ cannot be found until some true situation H_a is specified. That is,

the sampling distribution of the test statistic when H_a is true must be known or assumed. Because

β depends on the alternative hypothesis, which being composite most of the time does not specify

the distribution of the test statistic, it is important to observe that the experimenter cannot control

β. For example, the alternative H_a : μ < μ₀ does not specify the value of μ, as in the case of the null

hypothesis, H₀ : μ = μ₀.

Example 2.1

Let X₁, . . . , X_n be a random sample from a Poisson distribution with parameter λ, that is, the pdf is

given by f (x) = e−λλ^x/(x!). Then the hypothesis H₀ : λ = 1 uniquely specifies the distribution, because

f (x) = e−1/(x!) and hence is a simple hypothesis. The hypothesis H_a : λ > 1 is composite, because f (x) is

not uniquely determined.

Definition 2.2 A test at a given α of a simple hypothesis H₀ versus the simple alternative H_a that has

the largest power among tests with the probability of type I error no larger than the given α is called a most

powerful test.

Consider the test of hypothesis H₀ : θ = θ₀ versus H_a : θ = θ₁. If α is fixed, then our interest is to

make β as small as possible. Because β = 1 − Power(θ₁), by minimizing β we would obtain a most

powerful test. The following result says that among all tests with given probability of type I error, the

likelihood ratio test given later minimizes the probability of a type II error, in other words, it is most

powerful.

Theorem 7.2.1 (Neyman-Pearson Lemma) Suppose that one wants to test a simple hypothesis H₀ :

θ = θ₀ versus the simple alternative hypothesis Ha :θ =θ₁ based on a random sample X₁,...,X_n from a

distribution with parameter θ. Let L(θ) ≡ L(θ; X₁, . . . , X_n) > 0 denote the likelihood of the sample when

the value of the parameter is θ. If there exist a positive constant K and a subset C of the sample space Rⁿ (the

Euclidean n-space) such that

L(θ₀)

≤ K for (x₁,x₂,...,x_n) ∈ C

L(θ₁)

L(θ₀)

≥ K for (x₁,x₂,...,x_n) ∈ C^′, where C^′ is the complement of C, and

L(θ₁)

3. P [(X₁, . . . , X_n) ∈ C; θ₀] = α.

Then the test with critical region C will be the most powerful test for H₀ versus H_a. We call α the size of the

test and C the best critical region of size α.

Proof. We prove this theorem for continuous random variables. For discrete random variables, the

proof is identical with sums replacing the integral. Let S be some region in Rⁿ, an n-dimensional

Euclidean space. For simplicity we will use the following notation:

∫

L(θ) = . . . L(θ; x₁, x₂, . . . , x_n)dx₁dx₂, . . . , dx_n

Note that

∫

P ((X₁, . . . , X_n) ∈ C; θ₀) = f (x₁, . . . , x_n; θ₀)dx₁, . . . , dx_n

∫

= L(θ₀; x₁, . . . , x_n)dx₁, . . . , dx_n.

Suppose that there

is another critical region, say B, of size less than or equal

α,

that

∫

B L(θ0) ^≤α. Then

∫

≤ L(θ₀) − L(θ₀), because L(θ₀) = α by assumption 3.

Therefore,

∫

0 ≤ L(θ₀) − L(θ₀)

∫

= L(θ₀) +

L(θ₀) −

L(θ₀)

C∩B

C∩B^′

C∩B

C^′∩B

∫

= L(θ₀) −

L(θ₀).

C∩B^′

C^′∩B

Using assumption 1

of Theorem 7.2.1, KL(θ₁) ≥ L(θ₀) at each point in the region C and hence in

C ∩ B^′. Thus

∫

L(θ₀) ≤ K

L(θ₁).

C∩B^′

By assumption 2 of the theorem, KL(θ₁) ≤ L(θ₀) at each point in C^′, and hence in C^′ ∩ B. Thus,

∫

L(θ₀) ≥ K

L(θ₁).

C^′∩B

Therefore,

∫

0≤

L(θ₀) −

L(θ₀)

C∩B^′

C^′∩B

⎧

⎫

⎨

∫

⎬

≤K

L(θ₁)

⎩

⎪ L(θ1)−

⎭

C∩B^′

C^′∩B

That is,

⎧

⎫

⎨

∫

⎬

0≤K

L(θ₁) +

L(θ₁)−

L(θ₁) −

L(θ₁)

⎩

⎭

C∩B

C∩B^′

C∩B

C^′∩B

⎧

⎫

⎨∫

∫

⎬

L(θ₁) − L(θ₁)

= K⎩

⎭^.

As a result,

∫

L(θ₁) ≥ L(θ₁).

Because this is true for every critical region B of size ≤ α, C is the best critical region of size α, and

the test with critical region C is the most powerful test of size α.

When testing two simple hypotheses, the existence of a best critical region is guaranteed by the

Neyman-Pearson lemma. In addition, the foregoing theorem provides a means for determining

what the best critical region is. However, it is important to note that Theorem 7.2.1 gives only the

form of the rejection region; the actual rejection region depends on the specific value of α.

In real-world situations, we are seldom presented with the problem of testing two simple hypotheses.

There is no general result in the form of Theorem 7.4.1 for composite hypotheses. However, for

hypotheses of the form H₀ : θ = θ₀ versus H_a : θ > θ₀, we can take a particular value θ₁ > θ₀ and

then find a most powerful test for H₀ : θ = θ₀ versus H_a : θ > θ₁. If this test (that is, the rejection

region of the test) does not depend on the particular value θ₁, then this test is said to be a uniformly

most powerful test for H₀ : θ = θ₀ versus H_a : θ > θ₀.

The following example illustrates the use of the Neyman-Pearson lemma.

X−μ₀

∕

^√n

For H_a : μ = μ₁ > μ₀, the rejection region for the most powerful test would be

Reject H₀ if z > z_α.

On the other hand for H_a : μ = μ₂ < μ₀, the rejection region for the most powerful test would be

Reject H₀ if z < −z_α.

Thus, the rejection region depends on the specific alternative. Consequently, the two-sided hypothesis

just given has no UMP test.

In this section, we shall study a general procedure that is applicable when one or both H₀ and H_a are

composite. In fact, this procedure works for simple hypotheses as well. This method is based on the

maximum likelihood estimation and the ratio of likelihood functions used in the Neyman-Pearson

lemma. We assume that the pdf or pmf of the random variable X is f (x, θ), where θ can be one or

more unknown parameters. Let represent the total parameter space that is the set of all possible

values of the parameter θ given by either H₀ or H₁.

Consider the hypotheses

H₀ : θ ∈

0 vs. Ha : θ ∈ a =

−

where θ is the unknown population parameter (or parameters) with values in

, and

0 is a subset

Let L(θ) be the likelihood function based on the sample X₁, . . . , X_n. Now we define the likelihood

ratio corresponding to the hypotheses H₀ and H_a. This ratio will be used as a test statistic for the

testing procedure that we develop in this section. This is a natural generalization of the ratio test used

in the Neyman-Pearson lemma when both hypotheses were simple.

Definition 7.3.1 The likelihood ratio λ is the ratio

max L(θ; x₁, . . . , x_n)

θ∈

L^∗

λ=

max

L(θ; x₁, . . . , x_n)

L∗.

θ∈

We note that 0 ≤ λ ≤ 1. Because λ is the ratio of nonnegative functions, λ ≥ 0. Because

0 is a subset

, we know that max

L(θ) ≤ max L(θ). Hence, λ ≤ 1.

θ∈

If the maximum of L in

0 is much smaller as compared with the maximum of L in

, that is, if

λ is small, it would appear that the data X₁, . . . , X_n do not support the null hypothesis θ ∈

0. On

the other hand, if λ is close to 1, one could conclude that the data support the null hypothesis, H₀.

Therefore, small values of λ would result in rejection of the null hypothesis, and large values nearer

to 1 will result a decision in support of the null hypothesis.

For the evaluation of λ, it is important to note that maxθ∈ L(θ) = L(θ_ml.), where θ_ml. is the maximum

likelihood estimator of θ ∈

, and maxθ∈

0 L(θ)isthelikelihoodfunctionwithunknownparameters

replaced by their maximum likelihood estimators subject to the condition that θ ∈

0. We can

summarize the likelihood ratio test as follows.

LIKELIHOOD RATIO TESTS (LRTs)

To test

H₀ : θ ∈

0 vs. Ha : θ ∈ a

max L(θ; x₁, . . . , x_n )

θ∈

L^∗

λ=

maxL(θ; x1, . . . , xn )

L∗

θ∈

will be used as the test statistic.

The rejection region for the likelihood ratio test is given by

Reject H₀ if λ ≤ K .

K is selected such that the test has the given significance level α.

Example 3.1

Let X₁, . . . , X_n be a random sample from an N(μ, σ²). Assume that σ² is known. We wish to test, at level

α, H₀ : μ = μ₀ vs. H_a : μ = μ₀. Find an appropriate likelihood ratio test.

Solution

We have seen that to test

H₀ : μ = μ₀

vs. H_a : μ = μ₀

there is no uniformly most powerful test for this case. The likelihood function is

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L(μ) =

√

2πσ

Here,

0 = {μ0} and a = R − {μ0}.

Hence,

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L^∗

max

√

0 =

μ=μ₀

2πσ

∑

(x_i − μ₀)²

(

)_n

−i=1

2σ²

√

2πσ

Similarly,

∑

(x_i − μ)²

(

)_n

−i=1

2σ²

L^∗ = max

√

−∞<μ<∞

2πσ

Because the only unknown parameter in the parameter space is μ, −∞ < μ < ∞, the maximum of the

likelihood function is achieved when μ equals its maximum likelihood estimator, that is,

μ_ml. = X.

Therefore, with a simple calculation we have

(

)

∑

−

(x_i−μ₀)²

/2σ²

i=1

λ=

(

)

=e−n(x−μ0)2/2σ2.

∑

−

(x_i−x)²

/2σ²

e i=1

Thus, the likelihood ratio test has the rejection region

Reject H₀

if λ ≤ K

which is equivalent to

− n

2σ2(X−μ0)2≤lnK⇔

(X − μ₀)²

≥ 2lnK ⇔

σ²/n



X − μ0

 σ/^√n ≥2lnK=c1,say.

Note that we use the symbol ⇔ to mean ‘‘if and only if.’’ We now compute c₁. Under H₀

[(X − μ₀

(σ/^√n)] ∼ N(0, 1).

Observe that



}

X − μ0

α=P

σ∕√n ≥c1

gives a possible value of c₁ as c₁

= zα/2. Hence, LRT for the given hypothesis is



X − μ0

Reject H₀ if

 σ/^√n ≥za/2.

Thus, in this case, the likelihood ratio test is equivalent to the z-test for large random samples.

In fact, when both the hypotheses are simple, the likelihood ratio test is identical to the Neyman-

Pearson test. We can now summarize the procedure for the likelihood ratio test, LRT.

PROCEDURE FOR THE LIKELIHOOD RATIO TEST (LRT)

1. Find the largest value of the likelihood L(θ) for any θ₀ ∈

0 by finding the maximum likelihood

estimate within

0 and substituting back into the likelihood function.

2. Find the largest value of the likelihood L(θ) for any θ ∈ by finding the maximum likelihood

estimate within and substituting back into the likelihood function.

3. Form the ratio

L(θ) in

λ = λ(x₁,x₂,...,x_n) =

L(θ) in

4. Determine a K so that the test has the desired probability of type I error, α.

5. Reject H₀ if λ ≤ K .

In the next example, we find a LRT for a testing problem when both H₀ and H_a are simple.

4 HYPOTHESES FOR A SINGLE PARAMETER

In this section, we first introduce the concept of p-value. After that, we study hypothesis testing

concerning a single parameter.

4.1 The p-Value

In hypothesis testing, the choice of the value of α is somewhat arbitrary. For the same data, if the test

is based on two different values of α, the conclusions could be different. Many statisticians prefer to

compute the so-called p-value, which is calculated based on the observed test statistic. For computing

the p-value, it is not necessary to specify a value of α. We can use the given data to obtain the

p-value.

Definition 7.4.1 Corresponding to an observed value of a test statistic, the p-value

(or attained

significance level) is the lowest level of significance at which the null hypothesis would have been

rejected.

For example, if we are testing a given hypothesis with α = 0.05 and we make a decision to reject H₀

and we proceeded to calculate the p-value equal to 0.03, this means that we could have used an α as

low as 0.03 and still maintain the same decision, rejecting H₀.

Based on the alternative hypothesis, one can use the following steps to compute the p-value.

STEPS TO FIND THE p-VALUE

1. Let TS be the test statistic.

2. Compute the value of TS using the sample X₁, . . . , X_n . Say it is a.

3. The p-value is given by

⎧

⎪P (T S < a|H0 ),

if lower tail test

⎨

p-value =

P (T S > a|H₀ ),

if upper tail test

⎪

^⎩P (|T S| > |a||H₀ ), if two tail test.

Example 4.1

To test H₀ : μ = 0 vs. H_a : μ = 0, suppose that the test statistic Z results in a computed value of 1.58.

Then, the p-value = P (|Z| > 1.58) = 2(0.0571) = 0.1142. That is, we must have a type I error of 0.1142 in

order to reject H₀. Also, if H_a : μ > 0, then the p-value would be P (Z > 1.58) = 0.0582. In this case we

must have an α of 0.0582 in order to reject H₀.

The p-value can be thought of as a measure of support for the null hypothesis: The lower its value,

the lower the support. Typically one decides that the support for H₀ is insufficient when the p-value

drops below a particular threshold, which is the significance level of the test.

REPORTING TEST RESULT AS p-VALUES

1. Choose the maximum value of α that you are willing to tolerate.

2. If the p-value of the test is less than the maximum value of α, reject H₀.

If the exact p-value cannot be found, one can give an interval in which the p-value can lie. For example,

if the test is significant at α = 0.05 but not significant for α = 0.025, report that 0.025 ≤ p-value ≤

0.05. So for α > 0.05, reject H₀, and for α < 0.025, do not reject H₀.

In another interpretation, 1−(p-value) is considered as an index of the strength of the evidence against

the null hypothesis provided by the data. It is clear that the value of this index lies in the interval

[0, 1]. If the p-value is 0.02, the value of index is 0.98, supporting the rejection of the null hypothesis.

Not only do p-values provide us with a yes or no answer, they provide a sense of the strength of the

evidence against the null hypothesis. The lower the p-value, the stronger the evidence. Thus, in any

test, reporting the p-value of the test is a good practice.

Because most of the outputs from statistical software used for hypothesis testing include the p-value,

the p-value approach to hypothesis testing is becoming more and more popular. In this approach,

the decision of the test is made in the following way. If the value of α is given, and if the p-value of the

test is less than the value of α, we will reject H₀. If the value of α is not given and the p-value associated

with the test is small (usually set at p-value < 0.05), there is evidence to reject the null hypothesis in

favor of the alternative. In other words, there is evidence that the value of the true parameter (such as

the population mean) is significantly different (greater, or lesser) than the hypothesized value. If the

p-value associated with the test is not small (p > 0.05), we conclude that there is not enough evidence

to reject the null hypothesis. In most of the examples in this chapter, we give both the rejection region

and p-value approaches.

Example 4.2

The management of a local health club claims that its members lose on the average 15 pounds or more

within the first 3 months after joining the club. To check this claim, a consumer agency took a random

sample of 45 members of this health club and found that they lost an average of 13.8 pounds within the

first 3 months of membership, with a sample standard deviation of 4.2 pounds.

7.4 Hypotheses for a Single Parameter

363

(a) Find the p-value for this test.

(b) Based on the p-value in (a), would you reject the null hypothesis at α = 0.01?

Solution

(a) Let μ be the true mean weight loss in pounds within the first 3 months of membership in this club.

Then we have to test the hypothesis

H₀ : μ = 15 versus H_a : μ < 15

Here n = 45, x = 13.8, and s = 4.2. Because n = 45 > 30, we can use normal approximation.

Hence, the test statistic is

13.8 − 15

√

= −1.9166

4.2/

and

p-value = P (Z < −1.9166) ≃ P (Z < −1.92) = 0.0274.

Thus, we can use an α as small as 0.0274 and still reject H₀.

(b) No. Because the p-value = 0.0274 is greater than α = 0.01, one cannot reject H₀.

In any hypothesis testing, after an experimenter determines the objective of an experiment and decides

on the type of data to be collected, we recommend the following step-by-step procedure for hypothesis

testing.

STEPS IN ANY HYPOTHESIS TESTING PROBLEM

1. State the alternative hypothesis, H_a (what is believed to be true).

2. State the null hypothesis, H₀ (what is doubted to be true).

3. Decide on a level of significance α.

4. Choose an appropriate TS and compute the observed test statistic.

5. Using the distribution of TS and α, determine the rejection region(s) (RR).

6. Conclusion: If the observed test statistic falls in the RR, reject H₀ and conclude that based on the

sample information, we are (1 − α)100% confident that H_a is true. Otherwise, conclude that there is

not sufficient evidence to reject H₀. In all the applied problems, interpret the meaning of your

decision.

7. State any assumptions you made in testing the given hypothesis.

8. Compute the p-value from the null distribution of the test statistic and interpret it.

4.2 Hypothesis Testing for a Single Parameter

Now we study the testing of a hypothesis concerning a single parameter, θ, based on a random sample

X₁,...,X_n. Let θ be the sample statistic. First, we deal with tests for the population mean μ for large

and small samples. Next, we study procedures for testing the population variance σ². We conclude

the section by studying a test procedure for the true proportion p.

To test the hypothesis H : μ = μ₀ concerning the true population mean μ, when we have a large

sample (n ≥ 30) we use the test statistic Z given by

X−μ₀

S/^√n

where S is the sample standard deviation and μ₀ is the claimed mean under H₀ (if the population

variance is known, we replace S with σ.

For a small random sample (n < 30), the test statistic is

X−μ₀

T =

S/^√n

where μ₀ is the claimed value of the true mean, and X and S are the sample mean and standard

deviation, respectively. Note that we are using the lowercase letters, such as z and t, to represent the

observed values of the test statistics Z and T , respectively.

In practice, with raw data, it is important to verify the assumptions. For example, in the small sample

case, it is important to check for normality by using normal plots. If this assumption is not satisfied,

the nonparametric methods described in Chapter 12 may be more appropriate. In addition, because

the sample statistic such as X and S will be greatly affected by the presence of outliers, drawing a box

plot to check for outliers is a basic practice we should incorporate in our analysis.

We now summarize the typical test of hypothesis for tests concerning population (true) mean.

In order to compute the observed test statistic, z in the large sample case and t in the small sample

case, calculate the values of z = (x − μ₀)/(s/^√n) and t = [(x − μ₀)/(s/^√n)], respectively.

SUMMARY OF HYPOTHESIS TESTS FOR μ

Large Sample (n ≥ 30)

Small Sample (n < 30)

To test

H₀ : μ = μ₀

versus

μ > μ₀, upper tail test

μ < μ₀, lower tail test

H_a :

H_a : μ < μ₀, lower tail test

μ = μ₀, two-tailed test

X −μ₀

Test statistic: Z =

Test statistic: T =

σ/^√n

S/^√n

Replace σ by S, if σ is unknown.

⎧

⎪z >zα,

upper tail RR

⎪t >tα,n−1,

upper tail RR

⎨

Rejection region :

z < −z_α, lower tail RR

RR :

t < −tα,n−1,

lower tail RR

⎪

^⎩|z| > zα/2, two tail RR

^⎩|t | > tα/2,n−1, two tail RR

Assumption: n ≥ 30

Assumption: Random sample

comes from a normal

population

Decision: Reject H₀, if the observed test statistic falls in the RR and conclude that H_a is true with

(1 − α)100% confidence. Otherwise, keep H₀ so that there is not enough evidence to conclude that

H_a is true for the given α and more experiments may be needed.

Example 7.4.3

It is claimed that sports-car owners drive on the average 18,000 miles per year. A consumer firm believes that

the average mileage is probably lower. To check, the consumer firm obtained information from 40 randomly

selected sports-car owners that resulted in a sample mean of 17,463 miles with a sample standard deviation

of 1348 miles. What can we conclude about this claim? Use α = 0.01.

Solution

Let μ be the true population mean. We can formulate the hypotheses as H₀

: μ

= 18,000 versus

H_a : μ < 18,000.

The observed test statistic (for n ≥ 30) is

x−μ

17,463 − 18,000

√

σ/^√n

1348/

= −2.52.

Rejection region is {z < −z0.01} = {z < −2.33}.

Decision: Because z = −2.52 is less than −2.33, the null hypothesis is rejected at α = 0.01. There is

sufficient evidence to conclude that the mean mileage on sport cars is less than 18,000 miles per year.

Example 7.4.4

In a frequently traveled stretch of the I-75 highway, where the posted speed is 70 mph, it is thought that

people travel on the average of at least 75 mph. To check this claim, the following radar measurements of

the speeds (in mph) is obtained for 10 vehicles traveling on this stretch of the interstate highway.

Do the data provide sufficient evidence to indicate that the mean speed at which people travel on this

stretch of highway is at most 75 mph? Test the appropriate hypothesis using α = 0.01. Draw a box plot and

normal plot for this data, and comment.

Solution

We need to test

H₀ : μ = 75 vs. H_a : μ > 75