RELATIVE VALUES
As a result of statistical research
during processing of the statistical data of disease, mortality rate,
lethality, etc. absolute numbers are received, which specify the number of the phenomena.
Though absolute numbers have a certain cognitive values, but their use is
limited. For determination of a level of the phenomenon, for comparison of a
parameter in dynamics or with a parameter of other territory it is necessary to
calculate relative values (parameters,
factors) which represent result of a ratio of statistical numbers between
itself. The basic arithmetic action at subtraction of relative values is
division.
In medical statistics themselves the
following kinds of relative parameters are used:
— Extensive;
— Intensive;
— Relative
intensity;
— Visualization;
—
Correlation.
For the determination of a structure
of disease (mortality rate, lethality, etc.) the extensive parameter is used.
The extensive parameter or a parameter of distribution
characterizes a parts of the phenomena (structure), that is it shows, what part
from the general number of all diseases (died) is made with this or that
disease which enters into total.
Using this parameter, it is possible
to determine the structure of patients according to age, social status, etc. It
is accepted to express this parameter in percentage, but it can be calculated
and in parts per thousand case when the part of the
given disease is small and at the calculation in percentage it is expressed as
decimal fraction, instead of an integer.
The general formula of its subtraction is the
following:
part × 100
common number
Technique of the calculation of an extensive parameter
will be shown on an example.
To determine an age structure of those who has
addressed in a polyclinic if the following data is known:
Number of addressed — 1500 it is accepted by 100 %, number of patients of each age —
accordingly for X, from here per cent of what have addressed in a polyclinic in
the age of 15-19 years from the general number, will make:
1500 – 100
150 – X, ![]()
Table 2.5
Age groups of people,
which have visit to polyclinic
|
Age group |
Absolute number |
% from the general number |
|
15 – 19 |
150 |
10,0 |
|
20 – 29 |
375 |
25,0 |
|
30 – 39 |
300 |
20,0 |
|
40 – 49 |
345 |
23.0 |
|
50 – 59 |
150 |
10.0 |
|
60 and senior |
180 |
12.0 |
|
In total |
1500 |
100.0 |
Conclusion: most of the people that have addressed in a
polyclinic were in the age of 20-29 and 40-49 years.
The extensive parameter at the
analysis needs to be used carefully and we must remember that it is used only
for the characteristic of structure of the phenomena in the given place and at
present time. Comparison of a structure makes it possible to tell only about
change of a serial number of the given diseases in structure of diseases.
If it is necessary to determine
distribution of the phenomenon intensive parameters are used.
The intensive parameter characterizes frequency or
distribution.
It shows how frequently the given
phenomenon occurs in the given environment.
For example, how frequently there is this or that
disease among the population or how frequently people are dying from this or
that disease.
To calculate the intensive parameter, it is necessary
to know the population or the contingent.
General formula of the calculation is the following:
phenomenon × 100 (1000; 10 000; 100 000)
environment
Intensive parameters are calculated
on 1000 persons. These are parameters of birth, morbidity, mortality, etc.; on
separate disease they are being calculated on 10.000 and disease, which occurs
seldom — on 100000 persons.
Let' s consider a technique of its
subtraction on an example.
Example. Number of died in the area — 175, number of the population at the
beginning of year — 24000, at the end of year — 26000. To determine a parameter
of mortality :
General mortality = number
of died during the year
× 1000
rate number of the
population
We determine an average value of the population; for
this purpose we take the number of the population to the beginning of year plus
number of the population at the end of year and divide it by 2:
Average value of the population
.
We make a proportion: 175 persons, who died correspond
to 25000 people, and how many persons, who died correspond to 1000?
175 - 25000
X - 1000
‰
Parameters of birth, morbidity are calculated similarly
etc.
Table 2.6
Structure
of morbidity, invalidity and the reasons of mortality
|
Disease |
Structure of morbidity |
Structure of invalidity |
Structure of the reasons of death |
Index of relative intensity |
|
|
Of invalidity |
reasons of death |
||||
|
Traumas |
12.0 |
8.0 |
30.0 |
0.35 |
2.0 |
|
Heart and vessel diseases |
4.0 |
27.0 |
19.0 |
6.76 |
4.75 |
|
Diseases of nervous system |
6.0 |
8.0 |
- |
1.33 |
- |
|
Poisonings |
0.3 |
- |
0.4 |
- |
13.3 |
|
Tuberculosis |
0.5 |
5.0 |
5.5 |
10.0 |
11.0 |
|
Other |
74.2 |
52.0 |
41.5 |
0.7 |
0.56 |
|
Total |
100.0 |
100.0 |
100.0 |
- |
- |
Parameters of relative intensity represent a numerical ratio of two or
several structures of the same elements of a set, which is studied.
They allow determining a degree of
conformity (advantage or reduction) of similar attributes and are used as
auxiliary reception; in those cases where it isn’t possible to receive direct
intensive parameters or if it is necessary to measure a degree of a disproportion
in structure of two or several close processes.
For example, there are data only
about structure of the general morbidity, physical disability and mortality
rate.
Comparison of these structures and subtraction
of parameters of relative intensity allows finding out the relative importance
of these or those diseases in health parameters of the population.
So, for example, comparison of
densities of physical disability and mortality rates from cardiovascular
diseases with its densities in
morbidity allows to determine,
that cardiovascular diseases occupy almost in 7 times more part in physical
disability and almost in 5 times — in
mortality , than in structure of morbidity .
Procedure of the calculation of these
parameters is the following:
For example, densities of
cardiovascular diseases in structures:
— General morbidity - 4,0 %;
— Disability - 27,0 %;
— Reasons of mortality
- 19,0 %.
The parameter
of relative intensity of disability is received by a division of densities of
cardiovascular diseases in structure of
disability to densities of these diseases in the structure of the general
morbidity, which equals:
.
The parameter of relative intensity
of mortality is received in the similar way:
.
Thus, parameters of relative
intensity represent parameters of a disproportion of particles of the same
elements in the structure of processes, which are studying.
The parameter of correlation characterizes
the relation between diverse values.
For example, the parameter of average bed occupancy,
nurses, etc.
The techniques of subtraction of the
correlation parameter is the same as for intensive parameter,
nevertheless the number of an intensive parameter stands in the numerator, is included into denominator, where as in a
parameter of visualization of numerator and denominator different.
Example. Number of beds — 280, an average number of the
population — 260000. What is the bed
occupancy (BO) rate?
BO rate =
on 10.000 persons.
The parameter of visualization characterizes
the relation of any of comparable values to the initial level accepted for 100.
This parameter is used for convenience of comparison, and also in case shows a
direction of process (increase, reduction) not showing a level or the numbers
of the phenomenon.
It can be used for the characteristic
of dynamics of the phenomena, for comparison on separate territories, in
different groups of the population, for the construction of graphic.
Table 2.7
For example. Expression of parameters of visits to polyclinic
Polyclinic
|
Number of visits |
Parameter of presentation
= Polyclinic ¹ 1
(100%) |
|
¹ 1 |
850 |
100,0 |
|
¹ 2 |
920 |
108,1 |
|
¹ 3 |
990 |
116,1 |
|
¹ 4 |
1200 |
141,1 |
|
¹ 5 |
1290 |
151,7 |
For 100 % we take number of visits in a polyclinic ¹1 then the index of correlation
in the polyclinic ¹ 2 will be:
850 - 100
920 – X,
.
It is possible to calculate visualization parameters, using absolute
numbers, intensive parameters, parameters of correlation, average values, but
not extensive parameters, taking into account the above mentioned about this
parameter.
It is enough to calculate
parameters with the practical purpose to within one tenth.
To determine the tenth share, it is necessary to make calculation to the
second sign after a point.
Depending on, whether there will be a second sign more than five or less,
the first sign after a point is determined, in the first case it increases for
a unit, in the second – it remains the same.
A well-designed study, poorly analysed, can be rescued by a reanalysis
but a poorly designed study is beyond the redemption of even sophisticated
statistical manipulation. Many experimenters consult the medical statistician
only at the end of the study when the data have been collected. They believe
that the job of the statistician is simply to analyse the data, and with
powerful computers available, even complex studies with many variables can be
easily processed. However, analysis is only part of a statistician’s job, and
calculation of the final ‘p-value’ a minor one at that!
A far more important task for the medical statistician is to ensure that
results are comparable and generalisable.
In example, the types of individuals exposed to fluoridation depend
on their age, gender and ethnic mix, and these same factors are also known to
influence cancer mortality rates. It was established that over the 20
years of the study, fluoridated towns were more likely to be ones where
young, white people moved away and these are the people with lower cancer
mortality, and so they left behind a higher risk population.
Any observational study that compares populations distinguished by a
particular variable (such as a comparison of smokers and non-smokers) and
ascribes the differences found in other variables (such as lung cancer rates)
to the first variable is open to the charge that the observed differences
are in fact due to some other, confounding, variables. Thus, the difference in
lungtician is to ensure that results are comparable and generalisable. Cancer
rates between smokers and non-smokers has been ascribed to genetic factors;
that is, some factor that makes people want to smoke also makes them more
susceptible to lung cancer. The difficulty with observational studies is
that there is an infinite source of confounding variables. An
investigator can measure all the variables that seem reasonable to him but a
critic can always think of another, unmeasured, variable that just might
explain the result. It is only in prospective randomised studies that this
logical difficulty is avoided. In randomised studies, where exposure
variables (such as alternative treatments) are assigned purely by a chance
mechanism, it can be assumed that unmeasured confounding variables are
comparable, on average, in the two groups. Unfortunately, in many circumstances
it is not possible to randomise the exposure variable as part of the
experimental design, as in the case of smoking and lung cancer, and so
alternative interpretations are always possible.
Categorical or qualitative data
Nominal categorical data
Nominal or categorical data are data that one can name and put into
categories. They are not measured but simply counted. They often consist of
unordered ‘eitheror’ type observations which have two categories and are often
know as binary. For example: Dead or Alive; Male or Female; Cured or Not Cured;
Pregnant or Not Pregnant.
Ordinal data
If there are more than two categories of classification it may be
possible to order them in some way. For example, after treatment a patient may
be either improved, the same or worse; a woman may never have conceived,
conceived but spontaneously aborted, or given birth to a live infant.
Ranks
In some studies it may be appropriate to assign ranks. For example,
patients with rheumatoid arthritis may be asked to order their preference for
four dressing aids. Here, although numerical values from 1 to 4 may be assigned
to each aid, one cannot treat them as numerical values. They are in fact only
codes for best, second best, third choice and worst.
Interval and ratio scales
One can distinguish between interval and ratio scales. In an interval
scale, such as body temperature or calendar dates, a difference between two
measurements has meaning, but their ratio does not. Consider measuring
temperature (in degrees centigrade) then we cannot say that a temperature of
One difficulty with giving ranks to ordered categorical data is
that one cannot assume that the scale is interval. Thus, as we have indicated
when discussing ordinal data, one cannot assume that risk of cancer for an
individual educated to middle school level, relative to one educated only to
primary school level is the same as the risk for someone educated to college
level, relative to someone educated to middle school level.
Sample size and power considerations
One of the commonest questions asked of a consulting statistician is: How
large should my study be? If the investigator has a reasonable amount of
knowledge as to the likely outcome of a study, and potentially large resources
of finance and time, then the statistician has tools available to enable
a scientific answer to be made to the question. However, the usual
scenario is that the investigator has either a grant of a limited size, or
limited time, or a limited pool of patients. Nevertheless, given certain
assumptions the medical statistician is still able to help. For a given number
of patients the probability of obtaining effects of a certain size can be calculated.
If the outcome variable is simply success or failure, the statistician will
need to know the anticipated percentage of successes in each group so that the
difference between them can be judged of potential clinical relevance. If the
outcome variable is a quantitative measurement, he will need to know the size
of the difference between the two groups, and the expected variability of the
measurement.
For example, in a survey to see if patients with diabetes have raised
blood pressure the medical statistician might say, ‘with 100 diabetics and 100
healthy subjects in this survey and a possible difference in blood pressure of
5 mmHg, with standard deviation of 10 mmHg, you have a 20% chance of obtaining
a statistically significant result at the 5% level’. This statement means
that one would anticipate that in only one study in five of the proposed
size would a statistically significant result be obtained. The
investigator would then have to decide whether it was sensible or ethical to
conduct a trial with such a small probability of success. One option would be
to increase the size of the survey until success (defined as a
statistically significant result if a difference of 5 mmHg or more does
truly exist) becomes more probable.
A
characteristic that varies from one person or thing to another is called a variable.
Examples
of variables for humans are height, weight, number of
siblings, sex, marital status, and eye color. The first three of these
variables yield numerical information and are examples of quantitative
variables; the last three yield nonnumerical information and are examples
of qualitative variables, also called categorical variables.†
Quantitative
variables can be classified as either discrete or continuous. A discrete
variable is a variable whose possible values can be listed, even though the
list may continue indefinitely. This property holds, for instance, if either
the variable has only a finite number of possible values or its possible values
are some collection of whole numbers. A discrete variable usually involves a
count of something, such as the number of siblings a person has, the number of
cars owned by a family, or the number of students in an introductory statistics
class.
A
continuous variable is a variable whose possible values form some
interval of numbers. Typically, a continuous variable involves a measurement of
something, such as the height of a person, the weight of a newborn baby, or the
length of time a car battery lasts.
The
values of a variable for one or more people or things yield data. Thus
the information collected, organized, and analyzed by statisticians is data.
Data, like variables, can be classified as qualitative data, quantitative
data, discrete data, and continuous data.
Organizing Qualitative
Data
Some situations generate an overwhelming amount of data.We can
often make a large or complicated set of data more compact and easier to
understand by organizing it in a table, chart, or graph. In this section, we
examine some of the most important ways to organize qualitative data. In the
next section, we do that for quantitative data.
Frequency Distributions
Recall that qualitative data are values of a qualitative
(nonnumerically valued) variable.
One way of organizing qualitative data is to construct a table that
gives the number of times each distinct value occurs. The number of times a
particular distinct value occurs is called its frequency (or count).
A frequency distribution of
qualitative data is a listing of the distinct values and their frequencies.
To Construct a
Frequency Distribution of Qualitative Data
Step 1 List the
distinct values of the observations in the data set in the first column of a
table.
Step 2 For each observation, place a tally mark in the
second column of the table in the row of the appropriate distinct value.
Step 3 Count the
tallies for each distinct value and record the totals in the third column of
the table.
Relative-Frequency
Distributions
In addition to the frequency that a particular distinct value
occurs, we are often interested in the relative frequency, which is the ratio of the frequency to the total
number of observations:
Relative frequency = Frequency / Number of observations
As you might expect, a relative-frequency distribution of qualitative
data is similar to a frequency distribution, except that we use relative
frequencies instead of frequencies.
A relative-frequency
distribution of qualitative data is a listing of the distinct values and
their relative frequencies.
To Construct a
Relative-Frequency Distribution of Qualitative Data
Step 1 Obtain a
frequency distribution of the data.
Step 2 Divide each frequency by
the total number of observations.
Pie Charts
Another method for organizing and summarizing data is to draw a
picture of some kind. The old saying “a picture is worth a thousand words” has
particular relevance in statistics — a graph or chart of a data set often
provides the simplest and most efficient display.
Two common methods for graphically displaying qualitative data are
pie
charts and bar charts. We begin with pie charts.
A pie chart is a disk divided
into wedge-shaped pieces proportional to the relative frequencies of the
qualitative data.
To Construct a Pie
Chart
Step 1 Obtain a
relative-frequency distribution of the data by applying Procedure 2.2.
Step 2 Divide a disk
into wedge-shaped pieces proportional to the relative frequencies.
Step 3 Label the slices with the
distinct values and their relative frequencies.
Bar Charts
Another
graphical display for qualitative data is the bar chart.
Frequencies, relative frequencies, or percents can be used to label a bar
chart. Although we primarily use relative frequencies, some of our applications
employ frequencies or percents.
A bar chart displays the
distinct values of the qualitative data on a horizontal axis and the relative
frequencies (or frequencies or percents) of those values on a vertical axis.
The relative frequency of each distinct value is represented by a vertical bar
whose height is equal to the relative frequency of that value.
The bars should be positioned so that they do not touch each other.
To Construct a Bar
Chart
Step 1 Obtain a
relative-frequency distribution of the data by applying Procedure 2.2.
Step 2 Draw a horizontal
axis on which to place the bars and a vertical axis on which to display the
relative frequencies.
Step 3 For each distinct value, construct a vertical bar
whose height equals the relative frequency of that value.
Step 4 Label the bars
with the distinct values, the horizontal axis with the name of the variable,
and the vertical axis with “Relative frequency.”