RELATIVE VALUES

Relative values 

As a result of statistical research during processing of the statistical data of disease, mortality rate, lethality, etc. absolute numbers are received, which specify the number of the phenomena. Though absolute numbers have a certain cognitive values, but their use is limited. For determination of a level of the phenomenon, for comparison of a parameter in dynamics or with a parameter of other territory it is necessary to calculate  relative values (parameters, factors) which represent result of a ratio of statistical numbers between itself. The basic arithmetic action at subtraction of relative values is division.

In medical statistics themselves the following kinds of relative parameters are used:

Extensive;

— Intensive;

— Relative intensity;

— Visualization;

   Correlation.

For the determination of a structure of disease (mortality rate, lethality, etc.) the extensive parameter is used.

The extensive parameter or a parameter of distribution characterizes a parts of the phenomena (structure), that is it shows, what part from the general number of all diseases (died) is made with this or that disease which enters into total.

Using this parameter, it is possible to determine the structure of patients according to age, social status, etc. It is accepted to express this parameter in percentage, but it can be calculated and in parts per thousand case when the part of the given disease is small and at the calculation in percentage it is expressed as decimal fraction, instead of an integer.

The general formula of its subtraction is the following:       

part × 100

common number

Technique of the calculation of an extensive parameter will be shown on an example.

To determine an age structure of those who has addressed in a polyclinic if the following data is known:

Number of addressed — 1500 it is accepted by 100 %, number of patients of each age — accordingly for X, from here per cent of what have addressed in a polyclinic in the age of 15-19 years from the general number, will make:

 

1500 – 100

150 – X,                                  

Table 2.5

Age groups of people, which have visit to polyclinic

Age group

Absolute number

% from the general number

15 – 19

150

10,0

20 – 29

375

25,0

30 – 39

300

20,0

40 – 49

345

23.0

50 – 59

150

10.0

60 and senior

180

12.0

In total

1500

100.0

Conclusion: most of the people that have addressed in a polyclinic were in the age of 20-29 and 40-49 years.

The extensive parameter at the analysis needs to be used carefully and we must remember that it is used only for the characteristic of structure of the phenomena in the given place and at present time. Comparison of a structure makes it possible to tell only about change of a serial number of the given diseases in structure of diseases.

If it is necessary to determine distribution of the phenomenon intensive parameters are used.

The intensive parameter characterizes frequency or distribution.

It shows how frequently the given phenomenon occurs in the given environment.

For example, how frequently there is this or that disease among the population or how frequently people are dying from this or that disease.

To calculate the intensive parameter, it is necessary to know the population or the contingent.

General formula of the calculation is the following:

phenomenon × 100 (1000; 10 000; 100 000)

environment

Intensive parameters are calculated on 1000 persons. These are parameters of birth, morbidity, mortality, etc.; on separate disease they are being calculated on 10.000 and disease, which occurs seldom — on 100000 persons.

Let' s   consider a technique of its subtraction on an example.

Example. Number of died in the area — 175, number of the population at the beginning of year — 24000, at the end of year — 26000. To determine a parameter of mortality :

General  mortality  =  number of died  during the  year  × 1000

rate                                          number of  the  population

We determine an average value of the population; for this purpose we take the number of the population to the beginning of year plus number of the population at the end of year and divide it by 2:

Average value of the population .

We make a proportion: 175 persons, who died correspond to 25000 people, and how many persons, who died correspond to 1000?

175 - 25000

X - 1000

 

Parameters of birth, morbidity are calculated similarly etc.

Table  2.6

Structure of morbidity, invalidity and the reasons of mortality

 

Disease

Structure of morbidity

 

Structure of invalidity

 

Structure of the reasons of death

 

Index of relative intensity

Of invalidity

reasons of death

Traumas

12.0

8.0

30.0

0.35

2.0

Heart and vessel diseases

4.0

 

27.0

19.0

6.76

4.75

Diseases of nervous system

6.0

8.0

-

1.33

-

Poisonings

0.3

-

0.4

-

13.3

Tuberculosis

0.5

5.0

5.5

10.0

11.0

Other

 

74.2

52.0

41.5

0.7

0.56

Total

100.0

100.0

100.0

-

-

Parameters of relative intensity represent a numerical ratio of two or several structures of the same elements of a set, which is studied.

They allow determining a degree of conformity (advantage or reduction) of similar attributes and are used as auxiliary reception; in those cases where it isn’t possible to receive direct intensive parameters or if it is necessary to measure a degree of a disproportion in structure of two or several close processes.

For example, there are data only about structure of the general morbidity, physical disability and mortality rate.

Comparison of these structures and subtraction of parameters of relative intensity allows finding out the relative importance of these or those diseases in health parameters of the population.

So, for example, comparison of densities of physical disability and mortality rates from cardiovascular diseases with its densities in  morbidity  allows to determine, that cardiovascular diseases occupy almost in 7 times more part in physical disability and almost in 5 times — in  mortality , than in structure of morbidity .

Procedure of the calculation of these parameters is the following:

For example, densities of cardiovascular diseases in structures:

        — General morbidity                - 4,0 %;

Disability                               - 27,0 %;

— Reasons of mortality                     - 19,0 %.

The parameter of relative intensity of disability is received by a division of densities of cardiovascular diseases   in structure of disability to densities of these diseases in the structure of the general morbidity, which equals:

.

The parameter of relative intensity of mortality is received in the similar way:.

Thus, parameters of relative intensity represent parameters of a disproportion of particles of the same elements in the structure of processes, which are studying.

The parameter of correlation characterizes the relation between diverse values.

For example, the parameter of average bed occupancy, nurses, etc.

The techniques of subtraction of the correlation  parameter  is the same as for intensive parameter, nevertheless the number of an intensive parameter stands in the numerator,  is included into denominator, where as in a parameter  of visualization  of numerator and denominator different.

Example. Number of beds — 280, an average number of the population — 260000. What is the  bed  occupancy  (BO) rate?

         BO rate   = on 10.000 persons.

The parameter of visualization characterizes the relation of any of comparable values to the initial level accepted for 100. This parameter is used for convenience of comparison, and also in case shows a direction of process (increase, reduction) not showing a level or the numbers of the phenomenon.

It can be used for the characteristic of dynamics of the phenomena, for comparison on separate territories, in different groups of the population, for the construction of graphic.

Table  2.7

For example. Expression of parameters of visits to polyclinic

Polyclinic

Number of visits

Parameter of presentation =

Polyclinic ¹  1 (100%)

¹ 1

850

100,0

¹ 2

920

108,1

¹ 3

990

116,1

¹ 4

1200

141,1

¹ 5

1290

151,7

For 100 % we take number of visits in a polyclinic ¹1 then the index of correlation in the polyclinic ¹ 2 will be:

850 - 100

920 – X,        .

It is possible to calculate visualization parameters, using absolute numbers, intensive parameters, parameters of correlation, average values, but not extensive parameters, taking into account the above mentioned about this parameter.

It is enough to calculate   parameters with the practical purpose to within one tenth.

To determine the tenth share, it is necessary to make calculation to the second sign after a point.

Depending on, whether there will be a second sign more than five or less, the first sign after a point is determined, in the first case it increases for a unit, in the second – it remains the same.

A well-designed study, poorly analysed, can be rescued by a reanalysis but a poorly designed study is beyond the redemption of even sophisticated statistical manipulation. Many experimenters consult the medical statistician only at the end of the study when the data have been collected. They believe that the job of the statistician is simply to analyse the data, and with powerful computers available, even complex studies with many variables can be easily processed. However, analysis is only part of a statistician’s job, and calculation of the final ‘p-value’ a minor one at that!

A far more important task for the medical statistician is to ensure that results are comparable and generalisable.

In example, the types of individuals exposed to fluoridation depend on their age, gender and ethnic mix, and these same factors are also known to influence cancer mortality rates. It was established that over the 20 years of the study, fluoridated towns were more likely to be ones where young, white people moved away and these are the people with lower cancer mortality, and so they left behind a higher risk population.

Any observational study that compares populations distinguished by a particular variable (such as a comparison of smokers and non-smokers) and ascribes the differences found in other variables (such as lung cancer rates) to the first variable is open to the charge that the observed differences are in fact due to some other, confounding, variables. Thus, the difference in lungtician is to ensure that results are comparable and generalisable. Cancer rates between smokers and non-smokers has been ascribed to genetic factors; that is, some factor that makes people want to smoke also makes them more susceptible to lung cancer. The difficulty with observational studies is that there is an infinite source of confounding variables. An investigator can measure all the variables that seem reasonable to him but a critic can always think of another, unmeasured, variable that just might explain the result. It is only in prospective randomised studies that this logical difficulty is avoided. In randomised studies, where exposure variables (such as alternative treatments) are assigned purely by a chance mechanism, it can be assumed that unmeasured confounding variables are comparable, on average, in the two groups. Unfortunately, in many circumstances it is not possible to randomise the exposure variable as part of the experimental design, as in the case of smoking and lung cancer, and so alternative interpretations are always possible.

Categorical or qualitative data

Nominal categorical data 

Nominal or categorical data are data that one can name and put into categories. They are not measured but simply counted. They often consist of unordered ‘eitheror’ type observations which have two categories and are often know as binary. For example: Dead or Alive; Male or Female; Cured or Not Cured; Pregnant or Not Pregnant.

Ordinal data 

If there are more than two categories of classification it may be possible to order them in some way. For example, after treatment a patient may be either improved, the same or worse; a woman may never have conceived, conceived but spontaneously aborted, or given birth to a live infant.

Ranks 

In some studies it may be appropriate to assign ranks. For example, patients with rheumatoid arthritis may be asked to order their preference for four dressing aids. Here, although numerical values from 1 to 4 may be assigned to each aid, one cannot treat them as numerical values. They are in fact only codes for best, second best, third choice and worst.

Interval and ratio scales

One can distinguish between interval and ratio scales. In an interval scale, such as body temperature or calendar dates, a difference between two measurements has meaning, but their ratio does not. Consider measuring temperature (in degrees centigrade) then we cannot say that a temperature of 20°C is twice as hot as a temperature of 10°C. In a ratio scale, however, such as body weight, a 10 % increase implies the same weight increase whether expressed in kilograms or pounds. The crucial difference is that in a ratio scale, the value of zero has real meaning, whereas in an interval scale, the position of zero is arbitrary.

One difficulty with giving ranks to ordered categorical data is that one cannot assume that the scale is interval. Thus, as we have indicated when discussing ordinal data, one cannot assume that risk of cancer for an individual educated to middle school level, relative to one educated only to primary school level is the same as the risk for someone educated to college level, relative to someone educated to middle school level.

Sample size and power considerations

One of the commonest questions asked of a consulting statistician is: How large should my study be? If the investigator has a reasonable amount of knowledge as to the likely outcome of a study, and potentially large resources of finance and time, then the statistician has tools available to enable a scientific answer to be made to the question. However, the usual scenario is that the investigator has either a grant of a limited size, or limited time, or a limited pool of patients. Nevertheless, given certain assumptions the medical statistician is still able to help. For a given number of patients the probability of obtaining effects of a certain size can be calculated. If the outcome variable is simply success or failure, the statistician will need to know the anticipated percentage of successes in each group so that the difference between them can be judged of potential clinical relevance. If the outcome variable is a quantitative measurement, he will need to know the size of the difference between the two groups, and the expected variability of the measurement.

For example, in a survey to see if patients with diabetes have raised blood pressure the medical statistician might say, ‘with 100 diabetics and 100 healthy subjects in this survey and a possible difference in blood pressure of 5 mmHg, with standard deviation of 10 mmHg, you have a 20% chance of obtaining a statistically significant result at the 5% level’. This statement means that one would anticipate that in only one study in five of the proposed size would a statistically significant result be obtained. The investigator would then have to decide whether it was sensible or ethical to conduct a trial with such a small probability of success. One option would be to increase the size of the survey until success (defined as a statistically significant result if a difference of 5 mmHg or more does truly exist) becomes more probable.

A characteristic that varies from one person or thing to another is called a variable.

Examples of variables for humans are height, weight, number of siblings, sex, marital status, and eye color. The first three of these variables yield numerical information and are examples of quantitative variables; the last three yield nonnumerical information and are examples of qualitative variables, also called categorical variables.†

Quantitative variables can be classified as either discrete or continuous. A discrete variable is a variable whose possible values can be listed, even though the list may continue indefinitely. This property holds, for instance, if either the variable has only a finite number of possible values or its possible values are some collection of whole numbers. A discrete variable usually involves a count of something, such as the number of siblings a person has, the number of cars owned by a family, or the number of students in an introductory statistics class.

A continuous variable is a variable whose possible values form some interval of numbers. Typically, a continuous variable involves a measurement of something, such as the height of a person, the weight of a newborn baby, or the length of time a car battery lasts.

The values of a variable for one or more people or things yield data. Thus the information collected, organized, and analyzed by statisticians is data. Data, like variables, can be classified as qualitative data, quantitative data, discrete data, and continuous data.

Organizing Qualitative Data

Some situations generate an overwhelming amount of data.We can often make a large or complicated set of data more compact and easier to understand by organizing it in a table, chart, or graph. In this section, we examine some of the most important ways to organize qualitative data. In the next section, we do that for quantitative data.

Frequency Distributions

Recall that qualitative data are values of a qualitative (nonnumerically valued) variable.

One way of organizing qualitative data is to construct a table that gives the number of times each distinct value occurs. The number of times a particular distinct value occurs is called its frequency (or count).

A frequency distribution of qualitative data is a listing of the distinct values and their frequencies.

To Construct a Frequency Distribution of Qualitative Data

Step 1 List the distinct values of the observations in the data set in the first column of a table.

Step 2 For each observation, place a tally mark in the second column of the table in the row of the appropriate distinct value.

Step 3 Count the tallies for each distinct value and record the totals in the third column of the table.

Relative-Frequency Distributions

In addition to the frequency that a particular distinct value occurs, we are often interested in the relative frequency, which is the ratio of the frequency to the total number of observations:

Relative frequency = Frequency / Number of observations

As you might expect, a relative-frequency distribution of qualitative data is similar to a frequency distribution, except that we use relative frequencies instead of frequencies.

A relative-frequency distribution of qualitative data is a listing of the distinct values and their relative frequencies.

To Construct a Relative-Frequency Distribution of Qualitative Data

Step 1 Obtain a frequency distribution of the data.

Step 2 Divide each frequency by the total number of observations.

Pie Charts

Another method for organizing and summarizing data is to draw a picture of some kind. The old saying “a picture is worth a thousand words” has particular relevance in statistics — a graph or chart of a data set often provides the simplest and most efficient display.

Two common methods for graphically displaying qualitative data are pie charts and bar charts. We begin with pie charts.

A pie chart is a disk divided into wedge-shaped pieces proportional to the relative frequencies of the qualitative data.

To Construct a Pie Chart

Step 1 Obtain a relative-frequency distribution of the data by applying Procedure 2.2.

Step 2 Divide a disk into wedge-shaped pieces proportional to the relative frequencies.

Step 3 Label the slices with the distinct values and their relative frequencies.

Bar Charts

Another graphical display for qualitative data is the bar chart. Frequencies, relative frequencies, or percents can be used to label a bar chart. Although we primarily use relative frequencies, some of our applications employ frequencies or percents.

A bar chart displays the distinct values of the qualitative data on a horizontal axis and the relative frequencies (or frequencies or percents) of those values on a vertical axis. The relative frequency of each distinct value is represented by a vertical bar whose height is equal to the relative frequency of that value.

The bars should be positioned so that they do not touch each other.

To Construct a Bar Chart

Step 1 Obtain a relative-frequency distribution of the data by applying Procedure 2.2.

Step 2 Draw a horizontal axis on which to place the bars and a vertical axis on which to display the relative frequencies.

Step 3 For each distinct value, construct a vertical bar whose height equals the relative frequency of that value.

Step 4 Label the bars with the distinct values, the horizontal axis with the name of the variable, and the vertical axis with “Relative frequency.”