Diagrammatic
and graphic representation of data
The data in statistics are
generally based on individual observations. Some basic terms that are necessary
for an understanding of biological and agricultural data are:
Sample and population
The selection of a part of
a population to represent the whole population is known as sampling and the
part selected is known as sample. The object of sampling is to get information
regarding the population from which the sample is obtained. Suppose, there are
500 girls in a college, if we want to know the average weight of those girls,
we will weigh each girl and will get information about all the 500 girls. The
average will be obtained by dividing the total weight of the girls by the
number of girls. The population is the weight of the entire group of 500 girls.
However, we can save the time and labour by taking
only 50 girls out of 500 girls and obtain the average of this part of the total
population. The average of 50 girls reasonably is representative of average
weight of 500 girls. In this case weight of 50 girls is the sample. Examples:
We look at a handful of grain from a bag to evaluate the quality of rice, A
drop of blood is tested for diseases like malaria, typhoid etc.
In statistics, population
always means the total number of individual observations from which inferences
are to be made at a particular time. A sample represents the small collection
of the population which has actually been observed. For example, all plants in
a wheat field represent a population. Whereas individual observations on ten
high yielding plants from this population refer to a sample. Populations may be
finite or infinite. If a population of values consists of a fixed number of
these values, the population is said to be finite. For example, the number of
plants in a quadrat, the number of wheat plants in a plot and the number of
patients in a hospital etc. On the other hand, population in which it is
theoretically impossible to observe all the values, is
an infinite one. An infinite population is unlimited in size. For example, the
number of phytoplanktons in a pond, the number of
RBCs in the human body and so on.
If we have collected data
on only one variable we have a “univariate”
population.
However, if two variables
are measured (say the price of a certain commodity and the quantity demanded)
we have a “bivariate” population. When more than two variables are measured on
every individual, it is “multivariate” population.
Variable
The data collected for statistical analysis do not
consist of observations that are identical, since there would be little reason
to study such a variation. The data counted or measured for analysis purposes
will represent the varying values of variable, i.e. a characteristic that shows
variation. Any quantity or quality liable to show variation from one individual
to the next in the same population is known as variable. An individual
observation of any variable is known as variate.
Examples: the plant height, heights of adult males, the weights of preschool
children and the ages of patients seen in a dental clinic.
Parameters
Measures describing the characteristic of a population
are called parameters. For example: the mean pod length of all mungbean varieties growing in a field is 60mm. In this case, 60mm is the
characteristic of the population of all mungbean
varieties and can be called as population parameter. Greek letters are usually
used to denote parameters. The population mean is represented by µ.
Types of data
Primary data is original and
first-hand information. It can be collected through direct personal interview,
mailed questionnaires or information from correspondents.
Secondary data is those data collected from second person. It can be
collected from Government publications, report of Committees and Commissions,
newspapers, journals, magazines, annual reports, etc.
Summarization of data
through frequency distribution
One of the most important
aspects to any data presentation or data analysis is summarizing the available
data in a condensed and meaningful form. This can be done with the help of
frequency distribution tables and graphical representation. Let’s look at summarizing
the data with the help of frequency distribution tables.
Summarizing of data: Typically in any given
problems or in a sample data we have sets of numbers thrown together which do
not give out any specific output unless segregated. The method of classifying
the available data so as to give out a desired output or a meaningful result is
called “Summarizing”. This forms a part of the study called Descriptive
Statistics. Descriptive Statistics talks about arranging and displaying
numerical information from which conclusion can be drawn.
Frequency Distribution: The technique used in arranging,
sorting and depicting relevant data through grouping of data into categories,
showing the number of observation in each of the non-overlapping classes is
called as frequency distribution. For example let us look at the recently
published report by the WHO on the number of AIDS related death in our country.
The table put forth data on the age group of people and the number in each of
these age groups.
Graphic representation
and the graphic analysis
The
graphic representations are used for evident imagination of statistical
quantities they allow to analyze them deeper.
The graphic representation
can be built both after absolute and after relative quantities.
Using the graphic method, it is important to
know that the type of graphic representation must strictly answer the
maintenance of every index.
For construction of graphic representations the
following quantities
are used:
Relative quantities
are:
- intensive
indices
- extensive
indices
- index
of correlation
- index
of evidence
Absolute quantities
Intensive
quantities - 4 types of diagrams:
·
column
·
linear
·
mapgram
·
mapdiagram
Extensive
quantities: (they characterize the structure) sector or inwardly-column
diagram.
Indices
of correlation: the same
diagrams, that for intensive quantities (column and linear diagrams, mapgram, mapdiagram).
Indices
of evident: the
principles of graphic representation are the same, that for intensive
quantities.
Column
diagrams – for illustration of homogeneous, but not
interconnected indices. They represent the static’s of the phenomena.
Linear
diagrams – for the representations
of dynamics of that or other phenomenon (a typical example is a temperature
curve, change of birth-rate, death rate level).
Radial diagram –
is built on the system of polar co-ordinates of the phenomenon representations
during the close cycle of time (days, week, year). For
example: structure of morbidity or cause of mortality, where in a circle every
cause of mortality, depending on its percent occupies a certain sector.
Mapgram
is the representations of statistical quantities on
a geographical map (or scheme of card).
Absolute
and other indices can be marked.
Mapdiagram
is the representations of different types of
diagrams on the geographical map.
Common rules of
construction of graphic representations:
·
every graphic representation must
have the name, where its contest, time and place is mentioned;
·
it must be built in a certain
scale;
·
for every graphic representation
explanation of colored application must be given (as conditional denotations or
shading).
During the choosing of
graphic representations type, it’s necessary to know that it must strictly
answer to the essence of the represented index.
Principles of
construction and application of square diagrams (linear, column, rectangular,
sector, radial).
Linear
diagram is
used for illustration of the frequency phenomena which changes with time, that for
the representations of the phenomena dynamics.
The
base of this diagram is the rectangular system of co-ordinates. For example: on
abscising axis – Х - segments are put aside on a scale, on a y-axis – indices of
morbidity (х : y = 4: 3).
Column diagram (rectangular) is used for illustration of
homogeneous, but not connected between themselves intensive indices. It
represents dynamics or static of the phenomena.
At
construction of this kind of diagram columns are drawn, the height of which
must suit the quantities of the represented indices taking the scale into
account. It is necessary to take into account that the wideness of all the
columns and also the distance between them must be identical and arbitrary.
Columns on a diagram can be vertical or horizontal. For example: growth of
number of beds in permanent establishment from 1990 to 2003 year.
Sector diagram is used for illustration of
extensive indices, which characterize the structure of the phenomenon, thus
they give imagination about specific gravity of the phenomena in common.
The
circle is taken as 100 % (if indices are shown in %) thus 1 % equal to 3,60 circumferences. With the help of protractor
the segments, which suit the size of an index are put aside on a circle.
For example: among all infectious diseases a
measles had 28,6 % (28,6×3,60= 1030), and other
infections - 71,4 %
(71,4×3,60=2570).
With the help of protractor
the segments, which suit the size of every index are put aside on a circle. The
found points of circumference are connected with the center of circle. Separate
sectors in the circle are the parts of the phenomenon, which we determine.
In place of sector it is possible to use an
inwardly-column diagram. Then for 100 % the whole height of column is taken and
the extensive indices are put in the proper scale units, which give, in
essence, the whole one.
Radial
diagram is the type of the linear diagram built on
polar co-ordinates.
At the construction of radial diagram
in the role of abscising axis - Х is the circle divided on the identical number
of parts, according to the spans of time of that or another cycle.
A y-axis is the У- radius of circle or its
continuation.
So,
for the radius of circumference the medial quantity of time cycle is
phenomenon, which we analyze is taken. The amount of radiuses is equal to the
time domains of cycle, which we study:
·
12 radiuses – at the study of the
phenomena during a year
·
7 radiuses – at the study of the
phenomena during a week.
The beginning of radiuses marking is accepted
to begin from radius, which answers to 12 hours and to continue on a hour – hand.
Results of examinations after their statistical processing can given as graphic representations, on which numerical
numbers are presented as drawing. Schedules give a general characteristic of
the phenomenon and define its general laws, enable to analyze the given
researches more deeply.
They facilitate comparison of parameters, give imagination about
structure and character of connection between the phenomena, specify
their tendencies.
Therefore , graphic demonstration
we often connect with
the graphic analysis for which the graphic representation serves not
only means of demonstration of results and conclusions research, but also means
of the analysis of the received materials, revealing of internal connections
and laws.
At construction of schedules character of the data which are subject to a
graphic representation, purpose of schedules (demonstration at conference,
lectures, a reproduction in scientific work, etc.), the purpose of the schedule
(evidently to show the received results or only to emphasize, allocate any law
or the fact), a level of an audience before which the schedule is shown are
taken into account.
The choice will depend on all it is the following as a graphic
representation, color, the number, a proportion of a print, etc. In all cases
schedules should be clear, convenient and easy for reading.
In medical statistical researches linear diagrams,
plane diagrams, cartograms and linear or coordinate are used.
LINEAR DIAGRAMS are schedules on which numerical values are displayed by
curves which allow to trace dynamics of the phenomenon in time or to find out
dependence of one attribute on another (Fig. 2.1).
Fig.2.1 Age
mortality rate of the population in Ukraine(Ukrainian Center of
medical statistics, Kyiv,1999)
Whether on linear diagrams with two and a plenty of
curves probably also comparison of numbers in two the greater number of dynamic
lines, and also an establishment of dependence of changes of fluctuations which
occurs in the other number line.
Linear diagrams are
made according to system of
rectangular coordinates where the horizontal scale is postponed at the left - to
the right on a line of abscissas (X), and vertical - from below - upwards on a
line which is called as ordinate (Y). The obligatory requirement of
construction of any schedule is scale, that is the image on drawing should be
reduced, compared with corresponding figures.
Contrast to linear diagrams which describe dynamics of any process, plane
diagrams are used in the case when it is necessary to represent the statistical
phenomena or the facts, independent one from another.
The most simple example of plane diagrams is the
diagram as rectangular or figures. Digital numbers on plane diagrams average
represented by geometrical figures - rectangular, squares. These diagrams are
used for demonstration and popularization of the resulted data, and also in
cases if it is necessary to represent structure of the phenomenon on one of the
moments of supervision.
For example, age type fallen ill or structure of disease in any
settlement.
Fig. 2.2 Age
structure of the population in Ukraine, 1990, % (the part of each age layer was
determined to all population).
In long-pillar diagrams digital numbers are
represented by rectangular columns with an identical basis and different
height.
The height of a rectangular corresponds to the relative value of the
phenomenon which is studied. For construction a long-pillar diagram we use a
scale according to which it is possible to determine the height of each column.
Long-pillar diagrams serve for comparison of
several sizes. It is possible to rectangular
which represent sizes, it is possible to place also on the plane diagram not on
a vertical, and across and then there will be a tape diagram (Fig.4). In some cases the image of sizes as tapes (stirs)
is more convenient, than as columns because it is easier to sign with each tape
by a horizontal inscription.
With the aid of column and tape diagrams it is possible not only to
compare different sizes, but also simultaneously to display structure of these
numbers and to compare their parts. For example, long-pillar or tape diagrams
which show distribution of diseases on the basic nosological
forms, it is possible to show also percent of diseases among men and women.
For this purpose it is necessary (a figure or a tape) to divide each
rectangular for two parts, any of which will correspond to digital number of
disease among men and women.
In circular diagrams they use to display ratio of homogeneous absolute
sizes.
They don’t use the area of a rectangular, but the area of a circle.
But it is necessary to remember,
that the areas of circles match up one another as squares of their radiuses,
therefore at construction of circular diagrams we must extract off the diagram sizes and on this
basis to construct radius, and having radius, it is easy to describe a circle.
In a case if the circular diagram displays parts of
the whole, it is necessary to display circles not separately one from another,
and to impose against each other. The whole is possible also and its parts to
submit as the circle divided on sectors - the sector diagram. At construction
of the sector diagram all area of a circle is accepted for 100 %, and each
sector occupies is the following part of the area which correspond to the
necessary percent.
In practice for construction of sector diagrams it
is possible to use not only the area of a circle, but also the area of a square
and a rectangular.
Nevertheless, often it happens to divide is the following figures are
harder than a circle and consequently they are rather seldom used as a basis of
sector diagrams.
Radial or linear - circular diagram (Fig.2.3) are constructed on the basis of number coordinates
in which the radius replaces vertical scale of diagrams which are based on
system of rectangular coordinates.
The example of the radial diagram is a wind rose
with the aid of which we represent on maps the change of a direction of a wind
during any calendar period of time (month, year).
Radial diagrams are used for an illustration of
seasonal fluctuations of any numbers, for example diseases or mortality rates.
These diagrams are constructed on a circle which
center has12 radiuses. Each radius saws from a circle
an arch in 30 (360/12=30) also represents ordinate of one of calendar months:
January, March, etc.
As an initial zero point they take the center of a
circle, and then on radiuses according to the scale chosen before render
numbers which display intensity of seasonal fluctuations of the phenomenon in
any of calendar months.
Having connected the marked points, we receive the closed
line which enables to imagine seasonal fluctuations.
When building radial diagrams, it is necessary to
remember a rule of calculation of radiuses from the top part of the diagram and
in other words.
Fig.
2.3 The radial diagram.
Seasonal prevalence of mortality rate of the population of Kalinovsky district by Vinnitsya
region (1984-1998 ,Ukraine).
Comparisons of the different phenomena according to the territorial
attribute cartograms are built, if necessary. They represent geographical maps,
on which with the aid of graphic symbols, where the intensity of distribution
and grouping of the phenomenon (morbidity, mortality, etc.) for any period of
time (Fig. 2.4) is shown.
Therefore they are better for building on simplified maps on which only
administrative frontiers and some big settlements are shown. At construction of
a cartogram the great value has grouping the phenomena which are displayed.
The most simple grouping is division of some parameters on group with
parameters below average and group with parameters is higher than average.
According to this division regions districts with
parameters than will be shaded on a cartogram and below average - not shaded.
Fig. 2.4
Regional features of mortality from cancer in Ukraine.
Displays and Appraisals
of Patterns
The stem-leaf
plots discussed earlier in Section 3.3.1.2. have been replacing the histograms that were used for many
years to display the shape and contents of dimensional data. Histograms are
gradually becoming obsolete because they show results for arbitrary intervals
of data rather than the actual values, and because the construction requires an artistry beyond the simple digits that can easily be
shown with a typewriter or computer for a stem-leaf plot.
1. One-Way Graph
Data can always be displayed,
particularly if the stem-leaf plot is too large, with a “one-way graph.” The
vertical array of points on the left of Figure
5.2 is a one-way graph for
the data of Table 3.1.
In this type of graph, multiple points at the same vertical level of location
are spread horizontally close to one another around the central axis. A
horizontal line is often drawn through the vertical array to indicate the
location of the mean. The horizontal line is sometimes adorned with vertical
flanges so that it appears as || rather than .The adornment is merely an artist’s aesthetic caprice and should be
omitted because it can be confusing. The apparently demarcated length of the
flange may suggest that something possibly useful is being displayed, such as a
standard deviation or standard error.
2. Box Plot
If you want to see the pattern of
dimensional data, the best thing to examine is a stem-leaf plot. To summarize
the data, however, the best display is another mechanism: the box plot,
which Tukey called a “box-and-whiskers plot.” Based
on quantiles rather than “parametric” indexes, the
box plot is an excellent way to show the central index, interquartile (50%)
zone, symmetry, spread, and outliers of a distribution. The “invention” of the
box plot is regularly attributed to John Tukey, who proposed1 it in
1977, but a similar device, called a range
bar, was described in
Construction of Box — In the customary box plots today, data are displayed vertically with horizontal
lines drawn at the level of the median, and at the upper and lower quartiles
(which Tukey calls “hinges”). The three horizontal
lines are then connected with vertical lines to form the box. The interquartile
spread of the box shows the segment containing 50% of the data, or the
“H-spread.” The box plot for the data of Table
3.1 is shown on the right
side of Figure 5.2,
using the same vertical units as the corresponding one-way graph on the left.
For the boundaries of the box, the lower quartile is at 17 between the
14th and 15th rank, and the upper quartile is at 28 between the 41st and
42nd rank. The box thus has 28 as its top value, 21 as the
median, and 17 at the bottom. The mean of 22.7 is shown with a +
sign above the median bar.
The width of the horizontal lines in
a box plot is an aesthetic choice. They should be wide enough to show things
clearly but the basic shape should be a vertical rectangle, rather than a
square box. When box plots are compared for two or more groups, the horizontal
widths are usually similar, but McGill et al.9 have
proposed that the widths vary according to the square root of each group’s
size.
Construction of
“Whiskers” — Two single lines are
drawn above and below the box
to form the “whiskers” that summarize
the rest of the distribution beyond the quartiles. The length of the whiskers
will vary according to the goal at which they are aimed.
If intended to show the ipr95, the whiskers will extend up and down to the values
at the 97.5 and 2.5 percentiles. For the 2.5 and 97.5 percentiles, calculated
with the proportional method, the data of Table
3.1 have the respective
values of 12 and 41. Because the upper and lower quadragintile boundaries may not be located symmetrically
around either the median or the corresponding quartiles, the whiskers may have
unequal lengths. In another approach, the whiskers extend to the smallest and
largest observations that are within an H-spread (i.e., interquartile distance)
below and above the box.
For many analysts, however, the main
goal is to let the whiskers include everything but the outliers.
The egregious outliers can often be
noted by eye, during examination of either the raw data or the stem-leaf plot.
The box-plot summary, however, relies on a demarcating boundary that varies with
different statisticians and computer programs. Tukey
originally proposed that outliers be demarcated with an inner and outer set of
boundaries that he called fences. For the inner fences, the ends
of the whiskers are placed at 1.5 H-spreads (i.e., one and a half interquartile
distances) above the upper hinge (i.e., upper quartile) and below the lower
hinge (i.e., lower quartile). The outer fences are placed
correspondingly at 3.0 H-spreads above and below the hinges. With this
convention, the “mild” or “inner” outliers are between the inner and outer
fences; the “extreme” or “outer” outliers are beyond the outer fence.
Tukey’s boundaries are used for box-plot displays in the SAS data management system,10 where the inner outliers, marked 0, are located
between 1.5 and 3 H-spreads; the more extreme outliers, marked *, occur beyond
3 H-spreads. In the SPSS system,11 however, an “inner
outlier” is in the zone between 1.0 and 1.5 H-spreads and is marked with an X;
the “outer” (or extreme) outliers are beyond 1.5 H-spreads and are marked E.
In the data of Table 3.1 and
Figure 5.2, the spread
between the hinges at the upper and lower quartiles is 28 − 17 = 11.
Using the 1.5 H-spread rule, the whiskers would each have a length of 1.5 Ч 11 = 16.5 units, extending from 0.5 (=17 16.5) to 43.5 (=28 + 16.5).
Because the data in Table 3.1 have no major outliers, the whiskers can be shortened to show the entire
range of data from 11 to 43. This shortening gives unequal lengths to the
whiskers in Figure 5.2.
On the other hand, if boundaries are
determined with the 1 H-spread rule, each whisker would be 11 units long,
extending from a low of 6 (= 17 11) to a high of 39 (= 28 + 11).
The lower value could be reduced, because the whisker need only reach the smallest
value of data (11), but the upper whisker would not encompass the data
values of 41 and 42, which would then appear to be outliers.
Immediate Interpretations
— The horizontal line of the median between
the two quartiles divides the box plot into an upper and lower half. If the two
halves have about the same size, the distribution is symmetrical around the
central index. A substantial asymmetry in the box will promptly indicate that
the distribution is not Gaussian.
The “top-heavy” asymmetry of the box
in Figure 5.2 immediately shows that the distribution is skewed right (toward high
values). Because the mean and median have the same location in a Gaussian
distribution, the higher value of the mean here is consistent with the right
skew. In most distributions, the mean is inside the box; and a location beyond
the box will denote a particularly egregious skew.
Figure 1-1
Degree of emphasis on
the nurse as a sex object in motion pictures, 1930-1980 (N = 211). (From Kalisch, B. J., Kalisch, P. A., & McHugh,
M. L. [1982]. Research in Nursing and Health, 5, 150.)
Figure 1-2
Claimed region of
ethnic origin (TV = 645). (From Clinton, J. [1982]. The development
of an empirical construct for cross-cultural health research. Western Journal
of Nursing Research, 4, 281.)
Figure 1-3
Subject and
interviewer ratings of physical health of the older bereaved. (From
Valanis, B. G, & Yeaworth,
R. [1982]. Ratings of physical and mental health in the older bereaved. Research
in Nursing and Health, 5, 142.)
Figure i-4
Illustration of the importance of the number of bars in
designing a histogram for a set of data.
Figure 1-5
Number of previous
births in a sample of women having cesarean deliveries (TV = 123). Data
collected in program grant funded by
Figure 1-6
Histogram of age in a sample of hysterectomy
patients from an urban medical center (N = 112). Data collected in program grant
funded by
Figure 1-7
Illustration of the
importance of the graph's height in designing a histogram for a set
of data. Data collected in program grant funded by
Figure 1-8
Polygon superimposed
on histogram
shown in Figure 1-6. The two shaded triangles
are congruent. Data collected in program grant funded by
Figure 1-9
Comparison of depression scores for
patients having surgical procedure X (N = 104) and surgical procedure Y (N = 61). (Hypothetical data)
Figure 1-10
Example of histogram produced by SPSS: Denial scores from
a sample of 152 heart attack patients. Qacobsen,
B. S., & Lowery, B. J. [1992]. Further analysis of the psychometric properties
of the Levihe Denial of Illness Scale. Psychosomatic
Medicine, 54, 372-381.)
Figure 1-11
Histogram for psychological adjustment to illness by 195
patients with
breast cancer (higher scores indicate poorer adjustment). (Lowery, B. J., Jacobsen, B. S., & Ducette, J. [1992]. Causal attribution, control, and adjustment to breast cancer. Psychosocial
Oncology.)
Figure 1-12
Responses by 195
breast cancer patients to the question of whether attitude will
help to prevent a recurrence of cancer. Data collected in grant funded by
First-order degradation of pesticide residues can
be described by the regression equation: log y = log a - (log b) x (y =
residue, x = time). Conversion of the regression equation into antilogarithms
yields the exponential function y= a/bx. As the results of residue analyses are
subject to errors, the straight line that best fits the measured values must be
computed by regression analysis. These computations and the graphic
representation of the straight line and of the degradation curve back
transformed into a linear coordinate system are performed with a desk top
computer with plotter. Besides the constants log a and log b with their
confidence intervals, the following parameters are computed: 1. Coefficient of
determination (r2) as a measure of the dependence of
the (logarithm of) residue on time; 2. Test quantity (D) which indicates whether
or not such dependence exists; 3. Degradation times (T/X) with confidence
interval; 4. Residue at a certain point of time (e.g. safety interval) with
confidence interval; 5. Point of time at which a certain residue level is
reached; 6. Outlier test, if necessary. The mathematical principles for the
computation of these parameters are described.
Graphical
Representation of Data
The graphical representation of data makes the reading more interesting,
less time-consuming and easily understandable. The disadvantage of graphical
presentation is that it lacks details and is less accurate. In our study, we
have the following graphs: 1. Bar Graphs 2. Pie Charts 3. Frequency Polygon 4.
Histogram.
Bar Graphs
This is the simplest type of graphical presentation of data. The
following types of bar graphs are possible: (a) Simple bar graph (b) Double bar
graph (c) Divided bar graph.
Pie Graph or Pie Chart.
Sometimes a circle is used to represent a given data. The various parts
of it are proportionally represented by sectors of the circle. Then the graph
is called a Pie Graph or Pie Chart.
Frequency Polygon
In a frequency distribution, the
mid-value of each class is obtained. Then on the graph paper, the frequency is
plotted against the corresponding mid-value. These points are joined by
straight lines. These straight lines may be extended in both directions to meet
the X - axis to form a polygon.
Relative frequencies of
class intervals also can be shown in a frequency polygon. In this chart, the
frequency of each class is indicated by points or dots drawn at the midpoints of each class
interval. Those points are then connected by straight lines. Comparing the frequency
polygon (shown in Figure 1) to the frequency histogram (refer to Figure 1 in
"Frequency Histogram"), you see that the major difference is that
points replace the bars.
Whether to use bar charts or
histograms depends on the data. For example, you may have categorical (or qualitative)
data—numerical information about categories that vary significantly in
kind. Gender (male or female), types of automobile owned (sedan, sports car, pickup truck,
van, and so forth), and religious affiliations (Christian, Jewish, Muslim, and so forth) are
all qualitative data. On the other hand, quantitative data can be measured in amounts: age
in years, annual salaries, inches of rainfall. Typically, qualitative data are better displayed
in bar charts; quantitative data, in histograms.
Histogram
A two dimensional frequency
density diagram is called a histogram. A histogram is a diagram which
represents the class interval and frequency in the form of a rectangle.
In a simple bar graph, the height of each bar represents the frequency.
The thickness has no significance. All bars to have the same thickness.
We use double bar graph when we want to compare two things.
In the frequency polygon, the frequency is plotted against the mid
value of each class. These points are joined by line segments.
The scientific methods of collection of data, its classification and
application to commerce and everyday life is called statistics. A list of some
important terms as follows: ungrouped data, tabulation of data, range,
frequency, frequency distribution tally, inclusive type of grouped frequency
distribution, exclusive type of grouped frequency distribution, lower limit and
actual lower limit, upper limit and actual upper limit class size or class
width class mark or class mid-interval. Variables, Continuous Variables (xv)
Discrete Variables.
Graphical Representation
There are various methods
of graphical representation of statistical data. In our study, we learn two
types. Histogram Ogive or Cumulative Frequency Curve.
Cumulative Frequency
Cumulative
frequency is obtained by adding the frequency of a class interval and the
frequencies of the preceding intervals up to that class interval.
Frequency Curves
A histogram is a graphical
representation of a frequency distribution in the form of rectangles with the class
intervals as the bases and the corresponding frequencies as heights, without any gap
between two successive rectangles. A polygon obtained by joining the mid-values of the
widths of the consecutive rectangles of a histogram is called a frequency polygon. The
formula to find the mid-value of a class interval is:
Thus, given a histogram, a frequency
polygon can be drawn. A frequency polygon can be drawn even without
drawing a histogram if the class intervals and their frequencies are known.
When the class intervals in a
frequency distribution are decreased, the points of the frequency polygon come
closer and closer and the frequency polygon tends to be a curve. If the class interval in a
frequency distribution is very small, then the approximation of the frequency polygon is done
by drawing it freehand in such a way that no corners remain in it and its area is
approximately equal to the frequency polygon. The curve so formed is called a frequency curve.
If the points corresponding to the ordered pairs are plotted on a graph sheet and joined by
line segments, a frequency polygon is obtained. On the other hand, if the points are
joined by a smooth curve, a frequency curve is obtained.
Cumulative
Frequency Curve
A plot of the
cumulative frequency against the upper class boundary with the points joined by
line segments. Any continuous cumulative frequency curve, including a
cumulative frequency polygon, is called an ogive.
There are two ways of constructing an ogive or
cumulative frequency curve. The curve is usually of shape.
A histogram is
a diagram which represents the class interval and frequency in the form of a
rectangle. The cumulative frequency curve is a shaped curve. Points on the
cumulative frequency curve have abscissas as the actual upper / lower limits
for 'less than' / more than curve and ordinates as the cumulative frequencies.
GRAPHICAL REPRESENTATION OF DATA
Graphical representation
is done of the data available this being a very important step of statistical
analysis. We will be discussing the organization of data. The word 'Data' is
plural for 'datum'; datum means facts. Statistically the term is used for
numerical facts such as measures of height, weight and scores on achievement
and intelligence tests.
Tests, experiments and surveys
in education and psychology provide us valuable data, mostly in the shape of
numerical scores. For understanding data available and deriving meaning and
useful conclusion, the data have to be organized or arranged in some systematic
way. This can be done by following ways:
1. Statistical tables
2. Rank order
3. Frequency distribution
Statistical
tables
The data are tabulated or
arranged into rows and columns of different heading. Such tables can list
original raw scores as well as the percentages, means, standard deviations and
so on.
Rules for
constructing tables:
1. Title of the table
should be simple, concise and unambiguous. As a rule, it should appear on the
table.
2. The table should be
suitably divided into columns and rows according to the nature of data and
purpose. These columns and rows should be arranged in a logical order to
facilitate comparison.
3. The heading of each
columns or row should be as brief as possible. Two or more columns or rows with
similar headings may be grouped under a common heading to avoid repetition and
we may have subheadings or captions.
4. Sub
total for each separate classification and a general total for all
combined classes are to be given. These totals should be given at the bottom or
right of the concerned items.
5. The units in which the
data are given must invariably be mentioned.
6. Necessary footnotes
should be providing essential explanation of the points to ambiguous representation
of the tabulated data must be given at the bottom of the table.
7. The sources from where
the data have been received should be given at the end of the table.
9. If the numbers
tabulated have more than three significant figures, the digit should be grouped
in threes. For ex.- 4394756 as 4 394 756.
10. For all purposes and
by all means, the table should be as simple as possible so that it may be
studied by the readers with minimum possible strain and create a clear picture
and interpretations of the data.
Rank order
The original raw scores
can be arranged in an ascending or a descending series exhibiting an order with
respect to the rank or merit position of the individual. Example:
Sixteen students of BA
final psychology class obtained the following scores on an achievement test.
Tabulating the given data -
5 8 4 12 15 17 18 12 20 7
8 19 6 9 10 11
S. No. Scores S No. Scores
S No. Scores S No. Scores
1 20 5 15 9 10 13 7
2 19 6 12 10 9 14 6
3 18 7 12 11 8 15 5
4 17 8 11 12 8 16 4
Frequency
Distribution
The organization of the
data according to rank order does not help us to summarize a series of raw
scores. It also does not tell us the frequency of the raw scores. In frequency
distribution we group the data into an arbitrarily chosen groups or classes. It
is also seen that how many times a particular score or group of scores occurs
in the given data. This is known as the frequency distribution of numerical
data.
Construction of
Frequency distribution table
Finding the range:
First of all the range of
the series to be grouped is found. it is done by subtracting the lowest score from
the highest. In the present problem the range of the distribution is 46-12, 34.
Determining class
interval:
After finding range we
find class interval represented by Y. The formula for this is:
Writing the contents of
the frequency distribution table:
Writing the classes of the
distribution.
In the first column we
write the classes of distribution. First of all the lowest class is settled and
afterwards other subsequent classes are written down. In this case we take 10-14
as the lowest class, then wee have higher classes as
15-19, 20-24,.. and so on up to 45-49.
Tallying the
scores into proper classes.
The scores given are
tallied into proper classes in the second column then the tallies are counted
against each class to obtain the frequency of the class.
GRAPHICAL
REPRESENTATION OF DATA
The statistical data may
be presented in a more attractive form appealing to the eye with the help of
some graphic aids, i.e. Pictures and graphs. Such presentation carries a lot of
communication power. A mere glimpse of thee picture
and graphs may enable the viewer to have an immediate and meaningful grasp of
the large amount of data.
Ungrouped data
may be represented through a bar diagram, pie diagram, pictograph and line
graph.
Bar graph represents the
data on the graph paper in the form of vertical or horizontal bars.
· In a pie diagram, the
data is represented by a circle of 360degrees into
parts, each representing the amount of data converted into angles. The total
frequency value is equated to 360 degrees and then the angle corresponding to
component parts are calculated.
· In pictograms, the data
is represented by means of picture figures appropriately designed in proportion
to the numerical data.
· Line graphs represent
the data concerning one variable on the horizontal and other variable on the
vertical axis of the graph paper.
Grouped data may be
represented graphically by histogram, frequency polygon, cumulative frequency
graph and cumulative frequency percentage curve or ogive.
· A histogram is
essentially a bar graph of a frequency distribution. The actual class limits
plotted on the x-axis represents the width of various bars and respective
frequencies of these class intervals represent the height of these bars.
· A frequency polygon is a
line graph for the graphical representation of frequency distribution.
· A cumulative frequency
graph represents the cumulative frequency distribution by plotting actual upper
limits of the class intervals on the x axis and the respective cumulative
frequencies of these class intervals on the y axis.
· Cumulative frequency
percentage curve or ogive represents cumulative
percentage frequency distribution by plotting upper limits of the class
intervals on the x axis and the respective cumulative percentage frequencies of
these class intervals on the y axis.
METHOD FOR
CONSTRUCTING
A HISTOGRAM
1. The scores in the form
of actual class limits as 19.5-24.5, 24.5-29.5 and so on are taken as examples
in the construction of a histogram rather than written class limits as 20-24,
25-30.
2. It is customary to take
two extra intervals of classes one below and above the grouped intervals.
3. Now we take the actual lower
limits of all the class intervals and try to plot them on the x axis. The lower
limit of the lowest class interval is taken at the intersecting point of x axis
and y axis.
4. Frequencies of the
distribution are plotted on the y axis.
5. Each class interval
with its specific frequency is represented by separate rectangle. The base of
each rectangle is the width of the class interval. And the height is
representative of the frequency of that class or interval.
6. Care should be taken to
select the appropriate units of representation along the x and y axis. Both the
axis and the y axis must not be too short or too long.
METHOD FOR
CONSTRUTING
A FREQUENCY
POLYGON
1. As in histogram two
extra class interval is taken, one above and other below the given class
interval.
2. The mid-points of the
class interval is calculated.
3. The mid
point is calculated along the x axis and the corresponding frequencies
are plotted along the y axis.
4. The various points
given by the plotting are joined by lines to give frequency polygon.
DIFFERENCE
BETWEEN HISTOGRAM AND FRQUENCY POLYGON
Histogram is a bar graph
while frequency polygon is a line graph. Frequency polygon is more useful and
practical. In frequency polygon it is easy to know the trends of the distribution;
we are unable to do so in histogram. Histogram gives a very clear and accurate
picture of the relative proportion of the frequency from interval to interval.
METHOD FOR
CONSTRUTING
A CUMULATIVE
FREQUENCY GRAPH
1. First of all we
calculate the actual upper and lower limits of the class intervals i.e. if the
class interval is 20-24 then upper limit is 24.5 and the lower limit is 19.5.
2. We must know select a
suitable scale as per the range of the class interval and plot the actual upper
limits on the x axis and the respective cumulative frequency on y axis.
3. All the plotted points
are then joined by successive straight lines resulting a line graph.
4. To plot the origin of
the x axis an extra class interval is taken with cumulative frequency zero is
taken.
Statistics is that branch of mathematics devoted to the
collection, compilation, display, and interpretation of numerical data. In
general, the field can be divided into two major subgroups, descriptive
statistics and inferential statistics. The former subject deals primarily with
the accumulation and presentation of numerical data, while the latter focuses
on predictions that can be made based on those data.
Perhaps the
simplest way to report the results of the study described above is to make a
table. The advantage of constructing a table of data is that a reader can get a
general idea about the findings of the study in a brief glance.
Two fundamental concepts used in statistical analysis are population and sample.
The term population refers to a complete set of individuals, objects, or events
that belong to some category. For example, all of the players who are employed
by Major League Baseball teams make up the population of professional major
league baseball players. The term sample refers to some subset of a population.
Statistics
- Collecting Data
Statistics -
Graphical Representation
The table shown above is
one way of representing the frequency distribution of a sample or population. A
frequency distribution is any method for summarizing data that shows the number
of individuals or individual cases present in each given interval of
measurement. In the table above, there are 5,382,025 female African-Americans
in the age group 0-19;
Statistics - Distribution Curves
Finally, think
of a histogram in which the vertical bars are very narrow...and then very, very
narrow. As one connects the midpoints of these bars, the frequency polygon
begins to look like a smooth curve, perhaps like a high, smoothly shaped hill.
A curve of this kind is known as a distribution curve. Probably the most
familiar kind of distribution curve is one with a peak in the middle.
Statistics
Other
Kinds Of Frequency Distributions
Bar graphs look very much like histograms except that gaps are left
between adjacent bars. This difference is based on the fact that bar graphs are
usually used to represent discrete data and the space between bars is a
reminder of the discrete character of the data represented. Line graphs can
also be used to represent continuous data. If one were to record the
temperature once an hour all day to week.
Statistics - Measures Of Central Tendency
Both
statisticians and non-statisticians talk about "averages" all the time.
But the term average can have a number of different meanings. In the field of
statistics, therefore, workers prefer to use the term "measure of central
tendency" for the concept of an "average." One way to understand
how various measures of central tendency.
Measures
Of Variability
Suppose that a
teacher gave the same test to two different classes and obtained the following
results: Class 1: 80%, 80%, 80%, 80%, 80% Class 2: 60%, 70%, 80%, 90%, 100% If
you calculate the mean for both sets of scores, you get the same answer: 80%.
But the collection of scores from which this mean was obtained was very
different in the two cases. The way that statisticians have of distinguishing…
Statistics -
Inferential Statistics
Expressing a collection of data in some useful form, as described above,
is often only the first step in a statistician's work. The next step will be to
decide what conclusions, predictions, and other statements, if any, can be made
based on those data. A number of sophisticated mathematical techniques have now
been developed to make these judgments. An important fundamental concept used
in biostatistics.
Computer forensics is the preservation,
analysis, and interpretation of computer data. There is a need for software
that aids investigators in locating data on hard drives left by persons
committing illegal activities. These software tools should reduce the tedious
efforts of forensic examiners, especially when searching large hard drives. A
method is proposed here that uses visualization techniques to represent file
statistics, such as file size, last access date, creation date, last
modification date, owner, and file type. The user interface to this software
allows file searching, pattern matching, and display of file contents. By
viewing file information graphically, the developed software will reduce the
examiner’s analysis time and greatly increase the probability of locating
criminal evidence.
Computer forensics is the preservation,
analysis, and interpretation of computer data. In a world wherein the number of
crimes committed using computers is increasing rapidly, a definite need exists
for forensic software tools. These tools allow investigators to follow digital
tracks left by persons committing illegal activities. Traces of evidence may be
found in plain text documents, log files, or even system files, yet more
technologically advanced criminals may conceal information by deleting it,
encrypting it, or embedding it inside another file. With the large amount of
storage space available on modern hard drives, searching for a single file
becomes quite tedious without the help of special forensic tools. Using
visualization techniques to display information about computer data can help
forensic specialists direct their search to suspicious files.
A great deal of time is wasted trying to
interpret mass amounts of data that is not correlated or meaningful without
high levels of patience and tolerance for error. A well quoted phrase, “a
picture is worth a thousand words,” is what we’re trying to accomplish here.
Human brains have the ability to interpret and comprehend pictures, video, and
charts much faster than reading a description of the same. This is because the
human mind is able to examine graphics in parallel but only examine text in
serial. Imagine a friend trying to describe in an email the beauty of the
One single picture not only presented an
accurate representation of the
Requests for more information about a suspect
file can be filled by clicking on the display and walking through various
menus. Viewing information about multiple files or understanding the
relationship between them is also helpful. The user interface to this software
allows file searching, pattern matching, and display of file contents. Each of
these options allows a deeper analysis of the data stored on the hard drive and
results in a flexible and customizable tool for locating criminal evidence.
The software tool we have developed will greatly
aid the computer forensic process by reducing the time to identify suspicious
files and increasing the probability of locating criminal evidence. This is
done by using a graphical representation of the file rather than traditional
text.
Our contributions to computer science include
the use of enhanced tree-maps, applied visualization techniques for computer
forensics, and a software framework on which to build future enhancements.
Enhanced tree-maps help represent temporal information about files, such as
access time. Traditional tree-maps only have the capability of representing
spatial information, such as size. The first to apply visualization techniques
to computer forensics and will show it to be a promising method for identifying
hidden or altered files. Lastly, our software allows for additional
visualization techniques not yet developed.
Documentation
During
the analysis process, detailed information must be recorded if there is to be
any hope of a successful court appearance. This information includes forensic tools
used, actions taken, and chain of custody. Some forensic tools have more
credibility in court than others because they have been proven. Thus, it is
important to use a proven forensic tool. Actions taken include opening files
and hashing. Time of day should be recorded whenever a file is opened, hashed,
or scanned, along with the directory it was discovered in. Every examiner
involved in the case needs to be recorded in the chain of custody. At any time
in the investigation, it should be clear and possible to identify the
individual who carried out an analysis task.
Court Appearance
Once the evidence has been analyzed,
authenticated, and documented, it may go to court. It is important to present
the case in a simple and clear manner because judges and juries may not have
technical knowledge of computer systems. Investigators who have followed the
forensic process will have a higher probability of winning the case. However,
if there are holes in the chain of custody or any step of the forensic process,
the defence will exploit them and usually succeed at convincing the jury the
investigation was handled improperly.
The prosecution, thus, would not be able to
rebuild their case and would loose. An understanding
of the computer forensic process leads to the development of improved software
that aids investigators in locating evidence. Any software used to collect or
analyze evidence must follow the computer forensic
guidelines; otherwise, its use becomes a hindrance rather than a benefit.
Visualization of Data
Tree-Maps
In
our method, we use visualization techniques to help represent file attributes.
One method of displaying the relationship of files visually in two dimensions
is called a tree map. Schneiderman describes
tree-maps as 2D space-filling algorithm for complex
tree
structures.
They are designed to display the entire tree structure in one screen. Each file
is represented by a shaded box that adheres to a chosen colouring scheme that
highlights file and directory boundaries. Box size is determined by two
parameters: the size of the user selected display region and percentage of the
selected directory the file occupies. Other file directory representations like
that of Windows Explorer use nodes and edges rotated on their side and always
require scrolling up and down to view the complex structure.
The tree-map facilitates easy recognition of the
largest files because they take up the most space in the 2D
display. The method of using tree-maps to visualize data storage and directory
structure greatly reduces the time it takes to locate large files in a tree
structure that is nine levels deep and contains many thousands of files.
Tree-maps are primarily designed to emphasize large files. However, Schneiderman does point out that a user can drag a mouse over
the display and click on a shaded box to query the system for the file name or
other information. Such additions may enhance the usefulness of tree-maps, but
stand-alone tree-maps for computer forensics contain many weaknesses. Small
files and directories are hidden among larger files and may not even show
up
on the display. We may be looking for a simple file on a massive hard drive. If
the file is small or if the disk contains numerous files, our file will hardly
stand out.
For our purposes, stand-alone tree-maps require
enhancement that provides the user with advanced filtering and display
techniques. In this way, tree-maps are interesting and provide groundwork for
opportunities in computer forensics
"Graphic
representation of statistics" Videos
Graphic representation of statistics Questions
& Answers
Question: Name the different graphical
representation of data use in statistics
Answer: graphical representation of statistical
data is for the sole purpose of easier interpretation. in modern manufacturing
it has been converted to 'statistical process control' which sprung the 'seven
QC tools' and was recently upgraded the seven QC tools: flow charts run charts
paretic diagram histogram cause effect diagrams scatter diagrams control chart
(the most famous and widely used) the new version: affinity diagrams relations
diagrams tree diagram matrix diagram arrow diagram process decision program
charts matrix data analysis just type any of the key words i put in here in
your search engines and you'll have better explanations about them good luck
Question: Based on your observation list out 4
points on the characteristics of logarithmic or exponential functions and their
graphical representation.
Answer: they are mirror images of each other.
That’s one.
Question: Working with Numbers Number Operations
and Number Sense Simple Algebra Algebra, Functions,
and Patterns Geometry and Graphing Measurement, Geometry and Coordinate
Geometry or lead me to a site. 4. Statistical Math Data analysis, reading
graphical representations of data Statistics and probability
Answer: I'm not trying to just get points, but
no one can help you with this. you have to have real problems, because these
subjects are so broad that it would be impossible to cover these even simply
without talking an hour.
In medical
statistics themselves the following kinds of relative parameters are used:
– Extensive;
– Intensive;
– Relative intensity;
– Visualization ;
– Correlation.
For the
determination of a structure of disease (mortality rate, lethality, etc.) the
extensive parameter is used.
The extensive parameter or a parameter of distribution
characterizes a parts of the phenomena (structure), that is it shows, what part
from the general number of all diseases (died) is made with this or that
disease which enters into total.
Using this
parameter, it is possible to determine the structure of patients according to
age, social status, etc. It is accepted to express this parameter in
percentage, but it can be calculated and in parts per thousand case, when the
part of the given disease is small and at the calculation in percentage it is
expressed as decimal fraction, instead of an integer.
The general
formula of its subtraction is the following:
Technique of
the calculation of an extensive parameter will be shown on an example.
To determine
an age structure of those who has addressed in a polyclinic if the following
data is known:
Number of
addressed – 1500 it is accepted by 100 %,
number of patients of each age – accordingly for X, from here per cent of what
have addressed in a polyclinic in the age of 15-19 years from the general
number, will make:
1500– 100
150 – X,
Table 2.5. Age groups of people,
which have visit to polyclinic
Age group |
Absolute number |
% from the general
number |
15 – 19 |
150 |
10,0 |
20 – 29 |
375 |
25,0 |
30 – 39 |
300 |
20,0 |
40 – 49 |
345 |
23.0 |
50 – 59 |
150 |
10.0 |
60 and senior |
180 |
12.0 |
In total |
1500 |
100.0 |
Conclusion:
most of the people that have addressed in a polyclinic were in the age of 20-29
and 40-49 years.
The extensive parameter
at the analysis needs to be used carefully and we must remember, that it is
used only for the characteristic of structure of the phenomena in the given
place and at present time. Comparison of a structure makes it possible to tell
only about change of a serial number of the given diseases in structure of
diseases.
If it is
necessary to determine distribution of the phenomenon intensive parameters are
used.
The intensive parameter characterizes frequency or
distribution.
It shows, how
frequently the given phenomenon occurs in the given environment.
For example,
how frequently there is this or that disease among the population or how
frequently people are dying from this or that disease.
To calculate
the intensive parameter, it is necessary to know the population or the
contingent.
General
formula of the calculation is the following:
Intensive parameters
are calculated on 1000 persons. These are parameters of birth, morbidity,
mortality, etc.; on separate disease they are being calculated on 10.000 and
disease, which occurs seldom – on 100000 persons.
Let' s
consider a technique of its subtraction on an example.
Example.
Number of died in the area – 175, number of the population at the beginning of
year – 24000, at the end of year – 26000. To determine a parameter of
mortality:
We determine
an average value of the population; for this purpose we take the number of the
population to the beginning of year plus number of the population at the end of
year and divide it by 2:
We make a
proportion: 175 persons, who died correspond to 25000 people, and how many
persons, who died correspond to 1000?
175-25000
ò –
1000
Parameters of
birth, morbidity are calculated similarly etc.
Table 2.6. Structure of morbidity,
invalidity and the reasons of mortality
Disease |
Structure of morbidity |
Structure of invalidity |
Structure of the reasons
of death |
Index of relative
intensity |
|
Of invalidity |
reasons of death |
||||
Traumas |
12.0 |
8.0 |
30.0 |
0.35 |
2.0 |
Heart
and vessel diseases |
4.0 |
27.0 |
19.0 |
6.76 |
4.75 |
Diseases
of nervous system |
6.0 |
8.0 |
- |
1.33 |
- |
Poisonings |
0.3 |
- |
0.4 |
- |
13.3 |
Tuberculosis |
0.5 |
5.0 |
5.5 |
10.0 |
11.0 |
Other |
74.2 |
52.0 |
41.5 |
0.7 |
0.56 |
Total |
100.0 |
100.0 |
100.0 |
- |
- |
Parameters of relative intensity represent a
numerical ratio of two or several structures of the same elements of a set,
which is studied.
They allow
determining a degree of conformity (advantage or reduction) of similar
attributes and are used as auxiliary reception; in those cases where it isn’t
possible to receive direct intensive parameters or if it is necessary to
measure a degree of a disproportion in structure of two or several close
processes.
For example,
there are data only about structure of the general morbidity, physical
disability and mortality rate.
Comparison of
these structures and subtraction of parameters of relative intensity allows
finding out the relative importance of these or those diseases in health
parameters of the population.
So, for
example, comparison of densities of physical disability and mortality rates
from cardiovascular diseases with its densities in morbidity allows to determine,
that cardiovascular diseases occupy almost in 7 times more part in physical
disability and almost in 5 times – in mortality, than in structure of
morbidity.
Procedure of
the calculation of these parameters is the following:
For example,
densities of cardiovascular diseases in structures:
– General morbidity – 4,0 %;
– Disability –
27,0 %;
– Reasons of mortality
– 19,0 %.
The parameter
of relative intensity of disability is received by a division of densities of
cardiovascular diseases in structure of disability to densities of these
diseases in the structure of the general morbidity, which equals:
.
The parameter
of relative intensity of mortality is received in the similar way:.
Thus,
parameters of relative intensity represent parameters of a disproportion of
particles of the same elements in the structure of processes, which are
studying.
The parameter of visualization characterizes the relation
between diverse values.
For example,
the parameter of average bed occupancy, nurses, etc.
The techniques
of subtraction of the visualization parameter is the same as for intensive
parameter, nevertheless the number of an intensive parameter stands in the
numerator, is included into denominator, where as in a parameter of
visualization of numerator and denominator different.
Example.
Number of beds – 280, an average number of the population – 260000. What is the
bed occupancy (BO) rate?
BO rate = on 10.000 persons.
The parameter of correlation characterizes the relation
of any of comparable values to the initial level accepted for 100. This
parameter is used for convenience of comparison, and also in case shows a
direction of process (increase, reduction) not showing a level or the numbers
of the phenomenon.
It can be used
for the characteristic of dynamics of the phenomena, for comparison on separate
territories, in different groups of the population, for the construction of
graphic.
Results of examinations after
their statistical processing can given as graphic
representations, on which numerical numbers are presented as drawing. Schedules
give a general characteristic of the phenomenon and define its general laws,
enable to analyze the given researches more deeply.
They facilitate comparison of parameters, give
imagination about structure and character of connection between the phenomena,
specify their tendencies.
Therefore, graphic demonstration we often connect
with the graphic analysis for which the graphic representation serves not only
means of demonstration of results and conclusions research, but also means of
the analysis of the received materials, revealing of internal connections and
laws.
At construction of schedules character of the data
which are subject to a graphic representation, purpose of schedules
(demonstration at conference, lectures, a reproduction in scientific work,
etc.), the purpose of the schedule (evidently to show the received results or
only to emphasize, allocate any law or the fact), a level of an audience before
which the schedule is shown are taken into account.
The choice will depend on all it is the following
as a graphic representation, color, the number, a proportion of a print, etc.
In all cases schedules should be clear, convenient and easy for reading.
In medical statistical researches linear diagrams,
plane diagrams, cartograms and linear or coordinate are used.
LINEAR DIAGRAMS are schedules on which numerical values are displayed by curves
which allow to trace dynamics of the phenomenon in time or to find out
dependence of one attribute on another.
Whether on linear diagrams with two and a plenty of
curves probably also comparison of numbers in two the greater number of dynamic
lines, and also an establishment of dependence of changes of fluctuations which
occurs in the other number line.
Linear diagrams are made according to
system of rectangular coordinates where the horizontal scale is postponed at
the left – to the right on a line of abscissas (X), and vertical – from below –
upwards on a line which is called as ordinate (Y). The obligatory requirement
of construction of any schedule is scale, that is the image on drawing should
be reduced, compared with corresponding figures.
Contrast to linear
diagrams which describe dynamics of any process, plane diagrams are used in the
case when it is necessary to represent the statistical phenomena or the facts,
independent one from another.
The most simple example of plane
diagrams is the diagram as rectangular or figures. Digital numbers on plane
diagrams average represented by geometrical figures – rectangular, squares.
These diagrams are used for demonstration and popularization of the resulted
data, and also in cases if it is necessary to represent structure of the
phenomenon on one of the moments of supervision.
For example, age type fallen ill or
structure of disease in any settlement.
Fig. 2.2 Age
structure of the population (the part of each age layer was determined to all
population).
In long-pillar diagrams digital numbers are
represented by rectangular columns with an identical basis and different
height.
The height of a rectangular corresponds to the
relative value of the phenomenon which is studied. For construction a
long-pillar diagram we use a scale according to which it is possible to
determine the height of each column.
Long-pillar diagrams serve for comparison of
several sizes. It is possible to rectangular which represent sizes, it is
possible to place also on the plane diagram not on a vertical, and across and
then there will be a tape diagram (Fig.4). In some
cases the image of sizes as tapes (stirs) is more convenient, than as columns
because it is easier to sign with each tape by a horizontal inscription.
With the aid of column and
tape diagrams it is possible not only to compare different sizes, but also
simultaneously to display structure of these numbers and to compare their
parts. For example, long-pillar or tape diagrams which show distribution of
diseases on the basic nosological forms, it is
possible to show also percent of diseases among men and women.
For this purpose it is necessary (a figure or a
tape) to divide each rectangular for two parts, any of which will correspond to
digital number of disease among men and women.
In circular diagrams they use to display ratio of
homogeneous absolute sizes.
They don’t use the area of a rectangular, but the
area of a circle.
But it is necessary to remember, that the areas of
circles match up one another as squares of their radiuses, therefore at
construction of circular diagrams we must extract off the diagram sizes and on
this basis to construct radius, and having radius, it is easy to describe a
circle.
In a case if the circular diagram displays parts of
the whole, it is necessary to display circles not separately one from another,
and to impose against each other. The whole is possible also and its parts to
submit as the circle divided on sectors – the sector diagram. At construction
of the sector diagram all area of a circle is accepted for 100 %, and each
sector occupies is the following part of the area which correspond to the
necessary percent.
In practice for construction of sector diagrams it
is possible to use not only the area of a circle, but also the area of a square
and a rectangular.
Nevertheless, often it happens to divide is the
following figures is more hard, than a circle and consequently they are rather
seldom used as a basis of sector diagrams.
Radial or linear – circular diagram (Fig.2.3) are constructed on the basis of number coordinates
in which the radius replaces vertical scale of diagrams which are based on
system of rectangular coordinates.
The example of the radial diagram is a
wind rose with the aid of which we represent on maps the change of a direction
of a wind during any calendar period of time (month, year).
Radial diagrams are used for an
illustration of seasonal fluctuations of any numbers, for example diseases or
mortality rates.
These diagrams are constructed on a circle which
center has12 radiuses. Each radius saws from a circle
an arch in 30 (360/12=30) also represents ordinate of one of calendar months:
January, March, etc.
As an initial zero point they
take the center of a circle, and then on radiuses according to the scale chosen
before render numbers which display intensity of seasonal fluctuations of the
phenomenon in any of calendar months.
Having connected the marked points, we receive the
closed line which enables to imagine seasonal fluctuations.
When building radial diagrams, it is necessary to
remember a rule of calculation of radiuses from the top part of the diagram and
in other words.
Fig. 2.3 The
radial diagram.
Comparisons of the different phenomena according to
a territorial attribute cartograms are built, if necessary. They represent
geographical maps, on which with the aid of graphic symbols where the intensity
of distribution and grouping of the phenomenon ( morbidity, mortality, etc.)
for any period of time ( Fig. 2.4) is shown.
Therefore they are better for building on
simplified maps on which only administrative frontiers and some big settlements
are shown. At construction of a cartogram the great value has grouping the
phenomena which are displayed.
The most simple grouping is division of some
parameters on group with parameters below average and group with parameters is
higher than average. According to this division regions districts with
parameters than will be shaded on a cartogram and below average – not shaded.