Key:
v - Square Root
? - Sigma
Define Statistics.
By statistics we mean aggregates of facts affected to a marked extent by multiplicity
of causes, numerically expressed, enumerated or estimated according to reasonable
standards of accuracy, collected in a systematic manner for a pre-determined
purpose and placed in relation to each other.
Statistics may also be defined
as the science of collection, organization, presentation, analysis and
interpretation of numerical data. Functions of statistics? 1. It presents facts in a definite form. 2. It simplifies mass of figures. 3. It facilitates comparison. 4. It helps in formulation and testing hypothesis. 5. It helps in prediction. 6. It helps in the formulation of suitable policies. What are the main reasons for distrust in statistics? 1. Figures are convincing and, therefore, people are easily led to believe them. 2. They can be manipulated in such a manner as to establish foregone conclusions. 3. Even if correct figures are used they may be presented in such a manner that the reader is misled. What is a unit? The unit in terms of which the investigator counts or measures the variables or attributes selected for enumeration, analysis and interpretation is known as a ‘statistical unit.’ For example, in a population census the statistical unit is a person. Similarly, if the number of houses in a particular area is counted, then the unit is the house. What are the units of collection? These are those in terms of which data are collected. They involve either counting or measurement - the former being employed in the case of physical items and the latter in respect of qualitative attributes. In the process of collection, therefore, one may deal with either discrete entities and events relating to them as in the case of persons, houses, livestock, number of accidents, and number of deaths, or with measurable quantities and value units, such as tonnes, kilograms, litres. |
What is frame?
The term ‘frame’ or population frame refers to listing of all units
in the population under study. The identification of the unit in a population
under study is often a difficult task. If we want to find out the capital invested
and number of workers working in small-scale industries in Delhi, we must have
a complete list of names and addresses of all the small-scale firms. The list
of names and addresses will be called a frame. The whole structure of enquiry
is to a considerable extent determined by the frame.
What is primary data? What are the sources of primary data?
Primary data are obtained by a study specifically designed to fulfil the data
needs of the problem at hand. Such data are original in character and are generated
in large number of surveys conducted mostly by government and also by some individuals,
institutions and research bodies. For example, data obtained in a population
census by the office of the Registrar General and Census Commissioner, Ministry
of Home Affairs, are primary data.
What is secondary data? What are the sources of secondary data?
Data, which are not originally collected but rather obtained from published
or unpublished sources, are known as secondary data. For example, for the office
of the Registrar General and Census Commissioner the census data are primary
whereas for all other, who use such data, they are secondary. The secondary
data constitute the chief material on the basis of which statistical work is
carried out in many investigations.
When should we choose primary or secondary data?
The investigator must decide at the outset whether he will use primary data
or secondary data in an investigation. The choice between the two depends mainly
on the following considerations:
1. Nature
and scope of the enquiry
2. Availability
of financial resources
3. Availability
of time
4. Degree of accuracy desired,
and
5. The collecting agency, i.e.,
whether an individual, an institution or a government body.
What is census?
Under the census or complete enumeration survey method, data are collected for
each and every unit (person, household, field, shop, factory, etc., as the case
may be) of the population or universe, which is the complete set of items, which
are of interest in any particular situation.
What does sampling mean?
Sampling is simply the process of learning about the population on the basis
of sample drawn from it. Thus, in the sampling technique instead of every unit
of the universe only a part of the universe is studied and the conclusions are
drawn on that basis for the entire universe. A sample is a subset of population
units.
Define the law of statistical regularity.
The law of statistical regularity lays down that a moderately large number of
items chosen at random from a large group are almost sure on the average to
possess the characteristics of the large group.
Define the law of inertia of large numbers.
It states that, other things being equal, larger the size of the sample, more
accurate the results are likely to be. This is because large numbers are more
stable as compared to small ones.
How do you classify data?
The data can be classified on the following four bases:
1. Geographical, i.e., area-wise,
e.g., cities, districts, etc.
2. Chronological, i.e., on the
basis of time.
3. Qualitative, i.e., according
to some attributes.
4. Quantitative, i.e., in terms
of magnitudes.
What are class limits?
The class limits are the lowest and the highest values that can be included
in the class. For example, take the class 20-40. The lowest value of the class
is 20 and the highest 40. The two boundaries of class are known as the lower
limit and the upper limit of the class.
Mention the parts of the table.
Table number, title of the table, caption, stub, body of the table, head-note,
footnote.
Give a brief note on rectangle.
Since the area of the rectangle is equal to the product of its length and width,
while constructing a rectangle both length and breadth are considered.
What is false base line/ broken line?
In order to accommodate smaller values to a very big value in the same graph
we break the x-axis and y-axis. This broken line is known as false base line.
What is frequency polygon and frequency curve?
Frequency polygon is a graph of frequency distribution; the centre of each bar
is connected using the scale. Such type of representation is known as frequency
polygon. Without using scale if the centre of each of the bar of the histogram
is connected, it is known as frequency curve.
What do you mean by Ogive?
Ogive is the graphical representation of finding medians, quartiles, percentiles
and deciles. It is also known as cumulative frequency curve. It has two types:
1. Less than and
2. More
than cumulative frequency curves.
It is also useful to compare two or more frequency distributions.
What are the requisites of a good average?
Easy to understand, simple to compute, based on all the items, not unduly affected
by extreme values, rigidly defined, capable of further algebraic treatment,
and sampling stability.
What are the advantages & disadvantages of median?
Advantages:
1.
Median is not affected by extreme values.
2. For open-ended classes we can
calculate median.
3. It is most
appropriate average in dealing with qualitative data.
Disadvantages:
1. It
does not consider all variables because it is a positional average.
2. It
is not capable of further algebraic treatment.
3. The
value of median is affected more by sampling fluctuations.
Give the merits and demerits of mode.
Merits:
1. Mode is the most
representative value of distribution, it is useful to calculate model wage.
2. For open-ended distributions
we can calculate mode.
3. We
can also calculate mode by using graph.
Limitations:
1. Mode cannot be
calculated when frequency distribution is ill-defined.
2. It is not based on
all observations.
3. It
is not rigidly defined measure because several formulae to calculate mode is
used.
What is the relationship between mean, median and mode? Give the formula?
In a normal distribution Mean = Median = Mode
In an asymmetrical distribution median is always in the middle but mean and
mode will interchange their positions or values.
Mode = 3 Median - 2 Mean
Define geometric mean and mention its disadvantages and uses.
It is defined as nth root of the product of n items or values.
i.e., G.M. = anti log of 1/n (log x1 + log x2
+ log x3 + ……… + log xn)
It is difficult to understand, difficult to compute, it cannot be computed when one of the values is 0 or negative.
It is used to find average growth rate,
construction of index numbers, when large weight has to be given to small items
or small weight to large items the best average is geometric mean.
Define harmonic mean.
It is defined as the reciprocal of the arithmetic mean of the reciprocal of
the individual observations.
|
N |
||
| H.M. | = |
|
|
( 1/x1 + 1/x2
+ 1/x3 + ........ +1/xn) |
Uses of harmonic mean?
If there are two measurements taken together to measure a variable we use harmonic
mean. Example, tonne mileage, speed per hour, passenger kilometre. In the above
example tonne mileage, tonne is one measurement and mileage is another measurement.
We use this average to calculate average speed.
Limitations of harmonic mean?
It gives largest weight to smallest items. Its value cannot be computed when
there are both positive negative items in a series or when one or more items
are zero.
What is the relation between arithmetic mean, geometric mean and harmonic
mean?
A.M. >= G.M. >= H.M.
G.M. = (vARITHMETIC MEAN * HARMONIC MEAN)
What is dispersion?
Dispersion is the measure of variation of items; it measures the extent to which
the items vary from central value. It is also known as average of the second
order. It includes range, mean deviation, quartile deviation, and standard deviation.
Limitations of range?
It is not based on each and every item of the distribution. It is subject to
fluctuations of considerable magnitude from sample to sample. Range cannot be
computed in open-ended distributions.
What are the uses of range?
Useful in weather forecasting, i.e., maximum and minimum temperature, quality
control and fluctuation in share prices.
Define standard deviation.
It is one of the measures of studying dispersion and defined as square root
of the mean of the squared deviations from the respective arithmetic mean.
Distinguish between mean deviation and standard deviation.
1. Algebraic signs are ignored
while calculating mean deviations whereas standard deviation takes into account
algebraic sign also.
2. Mean deviation can be computed
either from median or mean whereas standard deviation is computed always from
arithmetic mean.
What do you mean by co-efficient of variation and mention the uses
of co-efficient of variation?
Co-efficient of variation is the relative measure of dispersion whereas standard
deviation is the absolute measure of dispersion. It is useful to compare one
or two variables and to study the variability, consistency, reliability and
deviations.
What do you mean by skewness?
Skewness refers to asymmetry or lack of symmetry in the shape of the frequency
distributions. The measure of skewness tells us the direction and the extent
of skewness.
Distinguish between dispersion and skewness.
Dispersion (scalar quantity) is concerned with the amount of variation rather
than with its direction whereas skewness tells us about the direction of the
variation or departure from symmetry (vector quantity).
Variation is most important characteristics
of distributions, whereas skewness is rarely calculated in economics and business
series.
What are the tests of skewness?
1. Mean, median
and mode don’t coincide.
2. The frequency doesn’t
produce normal bell shaped curve.
3. The
distance from median to first quartile is not equal to third quartile to median.
4. Sum
of the positive deviations from the median is not equal to the sum of the negative
deviations.
Define correlation.
Correlation is the degree of the relationship between two or more variables.
It does not explain the cause behind the relationship.
Indicate the type of correlation in the following situations.
1. -1 (Ans) Perfect negative
correlation
2. +1 (Ans) Perfect positive
correlation
3. +0.49 (Ans) Low degree
positive correlation
4. -0.78 (Ans) Moderate
degree negative correlation
Method of Measuring Correlation?
±1 : Perfect correlation
±0.8 to ±0.99 : High degree correlation
±0.5 to ±0.79 : Moderate degree correlation
Below ±0.5 : Low degree correlation
Exactly 0 : No correlation
What is scatter diagram?
It is a graphical representation of finding relationship between two or more
variables. Here we take independent variable on the x-axis and dependent variable
on the y-axis and plot the various values of x and y on the graph and see the
direction in which all values move on the graph. If all values move upwards
we say positive correlation, if they move downwards we say negative correlation.
What is spurious correlation or nonsense correlation?
Mathematically we can establish a perfect correlation between two variables
but in reality there shall be no correlation. Such type of correlation is known
as spurious correlation. E.g. Size of the shoe and intelligence of a person.
What is probable error?
The probable error explains the reliability of the co-efficient of correlations
that are taken from samples and used. Such correlation result to other samples
that are taken from the same population. If the value of r is less than probable
error there is no evidence of correlation. If r is more than 6 times of probable
error the correlation is certain. r + probable error indicates the upper limit
and lower limit of likely correlation of other samples taken from same population.
| P.E. = 0.6745 (1- | r | 2 | ) / vn |
What is regression analysis?
Regression is the measure of the average relationship between two or more variables
in terms of the original unit of the data. It is one of the forecasting techniques
to predict the dependent variable due to change in the independent variable.
Distinguish between regression and correlation.
1. Correlation explains the degree
of relationship, whereas regression explains the nature of the relationship.
2. Correlation does not explain
the cause behind the relationship whereas regression studies the cause and effect
relationship.
3. Correlation co-efficient is
independent of change of scale and origin, whereas regression is independent
of change of origin but not of scale.
What are the characteristics of regression co-efficients?
1. Both regression co-efficients
will have the same sign.
2. If one regression co-efficient
is above unity, then the other regression co-efficient should be below unity.
3. If both the regression co-efficient
are negative, correlation co-efficient should be negative.
4. Regression co-efficients are
independent of change of origin but not of scale.
What do you mean by index numbers?
Index number is an indicator of changes in prices and quantities. It is a measure
designed to show changes in one variable or in the group of related variables
over time or with respect to geographical location or other characteristics.
It is also an indicator of inflationary or deflationary tendencies.
Mention few uses of index numbers.
1. They reveal trends and tendencies.
2. They are important in forecasting
future economic activities.
3. Index number is used by government
to calculate Dearness allowance and to avoid industrial disputes based on index
numbers.
4. The real worth of money can
be identified by deflating the actual money.
What are the problems in the constructions of index numbers?
1. The purpose of index to be
decided.
2. Deciding the base year.
3. Selection of number of items.
4. Selection of price (wholesale/
retail).
5. Choice of average (simple/
geometric average).
6. Selection of appropriate weights.
7. Selection of appropriate formula
(Fisher’s/ Laaspyr’s).
What is time reversal test?
The test is that the formula for calculating the index number should be such
that it will give the same ratio between one point of comparison and the other,
no matter which of the two is taken as base. Symbolically P01
* P10 = 1
What is factor reversal test?
Just as each formula should permit the interchange of the two items without
giving inconsistent results, so it ought to permit interchanging prices and
quantities without giving inconsistent results, i.e. the two results multiplied
together should give the true value ratio. In other words, the change in price
multiplied by change in quantity should be equal to the total change in value.
P01 * Q01 = ?p1q1
/ ?p0q0
Define time series.
Time series may be defined as collection of magnitudes belonging to different
time periods of some variable of composite of variables such as production of
steel, per capita income, GNP or industrial production.
What are the utilities of time series analysis?
1. It helps in understanding past
behaviour and is useful for prediction of future.
2. It facilitates comparison.
3. The various components of time
series are useful to study the effective change under each component.
4. The reasons for variation can
be studied by comparing actual with expected results.
What are the components of time series?
1. Secular trend
2. Seasonal variation
3. Cyclical variation
4. Irregular variation or erratic
movement
What is secular trend?
Secular trend is a long term trend which has the basic tendency to grow or decline
over a period of time. It may be due to population change, technological progress,
large scale shifts in consumer tastes, discovery of new things, etc.
What is seasonal variation?
Seasonal variations are those periodic movements in business activity, which
occur regularly every year and have their origin in the nature of the year itself.
It may be due to climate, weather conditions, customs, traditions and habits,
festivals, etc.
What is cyclical variation?
The term cycle refers to the recurrent variations in time series that usually
last longer than a year and are regular neither in amplitude nor in length.
Cyclical fluctuations are long-term movements that represent consistently recurring
rises and declines in activity. It has four important characteristics:
1. Prosperity
2. Decline
3. Depression
4. Improvement
What is irregular variation or erratic movement?
It is the variation in business activities, which do not repeat in a definite
pattern. Floods, earthquakes, strikes and wars cause it.
What is least square method of trend?
The sum of the squares of the deviations taken from the respective arithmetic
means is the least. Based on the above principle we fit a trend line. Such line
is known as least square trend line. That is S (Y - YC) = 0 and S (Y - YC) 2
is least.
What is interpolation?
Interpolation is the method of finding the value within the given scope of the
series. Example,
| Year |
1993 |
1994 |
1995 |
1996 |
1997 |
| Production |
10 |
20 |
? |
35 |
50 |
The value of 1995 can be calculated by
using certain methods. Such calculation is known as interpolation. Because it
is within the scope of 1993-1997.
What is extrapolation?
The method of finding the value outside the scope of the given series is known
as extrapolation. Suppose in the previous example if we calculate the value
for 1999 or 1992, which goes beyond the scope of the series of 1993-1997, such
calculation is known as extrapolation.
Distinguish between interpolation and extrapolation.
Refer to the answers above
What are the assumption of interpolation and extrapolation?
There are no sudden jumps in the series from one period to another and depicts
some sort of continuity and also there is no sudden change in the future too.
The rate of change of figures from one period to another is uniform.
| Expand (y - 1) n = yn - n yn - 1 + | n (n - 1) yn - 2 | - | n (n - 1)(n - 2) yn - 3 | + … = 0 |
|
|
|
|||
| 2! | 3! |
Problem 1: Calculate G.M. for 9 and 16
GM = v9 * 16 = v144 = 12.
Problem 2: Calculate H.M. if arithmetic mean = 25, G.M.=20.
Ans. G.M. = vA.M. * H.M.
| (G.M.) | 2 | |
| H.M. = | |
|
| A.M. |
| (20) | 2 | |
| H.M. = | |
|
| 25 |
| 400 | |
| H.M. = |
|
| 25 |
H.M. = 16.
Problem 3: Calculate correlation co-efficient, if bxy
= -1.2, byx = -0.075
Ans. r = vbxy * byx = v-1.2 * -0.075
= v0.09 = 0.3
Note: Since both the regression
co-efficients are negative, correlation co-efficient should be negative. Therefore
r = -0.3
Problem 4: If C.V. = 20% and mean = 10, calculate standard
deviation and variance.
Ans. C.V. = S.D. * 100/MEAN
S.D. = C.V. * MEAN/100 = 20 * 10/100 = 2
| Variance = S.D. | 2 | = 2 | 2 | = 4 |
Problem 5: Calculate quartile deviation and co-efficient of quartile
deviation if Q1=20, Q3 = 40
Ans. Q.D. = Q3-Q1/2 = 40-20/2 =
10
Co-efficient of quartile deviation = Q3-Q1/Q3+Q1
= 40-20/40+20 = 10/30 = 0.3333