Business Statistics

Business Statistics

Key:
v - Square Root
? - Sigma


Define Statistics.
By statistics we mean aggregates of facts affected to a marked extent by multiplicity of causes, numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a pre-determined purpose and placed in relation to each other.

Statistics may also be defined as the science of collection, organization, presentation, analysis and interpretation of numerical data.


Functions of statistics?
1. It presents facts in a definite form.
2. It simplifies mass of figures.
3. It facilitates comparison.
4. It helps in formulation and testing hypothesis.
5. It helps in prediction.
6. It helps in the formulation of suitable policies.


What are the main reasons for distrust in statistics?
1. Figures are convincing and, therefore, people are easily led to believe them.
2. They can be manipulated in such a manner as to establish foregone conclusions.
3. Even if correct figures are used they may be presented in such a manner that the reader is misled.


What is a unit?

The unit in terms of which the investigator counts or measures the variables or attributes selected for enumeration, analysis and interpretation is known as a ‘statistical unit.’ For example, in a population census the statistical unit is a person. Similarly, if the number of houses in a particular area is counted, then the unit is the house.


What are the units of collection?
These are those in terms of which data are collected. They involve either counting or measurement - the former being employed in the case of physical items and the latter in respect of qualitative attributes. In the process of collection, therefore, one may deal with either discrete entities and events relating to them as in the case of persons, houses, livestock, number of accidents, and number of deaths, or with measurable quantities and value units, such as tonnes, kilograms, litres.
 


What is frame?
The term ‘frame’ or population frame refers to listing of all units in the population under study. The identification of the unit in a population under study is often a difficult task. If we want to find out the capital invested and number of workers working in small-scale industries in Delhi, we must have a complete list of names and addresses of all the small-scale firms. The list of names and addresses will be called a frame. The whole structure of enquiry is to a considerable extent determined by the frame.


What is primary data? What are the sources of primary data?
Primary data are obtained by a study specifically designed to fulfil the data needs of the problem at hand. Such data are original in character and are generated in large number of surveys conducted mostly by government and also by some individuals, institutions and research bodies. For example, data obtained in a population census by the office of the Registrar General and Census Commissioner, Ministry of Home Affairs, are primary data.


What is secondary data? What are the sources of secondary data?
Data, which are not originally collected but rather obtained from published or unpublished sources, are known as secondary data. For example, for the office of the Registrar General and Census Commissioner the census data are primary whereas for all other, who use such data, they are secondary. The secondary data constitute the chief material on the basis of which statistical work is carried out in many investigations.


When should we choose primary or secondary data?
The investigator must decide at the outset whether he will use primary data or secondary data in an investigation. The choice between the two depends mainly on the following considerations:
1.
Nature and scope of the enquiry
2.
Availability of financial resources
3.
Availability of time
4. Degree of accuracy desired, and

5. The collecting agency, i.e., whether an individual, an institution or a government body.


What is census?
Under the census or complete enumeration survey method, data are collected for each and every unit (person, household, field, shop, factory, etc., as the case may be) of the population or universe, which is the complete set of items, which are of interest in any particular situation.


What does sampling mean?
Sampling is simply the process of learning about the population on the basis of sample drawn from it. Thus, in the sampling technique instead of every unit of the universe only a part of the universe is studied and the conclusions are drawn on that basis for the entire universe. A sample is a subset of population units.


Define the law of statistical regularity.
The law of statistical regularity lays down that a moderately large number of items chosen at random from a large group are almost sure on the average to possess the characteristics of the large group.


Define the law of inertia of large numbers.
It states that, other things being equal, larger the size of the sample, more accurate the results are likely to be. This is because large numbers are more stable as compared to small ones.


How do you classify data?
The data can be classified on the following four bases:
1. Geographical, i.e., area-wise, e.g., cities, districts, etc.
2. Chronological, i.e., on the basis of time.
3. Qualitative, i.e., according to some attributes.
4. Quantitative, i.e., in terms of magnitudes.


What are class limits?
The class limits are the lowest and the highest values that can be included in the class. For example, take the class 20-40. The lowest value of the class is 20 and the highest 40. The two boundaries of class are known as the lower limit and the upper limit of the class.


Mention the parts of the table.
Table number, title of the table, caption, stub, body of the table, head-note, footnote.


Give a brief note on rectangle.
Since the area of the rectangle is equal to the product of its length and width, while constructing a rectangle both length and breadth are considered.


What is false base line/ broken line?
In order to accommodate smaller values to a very big value in the same graph we break the x-axis and y-axis. This broken line is known as false base line.


What is frequency polygon and frequency curve?
Frequency polygon is a graph of frequency distribution; the centre of each bar is connected using the scale. Such type of representation is known as frequency polygon. Without using scale if the centre of each of the bar of the histogram is connected, it is known as frequency curve.


What do you mean by Ogive?
Ogive is the graphical representation of finding medians, quartiles, percentiles and deciles. It is also known as cumulative frequency curve. It has two types:
1. Less than and
2.
More than cumulative frequency curves.
It is also useful to compare two or more frequency distributions.


What are the requisites of a good average?
Easy to understand, simple to compute, based on all the items, not unduly affected by extreme values, rigidly defined, capable of further algebraic treatment, and sampling stability.


What are the advantages & disadvantages of median?
Advantages:
1. Median is not affected by extreme values.
2. For open-ended classes we can calculate median.

3. It is most appropriate average in dealing with qualitative data.

Disadvantages:
1.
It does not consider all variables because it is a positional average.
2.
It is not capable of further algebraic treatment.
3. The value of median is affected more by sampling fluctuations.


Give the merits and demerits of mode.
Merits:
1. Mode is the most representative value of distribution, it is useful to calculate model wage.
2. For open-ended distributions we can calculate mode.

3. We can also calculate mode by using graph.

Limitations:
1. Mode cannot be calculated when frequency distribution is ill-defined.
2. It is not based on all observations.
3.
It is not rigidly defined measure because several formulae to calculate mode is used.


What is the relationship between mean, median and mode? Give the formula?

In a normal distribution Mean = Median = Mode
In an asymmetrical distribution median is always in the middle but mean and mode will interchange their positions or values.
Mode = 3 Median - 2 Mean


Define geometric mean and mention its disadvantages and uses.
It is defined as nth root of the product of n items or values.
i.e., G.M. = anti log of 1/n (log x1 + log x2 + log x3 + ……… + log xn)

It is difficult to understand, difficult to compute, it cannot be computed when one of the values is 0 or negative.

It is used to find average growth rate, construction of index numbers, when large weight has to be given to small items or small weight to large items the best average is geometric mean.


Define harmonic mean.
It is defined as the reciprocal of the arithmetic mean of the reciprocal of the individual observations.

   
N
H.M.  = 

   
( 1/x1 + 1/x2 + 1/x3 + ........ +1/xn)


Uses of harmonic mean?
If there are two measurements taken together to measure a variable we use harmonic mean. Example, tonne mileage, speed per hour, passenger kilometre. In the above example tonne mileage, tonne is one measurement and mileage is another measurement. We use this average to calculate average speed.


Limitations of harmonic mean?
It gives largest weight to smallest items. Its value cannot be computed when there are both positive negative items in a series or when one or more items are zero.


What is the relation between arithmetic mean, geometric mean and harmonic mean?
A.M. >= G.M. >= H.M.
G.M. = (vARITHMETIC MEAN * HARMONIC MEAN)


What is dispersion?
Dispersion is the measure of variation of items; it measures the extent to which the items vary from central value. It is also known as average of the second order. It includes range, mean deviation, quartile deviation, and standard deviation.


Limitations of range?
It is not based on each and every item of the distribution. It is subject to fluctuations of considerable magnitude from sample to sample. Range cannot be computed in open-ended distributions.


What are the uses of range?
Useful in weather forecasting, i.e., maximum and minimum temperature, quality control and fluctuation in share prices.


Define standard deviation.
It is one of the measures of studying dispersion and defined as square root of the mean of the squared deviations from the respective arithmetic mean.


Distinguish between mean deviation and standard deviation.
1. Algebraic signs are ignored while calculating mean deviations whereas standard deviation takes into account algebraic sign also.
2. Mean deviation can be computed either from median or mean whereas standard deviation is computed always from arithmetic mean.


What do you mean by co-efficient of variation and mention the uses of co-efficient of variation?
Co-efficient of variation is the relative measure of dispersion whereas standard deviation is the absolute measure of dispersion. It is useful to compare one or two variables and to study the variability, consistency, reliability and deviations.


What do you mean by skewness?
Skewness refers to asymmetry or lack of symmetry in the shape of the frequency distributions. The measure of skewness tells us the direction and the extent of skewness.


Distinguish between dispersion and skewness.
Dispersion (scalar quantity) is concerned with the amount of variation rather than with its direction whereas skewness tells us about the direction of the variation or departure from symmetry (vector quantity).

Variation is most important characteristics of distributions, whereas skewness is rarely calculated in economics and business series.


What are the tests of skewness?
1. Mean, median and mode don’t coincide.
2. The frequency doesn’t produce normal bell shaped curve.
3.
The distance from median to first quartile is not equal to third quartile to median.
4.
Sum of the positive deviations from the median is not equal to the sum of the negative deviations.


Define correlation.
Correlation is the degree of the relationship between two or more variables. It does not explain the cause behind the relationship.


Indicate the type of correlation in the following situations.
1. -1 (Ans) Perfect negative correlation
2. +1 (Ans) Perfect positive correlation
3. +0.49 (Ans) Low degree positive correlation
4. -0.78 (Ans) Moderate degree negative correlation


Method of Measuring Correlation?
±1 : Perfect correlation
±0.8 to ±0.99 : High degree correlation
±0.5 to ±0.79 : Moderate degree correlation
Below ±0.5 : Low degree correlation
Exactly 0 : No correlation


What is scatter diagram?
It is a graphical representation of finding relationship between two or more variables. Here we take independent variable on the x-axis and dependent variable on the y-axis and plot the various values of x and y on the graph and see the direction in which all values move on the graph. If all values move upwards we say positive correlation, if they move downwards we say negative correlation.


What is spurious correlation or nonsense correlation?
Mathematically we can establish a perfect correlation between two variables but in reality there shall be no correlation. Such type of correlation is known as spurious correlation. E.g. Size of the shoe and intelligence of a person.


What is probable error?
The probable error explains the reliability of the co-efficient of correlations that are taken from samples and used. Such correlation result to other samples that are taken from the same population. If the value of r is less than probable error there is no evidence of correlation. If r is more than 6 times of probable error the correlation is certain. r + probable error indicates the upper limit and lower limit of likely correlation of other samples taken from same population.

P.E. = 0.6745 (1- r 2 ) / vn


What is regression analysis?
Regression is the measure of the average relationship between two or more variables in terms of the original unit of the data. It is one of the forecasting techniques to predict the dependent variable due to change in the independent variable.


Distinguish between regression and correlation.
1. Correlation explains the degree of relationship, whereas regression explains the nature of the relationship.
2. Correlation does not explain the cause behind the relationship whereas regression studies the cause and effect relationship.
3. Correlation co-efficient is independent of change of scale and origin, whereas regression is independent of change of origin but not of scale.


What are the characteristics of regression co-efficients?
1. Both regression co-efficients will have the same sign.
2. If one regression co-efficient is above unity, then the other regression co-efficient should be below unity.
3. If both the regression co-efficient are negative, correlation co-efficient should be negative.
4. Regression co-efficients are independent of change of origin but not of scale.


What do you mean by index numbers?
Index number is an indicator of changes in prices and quantities. It is a measure designed to show changes in one variable or in the group of related variables over time or with respect to geographical location or other characteristics. It is also an indicator of inflationary or deflationary tendencies.


Mention few uses of index numbers.
1. They reveal trends and tendencies.
2. They are important in forecasting future economic activities.
3. Index number is used by government to calculate Dearness allowance and to avoid industrial disputes based on index numbers.
4. The real worth of money can be identified by deflating the actual money.


What are the problems in the constructions of index numbers?
1. The purpose of index to be decided.
2. Deciding the base year.
3. Selection of number of items.
4. Selection of price (wholesale/ retail).
5. Choice of average (simple/ geometric average).
6. Selection of appropriate weights.
7. Selection of appropriate formula (Fisher’s/ Laaspyr’s).


What is time reversal test?
The test is that the formula for calculating the index number should be such that it will give the same ratio between one point of comparison and the other, no matter which of the two is taken as base. Symbolically P01 * P10 = 1


What is factor reversal test?
Just as each formula should permit the interchange of the two items without giving inconsistent results, so it ought to permit interchanging prices and quantities without giving inconsistent results, i.e. the two results multiplied together should give the true value ratio. In other words, the change in price multiplied by change in quantity should be equal to the total change in value. P01 * Q01 = ?p1q1 / ?p0q0


Define time series.
Time series may be defined as collection of magnitudes belonging to different time periods of some variable of composite of variables such as production of steel, per capita income, GNP or industrial production.


What are the utilities of time series analysis?
1. It helps in understanding past behaviour and is useful for prediction of future.
2. It facilitates comparison.
3. The various components of time series are useful to study the effective change under each component.
4. The reasons for variation can be studied by comparing actual with expected results.


What are the components of time series?
1. Secular trend
2. Seasonal variation
3. Cyclical variation
4. Irregular variation or erratic movement


What is secular trend?
Secular trend is a long term trend which has the basic tendency to grow or decline over a period of time. It may be due to population change, technological progress, large scale shifts in consumer tastes, discovery of new things, etc.


What is seasonal variation?
Seasonal variations are those periodic movements in business activity, which occur regularly every year and have their origin in the nature of the year itself. It may be due to climate, weather conditions, customs, traditions and habits, festivals, etc.


What is cyclical variation?
The term cycle refers to the recurrent variations in time series that usually last longer than a year and are regular neither in amplitude nor in length. Cyclical fluctuations are long-term movements that represent consistently recurring rises and declines in activity. It has four important characteristics:
1. Prosperity
2. Decline
3. Depression
4. Improvement


What is irregular variation or erratic movement?
It is the variation in business activities, which do not repeat in a definite pattern. Floods, earthquakes, strikes and wars cause it.


What is least square method of trend?
The sum of the squares of the deviations taken from the respective arithmetic means is the least. Based on the above principle we fit a trend line. Such line is known as least square trend line. That is S (Y - YC) = 0 and S (Y - YC) 2 is least.


What is interpolation?
Interpolation is the method of finding the value within the given scope of the series. Example,

Year
1993
1994
1995
1996
1997
Production
10
20
?
35
50

The value of 1995 can be calculated by using certain methods. Such calculation is known as interpolation. Because it is within the scope of 1993-1997.


What is extrapolation?
The method of finding the value outside the scope of the given series is known as extrapolation. Suppose in the previous example if we calculate the value for 1999 or 1992, which goes beyond the scope of the series of 1993-1997, such calculation is known as extrapolation.


Distinguish between interpolation and extrapolation.
Refer to the answers above


What are the assumption of interpolation and extrapolation?
There are no sudden jumps in the series from one period to another and depicts some sort of continuity and also there is no sudden change in the future too. The rate of change of figures from one period to another is uniform.


Expand (y - 1) n = yn - n yn - 1 +   n (n - 1) yn - 2  -  n (n - 1)(n - 2) yn - 3 + … = 0 
 
 
 
  2!   3!  


Problem 1: Calculate G.M. for 9 and 16
GM = v9 * 16 = v144 = 12.


Problem 2: Calculate H.M. if arithmetic mean = 25, G.M.=20.
Ans. G.M. = vA.M. * H.M.

  (G.M.)  2
H.M. = 
 
  A.M.  

  (20)  2
H.M. = 
 
  25  

  400
H.M. = 
  25

H.M. = 16.


Problem 3: Calculate correlation co-efficient, if bxy = -1.2, byx = -0.075
Ans. r = vbxy * byx = v-1.2 * -0.075 = v0.09 = 0.3

Note: Since both the regression co-efficients are negative, correlation co-efficient should be negative. Therefore r = -0.3


Problem 4: If C.V. = 20% and mean = 10, calculate standard deviation and variance.
Ans. C.V. = S.D. * 100/MEAN
S.D. = C.V. * MEAN/100 = 20 * 10/100 = 2

Variance = S.D. 2  = 2 2  = 4


Problem 5:
Calculate quartile deviation and co-efficient of quartile deviation if Q1=20, Q3 = 40
Ans. Q.D. = Q3-Q1/2 = 40-20/2 = 10

Co-efficient of quartile deviation = Q3-Q1/Q3+Q1 = 40-20/40+20 = 10/30 = 0.3333