This article will explain the process of an statistic problem and will try to provide the solution statistically.
Before Reading this article I would advice reader to read this article, it consists of basis statistics concepts.
Every business face some challenges in its operations. All the operations are interrelated and dependent on each others. To solve problems and challenges of the business, researchers understand the problem and gather the data. As we know that all statistical problem starts with the data. Lets understand few terminologies
- Population: It is the total cases, and items for the study. For example: We want to study the interest of the people of a particular city for a product. Here the city can be population.
- Sample: It is statistical investigation on the basis of examining a part of the population. Example: People of the city.
In statistics, population is the aggregate of objects, animate or inanimate, under study in any statistical investigation. In sampling, the population means the larger group from which the samples drawn.
The statistical measurements like mean, variance, skewness, kurtosis, correlation etc. for the population known as Parameter, same these statistical measurements for the sample known as statistics.
Parametric test depend on the statistical distribution but non parametric test doesn't depend on the statistical distribution.
The set of the values of the statistics for each sample constitutes sampling distribution. It is the sample drawn from the population and sample’s statistics constants like mean and variance.
The standard error of the sampling distribution of statistic is known as Standard Error.
There are different types of sampling:
- Simple random sampling
- Stratified random sampling
- Multistage Sampling
- Area Sampling
- Simple cluster sampling
- Quota sampling
- Quasi Random Sampling
- Systematic Sampling
Sampling based on the probability is known as random sampling.
Central Limit theorem: The central limit theorem states that if you have a population with mean μ and standard deviation σ and take sufficiently large random samples from the population with replacement, then the distribution of the sample means will be approximately normally distributed.
After generating sample from population, our main statistical activity will be setting up the hypothesis. Hypothesis is assumptions about the population.
The maximum size of the type I error we prepare to risk is level of significance.
We always want to prove the alternate hypothesis AND we never accept the null hypothesis.
You can use the following rule to formulate the null and alternate hypotheses: • The null hypothesis always has the following signs: = OR ≤ OR ≥
• The alternate hypothesis always has the following signs: ≠ OR > OR
Type of Tests:
- Two- tailed Test: It s a method in which the critical area of a distribution is two-sided and tests whether a sample is greater than or less than a certain range of values. We can use two-tailed test whenever we use ‘=’ in the null hypothesis.
2. One- tailed test: A one-tailed test is a statistical test in which the critical area of a distribution is one-sided so that it is either greater than or less than a certain value, but not both. We can use one -tailed test whenever we use ‘>’ or ‘<’ in the null hypothesis.
PROCEDURE FOR TESTING OF HYPOTHESIS
Z- Test: If the sample size is more than 30, then we use the Z- test.
We can use any one of the following two methods for testing of a statistical hypothesis.
We summarize below the various steps in testing of a statistical hypothesis in a systematic , each of these methods.
- Rejection Region Method
- P-value Testing Method
Method 1. Rejection Region Method
Step 1: Set up the null hypothesis.
Step 2: Set up the alternative hypothesis. Alternative hypothesis will enable us to decide whether we have to use a single- tailed or two tailed.
Step 3: Level of Significance: Choose the appropriate level of significance, depending on the permissible risk before drawing a sample.
Step 4: Identify the sample statistic to be used and its sampling distribution.
Step 5: Test statistic. Define and compute the test statistic under Ho Some of the commonly used distributions in obtaining the test statistic or test criterion are normal.
Step 6: Obtain the critical values and critical region of the test statistics.
Step 7: If the computed value of the test statistic lies in the rejection region, we reject H0 at level significance If the computed value of test statistic lies outside the rejection region, we fail to reject Ho.
Method 2: P-VALUE ESTIMATION METHOD
Steps 1 to 5 and Step 8 are the same as in Method I The Rejection Region Method. There is slight variation in Steps 6 and 7. as given below
Step 6: Find the P-value of the computed test statistic under Ho in Step 5.
Step 7: If P-value <a, we reject Ho at ‘a’ level of significance. If P-value > a, we fail to reject Ho at ‘a’ level of significance.
Chi- Square Test: Chi-squared test of independence — We use the Chi-Square test to determine whether or not there is a significant relationship between two categorical variables.
APPLICATIONS OF THE X²-DISTRIBUTION
Chi-square distribution has a number of applications, some of which are enumerated below
(i) Chi-square test of goodness of fit./
(ii) x²-test for independence of attributes
(iii) To test if the population has a specified value of the variance ⁰².
(iv) To test the equality of several population proportions.
CONDITIONS FOR THE VALIDITY OF CHI-SQUARE TEST
The Chi-square test statistic defined in (18–10) can be used only if the following conditions are satisfied:
1. N. the total frequency, should be reasonably large, say greater than 50. 2. The sample observations should be independent. This implies that no individual item should be included twice or more in the sample.
3. The constraints on the cell frequencies, if any, should be linear (i.e., they should not involve square and higher powers of the frequencies) such as Σ0=ΣE=N. 4. No theoretical frequency should be small. Small is a relative term. Preferably, each theoretical
4. No. theoretical frequency should be larger than 10 but in any case not less than 5. If any theoretical frequency is less than 5. then we cannot apply x²-test as such. In that case we use the technique of ‘Pooling which consists in adding the frequencies which are less than 5 with the preceding or succeeding frequency (frequencies) so that the resulting sum is greater than 5 and adjust for the degrees of freedom accordingly.
5. The given distribution should not be replaced by relative frequencies or proportions but the data should be given in original units.
T- Test: we use t- test is a statistical test when we don’t know the population variance and sample size is less than.
APPLICATIONS OF t- DISTRIBUTION The 1-distribution has a number of applications in Statistics, of which we shall discuss the following in
the coming sections:
(i) 1-test for the significance of single mean, population variance being unknown.
(ii) t-test for the significance of the difference between two sample means, the population variances being equal but unknown.
(iii) t-test for the significance of an observed sample correlation coefficient.
F- Test- F- Statistics is the ratio of two independent chi-square variates divided by their respective degree of freedom.
F-statistic is used when the data is positively skewed and follows an F distribution. F distributions are always positive and skewed right.
Assumptions for F-Test in (19–47)
1. The samples are simple random samples.
2. The samples are independent of each other.
3. The parent populations from which the samples are drawn, are normal.
Uses of F-distribution:
F-distribution has a number of other applications in Statistics, some of which are enumerated below
(i) F-test for testing the significance of an observed sample multiple correlation.
(ii) F-test for testing the significance of an observed sample correlation ratio.
(iii) F-test for testing the equality of several population means. Le, for testing Ho H₁ H₂ = ₁₂ (say), for k normal populations. This is by far the most important application of F-statistic and is done through the technique of Analysis of Variance pioneered by Prof. R.A. Fisher
Relation Between t, F and x² Distributions. We give below (without proof), the relationship between F and x2 (Chi- Square) distributions.
Relation between 1 and F Distributions. If a statistic t follows Student’s t-distribution with n d.f. then its square (t²) follows Snedecor’s F-distribution with (1. n) d.f. Symbolically.
t~ tn => t2~F(1,n)
Anova: Anova- Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed aggregate variability found inside a data set into two parts: systematic factors and random factors.
The analysis of variance is a powerful statistical tool for tests of significance. The test of significance based on t- distribution is an adequate procedure only for testing the significance of the difference between two sample means.
Anova technique enables us to compare several population means simultaneously and thus results in lot of savings in terms of tome and money as compared to several experiments required for comparing two population means at a time.
Its objective is to test the equality of several population means or homogeneity of several independent sample means.
Assumption:
- population from which samples are drawn is normally distributed.
- Samples are independent and random
- Each one of the population has same variance
- all sample has additive variance.
H0 — Mean of all sample are equal
HA — At least one sample mean is equal
F = Variation between sample/ variation within the sample
Variation between samples: It means there are variation between sample group.
Variation within sample: It means variation in a particular sample.
Type of Anova:
- One way Anova — One independent variable influence the all the sample variables
- Two Way Anova — Two independent variable influence the all the sample. One variable will influence row wise and another influence column wise.
Calculation of Anova:
Difference between different types of Hypothesis in test statistics
A researcher need to select one of the test statistics based on various factors, the below table will help to choose the test statistics.
we can choose the best statistical test based on different questions and requirements.
Thank you
Reference: Fundamental of Statistics (S.G Gupta)
Please join over linkedIN