 # Sample size

## Sample size definition Start A/B testing your ideas.
Improve your website and stop guessing.
• Choose the audience
• Apply the change
• See the results in real-time

• The sample size is a term used in market research for defining the number of subjects included in a sample size. By sample size, we understand a group of subjects that are selected from the general population and is considered a representative of the real population for that specific study.

For example, if we want to predict how the population in a specific age group will react to a new product, we can first test it on a sample size that is representative of the targeted population. The sample size, in this case, will be given by the number of people in that age group that will be surveyed.

## Calculation of sample size

The use of statistical formulas for determining the sample size implies, first of all, the choice of a significant benchmark for the measures to be made based on the results provided by the qualitative research to be performed, usually, the researcher has, in this sense, two alternatives:

It can monitor the measurement of variables and determine specific indicators that express their evolution. Thus, the researcher can follow the determination of the frequency of visit of a commercial unit and the appropriate indicator describing this variable to be the weekly average frequency of visiting the group in question, in the specialized literature, the choice of this alternative is designated under the concept of sampling in relation to the variables investigated.

It may be aimed at evaluating specific attributes of the investigated marketing phenomenon. For example, the researcher may pursue the identification of consumers’ preferences for the interior arrangement of a commercial unit, this evaluating a set of representative attributes for the interior design, in the specialized literature, the choice of this alternative is designated under the sampling concept with the investigated characteristics.

## Sample size formula is:

N = population size • e = Margin of error (percentage in decimal form) • z = z-score

Another sample size formula is:

n = N*X / (X + N – 1),

where,

X = Zα/22 ­*p*(1-p) / MOE2,

and Zα/2 is the critical value of the Normal distribution at α/2 (for a confidence level of 95%, α is 0.05 and the critical value is 1.96), MOE is the margin of error, p is the sample proportion, and N is the population size.  Note that a Finite Population Correction has been applied to the sample size formula.

## Sample size process

The sampling size process involves several specific activities, namely:

* defining the population that is the object of the research;

* choosing the sampling size frame;

* choosing the sampling size method;

* establishing the modalities of the selection of the sample size units;

* determining the mother of the sample size;

* choosing the actual units of the sample size;

* conducting field activity.

Defining the target population must be done with great care to avoid either the tendency to choose an unjustified large population or the inclination to select an unjustifiably narrow population. For example, for companies that produce cars, the total population can be represented by the people of the whole country, including children of different ages.

But, the relevant population, which will be the subject of the research, will be made up only of the population over 18 years old. No unjustifiably restricted population such as, for example, the male population between the ages of 25 and 50 can be admitted. This can cover a large part of the car market but excludes some essential segments.

In practice, in the case of random sampling, the sample will be chosen from a list of the population that often differs, to some extent, from the population that is the subject of the research. This list represents the sampling frame or the sampling base because it contains the elements from which the sample is to be constituted.

The establishment of the sample implies the establishment of the sampling unit. The sampling unit is represented by a distinct element or a group of different elements within the investigated population, which can be selected to form the sample. The sampling unit may be a person, a family, a household, a company or a company, a locality, etc. It is necessary to specify that the sampling unit is not always identical with the unit of analysis. For example, in the study of family expenses, the sampling unit may be the home or the household, and the unit of analysis may be a person or a family.

## Important Definitions in research

• Margin of error

The margin of error is the amount of accuracy you need. That is the plus or minus number that is often reported with an estimated percentage and can also be referred to as the confidence interval. It’s the range where the true population ratio is estimated to be and is frequently expressed in percentage points (e.g., ±2 percent ). Be aware after you collect your information will probably be more or less than this goal sum because it’ll be dependent upon the proportion rather than your sample percentage that the precision achieved.

• Confidence Level

The confidence level is the probability that the proportion that is true is contained by the margin of error. In case the study was repeated and each time was calculated by the range, you’d expect the true value to lie inside these ranges on 95 percent of events. The higher the confidence level, the more certain you can be that the interval includes the true ratio.

• Population size

This is the entire number of individuals on your population. In this formula, we use a finite population correction to account for sampling from populations that are small. But you do not know how large you are able to use 100,000 if your population is big. The sample size does not change considerably for people larger.

• Sample ratio definition

The sample proportion is what you expect the outcomes to be. This can often be set using the results in a survey, or by running small pilot research. Use 50%, which gives the most significant sample size and is conservative, if you are uncertain. Notice that this sample size calculation uses the Normal approximation to the Binomial distribution. In the event, the sample ratio is close to 1 or 0, then this approximation is not valid, and you want to take into account an alternative sample size calculation method.

• Sample size

Here is the minimum sample size you need to gauge the true population ratio. Note that if some people choose not to respond if non-response is a chance and that they cannot be contained in your sample, your sample size is going to need to be increased. Generally, the higher the response speed, the better the quote will lead to biases in your quote.

## What Is Standard Deviation?

The standard deviation is a statistic that measures the dispersion of a dataset relative to its mean and can be calculated as the square root of the variance. It is calculated as the square root of variance by specifying the variation between each data point relative to the mean. If the data points are from the mean, is a higher deviation within the data set; consequently, out the data, the greater the standard deviation.

## How to determine the sample size?

We cannot test the entire population. The sample size is based on confidence intervals: we are interested in calculating the population parameter, in measuring the sample size. Therefore, we should establish the confidence intervals, so that of the values of this sample lie inside that range. Sampling answers the question of how? How many? By population, we understand all the members of a specific community and whose character is a certain natural law, a specific characteristic, particularity (ex: youth 18-25 years, students).

What is a good sample size? The sample size is a subset, an extract, several persons extracted from that population. The population is considered infinite; in practice, we cannot study an endless number of cases.

The behaviors, scores, obtained by measuring the sample size are used to deduce, an estimate by statistical inference the scores or behaviors we would collect if we tested the entire population.

## Determining the sample size (as we select).

Fundamental principle – the number of participants considered acceptable to form a representative essay is dependent on the type of research. Thus, for correlational studies, 30 participants are sufficient to create a representative sample size (it is accepted that from 30 subjects, the distribution is normal). For the experimental and quasi-experimental searches (similar to the experiment except that the participants are not randomly divided into two groups, we found the groups already formed).

For descriptive research (ex: aviators), a number of 20% of the respective population is sufficient. The larger the population, the smaller the percentage. Ex: 20% of 1000 people = 200 people; 10% of 5000 pers = 500 pers. For small populations (under 100 persons), the sample size is approximately equal to the population. For average populations (around 500 people) approx. 20%. For larger populations (it is 5000 pers), about 400 pers, but also a sample size of 1% can be significant.

image created with: Flyer Maker

## SAMPLING ALGORITHMS

• Random sample size

(1) Identification and definition of the population

Ex. The population is made up of all 5000 school directors in a random country.

(2) Determining sample size (descriptive research)

Ex. The sample size will consist of 10% of the 5000 executives, resulting in 500 people.

If it is correlational or experimental, N = min 30.

(3) We make a list of all the members of the population.

Ex. All school principals are on the list

(4) A number is assigned to each listed. If we have up to a thousand people, the numbers from 000 are given, and the last one on the list will have 999; If we have 100 people 00-99.

Ex. On the list of directors, give numbers to each first will have 0000 and the last 4999.

(5) There are tables with random numbers, and then a name from the tables with random numbers is randomly selected.

Ex. From the table was chosen 53634 (out of 5 we do not consider that we have 5000 people).

(6) From the extracted number, all the numbers or how many numbers are required depending on the population from which we extract.

Eg. We have only 5000 people.

(7) If we have imprisonment at the set number, we enter it in the table on the sample size list.

Ex. Because there is the director with the number 3634in, we go into the sample size.

(8) Go to the next number on the column.

Variant: We choose the method of the ballot box if we do not agree with the process, that is, all the order numbers of the participants or their names are included in the ballot box, and we extract the number necessary for the preparation of the sample size.

• Systematic sample size

It is established according to the type of research: descriptive, correlational

(1) Identification and definition of the population.

Ex. The population is made up of all 5000 teachers from a random region in a country.

(2) Determining sample size (descriptive research)

Ex. Suppose it is descriptive research, it turns out that 10% of the population = 500 people

(3) We make a list with all the members of the population

Ex. The 5000 teachers are arranged in alphabetical order; already, the list is not randomly made up, but the procedure is valid.

(4) Determine the parameter or step K = population size / sample size.

Ex. K = 5000/500 = 10

(5) It starts with a certain position at the beginning of the list.

Ex. Suppose I put my finger on the 3rd name (using the list directly).

(6) Starting with the chosen position, each K name is chosen.

EX. In our sample size: 3-13-23-33-etc.

(7) If the sample size was not made up by the end of the list, it would come back from the beginning;

• Stratified sample size

(1) Identification and definition of the population.

Ex. To compare the efficiency of two methods of training the psychosocial competence in management according to the level of self-esteem, the population consists of the 300 top managers from a random city.

(2) Determining the sample size (calculating sample size)

Ex. The sample size will be 45 managers for methods a and b

(3) The variable and the subgroups are established, the layers for representing the representativeness (Equal number / Proportional number in each subgroup.

Ex. The desired subgroups are established based on three levels of self-esteem: medium, high, low (age, level of training, male-female)

(4) The members of the population are divided into one of the established subgroups.

Ex.300 managers are classified according to the level of self-esteem: 45 high self-esteem, 225 average self-esteem, 40 low self-esteem.

(5) By simply sampling (we use the table with numbering in disorder or drawing in lots). The number of participants from each subgroup (proportional number) is established

Ex. We determine that from each layer, a number of 30 is extracted. Using the table with random numbers or draw, we extract 30 managers with high self-esteem, 30 with average self-esteem, 30 with low self-esteem. The 30 participants in each sample size thus made up randomly distribute them (half method A and half method B)

• Multistage sample size

The selection of the participants who make up the sample size is made indirectly through the selection of the groups of which the participants are part.

(1) Identification and definition of the population.

Ex. The population is made up of all 5000 teachers from schools that are localized from a random region in a country.

(2) Determining sample size (Descriptive research)

Ex. Sample size = 10% = 500.

(3) Establish the logical type (Cluster)

Ex. The cluster is the school.

(4) The list containing the groups that make up the population is made

Ex. The list is made up of the 100 schools from a random region in a country.

(5) The population number for each group is estimated. (Cluster)

Ex. Although the schools differ in the number of teachers, we choose only 50 from each school

(6) The number of groups is determined by dividing the sample size by the estimated size of the groups.

Ex.500 / 50 = 10.

(7) The number of groups is randomly selected through the table with random numbers or the ballot box.

Ex. We select 10 schools from the 100 schools from a random region in a country!

(8) All members of the selected groups are part of the sample size.

Ex. All teachers in the 10 schools are part of the sample size.

Let us conclude.

The best way to make a representative sample size is random sampling.

## Sample size dimension and sample size type:

Probability depends on the kind of research. For correlational and experimental research, a number of 30 subjects are sufficient for descriptive research depending on the population size from 1-10%.

Regardless of the specific technique used in the large sampling steps, they consist of:

• identification of the population
• determining the required sample size
• selection of participants.
• data collection

Simple random sampling is the best way to obtain a representative or stabilized sample size if we have an exciting variant (self-esteem).

The primary source of deforming tendencies in sampling is the use of the nonprobabilistic method.

Using non-standard techniques is usually difficult if it is not impossible to describe the population of the population from which the sample size was extracted and generalize the results from the sample size to the respective population.

## Dangers of small sample size

For example, we would be tempted to say so that the sample size means obtained on a larger volume sample size is always more accurate than the average sample size obtained on a smaller volume sample size, which is not valid.

True, it is just statement: A larger sample size means on a larger volume sample size is more likely more accurate than one obtained on a smaller volume sample size. It is possible that, through the game of chance, an average obtained on larger sample size is far beyond the average real than average collected on a smaller sample size. Only this situation is less likely, with the less likely, the larger the volume difference between the two sample sizes.

If we reduce the terms of the equation to the extreme, we understand that the significance level of the test can be reached both with small a sample size, with large effect size, but also with a sufficiently large sample size, when the effect size is small. In other words, small effect size can be compensated by increasing the number of subjects, which raises the question of relevance research conclusion.

The systematic error results from factors that are not related to the sample size. These factors that generate the standard error are related to the imperfections of the sampling process, such as, for example, errors in the selection of the sample units, errors in the sampling frame, measurement errors, non-answers, answers that do not correspond to reality, the refusal to participate during the investigation, etc.

## Customer Satisfaction Survey and Market research

Customer satisfaction surveys do not depend on statistically significant sample size. These surveys must be accurate and have more precise answers. It is vital for you to carefully analyze every response a customer has given, in a customer satisfaction survey. All feedback, positive or negative, is important.

When it comes to market research, a statistically significant sample size helps a lot. These market surveys help to discover new information about customers and the market you want to activate. With this survey, you will receive the latest information about the target market and about the customers who would buy your services or products.

## What is a sample size in research?

The sample size in research can help to find out as much information about a specific target market or about a certain type of customer.

## Calculating Sample Size For An AB Test

Any experiment that involves statistical inference requires a sample size calculation done before such an experiment begins. A/B tests (split testing) are no exception. Measuring the minimum number of visitors required for an AB evaluation before beginning prevents us from running the test to get a smaller sample size, thus with an”underpowered” test.

We establish three criteria before we start running the experiment:

1. The significance level for your experiment: A 5% significance level means that if you declare a winner in your AB evaluation, then you’ve got a 95% likelihood that you’re correct in doing so. It also suggests that you have a significant effect difference between the control and the variant with a 95% “confidence.” This threshold is, clearly, an arbitrary one and one when making the design of an experiment chooses it.
2. Minimum detectable effect: The desirable, important difference between the prices you would like to find
3. The evaluation power: the likelihood of detecting that difference between the original rate and the variant conversion rates.