Sample Notes: Data And Its Collection
1. Data and Its Collection
1.1 General Ideas of Sampling
- Population:
The complete group of items or people being studied.
Example: All students in a school. - Sample:
A subset of the population selected for the study.
Used to make conclusions about the entire population without investigating every individual. - Census:
A data collection method where every member of the population is surveyed.- Pros: Highly accurate
- Cons: Time-consuming, expensive, often impractical
- Sampling:
The process of selecting a sample from the population. - Representative Sample:
A sample that accurately reflects the population’s characteristics (e.g. age, gender, region). - Why sampling is used:
- Saves time and cost
- Can still provide reliable results
- Useful when population is too large to survey completely
1.2 Types of Sampling
1. Simple Random Sampling (SRS)
- Every member of the population has an equal chance of being selected.
- Method: Use of random number tables or random number generator.
- Example: Choosing 10 students randomly from a list of 200 students using a random number generator.
- Pros: Unbiased if truly random
- Cons: May not reflect subgroups in the population
2. Systematic Sampling
- Select every nᵗʰ member from a list after choosing a random starting point.
- Example: If population = 100 and sample size = 10, select every 10ᵗʰ person starting from a random number between 1–10.
- Pros: Easy to implement
- Cons: May introduce bias if list is ordered in a pattern
3. Stratified Sampling
- Population is divided into strata (groups), e.g., age, gender, region. Then, a proportional sample is taken from each stratum.
- Example: In a school of 60% girls and 40% boys, a sample of 100 would include 60 girls and 40 boys.
- Pros: More representative
- Cons: Requires knowledge of population structure
4. Quota Sampling
- Interviewers are told to select people who fit into specific categories (e.g., 10 boys, 5 girls). No random selection within groups.
- Example: Interviewing 20 people—10 below 30 years, 10 above.
- Pros: Faster and easier to conduct
- Cons: Can introduce selection bias
1.3 Bias: How It Arises and Is Avoided
Bias:
An error that causes results to be skewed in a particular direction, leading to inaccurate conclusions.
Sources of Bias:
- Non-random sampling: Choosing only convenient people (e.g. friends, classmates).
- Leading questions: “How amazing was your experience?” vs “How would you rate your experience?”
- Non-response bias: When certain groups are less likely to respond.
- Small or unrepresentative sample: Sample doesn’t reflect the population.
- Interviewer influence: Body language or tone may influence answers.
How to Reduce Bias:
- Use random methods for selection.
- Ensure questions are neutral.
- Use larger sample sizes when possible.
- Ensure the sample represents all key subgroups.
- Avoid self-selection bias (e.g., online polls where only some types of people respond).
1.4 General Ideas of Surveys
What is a Survey?
A method for collecting information from a sample using a set of questions.
Designing a Survey:
- Define objective – What information is needed?
- Target population – Who is being surveyed?
- Choose sample method – SRS, stratified, etc.
- Design the questionnaire
- Pilot test the survey – Try it on a small group first
- Collect responses
- Analyse and interpret data
Question Types:
- Closed questions: Have fixed options (e.g., Yes/No, Multiple Choice)
- Easy to analyze but less detailed
- Open questions: Allow respondents to answer in their own words
- More detail but harder to analyze
Best Practices:
- Avoid leading questions
- Keep wording clear and simple
- Avoid double-barrelled questions (e.g. “Do you like sports and music?”)
1.5 Types of Data and Variable
1. Types of Data:
Quantitative Data:
Numerical data that can be measured or counted.
- Discrete: Takes specific values only (e.g., number of children: 0, 1, 2…)
- Continuous: Can take any value within a range (e.g., height: 161.2 cm, 163.8 cm…)
Qualitative Data:
Descriptive data that deals with categories or labels.
- Examples: Colors, brands, gender, opinions
2. Types of Variables:
- Independent variable: The one being changed or controlled in an experiment.
- Example: Type of fertilizer used in a crop study
- Dependent variable: The one being measured.
- Example: Growth of plants (cm)
- Categorical variable: Data that can be divided into specific groups or categories.
- Example: Blood type (A, B, AB, O)
- Numerical variable: Data represented by numbers.
- Example: Age, test scores
Important Terms to Memorize
Term | Definition |
---|---|
Population | The entire group of individuals or items |
Sample | A selected part of the population |
Census | Data collection from the whole population |
Bias | An error that skews results |
Random Sampling | Each member has equal chance of selection |
Stratified Sampling | Sample from different groups (strata) in proportion |
Quota Sampling | Fixed numbers from specific categories |
Discrete Data | Only certain values possible |
Continuous Data | Any value within a range |
Qualitative Data | Descriptive, non-numeric |
Quantitative Data | Measurable, numeric |
Example Exam Questions
- Define the term “census” and give one advantage and one disadvantage of using it.
- Explain the difference between stratified and quota sampling.
- Identify one source of bias in surveys and explain how it can be avoided.
- Give two examples each of discrete and continuous data.
- Distinguish between qualitative and quantitative data using examples.