1. Which type of chart is most appropriate for visualizing the distribution of a single continuous variable?
A. Bar chart
B. Pie chart
C. Histogram
D. Scatter plot
2. Which statistical technique is used to reduce the dimensionality of data while retaining as much variance as possible?
A. Linear Regression
B. Analysis of Variance (ANOVA)
C. Principal Component Analysis (PCA)
D. Chi-Squared Test
3. What is multicollinearity in regression analysis?
A. A perfect linear relationship between the dependent and independent variables.
B. A high correlation between two or more independent variables.
C. The absence of any correlation between independent variables.
D. The presence of outliers in the data.
4. Which of the following is NOT a valid property of a probability measure?
A. The probability of any event is between 0 and 1, inclusive.
B. The probability of the sample space is equal to 1.
C. The probability of the union of disjoint events is the sum of their probabilities.
D. The probability of an event can be a negative number if the event is rare.
5. For a normally distributed population, approximately what percentage of data falls within one standard deviation of the mean?
A. 50%
B. 68%
C. 95%
D. 99.7%
6. In regression analysis, what does the R-squared value represent?
A. The standard error of the regression model.
B. The correlation coefficient between the independent and dependent variables.
C. The proportion of variance in the dependent variable that is predictable from the independent variable(s).
D. The p-value for the overall significance of the regression model.
7. What does the Law of Large Numbers state?
A. In a large number of trials, the average of the results will approach the expected value.
B. The probability of rare events increases with more trials.
C. The sample mean is always equal to the population mean.
D. Large samples are always more biased than small samples.
8. What is the difference between a population and a sample?
A. A population is a subset of a sample.
B. A sample is a subset of a population.
C. A population is used in statistics, while a sample is used in probability.
D. There is no difference; the terms are interchangeable in statistics.
9. In probability theory, what is a sample space?
A. A subset of the population being studied.
B. The set of all possible outcomes of a random experiment.
C. The average of all possible outcomes.
D. The probability of a specific event occurring.
10. What is the purpose of stratified sampling?
A. To ensure every individual in the population has an equal chance of being selected.
B. To reduce bias by randomly selecting participants.
C. To ensure representation from different subgroups within the population.
D. To simplify the data collection process.
11. Which type of error occurs when we reject a true null hypothesis in hypothesis testing?
A. Type I error
B. Type II error
C. Standard error
D. Sampling error
12. What is the purpose of cross-validation in machine learning and statistics?
A. To increase the size of the training dataset.
B. To evaluate the performance of a model on unseen data and prevent overfitting.
C. To simplify the model by reducing the number of features.
D. To improve the computational efficiency of model training.
13. What is the expected value of a random variable?
A. The most likely value of the random variable.
B. The average value of the random variable over many trials.
C. The median value of the random variable.
D. The maximum value of the random variable.
14. What is the main assumption of parametric statistical tests?
A. Data must be categorical.
B. Data must follow a specific distribution (e.g., normal distribution).
C. Sample sizes must be small.
D. Variables must be independent.
15. What is the fundamental difference between probability and statistics?
A. Probability deals with observed data, while statistics predicts future events.
B. Probability reasons from general principles to specific instances, while statistics infers from specific instances to general principles.
C. Probability is used in social sciences, and statistics is used in natural sciences.
D. There is no difference; the terms are interchangeable.
16. What is the purpose of a confidence interval?
A. To estimate the exact value of a population parameter.
B. To provide a range of values that is likely to contain the population parameter.
C. To determine if the sample mean is significantly different from zero.
D. To calculate the probability of a Type I error.
17. What is the difference between descriptive and inferential statistics?
A. Descriptive statistics is used for qualitative data, while inferential statistics is for quantitative data.
B. Descriptive statistics summarizes data, while inferential statistics uses sample data to make generalizations about a population.
C. Descriptive statistics is more complex than inferential statistics.
D. There is no significant difference between them; they are both used for data analysis.
18. What does statistical power refer to in hypothesis testing?
A. The probability of making a Type I error.
B. The probability of making a Type II error.
C. The probability of correctly rejecting a false null hypothesis.
D. The probability of correctly accepting a true null hypothesis.
19. What is the role of the degrees of freedom in statistical tests like the t-test?
A. To measure the spread of the data.
B. To adjust for the sample size and the number of parameters being estimated, influencing the shape of the test statistic`s distribution.
C. To calculate the p-value directly.
D. To determine the level of significance.
20. Which of the following is a measure of linear association between two variables?
A. Variance
B. Standard deviation
C. Correlation coefficient
D. Expected value
21. In hypothesis testing, what is the meaning of a p-value?
A. The probability that the null hypothesis is true.
B. The probability of observing data as extreme as, or more extreme than, the data observed, assuming the null hypothesis is true.
C. The probability of rejecting the null hypothesis when it is false.
D. The probability of accepting the null hypothesis when it is true.
22. What does variance measure in statistics?
A. The central tendency of a dataset.
B. The number of data points in a dataset.
C. The spread or dispersion of data points around the mean.
D. The median of a dataset.
23. Which of the following is a measure of central tendency?
A. Standard deviation
B. Variance
C. Median
D. Range
24. Which of the following distributions is best suited for modeling the number of successes in a fixed number of independent Bernoulli trials?
A. Normal distribution
B. Poisson distribution
C. Binomial distribution
D. Exponential distribution
25. What is the difference between independent and dependent events in probability?
A. Independent events always have the same probability, while dependent events have different probabilities.
B. Independent events occur simultaneously, while dependent events occur sequentially.
C. The outcome of an independent event does not affect the probability of another event, while the outcome of a dependent event does affect the probability of subsequent events.
D. Dependent events are more common in real-world scenarios than independent events.
26. What is the purpose of bootstrapping in statistics?
A. To increase the sample size of a dataset.
B. To estimate the sampling distribution of a statistic by resampling with replacement from the original sample.
C. To remove outliers from a dataset.
D. To standardize data to have a mean of zero and a standard deviation of one.
27. In time series analysis, what does autocorrelation refer to?
A. The correlation between two different time series.
B. The correlation of a time series with itself at different time lags.
C. The average value of a time series over time.
D. The trend component in a time series.
28. What is the role of Bayes` Theorem in probability and statistics?
A. To calculate the probability of independent events.
B. To update the probability of an event based on new evidence.
C. To determine the sample size needed for a study.
D. To test the goodness of fit of a statistical model.
29. What is the Central Limit Theorem?
A. The sum of independent and identically distributed random variables tends to follow a uniform distribution.
B. The sample mean of a sufficiently large number of independent and identically distributed random variables, regardless of the original distribution`s form, will be approximately normally distributed.
C. The variance of a sample is always smaller than the variance of the population.
D. The probability of any event in a sample space is equal.
30. Which of the following is an example of a discrete random variable?
A. The height of a student.
B. The temperature of a room.
C. The number of cars passing a point on a highway in an hour.
D. The time it takes to run a marathon.