The science that applies statistical theory and mathematical principals to research in medicine, biology, environmental science, health and related fields.

The __________ statistic, named after its creators, tests the hypothesis that there is no autocorrelation of one time lag present in the errors obtained from forecasting

**Durbin-Watson (DW)**

A procedure of combining evidence in different reports on the same aspect. If different trials on the same regimen report varying efficacy, they can be combined to come to a unified conclusion, which may command substantially more confidence than result of any one of the individual trials.

**Meta-analysis**

In regression analysis, a ______________ is one that takes the values 0 or 1 to indicate the absence or presence of some categorical effect (month, quarter, strike, fire loss) that may be expected to shift the outcome.

**dummy variable**

If x objects have a certain characteristic then the sample proportion “p” is: p = x*n.

- True
**False**

Cause-specific rate is the rate obtained when numerator is restricted to a particular cause (e.g., of morbidity or of mortality).

**True**- False

A good hypothesis statement should: an “if” and “________________” statement.

**then**

A population has a mean of 60 and a standard deviation of 5. A random sample of 16 measurements is drawn from this population. Describe the sampling distribution of the sample means by computing its standard deviation. Assume that the population is infinite.

- 1.96
- 1.55
- 1.35
**1.25**

A _____ is a subset drawn from a larger population.

**sample**

The set of characteristics such as age, disease and severity, which are necessary in a subject to be considered eligible for inclusion in the study.

**Inclusion criteria**

A ____________ shows the number of observations falling into each of several ranges of values, which are typically portrayed as histograms.

**distribution**

A hypothesis is an ____________ about something in the world around you.

**educated guess**

The process of reaching to a decision after considering probabilities of various outcomes and value judgments regarding the utility of those outcomes.

**Decision analysis**

A prospective study of a cohort for a specified period, generally to observe the occurrence of an outcome of interest, and thereby determine the incidence.

**Cohort study**

A good hypothesis statement should: Have design _______________.

**criteria**

Regardless of the shape of the population distribution, this theorem states that the sampling distribution of the mean of n independent sample values will approach the normal distribution as the sample size increases.

**central limit theorem**

The heights female college students are normally distributed with mean of 68 inches and standard deviation of 3 inches. If 25 students are randomly drawn from the population, what would be the expected mean of the resulting sampling distribution of the means?

**68**- 90
- 45
- 88

The result stating that the chance of a summative measure such as sample mean following a Gaussian distribution rapidly increases in almost all practical situations as the number of individuals in a sample increases (i.e., sample size becomes large).

**Central Limit Theorem**

This condition exists when the errors do not have a constant variance across an entire range of values.

**heteroscedasticity**

An extraneous factor that could be an explanation of the outcome of interest in addition to the factor under study so that its effect can not be differentiated from the other: such as dietary factors when examining relationship between smoking and cervical cancer. Presence of unaccounted confounders decreases the validity of a study.

**Confounder**

The statistical procedure to classify units or individuals into groups such that the units are similar within each group but dissimilar across groups: generally used when the number and nature of the groups are not known.

**Cluster analysis**

Inverse of the standard error of the estimate or a derivative of this inverse.

**Efficiency of an estimate**

In hypothesis testing, the critical value is the threshold for significance.

**critical value**

In the game of Scrabble, each player begins by drawing 7 tiles from a bag containing 100 tiles. There are 42 vowel, 56 consonants, and 2 blank tiles in the bag. Cait chooses an SRS of 7 tiles. Let p-hat be the proportion of vowels in her sample.

**Yes**- No

A sample proportion is where a random sample of objects n is taken from a population P.

**True**- False

The simple arithmetic average of a distribution of variable values (or scores), the _____________ provides a single, concise numerical summary of a distribution.

**mean**- median
- mode

The normality assumption is at the core of a majority of standard statistical procedures, and it is important to be able to test this assumption.

**Lilliefors test**

A distribution is the arrangement of data by the values of one variable in order, from ______________.

- left to right
**low to high**- right to left
- high to low

_____ are a collection of statistical tools which are used to quantitatively describe or summarize a collection of data.

**Descriptive statistics**

If you are using a significance level of .05, a ______________ allots all 5 percent to testing the statistical significance in the one direction of interest.

**one-tailed test**

The ______________, also called root mean squared error (RMSE), is the square root of the mean squared residual term from the ANOVA table of the summary output.

**standard error of the estimate (SEE)**

The tendency of getting poor output or poor outcome when the inputs or efforts are poor.

**Garbage-in, garbage-out syndrome**

The mean of sampling distribution of the proportion, P, is a special case of the _____.

**sampling distribution of the mean**

A less scientific but a quick method to arrive at a consensus among experts.

**Delphi method**

The _____________ hypothesis is always the accepted fact.

**null**

28% of all Woodrow students believe Monday will be snow day. You take a sample of 50 students and find that 15 of them believe Monday will be a snow day. What does 28 represent?

- E
- N
- S
**P**

It is possible to fit a polynomial of any number of terms to a set of data.

**polynomial fitting**

The process of going into the deep of a phenomenon, data-set, thought, etc., and looking at its various components.

**Analysis**

The Gallup Poll asked a random sample of 1785 adults whether they attended church during the past week. Let p-hat be the proportion of people in the sample who attended church. A newspaper report claims that 40% of all U.S. adults went to church last week. Suppose this claim is true. What is the mean of the sampling distribution of p-hat?

- 0.55
- 0.66
- 0.5
**0.4**

Hypothesis should be testable, either by experiment or ________________.

**observation**

The mean of the sampling distribution of the mean is the same as the _____.

**population mean**

The 300 customers who called the call center spend an average of 45 minutes on hold, with a standard deviation of 12 minutes. What is the expected average of the sampling distribution for a sample of 150 randomly selected customers?

- 55
**45**- 50
- 60

A standardized measure of the association or mutual dependence between two variables, say, X and Y.

**correlation coefficient**

Census is the survey of the entire population.

**True**- False

The probability of occurrence of an event such as disease when some a-priori information such as sign-symptoms are known: denoted by P(A/B) where after slash (/) sign is what is known a-priori.

**Conditional probability**

When two or more comparisons or other statistical tests of hypothesis are done on the same set of data, the total probability of alpha error can increase much beyond the prefixed level such as 5%. This is known as "blinding".

- True
**False**

A characteristics that is assessed only in two categories such as ascites present or absent (or yes/no), or gender as male or female.

**Binary variable**

Hypothesis testing in statistics is a way for you to _______________ of a survey or experiment to see if you have meaningful results.

**test the results**

The first could be called missed diagnosis and the second as misdiagnosis. In place of healthy/diseased this could be any other categorization.

**Misclassification**

The average speed of 1500 vehicles traveled on a stretch of highway that day is 67 miles per hour with a standard deviation of 3.5 miles per hour. If 100 vehicles are randomly selected as samples, what would be the standard error of the resulting sampling distribution of sample means?

- 0.39
**0.34**- 0.56
- 0.78

A ______________ is calculated by subtracting the forecast value from the actual value to give an error value for each forecast period. In forecasting, this term is commonly used as a synonym for residual.

**forecast error**

This consists of calculations that provide information about levels of variability within a regression model and form a basis for tests of significance.

**ANOVA**

The mean of the sampling distribution of the mean formula is is μM = μ, where μM is the _____ of the mean.

**mean of the sampling distribution**

A _____________ is the probability that the observed result, or a result more extreme, could be obtained if the null hypothesis is true.

**p-value**

______________ are so called because the assumptions underlying their use are fewer and weaker than those associated with parametric tests

**Nonparametric or distribution-free tests**

This is simply the SEE divided by the average of the dependent variable.

**coefficient of variation**

The set of words that describes the essential features of a study.

**Keywords**

Variety of causes of death: some people meet death slowly such as by cancer, and some sudden such as by myocardial infarction.

**Death spectrum**

28% of all Woodrow students believe Monday will be snow day. You take a sample of 50 students and find that 15 of them believe Monday will be a snow day. What does 50 represent?

- I
- P
- N (capital)
**n**

The ___________ is calculated by dividing MSR (mean squared regression) by MSE (mean squared error), or explained variance by unexplained variance.

**F ratio or F statistic**

Increasing the sample size of an opinion poll will ___________ the variability of the estimates made from the data collected in the poll.

- decrease
**reduce**- stabilize

A population has a mean of 60 and a standard deviation of 5. A random sample of 16 measurements is drawn from this population. Describe the sampling distribution of the sample means by computing its mean. Assume that the population is infinite.

**60**- 75
- 70
- 65

If I…(do this to an independent variable)….then (this will happen to the _________________).

**dependent variable**

One of the most important measures of dispersion, the _____________ is the difference between the maximum and minimum values of a distribution.

**range**- variance
- IQR
- standard deviation

A person with disease classified as without disease. In place of disease, false negativity can be for any other attribute.

**False negative**

A non parametric test for comparing central tendency in three or more groups.

**Kuskal-Wallis test**

When the sample is small, the sampling distribution of a proportion will have an approximate normal distribution.

- True
**False**

What does this symbol represent?

- sample
- variable
**population proportion**- ratio

A sampling distribution is a graph of a statistics for your sample data.

**True**- False

The heights female college students are normally distributed with mean of 68 inches and standard deviation of 3 inches. If 25 students are randomly drawn from the population, what would be the standard error of the resulting sampling distribution of the means?

- 0.5
**0.6**- 0.1
- 0.3

If you are going to propose a hypothesis, it’s customary to write a ______________.

**statement**

A trial with the objective to examine if a new regimen is different from another regimen by more than a prespecified medically unimportant margin.

**Equivalence trial**

Identification data of a document containing the authors‘ name, title, publication name, volume, publication date, page numbers, etc.

**Citation**

____________ is the Excel function that calculates the kurtosis of a data set’s distribution.

**KURT**

For data sets having a normal, bell-shaped distribution, approximately 68 percent of the data values are within 1 standard deviation of the mean; approximately 95 percent are within 2 standard deviations of the mean; and approximately 99.7 percent (nearly all) are within 3 standard deviations of the mean.

**empirical rule**

If we could take many such samples, the collection of possible values of the statistic would follow its _______________.

**sampling distribution**

A summary of the death and survival pattern of a group of people—generally for the entire population of an area, but can be used for patients of a particular disease also.

**Life table**

The Gallup Poll asked a random sample of 1785 adults whether they attended church during the past week. Let p-hat be the proportion of people in the sample who attended church. A newspaper report claims that 40% of all U.S. adults went to church last week. Suppose this claim is true. Calculate the standard deviation of the sampling distribution of p-hat.

- 0.25652
- 0.0045
- 0.8987
**0.0116**

In a population of five university students with GPAs of 2.5, 2.3, 1.7, 1.4, and 1.1, a sample of three students are considered. What would be the standard deviation of the resulting sampling distribution?

- 0.88
- 0.56
**0.22**- 0.32

When the error terms remaining after application of a forecasting method show autocorrelation, it indicates that the forecasting method has not removed all of the pattern from the data.

**autocorrelated errors**

A difference in time between an observation and a previous observation.

**lag**

It’s good science to let people know if your study results are solid, or if they could have happened by chance. The usual way of doing this is to test your results with a _____________.

**p-value**

You can have the Y-axis on a logarithmic scale instead of a linear one.

**log scale**

How well the actual observations fit into a specified pattern.

**Goodness of fit**

Bibliography is a list of citations of the related literature.

**True**- False

The frequency of desired outcome per unit of resource inputs such as time, money and manpower.

**Efficiency**

The technique of estimating a smooth trend, usually by taking weighted averages of observations.

**smoothing**

As the sample size increases, the mean of the _____ of the mean will approach the population mean of μ.

**sampling distribution**

When there are more scores toward one end of the distribution than the other, this results in _____________.

- positive
- above average
- negative
**skew**

A _____ is a value which is generated from a population.

**parameter**

The sampling distribution of a proportion is when you repeat your survey for all possible samples of the population.

**True**- False

A form of regression analysis where the observations are measured at the same point in time or over the same time period but differ along another dimension.

**cross-sectional model**

The statistical procedure to discover a construct out of data that can possibly explain the variation and relationship among different variables.

**Factor analysis**

The Gallup Poll asked a random sample of 1785 adults whether they attended church during the past week. Let p-hat be the proportion of people in the sample who attended church. A newspaper report claims that 40% of all U.S. adults went to church last week. Suppose this claim is true. Calculate the standard deviation of the sampling distribution.

- 0.2514
- 0.2585
**0.0116**- 0.5264

The 300 customers who called the call center spend an average of 45 minutes on hold, with a standard deviation of 12 minutes. What is the standard error of the sampling distribution for a sample of 150 randomly selected customers

- 0.50
- 0.58
- 0.79
**0.69**

The heights of children of exceptionally tall (or short) parents “regress” to the mean of the population

**regression**

A person without disease classified as with disease. In place of disease, false positivity can be for any other attribute

**False positive**

Similar course of the disease process in the two regimens under comparison: also evaluated in terms of comparable bioavailability of drug products, say, within 80% to 125% with respect to area under the concentration curve and Cmax.

**Bioequivalence**

A plot of the residuals versus a z value (or cumulative normal percentile) derived from the normal probability distribution for the ranking location of the residual.

**normal probability plot**

The average speed of 1500 vehicles traveled on a stretch of highway that day is 67 miles per hour with a standard deviation of 3.5 miles per hour. If 100 vehicles are randomly selected as samples, what would be the mean of the resulting sampling distribution of sample means?

**67**- 68
- 69
- 70

Using computer to solve problems without understanding the implications of the underlying procedure. This is known as "black box approach".

**True**- False

A statistic is unbiased if... ... the mean of the sampling distribution IS _____________ to the value of the population mean.

- greater
- lesser
**equal**

In a population of five university students with GPAs of 2.5, 2.3, 1.7, 1.4, and 1.1, a sample of three students are considered. What would be the mean of the resulting sampling distribution?

**1.8**- 3.6
- 2.5
- 4.9

A _____ is a selected individual or group representing the full set of members of a certain group of interest.

**population**

The _____ theorem tells us that if we have a large number of independent, identically distributed variables, the distribution will approximately follow a normal distribution.

**central limit**

The ______________ is the score of a distribution residing at the 50th percentile, separating the top and bottom 50 percent of scores.

- mode
**median**- mean

The technique of multiple regression is an extension of simple regression.

**multiple regression**

This is a statement of what a statistical hypothesis test is set up to establish.

**alternative hypothesis**

linear combination of individual forecasts to assist in obtaining a more accurate forecast.

**composite regression model**

An international organisation of producers and consumers of medical research that helps to clarify the research achievements, particularly health care interventions such as drugs, diet alteration and behavior change

**Cochrane Collaboration**

____________ are used to draw inferences about a population from a sample.

**Inferential statistics**

It equals the change in Y for each unit change in X.

**slope**

A _____ quantity is a quantity without a physical unit and is thus a pure number.

**dimensionless**

The probability of occurrence of one of two or more mutually exclusive events is the sum of the probabilities of their individual occurrence

**Addition rule (of probability)**

_____ refers tot he ability to draw conclusions about the characteristics of the population as a whole based on the results of data collected from a sample.

**Generalizability**

A good hypothesis statement should: be based on information in ______________ research.

**prior**

If a forecast variable Y is regressed against several explanatory variables X1, X2, . . , Xk, then the estimated Y value is designated Y.

**multiple correlation coefficient**

If you had to choose a "best statistic" to describe a population, which of the following would be best.

**low bias, low variability**- low bias, high variability
- high bias, high variability
- high bias, low variability

The area from the ROC curve to the base: used as an indicator of the efficacy of a test in terms of sensitivity and specificity – can be used to compare performance of various tests.

**Area under the curve**

A ______________ is a set of ordered observations of a phenomenon at equally spaced time points.

**time series**

The remaining portion of life at any age that would be spent without any morbidity

**Healthy life expectancy**

