- Social Science
- Economics
- Econometrics
-
Flashcards
-
Learn
-
Test
-
Match
-
Flashcards
-
Learn
-
Test
-
Match
Terms in this set (77)
which of the following statements is true about variables?
1.Nominal variables are expressed by numbers
2.A variable is something that can be measured but does not vary
3.Typically, we have two kinds of variables: discrete and continuous
4.Categorical variables are not discrete
typically we have two kinds of variables continuous and discrete
which of the following variables is NOT a discrete variable?
1.Review out of 5 stars for service at the DMV
2.Average temperature across counties in Arizona
3.Number of "heads" in five coin flips
4.Number of moons around a planet
average temperature across counties in Arizona
which of the following is a discrete variable?
1.Weight of students in class
2.Distance traveled between classes
3.Number of red marbles in a jar
4.Time it takes to get to school
number of red marbles in a jar
which of the following is a continuos variable?
1.Number of "heads" when flipping three coins
2.Height of students in class
3.Students' final letter grades
4.Number of students present
height of students in a class
which of the following statements is true about the median and the mean?
1.The mean is the middle number in a sorted set of observations
2.Median is a measure of the variation in a distribution
3.Extreme observations do not affect calculation of the median
4.If there is no middle number of a list of numbers, then we can say that the list does not have a median
extreme observations do not affect the calculation of the median
which of the following statements is true?
1.Standard deviation is the mean distance of the observations from their median
2.Standard deviation is a measure of center for a random variable
3.Mean and median are measures of center for a random variable
4.Histograms are different from frequency distributions
mean and median are measures of center for a random variable
which of the following statements is true about percentiles?
1.The 50th percentile is the mean of a distribution
2.The 25th percentile divides the data into two parts: 25% of the data at or above its value and 75% at or below its value
3.Percentiles are a quick way to understand the mean of a distribution without generating a chart
4.The 50th percentile is the median of a
distribution and divides the data into two equal parts
the 50th percentile is the median of a distribution and divides the data into two equal parts
which of the following statements is true about populations and samples?
1.In most cases, we shouldn't worry about making sure our sample is randomly selected
2.Sometimes, the sample average we measured does not agree with the population average
3.The
sample average always agrees with population average
4.When we use random sample, it will eliminate potential bias completely
sometimes the sample average, does not agree with the population average
which of the following is true about probability distributions?
1.The chi-square distribution is not related to the t-distribution
2.A probability distribution shows the likelihood of occurrence of
specific outcomes of a random variable
3.The normal distribution and the T distribution have extremely different shapes
4.μ is the symbol for the true standard deviation and sigma is the symbol for the true mean of a distribution
a probability distribution shows the likelihood of occurrence of specific outcomes of a random variable
which of the following statements is true about charts?
1.Charts can
help us understand individual variables' distributions but cannot help us understand the relationships among variables
2.Scatter plots help us to see the frequency distribution of a variable
3.Histograms help us understand how variables behave together
4.Charts can help us understand individual variables' distributions and relationships among variables
charts can help us understand individual variables distributions and relationships among variables
which of the following statements is not true about scatterplots?
1.We don't need scatterplots; we can get the information we need from means, medians, and standard deviations
2.Once we can measure things, we can better manage them
3.Scatterplots show the relationship between two variables
4.In a scatterplot, a single dot represents the values of the two variables we're looking at for a single observation
we don't need scatterplots; we can get the information we need from the mean, median, and standard deviations
which of the following statements is true about lines?
1.The standard formula for a line is y=a+bx and a and b are numbers representing the intercept and slope, respectively
2.For example y=2-5x, 2 is the slope of the line and -5 is the intercept
3.The standard formula for a line is y=a+bx and a and b are
numbers representing the slope and intercept, respectively
4.For example y=2-5x, 2 is the intercept and 5 is the slope
the standard formula for a line is y=a+bx and a and b are numbers representing the intercept and slope
which of the following statements is not true about R?
1.R is an open-source programming language
2.R can be used for statistical programming
3.R can be used for data
manipulation
4. Excel can do everything R can do
excel can do everything R can do
R is an open source programming language. which of the following statement is NOT true about R?
1.R has versions for both Mac and Windows computers
2.It is free for all of us to use
3.Individuals cannot contribute any new code to R
4.We can use it to do data manipulation even when we find code bugs
individuals cannot contribute any new code to R
which of the following statements is true about the R console?
1.We cannot directly type commands into the console
2.We can use the console to run complicated code, but we cannot use it to do simple arithmetic operations
3.We can type commands into the console but it will not show any results
4.It is the user interface for the actual engine doing the
computations
it is the user interface for the actual engine doing the computations
which of the following statements is true about R?
1.We can either save our codes in the console or script files depending on our personal preferences
2.Once I finish typing code into a script file, R will execute the code automatically
3.I don't need to save my workspace image if I save my script files often
4.It is not necessary
to save script files too often because R will save them automatically
I don't need to save my work space image if I save my script files often
which of the following statements is true about variables in R?
1. Only a single number can be stored as a variable in R
2. We cannot add two variables together in R
3. R will not return anything when you create a variable
4. Variables cannot store
information in R
R will not return anything when you create a variable
which of the following statements is NOT true about lists in R?
1.A list cannot be stored as a variable
2.When we create a list, we need to use parentheses instead of brackets
3.The command for creating a list in R is "c"
4.Lists in R can either contain numbers or words
a list cannot be stored as a variable
which of the following statements is NOT true about matrices in R?
1.We use the "matrix" command to create a matrix in R
2.When we reference things in a matrix, we need parentheses
3.When we create a matrix, we must specify its dimensions
4.When we create a matrix, we use brackets
when we reference things in a matrice, we needs parentheses
which of the following statements is true about Data Frames?
1. The working directory is the location from which I can import files
2. If using a Mac, the working directory changes automatically
3. If using a Windows machine, the working directory changes automatically
4. The process for changing the working directory is the same in Mac and Windows machines
a dataset is organized in rows and columns and a row represents an observation
which of the following is correct about importing data?
1.The working directory is the location from which I can import files
2.If using a Mac, the working directory changes automatically
3.If using a Windows machine, the working directory changes automatically
4. The process for changing the working directory is the same in Mac and Windows machines
the working directory is the place from which I can import files
which of the following statements is correct about manipulating stored data?
1.If we would like to view the number of drivers working on the first three days in our dataset, we can simply run the command driversworking[1:3]
2.If we would like to view the number of drivers working on the first three days in our dataset, we can simply run the command driversworking(1:3)
3.A $ sign allows us to access variables
within a dataset
4.Before manipulating stored data, set our working directory again
a $ sign allows us to access variables within a dataset
which of the following statements is correct about using built in functions?
1.We can take advantage of R to get the mean, median, and standard deviation of our dataset very quickly
2.When we run weekend==1, we are telling R to change the value of variable
weekend to 1
3.R automatically corrects our mistakes, so spelling and case-sensitivity are never an issue
4.We can always trust that R will import our dataset properly, there is no need for us to check using "dim" or by printing the first row of the dataset
we can take advantage of R to get the mean, median, and standard deviation of our dataset very quickly
which of the following statements is true about graphs and charts?
1.Graphs and charts in R help us to visualize our data so that we can see the entire distribution for a variable
2.When plotting in R, there is no need to import the dataset first
3.Scatter plots show the frequency distribution of a variable
4.Histograms show the relationship between two variables in R
graphs and charts in R help us to visualize our data so that we can see the entire distribution for a variable
which of the following statements is NOT true about the aggregate function?
1.We can count the observations in many groups at a time using the aggregate function
2.R will not allow us to aggregate over more one variable at a time
3.The aggregate function in R is like the Pivot Table capability in Excel
4.We can take the average of observations in many groups at a time using the aggregate function
R will not allow us to aggregate over more than one variable at a time
which of the following statements is NOT correct about reassigning existing values?
1. We can use the which function to see when d$holiday==1
2. We only need the which function to change existing values
3. R allows us to reassign something that already has a value
4. Changing existing values in a dataset requires only a few simple lines of code
we only need the which function to change existing values
which of the following is true about creating a new variable and adding it to our dataset?
1.R will allow us to create a new variable but it will not allow the new variable to be stored in an existing dataset
2.R will allow us to create a new variable and store it in an existing dataset
3.Once we add a new variable to a dataset in R, it changes our original
csv file
4.R will allow us to add new variables but not overwrite variables in our dataset
R will allow us to create a new variable and store it in an existing dataset
For the linear regression Y=b0+b1*X+e, which of the following statements is true?
1.R is trying maximize the "e" when finding the line of best fit
2."e" is the distance between the actual Y and the predicted Y
3.The line of best
fit can represent each point perfectly, simultaneously
4.b0 is the slope and b1 is the intercept
"e" is the distance between the actual Y and the predicted Y
Which of the following statements is NOT correct?
1."lm" means linear model and this is the function that finds the line that minimizes the distance to each point simultaneously
2."lm" means linear model and this is the function that finds the
line that maximizes the distance to each point simultaneously
3.When we use "=" in R, we are actually storing the result in that variable
4.The dependent variable always comes first in the "lm" function call
"lm" means linear model and this is the function that finds the line that maximizes the distance to each point simultaneously
Which of the following statements is NOT true?
1.The size of the
errors indicate how "good" the line is as a model
2.We can say that we have a good model when we have small relative errors
3.Higher correlations between X and Y generally produce smaller relative errors
4.We can always get a perfect prediction if we try harder
we can always get a perfect prediction if we try harder
Which of the following statements is NOT correct?
1.Ordinary least squares does
not require any distributional assumptions
2.For a null hypothesis of "the coefficient is 0", R has intuition about the alternative hypothesis
3.The null hypothesis is always "this coefficient is equal to 0" for each test of slope or intercept
4.For the hypothesis test of the significance of each coefficient, we need a hypothesis, a test statistic, and the distribution of that test statistic when the null hypothesis is true
For a null hypothesis of "the coefficient is 0", R has intuition about the alternative hypothesis
Which of the following statements is NOT correct?
1.When we choose nominal coding, we need to assign one of the choices to be the reference
2.When considering gender, male has to be 0 and female has to be 1
3.When considering gender, we can choose either male or female as the reference case
4.When we choose ordinal coding, outcomes must have an inherent order
that we can assign numerical values to
When considering gender, male has to be 0 and female has to be 1
Which of the following statements is NOT correct?
1.Sum of squared errors are the errors in the model equation squared and summed up across each data point
2.R-squared is a measure of the model fit for all kinds of models
3.Finding the line of best fit is the process of minimizing the error sum of
squares
4.R-squared is equal to 1 minus the sum of squares for error divided by the total sum of squares
R-squared is a measure of the model fit for all kinds of models
Which of the following statements is NOT true?
1.When we interpret an intercept, we plug in zero for all of the X variables
2.Least squares estimation is the process by which we reduce the errors as much as possible
3.The phrase
"on average" is not always used when interpreting model coefficients
4.R does not require distributional assumptions to find the line of best fit
The phrase "on average" is not always used when interpreting model coefficients
Which of the following statements is true about multicollinearity?
1.Multicollinearity is a strong correlation between predictor variables (X) and the dependent variable (Y) in a
model
2.Perfect multicollinearity means two variables have a correlation of 0
3.Two variables are considered independent if their correlation is 1
4.Multicollinearity is a strong correlation between predictors (X variables) in a model
Multicollinearity is a strong correlation between predictors (X variables) in a model
Which of the following statements is NOT true about multicollinearity?
1.We
cannot use a model that contains multicollinearity
2.When there is a strong correlation between two predictors (X variables) in a model, multicollinearity exists
3.If two variables have a correlation of 1 or -1, we have perfect multicollinearity
4.If two variables have a correlation of 0, they are independent
we cannot use a model that contains multicollinearity
Which of the following is correct about Assumption 1 for linear regression?
1.We can include predictor variables that are not relevant to the dependent variable in our model as long as the coefficients of other variables are significant
2.We can leave out relevant predictor variables as long as the coefficients of other variables are significant
3.This assumption doesn't actually matter
4.We need a good, linear model
we need a good linear model
Which of the following statements is correct about Assumption 2 in linear regression?
1.When we spot perfect multicollinearity in our model, we should remove one of the two predictor variables involved
2.When we spot perfect multicollinearity in our model, we should remove both of the two predictor variables involved
3.If the correlation of predictor variable and dependent variable is 1 or -1, we say there exists a perfect multicollinearity and it violates our assumption
2.
4.R will issue a warning message if our model contains multicollinearity
When we spot perfect multicollinearity in our model, we should remove one of the two predictor variables involved
Which of the following statements is correct about Assumption 3?
1.When we use ACF method, significant correlations beyond lag=0 are ok
2.When we use the residual plot method, patterns in the residuals are
ok
3.It's okay for us to have some observations of the dependent variable (Y) be related with others
4.We must use our intuition and two other tests for this assumption
We must use our intuition and two other tests for this assumption
Which of the following statements is correct about Assumption 4?
1.Homoskedasticity means we have equal variances in the observations of the
response
2.Heteroskedasticity means we have equal variances in the observations of the response
3.If a model is better at predicting salaries of people with college degrees than people with high school diplomas, we say the model is homoskedastic
4.Homoskedasticity means we have unequal variances in the observations of the response
Homoskedasticity means we have equal variances in the observations of the response
Which of the following statements is true about Assumption 5?
1.Distributional assumptions are required for calculating the coefficients in a model
2.Q-Q plots are not useful for testing this assumption
3.Normally distributed errors are required for a linear regression model
4.There are only three ways to test for normally distributed errors
Normally distributed errors are required for a linear regression model
Which of the following statements is true about prediction?
1.Higher R squared is always associated with good prediction
2.If our goal is to interpret coefficients, we will likely use the model that yields the smallest errors out-of-sample
3.Sometimes we choose a model based on best prediction accuracy and not on R squared
4.If our goal is predicting with accuracy, we will likely use R squared to select a model
Sometimes we choose a model based on best prediction accuracy and not on R squared
Which of the following statements is correct about cross-validation?
1.Cross-validation is a standard practice for validating a predictive model's performance
2.Cross-validation involves swapping datasets with a friend
3.It is a way of testing your model on data it has seen before
4.There is only one way to do cross-validation
Cross-validation is a standard practice for validating a predictive model's performance
Which of the following statements is correct?
1.In-sample errors are the errors that come from predicting results of observations the model has not yet seen
2.The training set serves as a "pseudo-reality" where we can test our model's performance
3.In-sample errors are the errors that come from predicting the values used to create the model
4.The test set is the
"protective bubble" where we create the model
In-sample errors are the errors that come from predicting the values used to create the model
Which of the following statements is true?
1.We should use linear regression to predict continuous outcomes and logistic regression to predict binary outcomes
2.Logistic regression cannot be adapted to predict more than two outcomes
3.For a logistic regression,
we must have a normally distributed response, which means that our errors are normally distributed
4.For a linear regression, we don't necessarily need the errors to be normally distributed
We should use linear regression to predict continuous outcomes and logistic regression to predict binary outcomes
Which of the following statement is NOT correct about logistic regression?
1.The probabilities of
success and failure adds up to 1 in a binary logistic regression model
2.The sum of the probabilities of success and failure could be smaller than 1 in a binary logistic regression model
3.The left-hand side of the logistic model is log(p/(1-p)), the log of the odds ratio
4.The right-hand side of the logistic model is the same as in linear regression
The sum of the probabilities of success and failure could be smaller than 1 in a binary logistic regression model
Which of the following statement is NOT true about odds ratios?
1.The odds ratio is the probability of success over the probability of failure
2.The odds ratio is the probability of failure divided by the probability of success
3.The odds ratio is different from the probability of success
4.If the odds ratio goes down, it means that the probability of success goes down and the probability of failure goes up
The odds ratio is the probability of failure divided by the probability of success
Which of the following statements is NOT true about logarithms?
1.If the odds ratio is greater than 1, the log odds ratio is positive
2.When the odds ratio equals 1, the probability of success equals to 0.5
3.If the probability of success is smaller than the probability of failure, the log odds ratio is negative
4.We can take
the log of any number
we can take the log of any number
Which of the following statements is true?
1."glm" means "generalized linear model" which is exactly the same as logistic regression
2.Adding family="binomial" to the "glm" function is optional when running a logistic regression model in R
3.The "glm" function doesn't require any assumptions
4.Maximum Likelihood Estimation chooses is the
estimation method by which we find the intercept and slope for a logistic regression model
Maximum Likelihood Estimation chooses is the estimation method by which we find the intercept and slope for a logistic regression model
Which of the following statements is NOT correct?
1.For a logistic regression model, we can see how the predictors might change the odds of an outcome
2.If a coefficient is
insignificant in our logistic regression model, we should remove it from our model
3.In a logistic regression hypothesis test, we are testing to see whether each intercept and slope is equal to 0, essentially whether that coefficient is significantly contributing to the model
4.For a logistic regression model, we can see how the predictors change the outcome directly
For a logistic regression model, we can see how the predictors change the outcome directly
Which of the following statements is correct?
1.When a log odds ratio is greater than 0, it means that the odds ratio is greater than 1
2.When a log odds ratio is greater than 0, it means that the probability of success is smaller than the probability of failure
3.When a log odds ratio is greater than 0, it means that the odds ratio is smaller than 1
4.When we have a negative log odds ratio, it means that the probability of
success is greater than the probability of failure
When a log odds ratio is greater than 0, it means that the odds ratio is greater than 1
Which of the following statements is true about the logistic regression rule of thumb?
1.If the log odds ratio is 0, it means that the probability of success is 0.5
2.If the log odds ratio is 0, it means that the probability of success is close to 1
3.If the log
odds ratio is -3 or smaller, the probability of success is very close to 1
4.If the log odds ratio is 3 or greater, the probability of success is close to 0
If the log odds ratio is 0, it means that the probability of success is 0.5
Which of the following statements is NOT correct about interpreting the coefficients of logistic regression?
1.If a variable has a significant positive coefficient, we can
say that the log odds ratio increases by the amount of the coefficient, on average, when the variable increases by 1 unit
2.Increasing the log odds ratio means increasing the odds ratio, which means we are increasing the probability of success
3.Decreasing the log odds ratio means decreasing the odds ratio, which means we are decreasing the probability of success
4.If a variable has a significant positive coefficient, we can say that the probability of success increases by the amount of
the coefficient as the variable increases by 1 unit
If a variable has a significant positive coefficient, we can say that the probability of success increases by the amount of the coefficient as the variable increases by 1 unit
Which of the following statements is correct about the donations example?
1.It may be okay to have a model disagree with our intuition, because our intuition is sometimes be
flawed
2.If a model disagrees with our intuition, we need to double check the data set, otherwise, the model is useless
3.In this donation example, we use three independent variables
4.If a model disagrees with our intuition, we should assume our intuition is wrong
It may be okay to have a model disagree with our intuition, because our intuition is sometimes be flawed
Which of the following statements is correct about complete information?
1.Complete information is not necessary if there is multicollinearity in the model
2.If we do not have complete information, R will give us an error message automatically
3.As long as we have all values of at least one predictor, we have complete information
4.If we do not have complete information, our model is likely to make poor predictions
If we do not have complete information, our model is likely to make poor predictions
Which of the following statement is NOT correct?
1.If we have no complete separation, then we cannot draw a vertical line that separate all the zeros and ones
2.Complete separation occurs when one or more variables classifies the observations into successes and failures perfectly
3.If we have complete separation, we can separate all the zeros and ones with a single vertical line in a scatterplot of X versus Y
4.No
complete separation means that one or more variables classifies the observations into successes and failures perfectly
No complete separation means that one or more variables classifies the observations into successes and failures perfectly
Which of the following statement is NOT true about sample size?
1.Having a large sample size allows us to assume our response has a normal distribution
2.Having a
large sample size helps to ensure that we have complete information requirement.
3.Having a large sample size helps us to guarantee a stable solution
4.Having a large sample size allows us to assume our coefficients have a normal distribution
Having a large sample size allows us to assume our response has a normal distribution
Which of the following statements is true?
1.For a logistic regression
model, R-squared represents the proportion of variation in the response that is explained by the predictor variables
2.Bigger AIC values indicate model improvement
3.The model fit statistics in the R output for logistic regressions are generated during maximum likelihood estimation process
4.We will rely on AIC in this class for determining model fit
The model fit statistics in the R output for logistic regressions are generated during maximum likelihood estimation process
Which of the following statements is true?
1.We may have a probability of success that is greater than 1
2.Things that make "k" (the entire right hand side) more positive will decrease the likelihood of success
3.Higher income that increases "k" (the entire right hand side) will make it less likely that we receive a donation
4.Things that make "k" (the entire right hand side) more positive will increase the
likelihood of success
Things that make "k" (the entire right hand side) more positive will increase the likelihood of success
Which of the following statements is NOT correct?
1.We can use different classification rules for different logistic regression models
2.Classification is helpful for prediction
3.A classification rule helps us move from probabilities to predictions
4.If p > .5 in any
modeling situation, we should classify the prediction as "success"
If p > .5 in any modeling situation, we should classify the prediction as "success"
Which of the following statements is NOT true about confusion matrices?
1.We can only create a confusion matrix for our training set, we cannot create a confusion matrix for our test set
2.We can create a confusion matrix for both training and test
sets
3.A confusion matrix summarizes how often the model's predictions are correct versus incorrect
4.To create a confusion matrix, we must have both predicted and actual values stored as zeros and ones instead of probabilities
We can only create a confusion matrix for our training set, we cannot create a confusion matrix for our test set
Which of the following statements is correct?
1.Stability
relates to how often we make errors and what kind of errors we make
2.A good model (of any type) will have both accuracy and stability
3.Accuracy relates to whether the prediction performance in the training set and test set are similar
4.We only require stability and accuracy for linear regression models, not logistic regression models
A good model (of any type) will have both accuracy and stability
Which of the following statements is correct about overfitting?
1.Tackling an overfitting problem can help improve a model's stability
2.Overfitting can only affect linear and logistic models, not other types of models
3.High R-squared always indicates an overfitting problem
4.To address overfitting, we need to add more predictor variables to our model
Tackling an overfitting problem can help improve a model's stability
Which of the following statements is true?
1.When we add a new variable to a linear regression, as long as the R-squared increases by at least a tiny decimal, we should keep that variable
2.We use variable selection techniques to help remove unnecessary variables in our model and reduce overfitting
3.Backward stepwise regression is a technique where we start with no variables and we choose the next best addition at each step
4.Forward stepwise regression is a
technique where you start with all the variables in the model and slowly remove one variable at a time, stopping whenever you have removed all variables that are not contributing substantially
We use variable selection techniques to help remove unnecessary variables in our model and reduce overfitting
Which of the following statements is NOT correct?
1.We always remove variables that contribute less than
5% to the model's R-squared
2.At each step in backward stepwise selection, we consider removing the variable that contributes the least to the model
3.For backward stepwise selection, the first step is to run the model with all the variables and check the R-squared
4.For backward stepwise selection, the second step is to try removing each variable in the model, one at a time, and recording the R-squared value each time a variable is removed
We always remove variables that contribute less than 5% to the model's R-squared
Which of the following statements is NOT true?
1.Even after the application of a variable selection technique, underfitting is still a possibility
2.Always keep the model as complicated as possible, keeping all variables with significant coefficients
3.If you have a good performance in the training set but poor performance in the test set, you may have a problem
with overfitting
4.Even after the application of a variable selection technique, overfitting is still a possibility
Always keep the model as complicated as possible, keeping all variables with significant coefficients
Which of the following statements is correct?
1.Outliers are always observations that don't make sense in the context of the problem
2.We can identify outliers visually by looking at a
histogram or scatter plot for both Y variables and X variables
3.Outliers only refer to the observations that are very far from the center or from the other observations
4.Even though outliers are unusual, we can still make good predictions for them as long as we create a good model
We can identify outliers visually by looking at a histogram or scatter plot for both Y variables and X variables
Which of the following statements is NOT true about outliers?
1.Using a scatter plot to visualize the relationship between distance and fare would not help us to identify outliers
2.Extreme values of fare like 300 dollars could be considered outliers and we can consider removing them
3.If distance was zero but fare was very high, we could consider removing these because they don't make sense in the context of this problem
4.Extreme values of distance like 70 miles could be considered outliers
and we can consider removing them
Using a scatter plot to visualize the relationship between distance and fare would not help us to identify outliers
Which of the following statements is NOT true?
1.When we edit an imported dataset in R, we are only editing its copy inside R, NOT the original file
2.There are clear, straightforward rules about which outliers to remove and how to do it
3.Trimming is
the process of removing a top and bottom percentage of extreme values in X and Y variables
4.Different statisticians may have different rules for identifying outliers, even for the same data set
There are clear, straightforward rules about which outliers to remove and how to do it
Which of the following statements is true about interactions?
1.Interpreting interactions is always very
straightforward
2.Interaction is the same as multicollinearity
3.Interactions exist when X variables affect each other's influence on Y
4.Interactions are correlations among X variables
Interactions exist when X variables affect each other's influence on Y
Which of the following statement is NOT true?
1.Interaction terms are always significant
2.We can create an interaction by multiplying two
of our predictor variables and adding the result to our model
3.Adding an interaction term to a model can improve prediction accuracy
4.If we add an interaction term, we should also add its components to the model to determine whether they are significant contributors separately and together
Interaction terms are always significant.
Which of the following statements is NOT true?
1.An interaction term
can contain any combination of discrete and continuous variables
2.When we are making decisions about whether to keep an interaction term in our model, we can use changes in R-squared
3.When we are making decisions about whether to keep an interaction term in our model, we can use changes in prediction accuracy
4.In order to form an interaction term, one of the variable has to be binary and the other has to be continuous
In order to form an interaction term, one of the variable has to be binary and the other has to be continuous
Which of the following statements is true about decision trees?
1.A parent node can have more than 2 children
2.A decision tree is a list of rules for systematic decision making
3.A parent node cannot have 0 children
4.Decision trees require more assumptions than linear or logistic regression
A decision tree is a list of rules for systematic decision making
Which of the following statements is true?
1.The algorithm finds a single variable and multiple values in that variable that best divide the observations at each step
2.Every time the algorithm splits the data in a node, it considers all variables at all possible values for its split location
3.The algorithm stops after it divides the data 10 times
4.The algorithm finds multiple
variables and a single value in each variable that best divide the observations at each step
Every time the algorithm splits the data in a node, it considers all variables at all possible values for its split location
Which of the following statements is correct?
1.Using recursive partitioning, we are dividing the data into smaller sets, with members in the same set being wildly
different
2.Partitioning the data means dividing the data into overlapping sets
3.Using recursive partitioning, we are dividing the data into smaller sets, each with members sharing similar characteristics
4.To create a tree, we use either the "lm" or "glm" functions in R
Using recursive partitioning, we are dividing the data into smaller sets, each with members sharing similar characteristics
Which of the following statements is true?
1.For a classification tree we use "method=anova" and for a regression tree we use "method=class"
2.ANOVA means analysis of covariance
Correct!
3.ANOVA means analysis of variance
4.Trees for continuous outcomes are called classification trees, and trees for binary outcomes are called regression trees
ANOVA means analysis of variance
Which of the following statements is correct?
1.Decision trees are always better because they don't require assumptions
2.Decision trees are often more stable than linear or logistic regression
3.For decision trees, overfitting is a common problem
4.Decision trees should never be used because they don't have coefficients to interpret
for decision trees, overfitting is a common problem
Students also viewed
comp sci
70 terms
alethea_straker
SQL Second Semester
83 terms
Harami1
20.1.1 Primitive Types Quiz
25 terms
persaud_2021
13 Quiz
16 terms
Sabrina188
Sets found in the same folderMISY262 Final Exam
89 terms
jackiebattistaa
MISY Final
58 terms
eprosen91
Other sets by this creatorfinc314
63 terms
camy_hintonPlus
University of Delaware MISY 160 Exam 3 Study Guide
285 terms
camy_hintonPlus
misy tia vocab words
245 terms
camy_hintonPlus
Geology Quizzes III
76 terms
camy_hintonPlus
Verified questions
algebra
Calculate the length of time for each flight. $$ \begin{aligned} \begin{array}{lccc} \textbf{Departure} &&&& \textbf{Arrival}\\ 7:50\ \text{a} &&&& 5:00\ \text{p} \end{array} \end{aligned} $$
Verified answer
politics of the united states
Which of the following best describes Congress's use of the commerce clause over time? (A) Congress has used it to protect workers and the environment. (B) The Supreme Court has denied Congress much of its commercial regulation authority. (C) Congress can legislate only on products that involve interstate commerce. (D) Congress has used its commerce power sparingly and there are few federal commercial laws.
Verified answer
politics of the united states
Do you think that U.S. law should be changed so that defendants are required to testify in criminal cases? Explain.
Verified answer
finance
Return on assets and return on equity are examples of which type of ratio?
Verified answer
Recommended textbook solutions
Century 21 Accounting: General Journal
11th EditionClaudia Bienias Gilbertson, Debra Gentene, Mark W Lehman
1,012 solutions
Introductory Business Statistics
1st EditionAlexander Holmes, Barbara Illowsky, Susan Dean
2,174 solutions
Fundamentals of Engineering Economic Analysis
1st EditionDavid Besanko, Mark Shanley, Scott Schaefer
215 solutions
Principles of Economics
7th EditionN. Gregory Mankiw
1,394 solutions
Other Quizlet setsMGT 3100 Exam 2 multiple choice
15 terms
sleopol
Topic 2: Correlation and SLR
45 terms
winnhaley
ISU Stat 301 Statistics Exam 1 Study Guide
21 terms
jrsandholm
Stats final
15 terms
emery_lowden