Which of the following statements is not true about variables

Social Science
Economics
Econometrics

Table of Contents Show

Which of the following statement is true about a variable?
What is not true about variables in statistics?
Which of the following two statements are true about variables?
Which of the following statement is not true for values?

Flashcards
Learn
Test
Match

Flashcards
Learn
Test
Match

Terms in this set (77)

which of the following statements is true about variables?

1.Nominal variables are expressed by numbers
2.A variable is something that can be measured but does not vary
3.Typically, we have two kinds of variables: discrete and continuous
4.Categorical variables are not discrete

typically we have two kinds of variables continuous and discrete

which of the following variables is NOT a discrete variable?
1.Review out of 5 stars for service at the DMV
2.Average temperature across counties in Arizona
3.Number of "heads" in five coin flips
4.Number of moons around a planet

average temperature across counties in Arizona

which of the following is a discrete variable?

1.Weight of students in class
2.Distance traveled between classes
3.Number of red marbles in a jar
4.Time it takes to get to school

number of red marbles in a jar

which of the following is a continuos variable?

1.Number of "heads" when flipping three coins
2.Height of students in class
3.Students' final letter grades
4.Number of students present

height of students in a class

which of the following statements is true about the median and the mean?

1.The mean is the middle number in a sorted set of observations
2.Median is a measure of the variation in a distribution
3.Extreme observations do not affect calculation of the median
4.If there is no middle number of a list of numbers, then we can say that the list does not have a median

extreme observations do not affect the calculation of the median

which of the following statements is true?

1.Standard deviation is the mean distance of the observations from their median
2.Standard deviation is a measure of center for a random variable
3.Mean and median are measures of center for a random variable
4.Histograms are different from frequency distributions

mean and median are measures of center for a random variable

which of the following statements is true about percentiles?

1.The 50th percentile is the mean of a distribution
2.The 25th percentile divides the data into two parts: 25% of the data at or above its value and 75% at or below its value
3.Percentiles are a quick way to understand the mean of a distribution without generating a chart
4.The 50th percentile is the median of a distribution and divides the data into two equal parts

the 50th percentile is the median of a distribution and divides the data into two equal parts

which of the following statements is true about populations and samples?

1.In most cases, we shouldn't worry about making sure our sample is randomly selected
2.Sometimes, the sample average we measured does not agree with the population average
3.The sample average always agrees with population average
4.When we use random sample, it will eliminate potential bias completely

sometimes the sample average, does not agree with the population average

which of the following is true about probability distributions?

1.The chi-square distribution is not related to the t-distribution
2.A probability distribution shows the likelihood of occurrence of specific outcomes of a random variable
3.The normal distribution and the T distribution have extremely different shapes
4.μ is the symbol for the true standard deviation and sigma is the symbol for the true mean of a distribution

a probability distribution shows the likelihood of occurrence of specific outcomes of a random variable

which of the following statements is true about charts?

1.Charts can help us understand individual variables' distributions but cannot help us understand the relationships among variables
2.Scatter plots help us to see the frequency distribution of a variable
3.Histograms help us understand how variables behave together
4.Charts can help us understand individual variables' distributions and relationships among variables

charts can help us understand individual variables distributions and relationships among variables

which of the following statements is not true about scatterplots?

1.We don't need scatterplots; we can get the information we need from means, medians, and standard deviations
2.Once we can measure things, we can better manage them
3.Scatterplots show the relationship between two variables
4.In a scatterplot, a single dot represents the values of the two variables we're looking at for a single observation

we don't need scatterplots; we can get the information we need from the mean, median, and standard deviations

which of the following statements is true about lines?

1.The standard formula for a line is y=a+bx and a and b are numbers representing the intercept and slope, respectively
2.For example y=2-5x, 2 is the slope of the line and -5 is the intercept
3.The standard formula for a line is y=a+bx and a and b are numbers representing the slope and intercept, respectively
4.For example y=2-5x, 2 is the intercept and 5 is the slope

the standard formula for a line is y=a+bx and a and b are numbers representing the intercept and slope

which of the following statements is not true about R?

1.R is an open-source programming language
2.R can be used for statistical programming
3.R can be used for data manipulation
4. Excel can do everything R can do

excel can do everything R can do

R is an open source programming language. which of the following statement is NOT true about R?

1.R has versions for both Mac and Windows computers
2.It is free for all of us to use
3.Individuals cannot contribute any new code to R
4.We can use it to do data manipulation even when we find code bugs

individuals cannot contribute any new code to R

which of the following statements is true about the R console?

1.We cannot directly type commands into the console
2.We can use the console to run complicated code, but we cannot use it to do simple arithmetic operations
3.We can type commands into the console but it will not show any results
4.It is the user interface for the actual engine doing the computations

it is the user interface for the actual engine doing the computations

which of the following statements is true about R?

1.We can either save our codes in the console or script files depending on our personal preferences
2.Once I finish typing code into a script file, R will execute the code automatically
3.I don't need to save my workspace image if I save my script files often
4.It is not necessary to save script files too often because R will save them automatically

I don't need to save my work space image if I save my script files often

which of the following statements is true about variables in R?
1. Only a single number can be stored as a variable in R
2. We cannot add two variables together in R
3. R will not return anything when you create a variable
4. Variables cannot store information in R

R will not return anything when you create a variable

which of the following statements is NOT true about lists in R?

1.A list cannot be stored as a variable
2.When we create a list, we need to use parentheses instead of brackets
3.The command for creating a list in R is "c"
4.Lists in R can either contain numbers or words

a list cannot be stored as a variable

which of the following statements is NOT true about matrices in R?

1.We use the "matrix" command to create a matrix in R
2.When we reference things in a matrix, we need parentheses
3.When we create a matrix, we must specify its dimensions
4.When we create a matrix, we use brackets

when we reference things in a matrice, we needs parentheses

which of the following statements is true about Data Frames?

1. The working directory is the location from which I can import files
2. If using a Mac, the working directory changes automatically
3. If using a Windows machine, the working directory changes automatically
4. The process for changing the working directory is the same in Mac and Windows machines

a dataset is organized in rows and columns and a row represents an observation

which of the following is correct about importing data?

1.The working directory is the location from which I can import files
2.If using a Mac, the working directory changes automatically
3.If using a Windows machine, the working directory changes automatically
4. The process for changing the working directory is the same in Mac and Windows machines

the working directory is the place from which I can import files

which of the following statements is correct about manipulating stored data?

1.If we would like to view the number of drivers working on the first three days in our dataset, we can simply run the command driversworking[1:3]
2.If we would like to view the number of drivers working on the first three days in our dataset, we can simply run the command driversworking(1:3)
3.A $ sign allows us to access variables within a dataset
4.Before manipulating stored data, set our working directory again

a $ sign allows us to access variables within a dataset

which of the following statements is correct about using built in functions?

1.We can take advantage of R to get the mean, median, and standard deviation of our dataset very quickly
2.When we run weekend==1, we are telling R to change the value of variable weekend to 1
3.R automatically corrects our mistakes, so spelling and case-sensitivity are never an issue
4.We can always trust that R will import our dataset properly, there is no need for us to check using "dim" or by printing the first row of the dataset

we can take advantage of R to get the mean, median, and standard deviation of our dataset very quickly

which of the following statements is true about graphs and charts?

1.Graphs and charts in R help us to visualize our data so that we can see the entire distribution for a variable
2.When plotting in R, there is no need to import the dataset first
3.Scatter plots show the frequency distribution of a variable
4.Histograms show the relationship between two variables in R

graphs and charts in R help us to visualize our data so that we can see the entire distribution for a variable

which of the following statements is NOT true about the aggregate function?

1.We can count the observations in many groups at a time using the aggregate function
2.R will not allow us to aggregate over more one variable at a time
3.The aggregate function in R is like the Pivot Table capability in Excel
4.We can take the average of observations in many groups at a time using the aggregate function

R will not allow us to aggregate over more than one variable at a time

which of the following statements is NOT correct about reassigning existing values?

1. We can use the which function to see when d$holiday==1
2. We only need the which function to change existing values
3. R allows us to reassign something that already has a value
4. Changing existing values in a dataset requires only a few simple lines of code

we only need the which function to change existing values

which of the following is true about creating a new variable and adding it to our dataset?

1.R will allow us to create a new variable but it will not allow the new variable to be stored in an existing dataset
2.R will allow us to create a new variable and store it in an existing dataset
3.Once we add a new variable to a dataset in R, it changes our original csv file
4.R will allow us to add new variables but not overwrite variables in our dataset

R will allow us to create a new variable and store it in an existing dataset

For the linear regression Y=b0+b1*X+e, which of the following statements is true?

1.R is trying maximize the "e" when finding the line of best fit
2."e" is the distance between the actual Y and the predicted Y
3.The line of best fit can represent each point perfectly, simultaneously
4.b0 is the slope and b1 is the intercept

"e" is the distance between the actual Y and the predicted Y

Which of the following statements is NOT correct?

1."lm" means linear model and this is the function that finds the line that minimizes the distance to each point simultaneously
2."lm" means linear model and this is the function that finds the line that maximizes the distance to each point simultaneously
3.When we use "=" in R, we are actually storing the result in that variable
4.The dependent variable always comes first in the "lm" function call

"lm" means linear model and this is the function that finds the line that maximizes the distance to each point simultaneously

Which of the following statements is NOT true?

1.The size of the errors indicate how "good" the line is as a model
2.We can say that we have a good model when we have small relative errors
3.Higher correlations between X and Y generally produce smaller relative errors
4.We can always get a perfect prediction if we try harder

we can always get a perfect prediction if we try harder

Which of the following statements is NOT correct?

1.Ordinary least squares does not require any distributional assumptions
2.For a null hypothesis of "the coefficient is 0", R has intuition about the alternative hypothesis
3.The null hypothesis is always "this coefficient is equal to 0" for each test of slope or intercept
4.For the hypothesis test of the significance of each coefficient, we need a hypothesis, a test statistic, and the distribution of that test statistic when the null hypothesis is true

For a null hypothesis of "the coefficient is 0", R has intuition about the alternative hypothesis

Which of the following statements is NOT correct?

1.When we choose nominal coding, we need to assign one of the choices to be the reference
2.When considering gender, male has to be 0 and female has to be 1
3.When considering gender, we can choose either male or female as the reference case
4.When we choose ordinal coding, outcomes must have an inherent order that we can assign numerical values to

When considering gender, male has to be 0 and female has to be 1

Which of the following statements is NOT correct?

1.Sum of squared errors are the errors in the model equation squared and summed up across each data point
2.R-squared is a measure of the model fit for all kinds of models
3.Finding the line of best fit is the process of minimizing the error sum of squares
4.R-squared is equal to 1 minus the sum of squares for error divided by the total sum of squares

R-squared is a measure of the model fit for all kinds of models

Which of the following statements is NOT true?

1.When we interpret an intercept, we plug in zero for all of the X variables
2.Least squares estimation is the process by which we reduce the errors as much as possible
3.The phrase "on average" is not always used when interpreting model coefficients
4.R does not require distributional assumptions to find the line of best fit

The phrase "on average" is not always used when interpreting model coefficients

Which of the following statements is true about multicollinearity?

1.Multicollinearity is a strong correlation between predictor variables (X) and the dependent variable (Y) in a model
2.Perfect multicollinearity means two variables have a correlation of 0
3.Two variables are considered independent if their correlation is 1
4.Multicollinearity is a strong correlation between predictors (X variables) in a model

Multicollinearity is a strong correlation between predictors (X variables) in a model

Which of the following statements is NOT true about multicollinearity?

1.We cannot use a model that contains multicollinearity
2.When there is a strong correlation between two predictors (X variables) in a model, multicollinearity exists
3.If two variables have a correlation of 1 or -1, we have perfect multicollinearity
4.If two variables have a correlation of 0, they are independent

we cannot use a model that contains multicollinearity

Which of the following is correct about Assumption 1 for linear regression?

1.We can include predictor variables that are not relevant to the dependent variable in our model as long as the coefficients of other variables are significant
2.We can leave out relevant predictor variables as long as the coefficients of other variables are significant
3.This assumption doesn't actually matter
4.We need a good, linear model

we need a good linear model

Which of the following statements is correct about Assumption 2 in linear regression?

1.When we spot perfect multicollinearity in our model, we should remove one of the two predictor variables involved
2.When we spot perfect multicollinearity in our model, we should remove both of the two predictor variables involved
3.If the correlation of predictor variable and dependent variable is 1 or -1, we say there exists a perfect multicollinearity and it violates our assumption 2.
4.R will issue a warning message if our model contains multicollinearity

When we spot perfect multicollinearity in our model, we should remove one of the two predictor variables involved

Which of the following statements is correct about Assumption 3?

1.When we use ACF method, significant correlations beyond lag=0 are ok
2.When we use the residual plot method, patterns in the residuals are ok
3.It's okay for us to have some observations of the dependent variable (Y) be related with others
4.We must use our intuition and two other tests for this assumption

We must use our intuition and two other tests for this assumption

Which of the following statements is correct about Assumption 4?

1.Homoskedasticity means we have equal variances in the observations of the response
2.Heteroskedasticity means we have equal variances in the observations of the response
3.If a model is better at predicting salaries of people with college degrees than people with high school diplomas, we say the model is homoskedastic
4.Homoskedasticity means we have unequal variances in the observations of the response

Homoskedasticity means we have equal variances in the observations of the response

Which of the following statements is true about Assumption 5?

1.Distributional assumptions are required for calculating the coefficients in a model
2.Q-Q plots are not useful for testing this assumption
3.Normally distributed errors are required for a linear regression model
4.There are only three ways to test for normally distributed errors

Normally distributed errors are required for a linear regression model

Which of the following statements is true about prediction?

1.Higher R squared is always associated with good prediction
2.If our goal is to interpret coefficients, we will likely use the model that yields the smallest errors out-of-sample
3.Sometimes we choose a model based on best prediction accuracy and not on R squared
4.If our goal is predicting with accuracy, we will likely use R squared to select a model

Sometimes we choose a model based on best prediction accuracy and not on R squared

Which of the following statements is correct about cross-validation?

1.Cross-validation is a standard practice for validating a predictive model's performance
2.Cross-validation involves swapping datasets with a friend
3.It is a way of testing your model on data it has seen before
4.There is only one way to do cross-validation

Cross-validation is a standard practice for validating a predictive model's performance

Which of the following statements is correct?

1.In-sample errors are the errors that come from predicting results of observations the model has not yet seen
2.The training set serves as a "pseudo-reality" where we can test our model's performance
3.In-sample errors are the errors that come from predicting the values used to create the model
4.The test set is the "protective bubble" where we create the model

In-sample errors are the errors that come from predicting the values used to create the model

Which of the following statements is true?

1.We should use linear regression to predict continuous outcomes and logistic regression to predict binary outcomes
2.Logistic regression cannot be adapted to predict more than two outcomes
3.For a logistic regression, we must have a normally distributed response, which means that our errors are normally distributed
4.For a linear regression, we don't necessarily need the errors to be normally distributed

We should use linear regression to predict continuous outcomes and logistic regression to predict binary outcomes

Which of the following statement is NOT correct about logistic regression?

1.The probabilities of success and failure adds up to 1 in a binary logistic regression model
2.The sum of the probabilities of success and failure could be smaller than 1 in a binary logistic regression model
3.The left-hand side of the logistic model is log(p/(1-p)), the log of the odds ratio
4.The right-hand side of the logistic model is the same as in linear regression

The sum of the probabilities of success and failure could be smaller than 1 in a binary logistic regression model

Which of the following statement is NOT true about odds ratios?

1.The odds ratio is the probability of success over the probability of failure
2.The odds ratio is the probability of failure divided by the probability of success
3.The odds ratio is different from the probability of success
4.If the odds ratio goes down, it means that the probability of success goes down and the probability of failure goes up

The odds ratio is the probability of failure divided by the probability of success

Which of the following statements is NOT true about logarithms?

1.If the odds ratio is greater than 1, the log odds ratio is positive
2.When the odds ratio equals 1, the probability of success equals to 0.5
3.If the probability of success is smaller than the probability of failure, the log odds ratio is negative
4.We can take the log of any number

we can take the log of any number

Which of the following statements is true?

1."glm" means "generalized linear model" which is exactly the same as logistic regression
2.Adding family="binomial" to the "glm" function is optional when running a logistic regression model in R
3.The "glm" function doesn't require any assumptions
4.Maximum Likelihood Estimation chooses is the estimation method by which we find the intercept and slope for a logistic regression model

Maximum Likelihood Estimation chooses is the estimation method by which we find the intercept and slope for a logistic regression model

Which of the following statements is NOT correct?

1.For a logistic regression model, we can see how the predictors might change the odds of an outcome
2.If a coefficient is insignificant in our logistic regression model, we should remove it from our model
3.In a logistic regression hypothesis test, we are testing to see whether each intercept and slope is equal to 0, essentially whether that coefficient is significantly contributing to the model
4.For a logistic regression model, we can see how the predictors change the outcome directly

For a logistic regression model, we can see how the predictors change the outcome directly

Which of the following statements is correct?

1.When a log odds ratio is greater than 0, it means that the odds ratio is greater than 1
2.When a log odds ratio is greater than 0, it means that the probability of success is smaller than the probability of failure
3.When a log odds ratio is greater than 0, it means that the odds ratio is smaller than 1
4.When we have a negative log odds ratio, it means that the probability of success is greater than the probability of failure

When a log odds ratio is greater than 0, it means that the odds ratio is greater than 1

Which of the following statements is true about the logistic regression rule of thumb?

1.If the log odds ratio is 0, it means that the probability of success is 0.5
2.If the log odds ratio is 0, it means that the probability of success is close to 1
3.If the log odds ratio is -3 or smaller, the probability of success is very close to 1
4.If the log odds ratio is 3 or greater, the probability of success is close to 0

If the log odds ratio is 0, it means that the probability of success is 0.5

Which of the following statements is NOT correct about interpreting the coefficients of logistic regression?

1.If a variable has a significant positive coefficient, we can say that the log odds ratio increases by the amount of the coefficient, on average, when the variable increases by 1 unit
2.Increasing the log odds ratio means increasing the odds ratio, which means we are increasing the probability of success
3.Decreasing the log odds ratio means decreasing the odds ratio, which means we are decreasing the probability of success
4.If a variable has a significant positive coefficient, we can say that the probability of success increases by the amount of the coefficient as the variable increases by 1 unit

If a variable has a significant positive coefficient, we can say that the probability of success increases by the amount of the coefficient as the variable increases by 1 unit

Which of the following statements is correct about the donations example?

1.It may be okay to have a model disagree with our intuition, because our intuition is sometimes be flawed
2.If a model disagrees with our intuition, we need to double check the data set, otherwise, the model is useless
3.In this donation example, we use three independent variables
4.If a model disagrees with our intuition, we should assume our intuition is wrong

It may be okay to have a model disagree with our intuition, because our intuition is sometimes be flawed

Which of the following statements is correct about complete information?

1.Complete information is not necessary if there is multicollinearity in the model
2.If we do not have complete information, R will give us an error message automatically
3.As long as we have all values of at least one predictor, we have complete information
4.If we do not have complete information, our model is likely to make poor predictions

If we do not have complete information, our model is likely to make poor predictions

Which of the following statement is NOT correct?

1.If we have no complete separation, then we cannot draw a vertical line that separate all the zeros and ones
2.Complete separation occurs when one or more variables classifies the observations into successes and failures perfectly
3.If we have complete separation, we can separate all the zeros and ones with a single vertical line in a scatterplot of X versus Y
4.No complete separation means that one or more variables classifies the observations into successes and failures perfectly

No complete separation means that one or more variables classifies the observations into successes and failures perfectly

Which of the following statement is NOT true about sample size?

1.Having a large sample size allows us to assume our response has a normal distribution
2.Having a large sample size helps to ensure that we have complete information requirement.
3.Having a large sample size helps us to guarantee a stable solution
4.Having a large sample size allows us to assume our coefficients have a normal distribution

Having a large sample size allows us to assume our response has a normal distribution

Which of the following statements is true?

1.For a logistic regression model, R-squared represents the proportion of variation in the response that is explained by the predictor variables
2.Bigger AIC values indicate model improvement
3.The model fit statistics in the R output for logistic regressions are generated during maximum likelihood estimation process
4.We will rely on AIC in this class for determining model fit

The model fit statistics in the R output for logistic regressions are generated during maximum likelihood estimation process

Which of the following statements is true?

1.We may have a probability of success that is greater than 1
2.Things that make "k" (the entire right hand side) more positive will decrease the likelihood of success
3.Higher income that increases "k" (the entire right hand side) will make it less likely that we receive a donation
4.Things that make "k" (the entire right hand side) more positive will increase the likelihood of success

Things that make "k" (the entire right hand side) more positive will increase the likelihood of success

Which of the following statements is NOT correct?

1.We can use different classification rules for different logistic regression models
2.Classification is helpful for prediction
3.A classification rule helps us move from probabilities to predictions
4.If p > .5 in any modeling situation, we should classify the prediction as "success"

If p > .5 in any modeling situation, we should classify the prediction as "success"

Which of the following statements is NOT true about confusion matrices?

1.We can only create a confusion matrix for our training set, we cannot create a confusion matrix for our test set
2.We can create a confusion matrix for both training and test sets
3.A confusion matrix summarizes how often the model's predictions are correct versus incorrect
4.To create a confusion matrix, we must have both predicted and actual values stored as zeros and ones instead of probabilities

We can only create a confusion matrix for our training set, we cannot create a confusion matrix for our test set

Which of the following statements is correct?

1.Stability relates to how often we make errors and what kind of errors we make
2.A good model (of any type) will have both accuracy and stability
3.Accuracy relates to whether the prediction performance in the training set and test set are similar
4.We only require stability and accuracy for linear regression models, not logistic regression models

A good model (of any type) will have both accuracy and stability

Which of the following statements is correct about overfitting?

1.Tackling an overfitting problem can help improve a model's stability
2.Overfitting can only affect linear and logistic models, not other types of models
3.High R-squared always indicates an overfitting problem
4.To address overfitting, we need to add more predictor variables to our model

Tackling an overfitting problem can help improve a model's stability

Which of the following statements is true?

1.When we add a new variable to a linear regression, as long as the R-squared increases by at least a tiny decimal, we should keep that variable
2.We use variable selection techniques to help remove unnecessary variables in our model and reduce overfitting
3.Backward stepwise regression is a technique where we start with no variables and we choose the next best addition at each step
4.Forward stepwise regression is a technique where you start with all the variables in the model and slowly remove one variable at a time, stopping whenever you have removed all variables that are not contributing substantially

We use variable selection techniques to help remove unnecessary variables in our model and reduce overfitting

Which of the following statements is NOT correct?

1.We always remove variables that contribute less than 5% to the model's R-squared
2.At each step in backward stepwise selection, we consider removing the variable that contributes the least to the model
3.For backward stepwise selection, the first step is to run the model with all the variables and check the R-squared
4.For backward stepwise selection, the second step is to try removing each variable in the model, one at a time, and recording the R-squared value each time a variable is removed

We always remove variables that contribute less than 5% to the model's R-squared

Which of the following statements is NOT true?

1.Even after the application of a variable selection technique, underfitting is still a possibility
2.Always keep the model as complicated as possible, keeping all variables with significant coefficients
3.If you have a good performance in the training set but poor performance in the test set, you may have a problem with overfitting
4.Even after the application of a variable selection technique, overfitting is still a possibility

Always keep the model as complicated as possible, keeping all variables with significant coefficients

Which of the following statements is correct?

1.Outliers are always observations that don't make sense in the context of the problem
2.We can identify outliers visually by looking at a histogram or scatter plot for both Y variables and X variables
3.Outliers only refer to the observations that are very far from the center or from the other observations
4.Even though outliers are unusual, we can still make good predictions for them as long as we create a good model

We can identify outliers visually by looking at a histogram or scatter plot for both Y variables and X variables

Which of the following statements is NOT true about outliers?

1.Using a scatter plot to visualize the relationship between distance and fare would not help us to identify outliers
2.Extreme values of fare like 300 dollars could be considered outliers and we can consider removing them
3.If distance was zero but fare was very high, we could consider removing these because they don't make sense in the context of this problem
4.Extreme values of distance like 70 miles could be considered outliers and we can consider removing them

Using a scatter plot to visualize the relationship between distance and fare would not help us to identify outliers

Which of the following statements is NOT true?

1.When we edit an imported dataset in R, we are only editing its copy inside R, NOT the original file
2.There are clear, straightforward rules about which outliers to remove and how to do it
3.Trimming is the process of removing a top and bottom percentage of extreme values in X and Y variables
4.Different statisticians may have different rules for identifying outliers, even for the same data set

There are clear, straightforward rules about which outliers to remove and how to do it

Which of the following statements is true about interactions?

1.Interpreting interactions is always very straightforward
2.Interaction is the same as multicollinearity
3.Interactions exist when X variables affect each other's influence on Y
4.Interactions are correlations among X variables

Interactions exist when X variables affect each other's influence on Y

Which of the following statement is NOT true?

1.Interaction terms are always significant
2.We can create an interaction by multiplying two of our predictor variables and adding the result to our model
3.Adding an interaction term to a model can improve prediction accuracy
4.If we add an interaction term, we should also add its components to the model to determine whether they are significant contributors separately and together

Interaction terms are always significant.

Which of the following statements is NOT true?

1.An interaction term can contain any combination of discrete and continuous variables
2.When we are making decisions about whether to keep an interaction term in our model, we can use changes in R-squared
3.When we are making decisions about whether to keep an interaction term in our model, we can use changes in prediction accuracy
4.In order to form an interaction term, one of the variable has to be binary and the other has to be continuous

In order to form an interaction term, one of the variable has to be binary and the other has to be continuous

Which of the following statements is true about decision trees?

1.A parent node can have more than 2 children
2.A decision tree is a list of rules for systematic decision making
3.A parent node cannot have 0 children
4.Decision trees require more assumptions than linear or logistic regression

A decision tree is a list of rules for systematic decision making

Which of the following statements is true?

1.The algorithm finds a single variable and multiple values in that variable that best divide the observations at each step
2.Every time the algorithm splits the data in a node, it considers all variables at all possible values for its split location
3.The algorithm stops after it divides the data 10 times
4.The algorithm finds multiple variables and a single value in each variable that best divide the observations at each step

Every time the algorithm splits the data in a node, it considers all variables at all possible values for its split location

Which of the following statements is correct?

1.Using recursive partitioning, we are dividing the data into smaller sets, with members in the same set being wildly different
2.Partitioning the data means dividing the data into overlapping sets
3.Using recursive partitioning, we are dividing the data into smaller sets, each with members sharing similar characteristics
4.To create a tree, we use either the "lm" or "glm" functions in R

Using recursive partitioning, we are dividing the data into smaller sets, each with members sharing similar characteristics

Which of the following statements is true?

1.For a classification tree we use "method=anova" and for a regression tree we use "method=class"
2.ANOVA means analysis of covariance
Correct!
3.ANOVA means analysis of variance
4.Trees for continuous outcomes are called classification trees, and trees for binary outcomes are called regression trees

ANOVA means analysis of variance

Which of the following statements is correct?

1.Decision trees are always better because they don't require assumptions
2.Decision trees are often more stable than linear or logistic regression
3.For decision trees, overfitting is a common problem
4.Decision trees should never be used because they don't have coefficients to interpret

for decision trees, overfitting is a common problem

comp sci

70 terms

alethea_straker

SQL Second Semester

83 terms

Harami1

20.1.1 Primitive Types Quiz

25 terms

persaud_2021

13 Quiz

16 terms

Sabrina188

Sets found in the same folder

MISY262 Final Exam

89 terms

jackiebattistaa

MISY Final

58 terms

eprosen91

Other sets by this creator

finc314

63 terms

camy_hintonPlus

University of Delaware MISY 160 Exam 3 Study Guide

285 terms

camy_hintonPlus

misy tia vocab words

245 terms

camy_hintonPlus

Geology Quizzes III

76 terms

camy_hintonPlus

Verified questions

algebra

Calculate the length of time for each flight. $$ \begin{aligned} \begin{array}{lccc} \textbf{Departure} &&&& \textbf{Arrival}\\ 7:50\ \text{a} &&&& 5:00\ \text{p} \end{array} \end{aligned} $$

Verified answer

politics of the united states

Which of the following best describes Congress's use of the commerce clause over time? (A) Congress has used it to protect workers and the environment. (B) The Supreme Court has denied Congress much of its commercial regulation authority. (C) Congress can legislate only on products that involve interstate commerce. (D) Congress has used its commerce power sparingly and there are few federal commercial laws.

Verified answer

politics of the united states

Do you think that U.S. law should be changed so that defendants are required to testify in criminal cases? Explain.

Verified answer

finance

Return on assets and return on equity are examples of which type of ratio?

Verified answer

Century 21 Accounting: General Journal

11th EditionClaudia Bienias Gilbertson, Debra Gentene, Mark W Lehman

1,012 solutions

Introductory Business Statistics

1st EditionAlexander Holmes, Barbara Illowsky, Susan Dean

2,174 solutions

Fundamentals of Engineering Economic Analysis

1st EditionDavid Besanko, Mark Shanley, Scott Schaefer

215 solutions

Principles of Economics

7th EditionN. Gregory Mankiw

1,394 solutions

Other Quizlet sets

MGT 3100 Exam 2 multiple choice

15 terms

sleopol

Topic 2: Correlation and SLR

45 terms

winnhaley

ISU Stat 301 Statistics Exam 1 Study Guide

21 terms

jrsandholm

Stats final

15 terms

emery_lowden

Which of the following statement is true about a variable?

A variable cannot be subtracted. A variable represents a quantity. so X is representing a quantity which is 1. So the statement is TRUE.

What is not true about variables in statistics?

What is NOT true about variables? the type of variable does not determine the types of statistical analysis.

Which of the following two statements are true about variables?

Which of the following two statements are true about variables? Variables will be ignored by compiler. The value assigned to a variable may never change. They allow code to be edited more efficiently.