Lesson 1: Simple Linear Regression STAT 501

In all likelihood there will be more than one independent variable that causes the change in the amount of the dependent variable. The multiple independent variables along with the dependent variable for each observation can be entered into multiple regression software. We can also discuss this in the form of a graph and here is a sample simple linear regression model graph. Thus, in this whole blog, you will get to learn so many new things about simple linear regression in detail. The use of regression for parametric inference assumes that the errors (ε) are (1) independent of each other and (2) normally distributed with the same variance for each level of the independent variable. The errors (residuals) are greater for higher values of x than for lower values.

  • If your data violate the assumption of independence of observations (e.g., if observations are repeated over time), you may be able to perform a linear mixed-effects model that accounts for the additional structure in the data.
  • To understand the relationship between these two predictor variables and the probability of an email being spam, researchers can perform logistic regression.
  • The linear regression model fits a straight line into the summarized data to establish the relationship between two variables.

It ranks as one of the most important tools used in these disciplines. Some of the more common estimation techniques for linear regression are summarized below. It is possible that the unique effect can be nearly zero even when the marginal effect is large. This may imply that some other covariate captures all the information in xj, so that once that variable is in the model, there is no contribution of xj to the variation in y.

Heteroscedastic models

Essentially, the CAPM equation is a model that determines the relationship between the expected return of an asset and the market risk premium. The slope is negative because the line slants down from left to right, as it must for two variables that are negatively correlated, reflecting that one variable decreases as the other increases. When the correlation is positive, β 1is positive, and the line slants up from left to right. Where ŷ is the predicted value of the response variable, b0 is the y-intercept, b1 is the regression coefficient, and x is the value of the predictor variable.

Pvalue of t-test for input variable is less than 0.05, so there is a good relationship between the input and the output variable. F-statistic is a high number and p(F-statistic) is almost 0, which means our model is better than the only intercept model. Well, now we know how to draw important inferences from the model summary table, so now let’s look at our model parameters and evaluate our model. From the above graphical representations, we can say there are no outliers in our data, and YearsExperience looks like normally distributed, and Salary doesn’t look normal.

For each of these deterministic relationships, the equation exactly describes the relationship between the two variables. Instead, we are interested in statistical relationships, in which the relationship between the variables is not perfect. An R2 between 0 and 1 indicates just how well the response variable can be explained by the predictor variable.


The coefficient β2 would represent the average change in points scored when weekly weightlifting sessions is increased by one, assuming the number of weekly yoga sessions remains unchanged. The coefficient β1 would represent the average change in points scored when weekly yoga sessions is increased by one, assuming companies using xero and its marketshare the number of weekly weightlifting sessions remains unchanged. This was a simple linear regression example for a positive relationship in business. A regression model can be used when the dependent variable is quantitative, except in the case of logistic regression, where the dependent variable is binary.

Introduction to Simple Linear Regression

Linear regression is used in a wide variety of real-life situations across many different types of industries. Fortunately, statistical software makes it easy to perform linear regression. Businesses often use linear regression to understand the relationship between advertising spending and revenue.

Early evidence relating tobacco smoking to mortality and morbidity came from observational studies employing regression analysis. In order to reduce spurious correlations when analyzing observational data, researchers usually include several variables in their regression models in addition to the variable of primary interest. However, it is never possible to include all possible confounding variables in an empirical analysis.

Note that the numbers in red are the coefficients that the analysis provided. The coefficients, residual sum of squares and the coefficient of
determination are also calculated. The above example shows how to use the Forecast function in Excel to calculate a company’s revenue, based on the number of ads it runs.

Regression Analysis – Simple Linear Regression

The mean of the sample residuals is always 0 because the regression line is always drawn such that half of the error is above it and half below it. The equations that you used to estimate the intercept and slope determine a line of “best fit” by minimizing the sum of the squared residuals. Simple linear regression is a statistical method that allows us to summarize and study relationships between two continuous (quantitative) variables. This lesson introduces the concept and basic procedures of simple linear regression.

What is Regression Analysis?

Trend lines are often used to argue that a particular action or event (such as training, or an advertising campaign) caused observed changes at a point in time. This is a simple technique, and does not require a control group, experimental design, or a sophisticated analysis technique. However, it suffers from a lack of scientific validity in cases where other potential changes can affect the data. Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables.

Types of Linear Regression

It is important to interpret the slope of the line in the context of the situation represented by the data. You should be able to write a sentence interpreting the slope in plain English. Before proceeding, we must clarify what types of relationships we won’t study in this course, namely, deterministic (or functional) relationships. One way to measure how well the least squares regression line “fits” the data is using the coefficient of determination, denoted as R2. Data scientists for professional sports teams often use linear regression to measure the effect that different training regimens have on player performance. The formula estimates that for each increase of 1 dollar in online advertising costs, the expected monthly e-commerce sales are predicted to increase by $171.5.

The coefficient β2 would represent the average change in crop yield when water is increased by one unit, assuming the amount of fertilizer remains unchanged. The coefficient β1 would represent the average change in crop yield when fertilizer is increased by one unit, assuming the amount of water remains unchanged. The coefficient β1 would represent the average change in  blood pressure when dosage is increased by one unit. Medical researchers often use linear regression to understand the relationship between drug dosage and blood pressure of patients. The slope of 171.5 shows that each increase of one unit in X, we predict the average of Y to increase by an estimated 171.5 units. The results of the model will tell the business exactly how changes in word count and country of origin affect the probability of a given email being spam.

For example, a hypothetical gene might increase mortality and also cause people to smoke more. For this reason, randomized controlled trials are often able to generate more compelling evidence of causal relationships than can be obtained using regression analyses of observational data. When controlled experiments are not feasible, variants of regression analysis such as instrumental variables regression may be used to attempt to estimate causal relationships from observational data. A fitted linear regression model can be used to identify the relationship between a single predictor variable xj and the response variable y when all the other predictor variables in the model are “held fixed”. Specifically, the interpretation of βj is the expected change in y for a one-unit change in xj when the other covariates are held fixed—that is, the expected value of the partial derivative of y with respect to xj.

About Author



Leave a Reply