Regression analysis entails the use of statistics to investigate the relationship between a dependent variable, i.e., the variable of interest, and one or more independent variables, also referred to as explanatory variables.1 It is important to note that standard regression techniques cannot conclude that a causal relationship between the dependent and independent variable exists, but merely that a correlation exists between the changes in the variables; however, more advanced regression techniques possess the capability to show more causal relationships.2 A valuation analyst may employ regression analyses to attain a better understanding of the relationships between variables, which may have an impact on the consultant’s opinion of value. However, regression analyses take many forms, and carelessly employing an incorrect model, would, in all likelihood, render the resulting estimates, and the conclusions drawn from them, inaccurate, thereby impacting the efficacy and validity of the valuation analysis. This fifth installment of the six-part Health Capital Topics series on statistical methods utilized in valuation reports will discuss, at a high level, how regression analysis can inform healthcare valuations, and the considerations analysts must keep in mind when developing and interpreting the estimates of regression models.
Suppose a valuation analyst wanted to estimate the extent to which the number of health insurance providers in a city was correlated with the average reimbursement rates for specialty service visits (e.g., dermatology visits) during 2015. The analyst can conduct a regression analysis by gathering cross sectional data (i.e., data across space) on the number of health insurance providers and average specialty service reimbursement rates for numerous metropolitan areas during 2015, and estimate the correlation (if any) between the included variables.3 In contrast, suppose the valuation analyst was interested in whether this relationship changed over time within a single metropolitan area over a specified period of time, rather than investigating if this relationship exists throughout all metropolitan areas. This could be accomplished by performing a regression analysis using techniques developed to analyze times series data, which may provide insight into the evolution of reimbursement rates over time in the given geographic area.4 Additionally, the analysis could be extended to estimate the relationship across multiple cities, and over multiple years, in what is termed panel data analysis (i.e., across time and space), which combines elements of both cross sectional and time series data.5 Each of these types of data have their specific limitations, and require a specific set of skills, consider different parameters, and require interpretation specific to the methodology applied when running the regression. The scope of this article will, however, limit its focus to the most common forms of regression analysis.
Regression models are sufficiently flexible to allow for many functional forms. Linear regression, the most basic of these forms,6 assumes that the relationship between the independent and the dependent variables is constant.7 For instance, suppose that, using linear regression, a valuation analyst determined the elasticities between their independent variables and dependent variable, and estimated that a ten percent increase in healthcare providers in a city correlates with a 15 percent increase in annual, per capita physician visits for that area. A linear regression would assume that the magnitude of this effect would hold regardless of the level of providers, i.e., whether a city has ten healthcare providers, or if the city has 200 healthcare providers. Although estimates obtained in this matter could be accurate for these variables, it is also possible that the relationship may not follow a linear functional form. Perhaps when a city begins with ten healthcare providers, an increase of ten percent would lead to a more significant rise in annual, per capita physician visits than if the city begins with 200 providers. In this case, the relationship may be better explained by a functional form that incorporates this decreasing marginal impact. The valuation analyst should always consider what functional form the data may take in order to estimate the most accurate relationship. Often, the data itself will suggest the appropriate functional form. A scatterplot of the independent variable against the dependent variable will often indicate whether a linear relationship exists between the variables, or whether an alternative functional form should be considered. Alternatively, a theoretical non-quantitative understanding of the relationship between the included variables may also suggest a preferred functional form for the regression analysis.
Depending on the available data and the goal of the analysis, the valuation analyst could use a number of different estimation techniques to conduct a regression analysis. One of the most common regression estimation techniques is called Ordinary Least Squares (OLS), a technique that calculates the best-fit relationship within the data by minimizing the sum of the squared differences between the predicted values and the actual values of the dependent variable (i.e., the residuals).8 OLS is the most frequently used regression estimation technique for several important reasons, including: (1) it is relatively easy to implement and understand; (2) the goal of minimizing the sum of the squared residuals is reasonable from a theoretical point of view; and, (3) OLS estimates, under ideal circumstances (described in the following paragraph), have the most desirable statistical properties.9
In order for an OLS regression model to produce meaningful and reliable results, the seven classical linear regression model (CLRM) assumptions must hold true, as described by the Gauss-Markov theorem developed by Carl Friedrich Gauss and Andrey Markov.10 If only some of the CLRM assumptions hold true, then the OLS results may no longer be the “best” linear unbiased estimators, and the valuation analyst may consider alternative estimation methods with properties more suited to the data being analyzed. For instance, in cases where an analyst finds that the random error term (a term required in every regression equation) follows a trend, in violation of the Gauss-Markov theorem which requires that the error terms be uncorrelated and have a constant variance, the analyst might choose to use a more advanced technique to potentially rectify this problem and allow for a more accurate interpretation of the regression results.11
Even if an analyst uses the appropriate regression technique for their data, the regression analysis may still have several limitations. For instance, regression models can only estimate the relationship between two variables, it cannot determine the true relationship.12 This is a vital distinction, because the true relationship can only be estimated by having complete information and perfect foresight, which, unfortunately, is never available in a real world analysis.13 Analysts must, instead, settle for a best estimate of the true relationship given the limited data available, which allows for uncertainty, and possibly error, in the results of the regression analysis.14
Additionally, regression models may not capture all of the variability of a dependent variable, which produces the potential for error in the estimates through misspecification and bias.15 For example, if an analyst performs a regression analysis to determine if an increase in the number of cardiovascular surgeons in an area leads to a rise in the number of cardiovascular surgeries performed on that population, many other independent variables, beyond the growth in the number of cardiovascular surgeons, such as the percent of the population with health insurance, may also impact the resulting value of the dependent variable. Omitting relevant variables may introduce bias into the regression, creating a gap between the true relationship and the regression estimate.16 Therefore, a valuation analyst should consider all relevant factors that can affect the value of a dependent variable when creating a regression model, in order to minimize the bias in the regression analysis estimates.
It is important to keep in mind that like all statistics, regression analysis produces probabilities, not certainties, and analysts must exercise caution when interpreting results. One of the many criteria an analyst may use to interpret regression results is the calculated statistic referred to as R-squared which measures the extent to which the variations in the dependent variable around its mean are explained by the model.17 Higher values of R-squared generally indicate that the regression model is a better fit for the data, and accounts for more of the variation in the dependent variable.18 An analyst can use additional statistical tests, such as a t-test or an F-test,19 to determine the statistical significance of an estimated relationship or model. For example, suppose a regression estimate indicates that a ten percent decrease in the percent of the population covered by health insurance leads to a 20 percent decrease in revenue for the average healthcare provider in that area. Although this relationship might seem noteworthy, if the valuation analyst performs a t-test on this regression estimate and finds the estimate is not statistically significant from zero, then even though the response seems strong, it should not be relied upon on its own. Therefore, the data indicates that it is possible that no relationship exists between the two variables, and the resulting response is simply an artifact of the sampling process.
Analysts can use regression analyses to estimate relationships between variables, which may inform and strengthen a valuation analysis, allowing for the quantification of a relationship. Although regression is a powerful tool, the analyst should consider the limitations of regression analyses and the various metrics upon which regression estimates can be compared and interpreted. The type of regression that the analyst should use varies based on the scope of the project, as well as the data available. Therefore, a valuation analyst using regression analysis to conduct a healthcare, or other, valuation should always be cognizant of the potential pitfalls with regression analyses when developing a model, and should interpret the estimates with caution.