## 5.1 Introduction

The REGRESS module fits linear, logistic, Poisson, polytomous and proportional hazards regression models. All the keywords for the **DESCRIBE** models are also applicable here. This chapter provides additional commands relevant for performing a regression analysis. One main difference is that **DESCRIBE** uses the Taylor Series Linearization method for variance estimation but **REGRESS** uses the Jackknife Repeated Replication technique to estimate design-based variances (Kish and Frankel 1974).

## 5.2 REGRESS Statements

### 5.2.1 Models

**DEPENDENT variable name;**

This statement specifies the name of the dependent variable in the regression model. Dependent variables are assumed to be continuous unless the CATEGORICAL keyword is included as described below.

**PREDICTOR variable list;**

This specifies the right hand side of the regression model. Predictor variables are assumed to be continuous unless they are defined as CATEGORICAL as described below. Interaction terms can be specified by using the '*' notation. For example,

**PREDICTOR Income Age Income*Age;**

**LINK model;**

**LINK** defines the type of regression model to be fit. Specify Linear for fitting a multiple linear regression model, ** Logistic** for fitting a logistic (binary) or generalized logistic (polytomous) regression model,

**for fitting a Poisson regression model for a count variable,**

*Log***for fitting a tobit model or**

*Tobit***for fitting Proportional Hazards model (Cox model).**

*Phreg***CENSOR variable name (number);**

** variable name** is the censoring variable, and

**is the code indicating censoring. If the number is omitted then, by default, 1 will be considered as the code indicating censored observation. The Censor statement is required if the**

*number***LINK**is specified as

**. For example,**

*Phreg***LINK Phreg;**

**DEPENDENT**

*Survivaltime*;**CENSOR**

*Died (0)*;In this example, the outcome variable is ** Survivaltime** and the censoring variable is

**where**

*Died***denotes censored observations.**

*Died=0***CATEGORICAL variable list;**

declares that the listed variables are to be treated as categorical. If a variable with k categories is listed on the CATEGORICAL and PREDICTOR statement then k-1 predictors (dummies) will be included in the regression model. The category with the highest code value will be the reference category. For logistic and multinomial logit models, the dependent variable must also be listed in the variable list.

**OFFSETS count-variable(offset-variable);**

This statement is used to specify an offsets variable when fitting a Poisson regression model. For example,

**OFFSETS Injuries(Years);**

will fit a model predicting the number for injuries occurring per year.

**ID variable name;**

Specifies the variable to be used as the unique subject identifier. This allows for linking the PREDOUT file (see below) created by the REGRESS module to other files.

**NOINTER;**

This keyword will fit regression models without the intercept term.

**ESTIMATES label: specification;**

This is useful for estimating values of the dependent variable for a specific set of covariates or testing hypotheses involving the estimated regression coefficients. For example, suppose that the following regression model is fit:

*Y* = *b*_{0} + *b*_{1}*x*1 + *b*_{2}*x*2 + *b*_{3}*x*3

and we are interested in predicting Y for *x*1 = 1, *x*2 = 2 and *x*3 = 0. We can obtain the predicted value and the 95% confidence interval by using the following statement:

**ESTIMATES Mylabel : Intercept (1) x1(1) x2(2);**

Several estimates can be requested by separating them with the symbol: '/ ' .

### 5.2.2 Output files

The **REGRESS** module can be used to produce several plots and outputs for later processing. The following are the descriptions of these features.

**PLOT filename;**

This keyword creates a series of diagnostic plots including residual, leverage, influence and normal probability plots. The plots will be stored in the ** filename** specified after the

**PLOT**keyword. The user can rely on the built-in graphics produced internally or use GNU Plot by downloading this package and including the path in the XML settings file, see Chapter 9 for examples.

**PREDOUT filename;**

outputs a file containing the predicted values, their standard errors and 95% confidence intervals. If an ID statement is included in the setup, an ID variable is also included in the data set.

**ESTOUT filename;**

Outputs a file containing estimates and their variances-covariances.

**REPOUT filename;**

Outputs a file containing estimates for each replicate. Estimated regression coefficients are provided for each combination of STRATUM, CLUSTER and BY variable.

### 5.2.3 Design Variables

The design features can be specified using the commands **STRATUM**, **CLUSTER**, and **WEIGHT** as illustrated in the **DESCRIBE** chapter.

- If the STRATUM, CLUSTER and WEIGHT variable are not specified, then a simple random sample analysis will be performed.
- If a design based analysis involves only a WEIGHT variable and no STRATUM or CLUSTER variable, then a pseudo-stratification variable and a pseudo-cluster variable should be used. When using pseudo variables, all observations in the data set should have the same value for the pseudo STRATUM variable (e.g., 1), while each observation should have a unique value on the pseudo CLUSTER variable (e.g., observation ID number or SAS system variable N ). The pseudo variables should be created in the data prior to performing the analysis. Example SAS data step code for creating a pseudo STRATUM variable and a pseudo CLUSTER variable:

Note that the inclusion of pseudo variables will increase the time REGRESS needs for analysis.

LIBNAME MYLIB C:\MYINDIR;

DATA MYLIB.MYDATA;

SET MYLIB.MYDATA;

PSEUD_STRAT=1;

PSEUD_CLUST=_N_;

RUN;

**TITLE text \n text;**

Indicates the title(s) to be printed at the top of each page of the printout. A \n indicates that the text that follows should be printed on the next line. For example,

**TITLE This is the title on the first line \n This is the title on the second line;**