Chapter 4: DESCRIBE

4.1 Introduction

4.2 DESCRIBE Statements

4.2.1 Required or Standard Statements

4.2.2 Design Variables

4.2.3 Analysis Statements

4.2.4 Missing Data Handling

4.2.5 Other Commands

4.1 Introduction

The DESCRIBE module estimates population means, proportions, subgroup differences, and contrasts and linear combinations of means and proportions. A Taylor Series Linearization approach is used to obtain variance estimates for data derived from complex sample designs. Multiple imputation analysis can be performed using the DESCRIBE module.

4.2 DESCRIBE Statements

4.2.1 Required or Standard Statements

DATAIN filename;

This keyword identifies the location and name of the data set to be analyzed. See Section 1.4.1 for more information about specifying a filename using the libname statement in SAS or changing the working directory to match the location of the data set. To perform multiple imputation analysis, more than one SAS data set can follow the DATAIN keyword in the DESCRIBE module. When multiple data sets are specified, each is analyzed separately and the inferences-estimates and variances-are combined using the usual multiple imputation combining rules.

RUN;

This should be the last statement in the setup file.

4.2.2 Design Variables

The commands described in this section are relevant only for data from complex sample surveys with stratification, clustering or weighting.

STRATUM variable name;

variable name is the name of the stratum variable for the data from a complex survey. No missing values are allowed for this variable. If the statement is missing then the sample is assumed to be non-stratified.

CLUSTER variable name;

variable name is the Primary Sampling Unit (PSU) or Sampling Error Computing Unit (SECU) variable for the data from a complex sample survey. No missing values are allowed for the cluster variable. If this statement is missing then the sample is assumed to be un-clustered.

WEIGHT variable name;

variable name is the survey weight variable. Survey weights are usually the product of selection, non-response adjustments and post-stratification weights. No missing values are allowed for the weight variable. If this statement is not included then the sample is assumed to be self-weighted.

MODEL method;

MODEL indicates the variance estimation method to be used. Mult (Default) is useful when there are multiple PSUs within a stratum, Pair employs the paired selection method, and Diff employs the successive differences method. You can specify different methods for each stratum. For example,

MODEL Pair(15,16,17) Diff(20,21,27);

will use paired differences for strata 15, 16, 17, the successive differences for strata 20, 21,27, and Mult for the rest.

4.2.3 Analysis Statements

TABLE variable list;

This command will produce the weighted proportions and their standard errors for all levels of a variable(s) in the variable list. Some examples are given below.

TABLE Race;

for the marginal distribution of the variable Race. Cross-tabulations may be indicated with an asterisk, for example:

TABLE Race*Gender;

MEAN variable list;

Means, standard errors, and design effects are calculated for the list of variables listed under variable list. For example,

MEAN BMI Age;

will compute the means of BMI and Age.

BY list;

The BY keyword is used in conjunction with the TABLE or MEAN keyword. The analyses will be performed for each level of the variable(s) specified in the BY statement. For instance,

TABLE Race;

BY Gender;

will produce the weighted proportion of each Race category for each of the two levels of Gender. If variable Agecat is age in 3 categories then

TABLE Race;

BY Gender Agecat;

will produce weighted proportions of each Race category for each of the six combinations of Gender and Agecat.

CONTRAST specifications;;

CONTRAST is used in conjunction with the MEAN keyword to compare or estimate linear combinations of cell means (continuous) or proportions (binary variables). For example,

MEAN Income;

CONTRAST Race;

will produce all the pairwise comparisons of mean Income defined by Race. If Race has three categories then three pairwise comparisons will be produced. Another example is,

MEAN Income;

CONTRAST Race*Gender;

will produce comparisons of Income means for all combinations of Race and Gender.

Linear combinations of means can be estimated using the contrast features. Consider,

MEAN Income;

CONTRAST (0.5 0.5 -1);

will produce the estimate of the contrast (Î¼1 + Î¼2)/2 Î¼3 of the means for three categories of Race. (If Race has more than three levels then the above statement will produce an error message). The statement,

CONTRAST (0.333333 0.333333 0.333333);

will produce an (approximate, due to rounding) estimate of the mean (Î¼1 + Î¼2 + Î¼3)/3. Note that this is not technically a contrast. The contrast in IVEware can be viewed as a combination of contrast and estimate features in SAS, for example. You can also specify complicated statements such as

MEAN Income;

CONTRAST Race(-1 0 1)*Gender(-1 1);

for contrasting the race differences for one gender group with the race differences for the other. The contrast features can be useful in testing the significance of some pre-planned contrasts in an ANOVA setting.

4.2.4 Missing Data Handling

There are four possible options for handling missing data. Analyze previously multiply imputed data sets, perform multiple imputation analysis concurrently just for variables in the analysis, skip subjects with missing values (that is, perform available case analysis) or stop the analysis and exit.

For previously multiply imputed data sets, list all the data sets in the DATAIN statement and omit the MDATA command. Other options with only one data set in the DATAIN statement are given below.

MDATA instruction;

The keyword instruction options are (STOP/IMPUTE/SKIP). If MDATA is not included in your setup, cases with missing data will be excluded from your analysis. This is equivalent to using the SKIP instruction. When the instruction is STOP, the DESCRIBE module stops if missing data are encountered in any analysis variables. If the keyword is IMPUTE then the missing data will be imputed. All of the IMPUTE keywords can be used to specify the models for imputation process.

4.2.5 Other Commands

NOBS number;

NOBS indicates the number of observations to be used in the analysis. By default, all observations in the data set will be used. Specification of NOBS to subset a large data set might be useful while testing the setup file.

PRINT instruction;

Indicates the printout desired. The options are STANDARD (default) and DETAILS. When a DESCRIBE procedure includes the IMPUTE missing-data option (see MDATA above) the DETAILS keyword instructs IVEware to print the number and distribution of observed values, imputed values, and combined observed and imputed values for each variable. If the DESCRIBE procedure includes multiple imputations, the DETAILS keyword instructs IVEware to print estimates and statistics for each imputed data set as well as combined estimates and statistics across the imputed data sets. The standard DESCRIBE printout does not include imputation results.

TITLE text \n text;

Indicates the title(s) to be printed at the top of each page of the printout. A \n indicates that the text that follows should be printed on the next line. For example,

TITLE This is the title on the first line \n This is the title on the second line;

4.1 Introduction

4.2 DESCRIBE Statements

4.2.1 Required or Standard Statements

4.2.2 Design Variables

4.2.3 Analysis Statements

4.2.4 Missing Data Handling

4.2.5 Other Commands

IVEware User Guide

IVEware