SAS: Proc Tabulate and Percentage Statistics

The following proc tabulate procedure will generate at a report on sum of undergraduate teaching and graduate teaching and FTE for each Faculty by departments (row header) and by year (column header).

proc tabulate data=data;
class year faculty dept;
var undergrad_load graduate_load fte;
table faculty, (dept all), year*(undergrad_load="UG" graduate_load="GR" fte)*(sum="");
run;

Data table contains grade, program, and incoming year of students. “Total” variable equals to 1 if the student has a grade; “ge75” is 1 if the grade is great than and equal to 75; and “ge80” is 1 if the grade is great than and equal to 80.

The following proc tabulate procedure gives a summary table of total number of counts of students with grade, students with grade great than and equal to 75 and 80.

proc tabulate missing f=6. noseps data=data;
class program progname category year level2;
var total ge75 ge80;
tables category,program*progname*level2,year*(total ge75 ge80) / rts=40 indent=3;
run;

PERCENTAGE STATISTICS

  • reppctn: report percentage (all rows and all columns add up to 100)
  • colpctn: column percentage (every column adds up to 100)
  • rowpctn: row percentage (every row adds up to 100)
  • A*pctsum: construct a custom denomination, both A and B are analysis variables, and B as in the is the denominator for calculating the percentage.

The following tabulate procedure calculates the percentage of greater and equal to 75 and 80. The outomatic SAS naming with ‘1111111’ of the percentage output reflects the number of class variables in the procedure.

proc tabulate missing f=6. noseps data=data out=dataout (drop=_type_ _table_ _page_ rename=(ge75_pctsum_1111111=ge75 ge80_pctsum_1111111=ge80));
class level1 faculty program progname category level2 year;
var total ge75 ge80;
tables category*level1*faculty*program*progname*level2*year, ge75*pctsum='ge75%'*f=8.2 ge80*pctsum='ge80%'*f=8.2 / rts=40 indent=3;
run;

Examples: with multiple class variables in row expression

  • Reppctn/Reppctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*reppctsum="" );
run;
  • Colpctn/Colpctsum — best for % sum for multiple class variables
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*colpctsum="" );
run;
  • Rowpctn/Rowpctsum
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all)*(heads="Heads"*SUM="" heads="% Total"*rowpctsum="" );
run;
  • pctsum – not working for the All column
proc tabulate data = table1 ;
class Agegroup gender acadyear;
var heads ;
table agegroup="Age Group" all gender="Gender" all ,
(acadyear="" all="All Year" )*(heads="Heads"*SUM="" heads="% Total"*pctsum < agegroup all gender all > ="" );
run;

Show % sign in the table

Reference: http://support.sas.com/kb/38/001.htmlhttp://support.sas.com/kb/38/001.html

proc format;                           
   picture fmtpct (round) low-high='009.99%';   
run;
/* use: f=fmtpct. */

SAS: Proc Report Design feature with Percentage and Sub-total

There are two procedures commonly used for reporting purpose in SAS, PROC Report and PROC Tabulate.  In this post, I will focus more of PROC Report because it is more convenient to produce column percentage by group and the sub-total.  The references of column in the computation of percentages are different between a cross-tab report and the regular one dimensional report.

1. Cross-tab report design

Source table layout (Varn will be any other categorical variable that not contribute to the calculation of the percentage)

Variables Type
Year Ordinal
Var1 Categorical/Nominal
Var2 Ordinal
Var3 Continous

Desired Report Output

  Year
Year1 Year2 Year3
Var1 Varn Var2 sum of Var3 % of sum by Var1 sum of Var3 % of sum by Var1 sum of Var3 % of sum by Var1
xxxxx xxxxx xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
sub-total     xxx 100% xxx 100% xxx 100%
xxxxx xxxxx xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
sub-total     xxx 100% xxx 100% xxx 100%
xxxxx xxxxx xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
    xxxxx xxx xx% xxx xx% xxx xx%
sub-total     xxx 100% xxx 100% xxx 100%

Code

Proc report data = sample nowd ;
column var1 varn var2 year,(var3 ptvar3);
define var1 / group;
define varn / group;
define var2 /group;
define  year / across 'Year';
define var3 / sum f=7.2 'SUM';
define ptvar3 /computed f=percent8.2 '%';
/* Sum total var3 by year */
Break after var1  / summarize;
compute before var1;
den0 = _c4_;
den1 = _c6_;
den2 = _c8_;
endcomp;
/* Calculate percentage */
compute ptvar3;
_c5_ = _c4_ / den0;
_c7_ = _c6_ / den1;
_c9_ = _c8_ / den2;
endcomp;
run;

2. One dimensional report

Source table layout

Variables Type
Var1 Categorical/Nominal
Var2 Ordinal
Var3 Continous

Desired Report Output

Var1 Varn Var2 sum of Var3 % of sum by Var1
xxxxx xxxxx xxxxx xxx xx%
    xxxxx xxx xx%
    xxxxx xxx xx%
sub-total     xxx 100%
xxxxx xxxxx xxxxx xxx xx%
    xxxxx xxx xx%
    xxxxx xxx xx%
sub-total     xxx 100%
xxxxx xxxxx xxxxx xxx xx%
    xxxxx xxx xx%
    xxxxx xxx xx%
sub-total     xxx 100%

Code

Proc report data = sample  nowd ;
column var1 varn var2 var3 ptvar3;
define var1 / group;
define varn / group;
define var2 /group;
define var3 / sum f=7.2 'Var3';
define ptvar3 /computed f=percent8.2 '%';
Break after var1  / summarize;
compute before var1;
var3den = var3.sum;
endcomp;
/* Calculate percentage */
compute ptvar3;
ptvar3 = var3.sum / var3den;
endcomp;
run;

Note:

  • for cross-tab report, in column statement use “year,”. There is a ‘,’ after the variable.
  • if there is more than one variable in the value section of the cross-tab report, use ‘( )’ to include all the value variables needed in the column statement. eg. ‘year,  (var3 ptvar3)’
  • use break statement to provide sub-totals for the report;
  • for computing the percentage by group, denX variables are used to provide sum value of the group and _cX_ variables are used to hold all results for the column value. The column number X in _cX_ needs to be matched with the exact column number in the output. 
  • for computing percentage for one dimensional report, refer to the variable name directly (eg. ptvar3 = var3.sum/ var3den) in the formula. Refer to the column number will not work in this case.

reference:

http://support.sas.com/kb/49/390.html

http://support.sas.com/kb/43/091.html

Heuristic Andrew

Good-enough solutions for an imperfect world

Social Media Oreo

Social Media Oreo

Real Data

Adventures in Data Science

A. C. Cockerill

Past to Future in a Picosecond

Social Mathematics

The interaction between Mathematics and the modern day world.

PH Bytes

Code is the New Literacy

Data Science Insights

In Pursuit of Smarter Decisions for Performance