SQL Join: Compile Statistics Summary from Multiple Statistics Tables

In the previous blog, I presented how to compile statistic results by groups and by variable into a long table for each type of statistics analysis.   All tables share three common field: group, test and var.

Here is the list of the table with the specific statistic that want to extract and compile into a summary table.

  • stat_tstat:
    • class = ‘1’: n, mean
    • class = ‘0’: n, mean
    • class = ‘Diff (1-2)’: mean,
  • stat_ttest:
    • variances = ‘Unequal’: tValue, Probt

PPOC SQL is the most convenient procedure to pick up these values under specific condition and combine them into one table.

proc sql;
create table stat1 as
select a.n as studyn, a.mean as studymean, b.n as controln, b.mean as controlmean, c.mean as diff, a.group, a.var, d.tValue, d.probt
from stat_tstat a
left join stat_tstat b
on a.group =b.group and a.var=b.var and b.class = '1'
left join stat_tstat c
on a.group =c.group and a.var=c.var and c.class = 'Diff (1-2)'
left join stat_ttest d
on a.group = d.group and a.var=d.var and d.variances = 'Unequal'
where a.class = '0';
quit;

SAS: Proc SQL Where … In to subset

SQL where … in can be used to subsetting dataset which meet contain condition and the condition is in a separate dataset.

Table ONE is a large dataset and contains variable ID and other related information. Table TWO contains unique ID that of interest.  Table THREE is the subset of table ONE which match the ID value with table TWO.

proc sql;
create table THREE as
select *
from ONE
where ID in (select ID from TWO);

 

Proc sql to get fiscal year using put function

Dataset only has calendar year in numeric format.  The following sql statement calculates the fiscal start year and fiscal end year based on the calendar month and year.Case statement requires the results of the ‘when’ clause is the same as the ‘then’ clause, therefor the put function can’t be used to convert bud_year to a character value.
proc sql;
create table step1 as
select A.*, B.proposed_amount, B.awarded_amount, B.period, B.start_dt as bud_start_dt, B.end_dt as bud_end_dt, B.bud_month, B.bud_year,
case
when  B.bud_month>=5 then B.bud_year
when  B.bud_month<5 then B.bud_year-1
else .
end as bud_fiscalyr_start,
case
when  B.bud_month>=5 then B.bud_year+1
when  B.bud_month<5 then B.bud_year
else .
end as bud_fiscalyr_end
from projectlist as A
left join project_budget as B
on B.project_id = A.project_id
;
quit;

step1

bud_month     bud_year      bud_fiscalyr_start        bud_fiscalyr_end

4                          2014               2013                                     2014

5                           2014               2014                                    2015
proc sql;
create table step2 (drop=bud_fiscalyr_start bud_fiscalyr_end) as
select *,
case
when put(bud_fiscalyr_start,4.)=’   .’ then ”
when put(bud_fiscalyr_start,4.) ne ‘   .’ then put(bud_fiscalyr_start,4.)||”-“||put(bud_fiscalyr_end,4.)
end as bud_fiscal_year
from step1
;
quit;

step2

bud_month     bud_year      bud_fiscal_year

4                          2014               2013-2014

5                           2014               2014-2015

.                           .

 

SAS SQL: Merge dataset with multiple conditions

Assume all the macro variables are defined. There are two lists A and B. A is list of year n students and B is a list student id for year n+1 students in the same program.  Step 1 is to create table that includes all the students with study level 1 or 2 in A but not in B (1 year later).  This step identifies the student population of study level 1 and 2 who are lost to the program one year later. Notice A and B are used as table aliases to simplify the statement. Also the syntax for left join needs to be followed by  the on statement to specify the column field to join the two table.

proc sql;
create table nov&yr.not as
select A.* from nov&yr. as A
left join nov&nextyr.id as B
on A.sisid = B.id
where B.id is null and A.std_lvl in (’01’,’02’)
;
quit;

Second step is to identify the students who changed their major to Psychology from results of step 1.  Notice I omit As when I assign the aliases.
Proc sql;
create table nov&yr.chg as
select A.*, B.sisall, B.newm1 from nov&yr.not A, nov&nextyr.all B
where A.sisid = B.sisall
and A.major1 ne B.newm1
and B.newm1 eq ‘PSYC’
;
quit;

Heuristic Andrew

Good-enough solutions for an imperfect world

Social Media Oreo

Social Media Oreo

Real Data

Adventures in Data Science

A. C. Cockerill

Past to Future in a Picosecond

Social Mathematics

The interaction between Mathematics and the modern day world.

PH Bytes

Code is the New Literacy

Data Science Insights

In Pursuit of Smarter Decisions for Performance