SAS: Join or Merge Tables

Note: Be careful to use merge if the the byvar are not unique in both tables, especially when the vars are numeric and can be aggregated .

Compare of SQL method and merge method

Inner Join

SELECT var
FROM table1 as a
INNER JOIN table2 as b
ON a.var=b.var;

merge table1 (in=a) table2 (in=b);
by var;
if a and b;

Left Join

SELECT var
FROM table1 as a
LEFT JOIN as b
ON a.var=b.var; 

merge table1 (in=a) table2;
by var;
if a;

Left Join exclude Inner Join

SELECT var
FROM table1 as a
LEFT JOIN table as b
ON a.var=b.var;
WHERE b.var IS NULL;

merge table1 (in = a) table2 (in=b);
by var;
if a and not b;

Right Join

SELECT var
FROM table1 as a
RIGHT JOIN table2 as b
ON a.var=b.var;

merge table1 (in=a) table2 (in=b);
by var;
if b;

Right Join exclude Inner Join

SELECT var
FROM table1 as a
RIGHT JOIN table as b
ON a.var=b.var

WHERE a.var IS NULL;

merge table1 (in = a) table2 (in=b);
by var;
if b and not a;

Full Join

SELECT var
FROM table1 as a
FULL OUTER JOIN table2 as b
ON a.var=b.var;

merge table1 (in=a) table2 (in=b);
by var;
if a or b;

or

merge table1 (in = a) table2 (in=b);
by var;

Full Join Inner Join

SELECT var
FROM table1 as a
FULL OUTER JOIN table2 as b
ON a.var=b.var;

merge table1 (in=a) table2 (in=b);

by var;
if a ne b; /* only work on 2 table merge*/

Merge data with Data step

Because usually the information for the subject of interest are scattered in different datasets, it is common to join the tables to gather all the variables of the subject.  There needs to be a common id field with the same data format in all the datasets that need to be merged.

Both dataset A and B have a variable id. Merge statement join the variables from A and B by the id variable in A.  All the records and fields from A will be in the merged table and those records with equal id field from B will be added to the records in the merged table.

Caution:

  1. Make sure id in A is unique
  2. id field in both A and B should be the same type of field (character or numeric)
  3. After merging the table, need to check if there is any duplicated information and why.

data merged;

merge A (in =a) B;

by id;

if a;

run;

 

SAS: data extraction and merge routine (1)

/*select data from raw data set*/

data x;

set y;

where var1 = ‘xxx’ and var2 = ‘yyy’;

run;

/*check frequency on variables of interest*/

proc freq;

tables var1 var2;

run;

/*merge to get additional variables needed for analysis*/

proc sort data = x;

by var3;

proc sort data =z;

by var3;

data xz;

merge x (in =a) z;

by var3;

if a;

run;

 

Heuristic Andrew

Good-enough solutions for an imperfect world

Social Media Oreo

Social Media Oreo

Real Data

Adventures in Data Science

A. C. Cockerill

Past to Future in a Picosecond

Social Mathematics

The interaction between Mathematics and the modern day world.

PH Bytes

Code is the New Literacy

Data Science Insights

In Pursuit of Smarter Decisions for Performance