This SAS code performs data preprocessing by filtering and cleaning...

September 2, 2025 at 07:45 PM


proc sort data=dataframe_sorted;
	by Oblgr_Nbr bi_dt;
run;

/*extract first two records per obligor */
data first_two;
	set dataframe_sorted;
	by Oblgr_Nbr;

	retain first_val first_date second_val second_date count;
	format first_date second_date date9.;

	if first.Oblgr_Nbr then do;
		count = 1;
		first_val = Default_Ind_OblgrAbsorb;
		first_date = bi_dt;
		second_val= .;
		second_date = .;
	end;
	else if count=1 then do;
		count = 2;
		second_val = Default_Ind_OblgrAbsorb;
		second_date = bi_dt;
	end;

	if last.Oblgr_Nbr then output;

	keep Oblgr_Nbr first_val first_date second_val second_date;
run;

data drop_ids;
	set first_two;
	if first_val=0 and second_val=1
		and intck('qtr',first_date,second_date) = 1 then drop_flag=1;
	keep  blgr_Nbr drop_flag;
run;

proc sql;
	create table cleaned_data as
	select a.*
	from dataframe_sorted as a
	left join drop_ids  as b
		on a.Oblgr_Nbr = b.Oblgr_Nbr
	where coalesce(b.drop_flag,0) =0;
quit;

This SAS code performs data preprocessing by filtering and cleaning records from a dataset. Here's what each step does:

1. `proc sort` Step:

proc sort data=dataframe_sorted;
	by Oblgr_Nbr bi_dt;
run;

Sorts the dataset dataframe_sorted by Oblgr_Nbr (obligor number) and bi_dt (possibly a date field). Sorting is required to properly process records by groups and within groups by date.

2. Extract First Two Records per Group:

data first_two;
	set dataframe_sorted;
	by Oblgr_Nbr;

	retain first_val first_date second_val second_date count;
	format first_date second_date date9.;

	if first.Oblgr_Nbr then do;
		count = 1;
		first_val = Default_Ind_OblgrAbsorb;
		first_date = bi_dt;
		second_val= .;
		second_date = .;
	end;
	else if count=1 then do;
		count = 2;
		second_val = Default_Ind_OblgrAbsorb;
		second_date = bi_dt;
	end;

	if last.Oblgr_Nbr then output;

	keep Oblgr_Nbr first_val first_date second_val second_date;
run;

Extracts the first two records for each Oblgr_Nbr (group of obligors):
1. if first.Oblgr_Nbr ensures that operations reset when a new obligor group starts (first. variable is automatically created when using by statements).
2. retain is used to persist values across rows for the same group.
3. For the first record in the group:
  - first_val and first_date store the values of Default_Ind_OblgrAbsorb and bi_dt.
4. When count=1, i.e., the second record:
  - second_val and second_date store the second record's values.
5. if last.Oblgr_Nbr ensures that the data for the group is output only after processing the last row of the group (to ensure both first and second records are captured).
6. The resulting dataset (first_two) contains:
  - Oblgr_Nbr
  - The first and second values (first_val, second_val)
  - The corresponding dates (first_date, second_date).

3. Identify Records to Drop:

data drop_ids;
	set first_two;
	if first_val=0 and second_val=1
		and intck('qtr',first_date,second_date) = 1 then drop_flag=1;
	keep  blgr_Nbr drop_flag;
run;

Flags records for dropping based on specific conditions:
1. Keeps only records where the first value (first_val) is 0 and the second value (second_val) is 1.
2. Uses the intck function (interval check) to calculate the number of quarters (qtr) between first_date and second_date. Filters records where the difference is exactly 1 quarter.
3. Adds a drop_flag variable (set to 1) for such records and keeps Oblgr_Nbr and drop_flag in the output dataset (drop_ids).

4. Filter Original Dataset:

proc sql;
	create table cleaned_data as
	select a.*
	from dataframe_sorted as a
	left join drop_ids  as b
		on a.Oblgr_Nbr = b.Oblgr_Nbr
	where coalesce(b.drop_flag,0) =0;
quit;

Creates a cleaned version of the original dataset (dataframe_sorted) by removing flagged records:
1. Performs a LEFT JOIN between dataframe_sorted (aliased as a) and drop_ids (aliased as b) on Oblgr_Nbr.
2. Uses coalesce(b.drop_flag, 0) to treat missing values of drop_flag as 0, i.e., records without a flag in drop_ids are retained.
3. Keeps only rows where b.drop_flag equals 0 (unflagged records).
4. The resulting dataset is stored as cleaned_data.

Summary:

This code:
1. Sorts the input data by obligor and date.
2. Extracts the first two records (and related fields) for each Oblgr_Nbr.
3. Flags obligors who meet specific criteria: first_val=0, second_val=1, and exactly one quarter (3 months) between the two dates.
4. Removes these flagged records from the original dataset to create a cleaned dataset (cleaned_data).

Generate your own explanations

Download our vscode extension

Read other generated explanations

Built by @thebuilderjr

This SAS code performs data preprocessing by filtering and cleaning...

1. proc sort Step:

2. Extract First Two Records per Group:

3. Identify Records to Drop:

4. Filter Original Dataset:

Summary:

1. `proc sort` Step: