This code snippet performs the following steps: 1. **Read recipe inputs**:...
This code snippet performs the following steps:
-
Read recipe inputs: It starts by loading a dataset named
"rds_pd_fi_insurers"
from Dataiku into a PySpark DataFrame (df
) using thedkuspark.get_dataframe
function. It takes thesqlContext
object and the Dataiku dataset as arguments. -
Compute recipe outputs from inputs:
- It defines a dictionary (
renaming_dict
) that maps specific column names in the dataset to their new names. - Using a
for
loop, it iterates through the keys (representing old column names) in the dictionary and uses PySpark'swithColumnRenamed
method to rename each column in the DataFrame according to the mapping. Specifically:agg_parent_company_regulatory_code
is renamed toagg_parent_company_ctr_code
.rd_ins_life_investedassetsqlty
is renamed tord_ins_life_investassetsqlty
.rd_ins_nonlife_investedassetsqlty
is renamed tord_ins_nonlife_investassetsqlty
.rd_ins_investedassetsqlty
is renamed tord_ins_investassetsqlty
.insurers_subcluster_award_criteria
is renamed toinsurers_subclusterawardcriteria
.
- It defines a dictionary (
-
Write recipe outputs: It writes the resulting DataFrame (
df
) back to a Dataiku dataset named"shared_rds_pd_fi_insurers"
using thedkuspark.write_with_schema
function, preserving the schema compatibility.
Summary
The code reads a Dataiku dataset into a PySpark DataFrame, renames specific columns using a predefined mapping, and then saves the updated DataFrame back into a new or existing Dataiku dataset. It's part of a Dataiku workflow for transforming and processing datasets.