This Python script contains three functions (`rank_corr_grid`, `anomly_cal`, and `averaged_rank_corr`)...
This Python script contains three functions (rank_corr_grid
, anomly_cal
, and averaged_rank_corr
) that perform data analysis tasks such as ranking, anomaly calculation, and correlation analysis, primarily involving data grouped by spatial grids and time. Here's the breakdown:
1. Function: rank_corr_grid(data, verifier, subject, name)
This function calculates rank correlation between two variables (verifier
and subject
) for specific geographical grid cells, considering only summer months. Here's what it does:
- Prepares data columns: The
Date
column is converted to a datetime object, and new columns forMonth
andYear
are derived. - Filters summer months: The function filters the data to only include entries for June, July, and August (
Month
values 6, 7, 8). - Ranks data values: The function computes the rank of the
verifier
andsubject
values by grid (PageName
) and month across years using therankdata
method. - Computes correlation: For each grid cell (
PageName
), it calculates Pearson correlation between the ranks ofverifier
andsubject
using the helper functiongrid_correlation
. - Saves results: The results, including grid cell and correlation value (
R
), are saved to a CSV file with a naming convention based on the input arguments.
2. Function: anomly_cal(data, verifier, subject)
This function calculates anomalies for two variables (verifier
and subject
). Anomalies quantify the deviation of a value from its historical average. Here's what the function does:
- Prepares and cleans data: It converts the
Date
to datetime format, splits it intoMonth
andYear
, and drops missing values. - Calculates monthly averages:
- Computes the average values by year, month, and grid cell.
- Computes the historical average values by month and grid cell.
- Calculates anomalies: For each data point, it calculates the deviation (anomaly) of
verifier
andsubject
from the historical monthly averages.
3. Function: averaged_rank_corr(data_t, verifier, subject)
This function computes the Spearman rank correlation between the anomalies of two variables (verifier
and subject
). Here's what it does:
- Aggregates anomalies: Calculates the monthly and yearly average anomalies for
verifier
andsubject
. - Computes correlation: Uses the
spearmanr
function from Scipy to find the Spearman rank correlation between the average anomalies of the two variables.
Summary of Workflow:
- Use
rank_corr_grid
for ranking and finding correlation over spatial grids in summer months. - Use
anomly_cal
to generate anomaly data by comparing values to historical averages. - Use
averaged_rank_corr
to compute the correlation between anomalies of two variables.
Each function serves a specific purpose in spatial-temporal analysis and provides intermediate or final correlation metrics.