User Defined Functions

Collation Functionalities

A Set of Function(s) to Collate/Aggregate based on Logic

A simple example of aggregation is the statistical measures like mean, median, etc. Additional in-frequently but popular aggregation functions are defined here for end-users.

pandaswizard.functions.collate.weightedMA(initial: float, rate: float | Callable, length: int, decay: bool = True) → ndarray

Collate a Series based on Weighted Moving Average (WMA) Method

WMA is a variant of SMA/EMA, and is popularly used in financial analysis, which gives more weightage to the recent data and produces a smoother line (sometimes) giving a more accurate picture of the underlying data trend.

Parameters:

initial (float) – The initial weighteage of the value, typically a value of 0.5 is a good starting point.
rate (float, callable) – The rate at which subsequent values are increasing or decreasing. Typically, a value of 2 (i.e., at each subsequent level the impact is halved - “half life decay”) is a good starting point. The rate can either be a numeric value, i.e., each subsequent values is calculated as n_1 = n_0 / rate or can be a callable, i.e., each value is calculated like n_1 = rate(n_0) thus allow more control and dynamic approach.
length (int) – Length of the window, this enables a quick summarization of the final outcome using x * weightedMA(), where x is also a n-dimensional numpy array.
decay (bool) – When true (default) the returned array will be reveresed, i.e., it will give more priority to the recent data points (where the x is sorted in ascending order), else typically returns a “growth” array where more weightage is given to the data which is older.

Statistical Functionalities

A Set of Statistical Function(s) for pd.DataFrame Object

Statistics is the backbons for data anlytics and a collation of some important regularly used statistical methods are defined here. Check function documentation for more information.

pandaswizard.functions.statistics.quantileOutliers(array: ndarray) → ndarray

Outliers are extreme values that deviate from other observations on data, they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.

Outliers can be of two kinds: (I) univariate - typically found using looking at the distribution of a single feature, and (II) multivariate - determind by looking at the distributions of the n-dimensional features.

A quick measure to identify outlier for an univariate series is by using the IQR value (as in box-plot) which states that any value in range \([(Q1 - 1.5 * IQR), (Q3 + 1.5 * IQR)]\) is not an outlier.

pandaswizard.functions.statistics.zscoreOutliers(array: ndarray, thresh: float = 2.0) → ndarray

Outliers are extreme values that deviate from other observations on data, they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.

Outliers can be of two kinds: (I) univariate - typically found using looking at the distribution of a single feature, and (II) multivariate - determind by looking at the distributions of the n-dimensional features.

The Z-Score is a statistical value that describes the data points and establishes an relationship around the feature mean.