User Defined Functions
Collation Functionalities
A Set of Function(s) to Collate/Aggregate based on Logic
A simple example of aggregation is the statistical measures like
mean, median, etc. Additional in-frequently but
popular aggregation functions are defined here for end-users.
- pandaswizard.functions.collate.weightedMA(initial: float, rate: float | Callable, length: int, decay: bool = True) ndarray
Collate a Series based on Weighted Moving Average (WMA) Method
WMA is a variant of SMA/EMA, and is popularly used in financial analysis, which gives more weightage to the recent data and produces a smoother line (sometimes) giving a more accurate picture of the underlying data trend.
- Parameters:
initial (float) – The initial weighteage of the value, typically a value of
0.5is a good starting point.rate (float, callable) – The rate at which subsequent values are increasing or decreasing. Typically, a value of
2(i.e., at each subsequent level the impact is halved - “half life decay”) is a good starting point. The rate can either be a numeric value, i.e., each subsequent values is calculated asn_1 = n_0 / rateor can be a callable, i.e., each value is calculated liken_1 = rate(n_0)thus allow more control and dynamic approach.length (int) – Length of the window, this enables a quick summarization of the final outcome using x * weightedMA(), where
xis also a n-dimensionalnumpyarray.decay (bool) – When true (default) the returned array will be reveresed, i.e., it will give more priority to the recent data points (where the
xis sorted in ascending order), else typically returns a “growth” array where more weightage is given to the data which is older.
Statistical Functionalities
A Set of Statistical Function(s) for pd.DataFrame Object
Statistics is the backbons for data anlytics and a collation of some important regularly used statistical methods are defined here. Check function documentation for more information.
- pandaswizard.functions.statistics.quantileOutliers(array: ndarray) ndarray
Outliers are extreme values that deviate from other observations on data, they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.
Outliers can be of two kinds: (I) univariate - typically found using looking at the distribution of a single feature, and (II) multivariate - determind by looking at the distributions of the n-dimensional features.
A quick measure to identify outlier for an univariate series is by using the IQR value (as in box-plot) which states that any value in range \([(Q1 - 1.5 * IQR), (Q3 + 1.5 * IQR)]\) is not an outlier.
- pandaswizard.functions.statistics.zscoreOutliers(array: ndarray, thresh: float = 2.0) ndarray
Outliers are extreme values that deviate from other observations on data, they may indicate a variability in a measurement, experimental errors or a novelty. In other words, an outlier is an observation that diverges from an overall pattern on a sample.
Outliers can be of two kinds: (I) univariate - typically found using looking at the distribution of a single feature, and (II) multivariate - determind by looking at the distributions of the n-dimensional features.
The Z-Score is a statistical value that describes the data points and establishes an relationship around the feature mean.