Package com.leumanuel.woozydata.service
Class DataStatisticsService
java.lang.Object
com.leumanuel.woozydata.service.DataStatisticsService
Service class providing statistical analysis and data transformation operations for DataFrames.
Includes methods for descriptive statistics, statistical tests, data reshaping, and aggregations.
- Version:
- 1.0
- Author:
- Leu A. Manuel
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionApplies a function to a column of the DataFrame.int[]binomialDist(int trials, double prob, int size) Generates binomial distribution samples.doublecalculateCovariance(DataFrame df, String col1, String col2) Calculates covariance between two numeric columns.calculateFrequency(DataFrame df, String column) Calculates frequency distribution of values in a column.doublecalculateIQR(DataFrame df, String column) Calculates Interquartile Range (IQR) of a numeric column.doublecalculateKurtosis(DataFrame df, String column) Calculates kurtosis of a numeric column.doublecalculateQuantile(DataFrame df, String column, double q) Calculates specified quantile of a numeric column.doublecalculateSkewness(DataFrame df, String column) Calculates skewness of a numeric column.Generates comprehensive descriptive statistics for numeric columns.double[]getColumnValues(DataFrame df, String column) Gets numeric values from a specified column.Groups DataFrame by specified columns and calculates aggregate statistics.Joins two DataFrames based on matching column values.mannWhitney(DataFrame df, String col1, String col2) Performs Mann-Whitney U test between two columns.Unpivots DataFrame from wide to long format.Creates a pivot table from the DataFrame data.Sorts DataFrame by specified columns in ascending order.Sorts DataFrame by specified columns.Transforms values in a specified column using a provided function.
-
Constructor Details
-
DataStatisticsService
public DataStatisticsService()
-
-
Method Details
-
apply
Applies a function to a column of the DataFrame.- Parameters:
df- DataFrame to analyzecolumn- Name of the column to apply function tofunc- Function to apply to the column values- Returns:
- New DataFrame containing the results of the function application
-
pivot
Creates a pivot table from the DataFrame data.- Parameters:
df- Source DataFrameindex- Column to use as indexcolumns- Column to use for new columnsvalues- Column containing values to aggregateaggFunc- Aggregation function to apply ("sum", "mean", "min", "max")- Returns:
- New DataFrame containing the pivot table
-
transform
Transforms values in a specified column using a provided function.- Parameters:
df- Source DataFramecolumn- Column to transformfunc- Function to apply to each value- Returns:
- New DataFrame with transformed values
-
join
Joins two DataFrames based on matching column values.- Parameters:
left- Left DataFrameright- Right DataFrameleftCols- Columns from left DataFrame to join onrightCols- Columns from right DataFrame to join on- Returns:
- New DataFrame containing joined data
-
melt
Unpivots DataFrame from wide to long format.- Parameters:
df- Source DataFrameidVars- Columns to keep as identifiersvalueVars- Columns to unpivot into rows- Returns:
- Melted DataFrame in long format
-
sort
Sorts DataFrame by specified columns.- Parameters:
df- Source DataFramecolumns- Columns to sort byascending- Array indicating sort direction for each column- Returns:
- Sorted DataFrame
-
describe
Generates comprehensive descriptive statistics for numeric columns.- Parameters:
df- Source DataFrame- Returns:
- Map of column names to their statistical measures
-
calculateSkewness
Calculates skewness of a numeric column.- Parameters:
df- Source DataFramecolumn- Column to analyze- Returns:
- Skewness value
- Throws:
IllegalArgumentException- if column is not numeric
-
calculateKurtosis
Calculates kurtosis of a numeric column.- Parameters:
df- Source DataFramecolumn- Column to analyze- Returns:
- Kurtosis value
- Throws:
IllegalArgumentException- if column is not numeric
-
calculateCovariance
Calculates covariance between two numeric columns.- Parameters:
df- Source DataFramecol1- First column namecol2- Second column name- Returns:
- Covariance value
- Throws:
IllegalArgumentException- if either column is not numeric
-
calculateQuantile
Calculates specified quantile of a numeric column.- Parameters:
df- Source DataFramecolumn- Column to analyzeq- Quantile value (0 to 1)- Returns:
- Quantile value
- Throws:
IllegalArgumentException- if column is not numeric or q is invalid
-
calculateIQR
Calculates Interquartile Range (IQR) of a numeric column.- Parameters:
df- Source DataFramecolumn- Column to analyze- Returns:
- IQR value
- Throws:
IllegalArgumentException- if column is not numeric
-
calculateFrequency
Calculates frequency distribution of values in a column.- Parameters:
df- Source DataFramecolumn- Column to analyze- Returns:
- Map of values to their frequencies
-
getColumnValues
Gets numeric values from a specified column.- Parameters:
df- Source DataFramecolumn- Column name- Returns:
- Array of numeric values
- Throws:
IllegalArgumentException- if column is not numeric
-
binomialDist
public int[] binomialDist(int trials, double prob, int size) Generates binomial distribution samples.- Parameters:
trials- Number of trialsprob- Success probabilitysize- Number of samples to generate- Returns:
- Array of binomial distribution samples
- Throws:
IllegalArgumentException- if parameters are invalid
-
mannWhitney
Performs Mann-Whitney U test between two columns.- Parameters:
df- Source DataFramecol1- First column namecol2- Second column name- Returns:
- Map containing test statistic and p-value
- Throws:
IllegalArgumentException- if either column is not numeric
-
groupBy
Groups DataFrame by specified columns and calculates aggregate statistics.- Parameters:
df- Source DataFramecolumns- Columns to group by- Returns:
- DataFrame containing grouped statistics
-
sort
Sorts DataFrame by specified columns in ascending order.- Parameters:
df- Source DataFramecolumns- Columns to sort by- Returns:
- Sorted DataFrame
-