java.lang.Object

com.leumanuel.woozydata.service.DataStatisticsService

public class DataStatisticsService extends Object

Service class providing statistical analysis and data transformation operations for DataFrames. Includes methods for descriptive statistics, statistical tests, data reshaping, and aggregations.

Version:: 1.0
Author:: Leu A. Manuel

Constructor Summary

Constructors

Constructor

Description

DataStatisticsService()
Method Summary

Modifier and Type

Method

Description

DataFrame

apply(DataFrame df, String column, Function<List<Object>,Object> func)

Applies a function to a column of the DataFrame.

int[]

binomialDist(int trials, double prob, int size)

Generates binomial distribution samples.

double

calculateCovariance(DataFrame df, String col1, String col2)

Calculates covariance between two numeric columns.

Map<Object,Long>

calculateFrequency(DataFrame df, String column)

Calculates frequency distribution of values in a column.

double

calculateIQR(DataFrame df, String column)

Calculates Interquartile Range (IQR) of a numeric column.

double

calculateKurtosis(DataFrame df, String column)

Calculates kurtosis of a numeric column.

double

calculateQuantile(DataFrame df, String column, double q)

Calculates specified quantile of a numeric column.

double

calculateSkewness(DataFrame df, String column)

Calculates skewness of a numeric column.

Map<String,Map<String,Double>>

describe(DataFrame df)

Generates comprehensive descriptive statistics for numeric columns.

double[]

getColumnValues(DataFrame df, String column)

Gets numeric values from a specified column.

DataFrame

groupBy(DataFrame df, String... columns)

Groups DataFrame by specified columns and calculates aggregate statistics.

DataFrame

join(DataFrame left, DataFrame right, String[] leftCols, String[] rightCols)

Joins two DataFrames based on matching column values.

Map<String,Double>

mannWhitney(DataFrame df, String col1, String col2)

Performs Mann-Whitney U test between two columns.

DataFrame

melt(DataFrame df, String[] idVars, String[] valueVars)

Unpivots DataFrame from wide to long format.

DataFrame

pivot(DataFrame df, String index, String columns, String values, String aggFunc)

Creates a pivot table from the DataFrame data.

DataFrame

sort(DataFrame df, String... columns)

Sorts DataFrame by specified columns in ascending order.

DataFrame

sort(DataFrame df, String[] columns, boolean[] ascending)

Sorts DataFrame by specified columns.

DataFrame

transform(DataFrame df, String column, Function<Object,Object> func)

Transforms values in a specified column using a provided function.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- DataStatisticsService
  
  public DataStatisticsService()
Method Details
- apply
  
  public DataFrame apply(DataFrame df, String column, Function<List<Object>,Object> func)
  
  Applies a function to a column of the DataFrame.
  
  Parameters:
  
  df - DataFrame to analyze
  
  column - Name of the column to apply function to
  
  func - Function to apply to the column values
  
  Returns:
  
  New DataFrame containing the results of the function application
- pivot
  
  public DataFrame pivot(DataFrame df, String index, String columns, String values, String aggFunc)
  
  Creates a pivot table from the DataFrame data.
  
  Parameters:
  
  df - Source DataFrame
  
  index - Column to use as index
  
  columns - Column to use for new columns
  
  values - Column containing values to aggregate
  
  aggFunc - Aggregation function to apply ("sum", "mean", "min", "max")
  
  Returns:
  
  New DataFrame containing the pivot table
- transform
  
  public DataFrame transform(DataFrame df, String column, Function<Object,Object> func)
  
  Transforms values in a specified column using a provided function.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to transform
  
  func - Function to apply to each value
  
  Returns:
  
  New DataFrame with transformed values
- join
  
  public DataFrame join(DataFrame left, DataFrame right, String[] leftCols, String[] rightCols)
  
  Joins two DataFrames based on matching column values.
  
  Parameters:
  
  left - Left DataFrame
  
  right - Right DataFrame
  
  leftCols - Columns from left DataFrame to join on
  
  rightCols - Columns from right DataFrame to join on
  
  Returns:
  
  New DataFrame containing joined data
- melt
  
  public DataFrame melt(DataFrame df, String[] idVars, String[] valueVars)
  
  Unpivots DataFrame from wide to long format.
  
  Parameters:
  
  df - Source DataFrame
  
  idVars - Columns to keep as identifiers
  
  valueVars - Columns to unpivot into rows
  
  Returns:
  
  Melted DataFrame in long format
- sort
  
  public DataFrame sort(DataFrame df, String[] columns, boolean[] ascending)
  
  Sorts DataFrame by specified columns.
  
  Parameters:
  
  df - Source DataFrame
  
  columns - Columns to sort by
  
  ascending - Array indicating sort direction for each column
  
  Returns:
  
  Sorted DataFrame
- describe
  
  public Map<String,Map<String,Double>> describe(DataFrame df)
  
  Generates comprehensive descriptive statistics for numeric columns.
  
  Parameters:
  
  df - Source DataFrame
  
  Returns:
  
  Map of column names to their statistical measures
- calculateSkewness
  
  public double calculateSkewness(DataFrame df, String column)
  
  Calculates skewness of a numeric column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to analyze
  
  Returns:
  
  Skewness value
  
  Throws:
  
  IllegalArgumentException - if column is not numeric
- calculateKurtosis
  
  public double calculateKurtosis(DataFrame df, String column)
  
  Calculates kurtosis of a numeric column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to analyze
  
  Returns:
  
  Kurtosis value
  
  Throws:
  
  IllegalArgumentException - if column is not numeric
- calculateCovariance
  
  public double calculateCovariance(DataFrame df, String col1, String col2)
  
  Calculates covariance between two numeric columns.
  
  Parameters:
  
  df - Source DataFrame
  
  col1 - First column name
  
  col2 - Second column name
  
  Returns:
  
  Covariance value
  
  Throws:
  
  IllegalArgumentException - if either column is not numeric
- calculateQuantile
  
  public double calculateQuantile(DataFrame df, String column, double q)
  
  Calculates specified quantile of a numeric column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to analyze
  
  q - Quantile value (0 to 1)
  
  Returns:
  
  Quantile value
  
  Throws:
  
  IllegalArgumentException - if column is not numeric or q is invalid
- calculateIQR
  
  public double calculateIQR(DataFrame df, String column)
  
  Calculates Interquartile Range (IQR) of a numeric column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to analyze
  
  Returns:
  
  IQR value
  
  Throws:
  
  IllegalArgumentException - if column is not numeric
- calculateFrequency
  
  public Map<Object,Long> calculateFrequency(DataFrame df, String column)
  
  Calculates frequency distribution of values in a column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column to analyze
  
  Returns:
  
  Map of values to their frequencies
- getColumnValues
  
  public double[] getColumnValues(DataFrame df, String column)
  
  Gets numeric values from a specified column.
  
  Parameters:
  
  df - Source DataFrame
  
  column - Column name
  
  Returns:
  
  Array of numeric values
  
  Throws:
  
  IllegalArgumentException - if column is not numeric
- binomialDist
  
  public int[] binomialDist(int trials, double prob, int size)
  
  Generates binomial distribution samples.
  
  Parameters:
  
  trials - Number of trials
  
  prob - Success probability
  
  size - Number of samples to generate
  
  Returns:
  
  Array of binomial distribution samples
  
  Throws:
  
  IllegalArgumentException - if parameters are invalid
- mannWhitney
  
  public Map<String,Double> mannWhitney(DataFrame df, String col1, String col2)
  
  Performs Mann-Whitney U test between two columns.
  
  Parameters:
  
  df - Source DataFrame
  
  col1 - First column name
  
  col2 - Second column name
  
  Returns:
  
  Map containing test statistic and p-value
  
  Throws:
  
  IllegalArgumentException - if either column is not numeric
- groupBy
  
  public DataFrame groupBy(DataFrame df, String... columns)
  
  Groups DataFrame by specified columns and calculates aggregate statistics.
  
  Parameters:
  
  df - Source DataFrame
  
  columns - Columns to group by
  
  Returns:
  
  DataFrame containing grouped statistics
- sort
  
  public DataFrame sort(DataFrame df, String... columns)
  
  Sorts DataFrame by specified columns in ascending order.
  
  Parameters:
  
  df - Source DataFrame
  
  columns - Columns to sort by
  
  Returns:
  
  Sorted DataFrame

Class DataStatisticsService

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

DataStatisticsService

Method Details

apply

pivot

transform

join

melt

sort

describe

calculateSkewness

calculateKurtosis

calculateCovariance

calculateQuantile

calculateIQR

calculateFrequency

getColumnValues

binomialDist

mannWhitney

groupBy

sort