Class DataStatisticsService

java.lang.Object
com.leumanuel.woozydata.service.DataStatisticsService

public class DataStatisticsService extends Object
Service class providing statistical analysis and data transformation operations for DataFrames. Includes methods for descriptive statistics, statistical tests, data reshaping, and aggregations.
Version:
1.0
Author:
Leu A. Manuel
  • Constructor Details

    • DataStatisticsService

      public DataStatisticsService()
  • Method Details

    • apply

      public DataFrame apply(DataFrame df, String column, Function<List<Object>,Object> func)
      Applies a function to a column of the DataFrame.
      Parameters:
      df - DataFrame to analyze
      column - Name of the column to apply function to
      func - Function to apply to the column values
      Returns:
      New DataFrame containing the results of the function application
    • pivot

      public DataFrame pivot(DataFrame df, String index, String columns, String values, String aggFunc)
      Creates a pivot table from the DataFrame data.
      Parameters:
      df - Source DataFrame
      index - Column to use as index
      columns - Column to use for new columns
      values - Column containing values to aggregate
      aggFunc - Aggregation function to apply ("sum", "mean", "min", "max")
      Returns:
      New DataFrame containing the pivot table
    • transform

      public DataFrame transform(DataFrame df, String column, Function<Object,Object> func)
      Transforms values in a specified column using a provided function.
      Parameters:
      df - Source DataFrame
      column - Column to transform
      func - Function to apply to each value
      Returns:
      New DataFrame with transformed values
    • join

      public DataFrame join(DataFrame left, DataFrame right, String[] leftCols, String[] rightCols)
      Joins two DataFrames based on matching column values.
      Parameters:
      left - Left DataFrame
      right - Right DataFrame
      leftCols - Columns from left DataFrame to join on
      rightCols - Columns from right DataFrame to join on
      Returns:
      New DataFrame containing joined data
    • melt

      public DataFrame melt(DataFrame df, String[] idVars, String[] valueVars)
      Unpivots DataFrame from wide to long format.
      Parameters:
      df - Source DataFrame
      idVars - Columns to keep as identifiers
      valueVars - Columns to unpivot into rows
      Returns:
      Melted DataFrame in long format
    • sort

      public DataFrame sort(DataFrame df, String[] columns, boolean[] ascending)
      Sorts DataFrame by specified columns.
      Parameters:
      df - Source DataFrame
      columns - Columns to sort by
      ascending - Array indicating sort direction for each column
      Returns:
      Sorted DataFrame
    • describe

      public Map<String,Map<String,Double>> describe(DataFrame df)
      Generates comprehensive descriptive statistics for numeric columns.
      Parameters:
      df - Source DataFrame
      Returns:
      Map of column names to their statistical measures
    • calculateSkewness

      public double calculateSkewness(DataFrame df, String column)
      Calculates skewness of a numeric column.
      Parameters:
      df - Source DataFrame
      column - Column to analyze
      Returns:
      Skewness value
      Throws:
      IllegalArgumentException - if column is not numeric
    • calculateKurtosis

      public double calculateKurtosis(DataFrame df, String column)
      Calculates kurtosis of a numeric column.
      Parameters:
      df - Source DataFrame
      column - Column to analyze
      Returns:
      Kurtosis value
      Throws:
      IllegalArgumentException - if column is not numeric
    • calculateCovariance

      public double calculateCovariance(DataFrame df, String col1, String col2)
      Calculates covariance between two numeric columns.
      Parameters:
      df - Source DataFrame
      col1 - First column name
      col2 - Second column name
      Returns:
      Covariance value
      Throws:
      IllegalArgumentException - if either column is not numeric
    • calculateQuantile

      public double calculateQuantile(DataFrame df, String column, double q)
      Calculates specified quantile of a numeric column.
      Parameters:
      df - Source DataFrame
      column - Column to analyze
      q - Quantile value (0 to 1)
      Returns:
      Quantile value
      Throws:
      IllegalArgumentException - if column is not numeric or q is invalid
    • calculateIQR

      public double calculateIQR(DataFrame df, String column)
      Calculates Interquartile Range (IQR) of a numeric column.
      Parameters:
      df - Source DataFrame
      column - Column to analyze
      Returns:
      IQR value
      Throws:
      IllegalArgumentException - if column is not numeric
    • calculateFrequency

      public Map<Object,Long> calculateFrequency(DataFrame df, String column)
      Calculates frequency distribution of values in a column.
      Parameters:
      df - Source DataFrame
      column - Column to analyze
      Returns:
      Map of values to their frequencies
    • getColumnValues

      public double[] getColumnValues(DataFrame df, String column)
      Gets numeric values from a specified column.
      Parameters:
      df - Source DataFrame
      column - Column name
      Returns:
      Array of numeric values
      Throws:
      IllegalArgumentException - if column is not numeric
    • binomialDist

      public int[] binomialDist(int trials, double prob, int size)
      Generates binomial distribution samples.
      Parameters:
      trials - Number of trials
      prob - Success probability
      size - Number of samples to generate
      Returns:
      Array of binomial distribution samples
      Throws:
      IllegalArgumentException - if parameters are invalid
    • mannWhitney

      public Map<String,Double> mannWhitney(DataFrame df, String col1, String col2)
      Performs Mann-Whitney U test between two columns.
      Parameters:
      df - Source DataFrame
      col1 - First column name
      col2 - Second column name
      Returns:
      Map containing test statistic and p-value
      Throws:
      IllegalArgumentException - if either column is not numeric
    • groupBy

      public DataFrame groupBy(DataFrame df, String... columns)
      Groups DataFrame by specified columns and calculates aggregate statistics.
      Parameters:
      df - Source DataFrame
      columns - Columns to group by
      Returns:
      DataFrame containing grouped statistics
    • sort

      public DataFrame sort(DataFrame df, String... columns)
      Sorts DataFrame by specified columns in ascending order.
      Parameters:
      df - Source DataFrame
      columns - Columns to sort by
      Returns:
      Sorted DataFrame