Class StatisticalService

java.lang.Object
com.leumanuel.woozydata.service.StatisticalService

public class StatisticalService extends Object
Service class for advanced statistical operations and hypothesis testing. Provides implementation of various statistical tests and data analysis methods.
Version:
1.0
Author:
Leu A. Manuel
  • Constructor Details

    • StatisticalService

      public StatisticalService()
  • Method Details

    • tTest

      public Map<String,Double> tTest(DataFrame df, String col1, String col2)
      Performs t-test between two columns.
      Parameters:
      df - DataFrame containing the data
      col1 - First column name
      col2 - Second column name
      Returns:
      Map containing test results
    • anova

      public Map<String,Double> anova(DataFrame df, String... columns)
      Performs ANOVA test on multiple columns.
      Parameters:
      df - DataFrame containing the data
      columns - Column names to test
      Returns:
      ANOVA test results
    • chiSquareTest

      public Map<String,Double> chiSquareTest(DataFrame df, String col1, String col2)
      Performs chi-square test for independence.
      Parameters:
      df - DataFrame containing the data
      col1 - First categorical column
      col2 - Second categorical column
      Returns:
      Chi-square test results
    • shapiroWilkTest

      public Map<String,Double> shapiroWilkTest(DataFrame df, String column)
      Performs Shapiro-Wilk test for normality on a numeric column. Tests the null hypothesis that the data is normally distributed.
      Parameters:
      df - DataFrame containing the data
      column - Column name to test for normality
      Returns:
      Map containing 'statistic' (W) and 'p_value'
      Throws:
      IllegalArgumentException - if column is not numeric
    • select

      public DataFrame select(DataFrame df, String... columns)
      Selects specified columns from the DataFrame. Creates a new DataFrame containing only the selected columns.
      Parameters:
      df - Source DataFrame
      columns - Columns to select
      Returns:
      New DataFrame with only selected columns
      Throws:
      IllegalArgumentException - if any column doesn't exist
    • sample

      public DataFrame sample(DataFrame df, int n)
      Creates a random sample of specified size from the DataFrame. Uses random sampling without replacement.
      Parameters:
      df - Source DataFrame
      n - Sample size
      Returns:
      New DataFrame containing the random sample
      Throws:
      IllegalArgumentException - if n is larger than DataFrame size