Package com.leumanuel.woozydata.service
Class StatisticalService
java.lang.Object
com.leumanuel.woozydata.service.StatisticalService
Service class for advanced statistical operations and hypothesis testing.
Provides implementation of various statistical tests and data analysis methods.
- Version:
- 1.0
- Author:
- Leu A. Manuel
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionPerforms ANOVA test on multiple columns.chiSquareTest(DataFrame df, String col1, String col2) Performs chi-square test for independence.Creates a random sample of specified size from the DataFrame.Selects specified columns from the DataFrame.shapiroWilkTest(DataFrame df, String column) Performs Shapiro-Wilk test for normality on a numeric column.Performs t-test between two columns.
-
Constructor Details
-
StatisticalService
public StatisticalService()
-
-
Method Details
-
tTest
Performs t-test between two columns.- Parameters:
df- DataFrame containing the datacol1- First column namecol2- Second column name- Returns:
- Map containing test results
-
anova
Performs ANOVA test on multiple columns.- Parameters:
df- DataFrame containing the datacolumns- Column names to test- Returns:
- ANOVA test results
-
chiSquareTest
Performs chi-square test for independence.- Parameters:
df- DataFrame containing the datacol1- First categorical columncol2- Second categorical column- Returns:
- Chi-square test results
-
shapiroWilkTest
Performs Shapiro-Wilk test for normality on a numeric column. Tests the null hypothesis that the data is normally distributed.- Parameters:
df- DataFrame containing the datacolumn- Column name to test for normality- Returns:
- Map containing 'statistic' (W) and 'p_value'
- Throws:
IllegalArgumentException- if column is not numeric
-
select
Selects specified columns from the DataFrame. Creates a new DataFrame containing only the selected columns.- Parameters:
df- Source DataFramecolumns- Columns to select- Returns:
- New DataFrame with only selected columns
- Throws:
IllegalArgumentException- if any column doesn't exist
-
sample
Creates a random sample of specified size from the DataFrame. Uses random sampling without replacement.- Parameters:
df- Source DataFramen- Sample size- Returns:
- New DataFrame containing the random sample
- Throws:
IllegalArgumentException- if n is larger than DataFrame size
-