Package com.leumanuel.woozydata.service
Class RegressionService
java.lang.Object
com.leumanuel.woozydata.service.RegressionService
Service class for regression analysis and correlation calculations.
Provides methods for various types of regression models and statistical correlations.
- Version:
- 1.0
- Author:
- Leu A. Manuel
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiondoublecalculatePearsonCorrelation(DataFrame dataFrame, String column1, String column2) Calculates Pearson correlation coefficient between two columns.doublecalculateRSquared(DataFrame df, String xCol, String yCol) Calculates R-squared (coefficient of determination) for linear regression.logisticRegression(DataFrame df, String xCol, String yCol) Performs logistic regression analysis for binary classification.multipleRegression(DataFrame df, String[] xCols, String yCol) Performs multiple linear regression analysis.polynomialRegression(DataFrame df, String xCol, String yCol, int degree) Performs polynomial regression analysis.double[]simpleLinearRegression(DataFrame dataFrame, String xColumn, String yColumn) Performs simple linear regression between two columns.
-
Constructor Details
-
RegressionService
public RegressionService()
-
-
Method Details
-
calculatePearsonCorrelation
Calculates Pearson correlation coefficient between two columns.- Parameters:
dataFrame- DataFrame containing the datacolumn1- Name of first columncolumn2- Name of second column- Returns:
- Pearson correlation coefficient
- Throws:
IllegalArgumentException- if columns are not numeric
-
simpleLinearRegression
Performs simple linear regression between two columns. Returns array containing [slope, intercept].- Parameters:
dataFrame- DataFrame containing the dataxColumn- Independent variable column nameyColumn- Dependent variable column name- Returns:
- double array where [0] = slope, [1] = intercept
- Throws:
IllegalArgumentException- if columns are not numeric
-
calculateRSquared
Calculates R-squared (coefficient of determination) for linear regression.- Parameters:
df- DataFrame containing the dataxCol- Independent variable column nameyCol- Dependent variable column name- Returns:
- R-squared value
- Throws:
IllegalArgumentException- if columns are not numeric
-
multipleRegression
Performs multiple linear regression analysis.- Parameters:
df- DataFrame containing the dataxCols- Array of independent variable column namesyCol- Dependent variable column name- Returns:
- DataFrame containing regression coefficients
- Throws:
IllegalArgumentException- if any column is not numeric
-
polynomialRegression
Performs polynomial regression analysis.- Parameters:
df- DataFrame containing the dataxCol- Independent variable column nameyCol- Dependent variable column namedegree- Degree of polynomial- Returns:
- DataFrame containing polynomial coefficients
- Throws:
IllegalArgumentException- if columns are not numeric or degree is invalid
-
logisticRegression
Performs logistic regression analysis for binary classification.- Parameters:
df- DataFrame containing the dataxCol- Independent variable column nameyCol- Dependent variable column name (should contain binary values)- Returns:
- DataFrame containing logistic regression coefficients
- Throws:
IllegalArgumentException- if columns are not numeric or y is not binary
-