Interface DataController
- All Known Implementing Classes:
Woozydata
public interface DataController
Interface that defines operations for data manipulation, analysis, and statistical computations
similar to popular data analysis libraries like pandas.
-
Method Summary
Modifier and TypeMethodDescriptionPerforms basic analysis on a specified column.Performs one-way ANOVA test.doubleCalculates the average of a column.Bins continuous data into discrete intervals.int[]binomialDist(int trials, double prob, int size) Generates binomial distribution samples.Performs chi-square test of independence.clean()Performs general data cleaning operations.Concatenates current DataFrame with another DataFrame.Converts column types according to the provided type map.doubleCalculates correlation between two columns.correlation(String... columns) Calculates correlation matrix for specified columns.longCounts non-null values in a column.doubleCalculates the covariance between two columns.Decomposes time series into components.Provides descriptive statistics for a column.detectOutliers(String timeCol, String valueCol) Detects outliers in time series data.Removes duplicate rows based on specified columns.dropNa()Removes rows with null values.Creates dummy/indicator variables.double[]ema(double[] data, double alpha) Calculates Exponential Moving Average.Fills null values with a specified value.fillNaColumns(Object value, String... columns) Fills null values in specified columns with a given value.Forecasts future values using time series analysis.Calculates frequency distribution for a column.Reads data from a CSV file and creates a DataFrame.Reads data from a JSON file and creates a DataFrame.Reads data from a MongoDB collection and creates a DataFrame.Reads data from an Excel (XLSX) file and creates a DataFrame.fullReport(String... columns) Generates a comprehensive analysis report for specified columns.Groups DataFrame by specified columns.interpolate(String method, String... columns) Interpolates missing values using specified method.doubleCalculates the interquartile range for a column.doubleCalculates the kurtosis of a column.double[]Performs simple linear regression.logisticReg(String xCol, String yCol) Performs logistic regression.mannWhitney(String col1, String col2) Performs Mann-Whitney U test.doubleFinds the maximum value in a column.doubleCalculates the mean of a column.doubleCalculates the median of a column.Unpivots DataFrame from wide to long format.Merges current DataFrame with another DataFrame.doubleFinds the minimum value in a column.Analyzes missing values in the DataFrame.multipleReg(String[] xCols, String yCol) Performs multiple linear regression.doublenormalCdf(double x, double mean, double std) Calculates normal cumulative distribution function value.double[]normalDist(int size, double mean, double std) Generates normal distribution samples.Normalizes specified columns to [0,1] range.doublenormalPdf(double x, double mean, double std) Calculates normal probability density function value.outlierAnalysis(String... columns) Identifies and analyzes outliers in specified columns.Creates a pivot table from the DataFrame.double[]poissonDist(double lambda, int size) Generates Poisson distribution samples.polynomialReg(String xCol, String yCol, int degree) Performs polynomial regression.doubleCalculates the quantile value for a column.quickAnalysis(String... columns) Performs quick exploratory data analysis on specified columns.reshape(int rows, int cols) Reshapes the DataFrame to specified dimensions.rollingWindow(String column, int window, String func) Applies function over rolling window.doubleCalculates R-squared value for linear regression.sample(int n) Creates a random sample of rows from the DataFrame.seasonalAdjust(String timeCol, String valueCol) Performs seasonal adjustment on time series.Selects specified columns from DataFrame.shapiroWilk(String column) Performs Shapiro-Wilk normality test.doubleCalculates the skewness of a column.double[]sma(double[] data, int window) Calculates Simple Moving Average.Sorts DataFrame by specified columns.standardize(String... columns) Standardizes specified columns (z-score normalization).Calculates basic statistical measures for a column.doubleCalculates the standard deviation of a column.doubleCalculates the sum of a column.timeAnalysis(String dateCol, String valueCol) Performs time-based analysis on a datetime column and corresponding value column.voidExports DataFrame to CSV file.voidExports DataFrame to Excel file.voidExports DataFrame to HTML format.voidExports DataFrame to JSON file.voidExports DataFrame to LaTeX format.voidExports DataFrame to PowerBI format.Performs t-test between two columns.double[]uniformDist(int size, double min, double max) Generates uniform distribution samples.doubleCalculates the variance of a column.
-
Method Details
-
fromCsv
Reads data from a CSV file and creates a DataFrame.- Parameters:
filePath- Path to the CSV file- Returns:
- DataFrame containing the data from the CSV file
- Throws:
Exception- If there's an error reading the file
-
fromXlsx
Reads data from an Excel (XLSX) file and creates a DataFrame.- Parameters:
filePath- Path to the Excel file- Returns:
- DataFrame containing the data from the Excel file
- Throws:
Exception- If there's an error reading the file
-
fromJson
Reads data from a JSON file and creates a DataFrame.- Parameters:
filePath- Path to the JSON file- Returns:
- DataFrame containing the data from the JSON file
- Throws:
Exception- If there's an error reading the file
-
fromMongo
Reads data from a MongoDB collection and creates a DataFrame.- Parameters:
connectionString- MongoDB connection stringdbName- Database namecollection- Collection name- Returns:
- DataFrame containing the data from MongoDB
-
analyze
Performs basic analysis on a specified column.- Parameters:
column- Name of the column to analyze- Returns:
- DataFrame containing analysis results
-
stats
Calculates basic statistical measures for a column.- Parameters:
column- Name of the column- Returns:
- Map containing statistical measures
-
mean
Calculates the mean of a column.- Parameters:
column- Name of the column- Returns:
- Mean value
-
median
Calculates the median of a column.- Parameters:
column- Name of the column- Returns:
- Median value
-
stdv
Calculates the standard deviation of a column.- Parameters:
column- Name of the column- Returns:
- Standard deviation value
-
vars
Calculates the variance of a column.- Parameters:
column- Name of the column- Returns:
- Variance value
-
skew
Calculates the skewness of a column.- Parameters:
column- Name of the column- Returns:
- Skewness value
-
kurt
Calculates the kurtosis of a column.- Parameters:
column- Name of the column- Returns:
- Kurtosis value
-
cov
Calculates the covariance between two columns.- Parameters:
col1- Name of the first columncol2- Name of the second column- Returns:
- Covariance value
-
clean
DataFrame clean()Performs general data cleaning operations.- Returns:
- Cleaned DataFrame
-
dropNa
DataFrame dropNa()Removes rows with null values.- Returns:
- DataFrame with null values removed
-
dropDupes
Removes duplicate rows based on specified columns.- Parameters:
columns- Column names to check for duplicates- Returns:
- DataFrame with duplicates removed
-
fillNa
Fills null values with a specified value.- Parameters:
value- Value to fill nulls with- Returns:
- DataFrame with filled values
-
fillNaColumns
Fills null values in specified columns with a given value.- Parameters:
value- Value to fill nulls withcolumns- Columns to fill- Returns:
- DataFrame with filled values
-
convert
Converts column types according to the provided type map.- Parameters:
typeMap- Map of column names to their target types- Returns:
- DataFrame with converted types
-
interpolate
Interpolates missing values using specified method.- Parameters:
method- Interpolation method to usecolumns- Columns to interpolate- Returns:
- DataFrame with interpolated values
-
standardize
Standardizes specified columns (z-score normalization).- Parameters:
columns- Columns to standardize- Returns:
- DataFrame with standardized values
-
normalize
Normalizes specified columns to [0,1] range.- Parameters:
columns- Columns to normalize- Returns:
- DataFrame with normalized values
-
toCsv
Exports DataFrame to CSV file.- Parameters:
filePath- Path where to save the CSV file- Throws:
Exception- If there's an error writing the file
-
toJson
Exports DataFrame to JSON file.- Parameters:
filePath- Path where to save the JSON file- Throws:
Exception- If there's an error writing the file
-
toExcel
Exports DataFrame to Excel file.- Parameters:
filePath- Path where to save the Excel file- Throws:
Exception- If there's an error writing the file
-
toPowerBI
Exports DataFrame to PowerBI format.- Parameters:
filePath- Path where to save the PowerBI file- Throws:
Exception- If there's an error writing the file
-
toHtml
Exports DataFrame to HTML format.- Parameters:
filePath- Path where to save the HTML file- Throws:
Exception- If there's an error writing the file
-
toLatex
Exports DataFrame to LaTeX format.- Parameters:
filePath- Path where to save the LaTeX file- Throws:
Exception- If there's an error writing the file
-
sum
Calculates the sum of a column.- Parameters:
column- Column name- Returns:
- Sum value
-
avg
Calculates the average of a column.- Parameters:
column- Column name- Returns:
- Average value
-
count
Counts non-null values in a column.- Parameters:
column- Column name- Returns:
- Count of non-null values
-
min
Finds the minimum value in a column.- Parameters:
column- Column name- Returns:
- Minimum value
-
max
Finds the maximum value in a column.- Parameters:
column- Column name- Returns:
- Maximum value
-
describe
Provides descriptive statistics for a column.- Parameters:
column- Column name- Returns:
- Map of descriptive statistics
-
quantile
Calculates the quantile value for a column.- Parameters:
column- Column nameq- Quantile value (0-1)- Returns:
- Quantile value
-
iqr
Calculates the interquartile range for a column.- Parameters:
column- Column name- Returns:
- IQR value
-
frequency
Calculates frequency distribution for a column.- Parameters:
column- Column name- Returns:
- Map of values to their frequencies
-
normalDist
double[] normalDist(int size, double mean, double std) Generates normal distribution samples.- Parameters:
size- Number of samplesmean- Mean of the distributionstd- Standard deviation- Returns:
- Array of samples
-
normalPdf
double normalPdf(double x, double mean, double std) Calculates normal probability density function value.- Parameters:
x- Input valuemean- Mean of the distributionstd- Standard deviation- Returns:
- PDF value
-
normalCdf
double normalCdf(double x, double mean, double std) Calculates normal cumulative distribution function value.- Parameters:
x- Input valuemean- Mean of the distributionstd- Standard deviation- Returns:
- CDF value
-
binomialDist
int[] binomialDist(int trials, double prob, int size) Generates binomial distribution samples.- Parameters:
trials- Number of trialsprob- Success probabilitysize- Number of samples- Returns:
- Array of samples
-
poissonDist
double[] poissonDist(double lambda, int size) Generates Poisson distribution samples.- Parameters:
lambda- Rate parametersize- Number of samples- Returns:
- Array of samples
-
uniformDist
double[] uniformDist(int size, double min, double max) Generates uniform distribution samples.- Parameters:
size- Number of samplesmin- Minimum valuemax- Maximum value- Returns:
- Array of samples
-
correl
Calculates correlation between two columns.- Parameters:
col1- First column namecol2- Second column name- Returns:
- Correlation coefficient
-
linearReg
Performs simple linear regression.- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- Array containing slope and intercept
-
rsquared
Calculates R-squared value for linear regression.- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- R-squared value
-
multipleReg
Performs multiple linear regression.- Parameters:
xCols- Independent variable columnsyCol- Dependent variable column- Returns:
- DataFrame with regression results
-
polynomialReg
Performs polynomial regression.- Parameters:
xCol- Independent variable columnyCol- Dependent variable columndegree- Polynomial degree- Returns:
- DataFrame with regression results
-
logisticReg
Performs logistic regression.- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- DataFrame with regression results
-
tTest
Performs t-test between two columns.- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
anova
Performs one-way ANOVA test.- Parameters:
columns- Column names to compare- Returns:
- Map containing test results
-
chiSquare
Performs chi-square test of independence.- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
shapiroWilk
Performs Shapiro-Wilk normality test.- Parameters:
column- Column name- Returns:
- Map containing test results
-
mannWhitney
Performs Mann-Whitney U test.- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
sma
double[] sma(double[] data, int window) Calculates Simple Moving Average.- Parameters:
data- Input data arraywindow- Window size- Returns:
- Array of SMA values
-
ema
double[] ema(double[] data, double alpha) Calculates Exponential Moving Average.- Parameters:
data- Input data arrayalpha- Smoothing factor- Returns:
- Array of EMA values
-
forecast
Forecasts future values using time series analysis.- Parameters:
timeCol- Time column namevalueCol- Value column nameperiods- Number of periods to forecast- Returns:
- DataFrame with forecasted values
-
decompose
Decomposes time series into components.- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with decomposition components
-
seasonalAdjust
Performs seasonal adjustment on time series.- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with adjusted values
-
detectOutliers
Detects outliers in time series data.- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with outlier information
-
pivot
Creates a pivot table from the DataFrame.- Parameters:
index- Index column namecolumns- Column names for pivotvalues- Values column name- Returns:
- Pivoted DataFrame
-
melt
Unpivots DataFrame from wide to long format.- Parameters:
idVars- Columns to use as identifier variablesvalueVars- Columns to unpivot- Returns:
- Melted DataFrame
-
dummies
Creates dummy/indicator variables.- Parameters:
columns- Columns to convert to dummy variables- Returns:
- DataFrame with dummy variables
-
bin
Bins continuous data into discrete intervals.- Parameters:
column- Column to binbins- Number of bins- Returns:
- DataFrame with binned data
-
rollingWindow
Applies function over rolling window.- Parameters:
column- Column namewindow- Window sizefunc- Function to apply- Returns:
- DataFrame with rolling window calculations
-
groupBy
Groups DataFrame by specified columns.- Parameters:
columns- Columns to group by- Returns:
- Grouped DataFrame
-
sort
Sorts DataFrame by specified columns.- Parameters:
columns- Columns to sort by- Returns:
- Sorted DataFrame
-
select
Selects specified columns from DataFrame.- Parameters:
columns- Columns to select- Returns:
- DataFrame containing only the selected columns
-
sample
Creates a random sample of rows from the DataFrame.- Parameters:
n- Number of rows to sample- Returns:
- DataFrame containing the sampled rows
-
merge
Merges current DataFrame with another DataFrame.- Parameters:
other- DataFrame to merge withhow- Type of merge ('inner', 'outer', 'left', 'right')on- Columns to merge on- Returns:
- Merged DataFrame
-
concat
Concatenates current DataFrame with another DataFrame.- Parameters:
other- DataFrame to concatenateaxis- If true, concatenate along columns; if false, along rows- Returns:
- Concatenated DataFrame
-
reshape
Reshapes the DataFrame to specified dimensions.- Parameters:
rows- Number of rows in reshaped DataFramecols- Number of columns in reshaped DataFrame- Returns:
- Reshaped DataFrame
-
quickAnalysis
Performs quick exploratory data analysis on specified columns.- Parameters:
columns- Columns to analyze- Returns:
- DataFrame containing analysis results including basic statistics, distribution information, and potential anomalies
-
fullReport
Generates a comprehensive analysis report for specified columns.- Parameters:
columns- Columns to analyze- Returns:
- Map containing detailed analysis results including statistical tests, visualizations, and data quality metrics
-
timeAnalysis
Performs time-based analysis on a datetime column and corresponding value column.- Parameters:
dateCol- Column containing datetime valuesvalueCol- Column containing values to analyze- Returns:
- DataFrame with time-based analysis results including trends, seasonality, and temporal patterns
-
correlation
Calculates correlation matrix for specified columns.- Parameters:
columns- Columns to include in correlation analysis- Returns:
- DataFrame containing correlation matrix with correlation coefficients between all pairs of specified columns
-
missingAnalysis
DataFrame missingAnalysis()Analyzes missing values in the DataFrame.- Returns:
- DataFrame containing missing value analysis including counts, percentages, and patterns of missing data
-
outlierAnalysis
Identifies and analyzes outliers in specified columns.- Parameters:
columns- Columns to check for outliers- Returns:
- DataFrame containing outlier analysis results including identified outliers, their impact, and statistical justification
-