Package com.leumanuel.woozydata
Class Woozydata
java.lang.Object
com.leumanuel.woozydata.Woozydata
- All Implemented Interfaces:
DataController
*WoozyData Library*
A comprehensive Java library for data analysis, providing a unified interface for
data manipulation, statistical analysis, and machine learning operations.
Key features include:
- Data loading from multiple sources (CSV, Excel, JSON, MongoDB)
- Statistical analysis and data cleaning
- Time series analysis
- Data visualization and export capabilities
- Machine learning operations
- Version:
- 1.0
- Author:
- Leu A. Manuel, github.com/Leupesquisa
-
Constructor Summary
ConstructorsConstructorDescriptionInitializes a new instance of Woozydata with all required analysis services. -
Method Summary
Modifier and TypeMethodDescriptionAnalyzes a specific column for statistical measures and patterns.Performs one-way ANOVA test.doubleCalculates the average of a column.Bins continuous data into discrete intervals.int[]binomialDist(int trials, double prob, int size) Generates binomial distribution samples.Performs chi-square test of independence.clean()Performs automatic data cleaning on the current DataFrame.Concatenates current DataFrame with another DataFrame.Converts column data types according to the specified type map.doubleCalculates correlation between two columns.correlation(String... columns) Calculates correlation matrix for specified columns.longCalculates basic count of non-null values in a column.doubleCalculates the covariance between two columns.Decomposes time series into components.Generates comprehensive statistical analysis of a column.detectOutliers(String timeCol, String valueCol) Detects outliers in time series data.Removes duplicate rows based on specified columns.dropNa()Removes rows containing null values from the DataFrame.Creates dummy/indicator variables.double[]ema(double[] data, double alpha) Calculates Exponential Moving Average.Fills null values in all columns with a specified value.fillNaColumns(Object value, String... columns) Fills null values in specified columns with a given value.Forecasts future values using time series analysis.Calculates frequency distribution for a column.Loads data from a CSV file into a DataFrame.Loads data from a JSON file into a DataFrame.Connects to MongoDB and loads data from a collection into a DataFrame.Loads data from an Excel file into a DataFrame.fullReport(String... columns) Generates a full statistical report.Groups DataFrame by specified columns.interpolate(String method, String... columns) Interpolates missing values using specified method.doubleCalculates the Interquartile Range (IQR) of a column.doubleCalculates the kurtosis of a numeric column.double[]Performs simple linear regression.logisticReg(String xCol, String yCol) Performs logistic regression.mannWhitney(String col1, String col2) Performs Mann-Whitney U test.doubleFinds the maximum value in a numeric column.doubleCalculates the arithmetic mean of a numeric column.doubleCalculates the median value of a numeric column.Reshapes data from wide to long format.Merges current DataFrame with another DataFrame.doubleFinds the minimum value in a column.Analyzes missing values in the DataFrame.multipleReg(String[] xCols, String yCol) Performs multiple linear regression.doublenormalCdf(double x, double mean, double std) Calculates normal cumulative distribution function value.double[]normalDist(int size, double mean, double std) Generates normal distribution samples.Normalizes specified columns to range [0,1].doublenormalPdf(double x, double mean, double std) Calculates normal probability density function value.outlierAnalysis(String... columns) Performs quick comprehensive analysis of specified columns.Creates a pivot table from the DataFrame.double[]poissonDist(double lambda, int size) Generates Poisson distribution samples.polynomialReg(String xCol, String yCol, int degree) Performs polynomial regression.doubleCalculates quantile value for a numeric column.quickAnalysis(String... columns) Performs quick exploratory data analysis on specified columns.reshape(int rows, int cols) Reshapes the DataFrame to specified dimensions.rollingWindow(String column, int window, String func) Applies function over rolling window.doubleCalculates R-squared value for linear regression.sample(int n) Creates a random sample of rows from the DataFrame.seasonalAdjust(String timeCol, String valueCol) Performs seasonal adjustment on time series.Selects specified columns from DataFrame.shapiroWilk(String column) Performs Shapiro-Wilk normality test.doubleCalculates the skewness of a numeric column.double[]sma(double[] data, int window) Calculates Simple Moving Average (SMA) for time series data.Sorts DataFrame by specified columns.standardize(String... columns) Standardizes specified columns using z-score normalization.Calculates basic statistical measures for a column.doubleCalculates the standard deviation of a numeric column.doubleCalculates the sum of a column.timeAnalysis(String dateCol, String valueCol) Performs time-based analysis on a datetime column and corresponding value column.voidExports DataFrame to CSV file.voidExports DataFrame to Excel format.voidExports DataFrame to HTML format.voidExports DataFrame to JSON format.voidExports DataFrame to LaTeX format for academic papers.voidExports DataFrame to PowerBI format.Performs t-test between two columns.double[]uniformDist(int size, double min, double max) Generates uniform distribution samples.doubleCalculates the variance of a numeric column.
-
Constructor Details
-
Woozydata
public Woozydata()Initializes a new instance of Woozydata with all required analysis services. This constructor sets up all necessary services for data analysis operations.
-
-
Method Details
-
fromCsv
Loads data from a CSV file into a DataFrame.- Specified by:
fromCsvin interfaceDataController- Parameters:
filePath- Path to the CSV file- Returns:
- DataFrame containing the loaded data
- Throws:
Exception- if file cannot be read or is invalid
-
fromXlsx
Loads data from an Excel file into a DataFrame.- Specified by:
fromXlsxin interfaceDataController- Parameters:
filePath- Path to the Excel file (.xlsx)- Returns:
- DataFrame containing the loaded data
- Throws:
Exception- if file cannot be read or is invalid
-
fromJson
Loads data from a JSON file into a DataFrame.- Specified by:
fromJsonin interfaceDataController- Parameters:
filePath- Path to the JSON file- Returns:
- DataFrame containing the loaded data
- Throws:
Exception- if file cannot be read or is invalid
-
fromMongo
Connects to MongoDB and loads data from a collection into a DataFrame.- Specified by:
fromMongoin interfaceDataController- Parameters:
connectionString- MongoDB connection stringdbName- Database namecollection- Collection name- Returns:
- DataFrame containing the loaded data
-
mean
Calculates the arithmetic mean of a numeric column.- Specified by:
meanin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Mean value of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
median
Calculates the median value of a numeric column.- Specified by:
medianin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Median value of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
stdv
Calculates the standard deviation of a numeric column.- Specified by:
stdvin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Standard deviation of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
vars
Calculates the variance of a numeric column.- Specified by:
varsin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Variance value of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
skew
Calculates the skewness of a numeric column.- Specified by:
skewin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Skewness value of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
kurt
Calculates the kurtosis of a numeric column.- Specified by:
kurtin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Kurtosis value of the column
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if column is not numeric
-
cov
Description copied from interface:DataControllerCalculates the covariance between two columns.- Specified by:
covin interfaceDataController- Parameters:
col1- Name of the first columncol2- Name of the second column- Returns:
- Covariance value
-
clean
Performs automatic data cleaning on the current DataFrame. This method combines multiple cleaning operations:- Removes missing values
- Removes duplicates
- Fixes data types
- Standardizes formats
- Specified by:
cleanin interfaceDataController- Returns:
- Cleaned DataFrame
- Throws:
IllegalStateException- if no DataFrame is loaded
-
dropNa
Removes rows containing null values from the DataFrame.- Specified by:
dropNain interfaceDataController- Returns:
- DataFrame with null values removed
- Throws:
IllegalStateException- if no DataFrame is loaded
-
dropDupes
Removes duplicate rows based on specified columns.- Specified by:
dropDupesin interfaceDataController- Parameters:
columns- Column names to check for duplicates. If none specified, checks all columns- Returns:
- DataFrame with duplicates removed
- Throws:
IllegalStateException- if no DataFrame is loaded
-
fillNa
Fills null values in all columns with a specified value.- Specified by:
fillNain interfaceDataController- Parameters:
value- Value to use for filling null values- Returns:
- DataFrame with filled values
- Throws:
IllegalStateException- if no DataFrame is loaded
-
fillNaColumns
Description copied from interface:DataControllerFills null values in specified columns with a given value.- Specified by:
fillNaColumnsin interfaceDataController- Parameters:
value- Value to fill nulls withcolumns- Columns to fill- Returns:
- DataFrame with filled values
-
toCsv
Exports DataFrame to CSV file.- Specified by:
toCsvin interfaceDataController- Parameters:
filePath- Output file path- Throws:
Exception- if export failsIllegalStateException- if no DataFrame is loaded
-
toJson
Exports DataFrame to JSON format.- Specified by:
toJsonin interfaceDataController- Parameters:
filePath- Path where to save the JSON file- Throws:
Exception- if export failsIllegalStateException- if no DataFrame is loaded
-
toExcel
Exports DataFrame to Excel format.- Specified by:
toExcelin interfaceDataController- Parameters:
filePath- Path where to save the Excel file- Throws:
Exception- if export failsIllegalStateException- if no DataFrame is loaded
-
toPowerBI
Exports DataFrame to PowerBI format. Includes:- Data sheet
- Metadata sheet
- Statistics sheet
- Specified by:
toPowerBIin interfaceDataController- Parameters:
filePath- Output file path- Throws:
Exception- if export failsIllegalStateException- if no DataFrame is loaded
-
toHtml
Description copied from interface:DataControllerExports DataFrame to HTML format.- Specified by:
toHtmlin interfaceDataController- Parameters:
filePath- Path where to save the HTML file- Throws:
Exception- If there's an error writing the file
-
toLatex
Exports DataFrame to LaTeX format for academic papers.- Specified by:
toLatexin interfaceDataController- Parameters:
filePath- Output file path- Throws:
Exception- if export failsIllegalStateException- if no DataFrame is loaded
-
describe
Generates comprehensive statistical analysis of a column. Includes:- Basic statistics (mean, median, std)
- Distribution analysis
- Missing value analysis
- Outlier detection
- Specified by:
describein interfaceDataController- Parameters:
column- Column name to analyze- Returns:
- Map containing statistical measures
- Throws:
IllegalStateException- if no DataFrame is loaded
-
quantile
Calculates quantile value for a numeric column.- Specified by:
quantilein interfaceDataController- Parameters:
column- Column nameq- Quantile value (0-1)- Returns:
- Quantile value
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalArgumentException- if q is not between 0 and 1
-
iqr
Calculates the Interquartile Range (IQR) of a column. IQR is the difference between the 75th and 25th percentiles.- Specified by:
iqrin interfaceDataController- Parameters:
column- Column name- Returns:
- IQR value
- Throws:
IllegalStateException- if no DataFrame is loaded
-
frequency
Description copied from interface:DataControllerCalculates frequency distribution for a column.- Specified by:
frequencyin interfaceDataController- Parameters:
column- Column name- Returns:
- Map of values to their frequencies
-
normalDist
public double[] normalDist(int size, double mean, double std) Description copied from interface:DataControllerGenerates normal distribution samples.- Specified by:
normalDistin interfaceDataController- Parameters:
size- Number of samplesmean- Mean of the distributionstd- Standard deviation- Returns:
- Array of samples
-
normalPdf
public double normalPdf(double x, double mean, double std) Description copied from interface:DataControllerCalculates normal probability density function value.- Specified by:
normalPdfin interfaceDataController- Parameters:
x- Input valuemean- Mean of the distributionstd- Standard deviation- Returns:
- PDF value
-
normalCdf
public double normalCdf(double x, double mean, double std) Description copied from interface:DataControllerCalculates normal cumulative distribution function value.- Specified by:
normalCdfin interfaceDataController- Parameters:
x- Input valuemean- Mean of the distributionstd- Standard deviation- Returns:
- CDF value
-
poissonDist
public double[] poissonDist(double lambda, int size) Description copied from interface:DataControllerGenerates Poisson distribution samples.- Specified by:
poissonDistin interfaceDataController- Parameters:
lambda- Rate parametersize- Number of samples- Returns:
- Array of samples
-
uniformDist
public double[] uniformDist(int size, double min, double max) Description copied from interface:DataControllerGenerates uniform distribution samples.- Specified by:
uniformDistin interfaceDataController- Parameters:
size- Number of samplesmin- Minimum valuemax- Maximum value- Returns:
- Array of samples
-
correl
Description copied from interface:DataControllerCalculates correlation between two columns.- Specified by:
correlin interfaceDataController- Parameters:
col1- First column namecol2- Second column name- Returns:
- Correlation coefficient
-
linearReg
Description copied from interface:DataControllerPerforms simple linear regression.- Specified by:
linearRegin interfaceDataController- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- Array containing slope and intercept
-
rsquared
Description copied from interface:DataControllerCalculates R-squared value for linear regression.- Specified by:
rsquaredin interfaceDataController- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- R-squared value
-
multipleReg
Description copied from interface:DataControllerPerforms multiple linear regression.- Specified by:
multipleRegin interfaceDataController- Parameters:
xCols- Independent variable columnsyCol- Dependent variable column- Returns:
- DataFrame with regression results
-
polynomialReg
Description copied from interface:DataControllerPerforms polynomial regression.- Specified by:
polynomialRegin interfaceDataController- Parameters:
xCol- Independent variable columnyCol- Dependent variable columndegree- Polynomial degree- Returns:
- DataFrame with regression results
-
logisticReg
Description copied from interface:DataControllerPerforms logistic regression.- Specified by:
logisticRegin interfaceDataController- Parameters:
xCol- Independent variable columnyCol- Dependent variable column- Returns:
- DataFrame with regression results
-
tTest
Description copied from interface:DataControllerPerforms t-test between two columns.- Specified by:
tTestin interfaceDataController- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
anova
Description copied from interface:DataControllerPerforms one-way ANOVA test.- Specified by:
anovain interfaceDataController- Parameters:
columns- Column names to compare- Returns:
- Map containing test results
-
chiSquare
Description copied from interface:DataControllerPerforms chi-square test of independence.- Specified by:
chiSquarein interfaceDataController- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
shapiroWilk
Description copied from interface:DataControllerPerforms Shapiro-Wilk normality test.- Specified by:
shapiroWilkin interfaceDataController- Parameters:
column- Column name- Returns:
- Map containing test results
-
sma
public double[] sma(double[] data, int window) Calculates Simple Moving Average (SMA) for time series data.- Specified by:
smain interfaceDataController- Parameters:
data- Input time series datawindow- Window size for moving average- Returns:
- Array containing SMA values
-
ema
public double[] ema(double[] data, double alpha) Description copied from interface:DataControllerCalculates Exponential Moving Average.- Specified by:
emain interfaceDataController- Parameters:
data- Input data arrayalpha- Smoothing factor- Returns:
- Array of EMA values
-
forecast
Description copied from interface:DataControllerForecasts future values using time series analysis.- Specified by:
forecastin interfaceDataController- Parameters:
timeCol- Time column namevalueCol- Value column nameperiods- Number of periods to forecast- Returns:
- DataFrame with forecasted values
-
decompose
Description copied from interface:DataControllerDecomposes time series into components.- Specified by:
decomposein interfaceDataController- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with decomposition components
-
seasonalAdjust
Description copied from interface:DataControllerPerforms seasonal adjustment on time series.- Specified by:
seasonalAdjustin interfaceDataController- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with adjusted values
-
detectOutliers
Description copied from interface:DataControllerDetects outliers in time series data.- Specified by:
detectOutliersin interfaceDataController- Parameters:
timeCol- Time column namevalueCol- Value column name- Returns:
- DataFrame with outlier information
-
pivot
Creates a pivot table from the DataFrame.- Specified by:
pivotin interfaceDataController- Parameters:
index- Column to use as indexcolumns- Column to use for new columnsvalues- Column to use for values- Returns:
- Pivoted DataFrame
- Throws:
IllegalStateException- if no DataFrame is loaded
-
melt
Reshapes data from wide to long format.- Specified by:
meltin interfaceDataController- Parameters:
idVars- Columns to use as identifiersvalueVars- Columns to unpivot- Returns:
- Melted DataFrame
- Throws:
IllegalStateException- if no DataFrame is loaded
-
dummies
Description copied from interface:DataControllerCreates dummy/indicator variables.- Specified by:
dummiesin interfaceDataController- Parameters:
columns- Columns to convert to dummy variables- Returns:
- DataFrame with dummy variables
-
bin
Description copied from interface:DataControllerBins continuous data into discrete intervals.- Specified by:
binin interfaceDataController- Parameters:
column- Column to binbins- Number of bins- Returns:
- DataFrame with binned data
-
rollingWindow
Description copied from interface:DataControllerApplies function over rolling window.- Specified by:
rollingWindowin interfaceDataController- Parameters:
column- Column namewindow- Window sizefunc- Function to apply- Returns:
- DataFrame with rolling window calculations
-
quickAnalysis
Description copied from interface:DataControllerPerforms quick exploratory data analysis on specified columns.- Specified by:
quickAnalysisin interfaceDataController- Parameters:
columns- Columns to analyze- Returns:
- DataFrame containing analysis results including basic statistics, distribution information, and potential anomalies
-
correlation
Description copied from interface:DataControllerCalculates correlation matrix for specified columns.- Specified by:
correlationin interfaceDataController- Parameters:
columns- Columns to include in correlation analysis- Returns:
- DataFrame containing correlation matrix with correlation coefficients between all pairs of specified columns
-
count
Calculates basic count of non-null values in a column.- Specified by:
countin interfaceDataController- Parameters:
column- Name of the column to count- Returns:
- Count of non-null values
- Throws:
IllegalStateException- if no DataFrame is loaded
-
min
Description copied from interface:DataControllerFinds the minimum value in a column.- Specified by:
minin interfaceDataController- Parameters:
column- Column name- Returns:
- Minimum value
-
max
Finds the maximum value in a numeric column.- Specified by:
maxin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Maximum value
- Throws:
IllegalStateException- if no DataFrame is loadedIllegalStateException- if no numeric values found in column
-
analyze
Analyzes a specific column for statistical measures and patterns.- Specified by:
analyzein interfaceDataController- Parameters:
column- Name of the column to analyze- Returns:
- DataFrame containing analysis results
- Throws:
IllegalStateException- if no DataFrame is loaded
-
convert
Converts column data types according to the specified type map.- Specified by:
convertin interfaceDataController- Parameters:
typeMap- Map of column names to their target Java types- Returns:
- DataFrame with converted column types
- Throws:
IllegalStateException- if no DataFrame is loaded
-
standardize
Standardizes specified columns using z-score normalization.- Specified by:
standardizein interfaceDataController- Parameters:
columns- Columns to standardize- Returns:
- DataFrame with standardized columns
- Throws:
IllegalStateException- if no DataFrame is loaded
-
normalize
Normalizes specified columns to range [0,1].- Specified by:
normalizein interfaceDataController- Parameters:
columns- Columns to normalize- Returns:
- DataFrame with normalized columns
- Throws:
IllegalStateException- if no DataFrame is loaded
-
sum
Description copied from interface:DataControllerCalculates the sum of a column.- Specified by:
sumin interfaceDataController- Parameters:
column- Column name- Returns:
- Sum value
-
avg
Description copied from interface:DataControllerCalculates the average of a column.- Specified by:
avgin interfaceDataController- Parameters:
column- Column name- Returns:
- Average value
-
binomialDist
public int[] binomialDist(int trials, double prob, int size) Description copied from interface:DataControllerGenerates binomial distribution samples.- Specified by:
binomialDistin interfaceDataController- Parameters:
trials- Number of trialsprob- Success probabilitysize- Number of samples- Returns:
- Array of samples
-
mannWhitney
Description copied from interface:DataControllerPerforms Mann-Whitney U test.- Specified by:
mannWhitneyin interfaceDataController- Parameters:
col1- First column namecol2- Second column name- Returns:
- Map containing test results
-
groupBy
Description copied from interface:DataControllerGroups DataFrame by specified columns.- Specified by:
groupByin interfaceDataController- Parameters:
columns- Columns to group by- Returns:
- Grouped DataFrame
-
sort
Description copied from interface:DataControllerSorts DataFrame by specified columns.- Specified by:
sortin interfaceDataController- Parameters:
columns- Columns to sort by- Returns:
- Sorted DataFrame
-
select
Description copied from interface:DataControllerSelects specified columns from DataFrame.- Specified by:
selectin interfaceDataController- Parameters:
columns- Columns to select- Returns:
- DataFrame containing only the selected columns
-
sample
Description copied from interface:DataControllerCreates a random sample of rows from the DataFrame.- Specified by:
samplein interfaceDataController- Parameters:
n- Number of rows to sample- Returns:
- DataFrame containing the sampled rows
-
merge
Description copied from interface:DataControllerMerges current DataFrame with another DataFrame.- Specified by:
mergein interfaceDataController- Parameters:
other- DataFrame to merge withhow- Type of merge ('inner', 'outer', 'left', 'right')on- Columns to merge on- Returns:
- Merged DataFrame
-
concat
Description copied from interface:DataControllerConcatenates current DataFrame with another DataFrame.- Specified by:
concatin interfaceDataController- Parameters:
other- DataFrame to concatenateaxis- If true, concatenate along columns; if false, along rows- Returns:
- Concatenated DataFrame
-
reshape
Description copied from interface:DataControllerReshapes the DataFrame to specified dimensions.- Specified by:
reshapein interfaceDataController- Parameters:
rows- Number of rows in reshaped DataFramecols- Number of columns in reshaped DataFrame- Returns:
- Reshaped DataFrame
-
timeAnalysis
Description copied from interface:DataControllerPerforms time-based analysis on a datetime column and corresponding value column.- Specified by:
timeAnalysisin interfaceDataController- Parameters:
dateCol- Column containing datetime valuesvalueCol- Column containing values to analyze- Returns:
- DataFrame with time-based analysis results including trends, seasonality, and temporal patterns
-
missingAnalysis
Description copied from interface:DataControllerAnalyzes missing values in the DataFrame.- Specified by:
missingAnalysisin interfaceDataController- Returns:
- DataFrame containing missing value analysis including counts, percentages, and patterns of missing data
-
outlierAnalysis
Performs quick comprehensive analysis of specified columns. Includes:- Basic statistics
- Missing value analysis
- Distribution analysis
- Outlier detection
Example:
DataFrame analysis = woozy.quickAnalysis("sales", "profit"); analysis.show();- Specified by:
outlierAnalysisin interfaceDataController- Parameters:
columns- Columns to analyze- Returns:
- DataFrame containing analysis results
- Throws:
IllegalStateException- if no DataFrame is loaded
-
stats
Description copied from interface:DataControllerCalculates basic statistical measures for a column.- Specified by:
statsin interfaceDataController- Parameters:
column- Name of the column- Returns:
- Map containing statistical measures
-
interpolate
Description copied from interface:DataControllerInterpolates missing values using specified method.- Specified by:
interpolatein interfaceDataController- Parameters:
method- Interpolation method to usecolumns- Columns to interpolate- Returns:
- DataFrame with interpolated values
-
fullReport
Generates a full statistical report.Example:
Map<String, Object> report = woozy.fullReport("sales", "profit"); System.out.println("Correlation: " + report.get("correlation"));- Specified by:
fullReportin interfaceDataController- Parameters:
columns- Columns to include in report- Returns:
- Map containing comprehensive analysis
- Throws:
IllegalStateException- if no DataFrame is loaded
-