Interface DataController

All Known Implementing Classes:
Woozydata

public interface DataController
Interface that defines operations for data manipulation, analysis, and statistical computations similar to popular data analysis libraries like pandas.
  • Method Details

    • fromCsv

      DataFrame fromCsv(String filePath) throws Exception
      Reads data from a CSV file and creates a DataFrame.
      Parameters:
      filePath - Path to the CSV file
      Returns:
      DataFrame containing the data from the CSV file
      Throws:
      Exception - If there's an error reading the file
    • fromXlsx

      DataFrame fromXlsx(String filePath) throws Exception
      Reads data from an Excel (XLSX) file and creates a DataFrame.
      Parameters:
      filePath - Path to the Excel file
      Returns:
      DataFrame containing the data from the Excel file
      Throws:
      Exception - If there's an error reading the file
    • fromJson

      DataFrame fromJson(String filePath) throws Exception
      Reads data from a JSON file and creates a DataFrame.
      Parameters:
      filePath - Path to the JSON file
      Returns:
      DataFrame containing the data from the JSON file
      Throws:
      Exception - If there's an error reading the file
    • fromMongo

      DataFrame fromMongo(String connectionString, String dbName, String collection)
      Reads data from a MongoDB collection and creates a DataFrame.
      Parameters:
      connectionString - MongoDB connection string
      dbName - Database name
      collection - Collection name
      Returns:
      DataFrame containing the data from MongoDB
    • analyze

      DataFrame analyze(String column)
      Performs basic analysis on a specified column.
      Parameters:
      column - Name of the column to analyze
      Returns:
      DataFrame containing analysis results
    • stats

      Map<String,Double> stats(String column)
      Calculates basic statistical measures for a column.
      Parameters:
      column - Name of the column
      Returns:
      Map containing statistical measures
    • mean

      double mean(String column)
      Calculates the mean of a column.
      Parameters:
      column - Name of the column
      Returns:
      Mean value
    • median

      double median(String column)
      Calculates the median of a column.
      Parameters:
      column - Name of the column
      Returns:
      Median value
    • stdv

      double stdv(String column)
      Calculates the standard deviation of a column.
      Parameters:
      column - Name of the column
      Returns:
      Standard deviation value
    • vars

      double vars(String column)
      Calculates the variance of a column.
      Parameters:
      column - Name of the column
      Returns:
      Variance value
    • skew

      double skew(String column)
      Calculates the skewness of a column.
      Parameters:
      column - Name of the column
      Returns:
      Skewness value
    • kurt

      double kurt(String column)
      Calculates the kurtosis of a column.
      Parameters:
      column - Name of the column
      Returns:
      Kurtosis value
    • cov

      double cov(String col1, String col2)
      Calculates the covariance between two columns.
      Parameters:
      col1 - Name of the first column
      col2 - Name of the second column
      Returns:
      Covariance value
    • clean

      DataFrame clean()
      Performs general data cleaning operations.
      Returns:
      Cleaned DataFrame
    • dropNa

      DataFrame dropNa()
      Removes rows with null values.
      Returns:
      DataFrame with null values removed
    • dropDupes

      DataFrame dropDupes(String... columns)
      Removes duplicate rows based on specified columns.
      Parameters:
      columns - Column names to check for duplicates
      Returns:
      DataFrame with duplicates removed
    • fillNa

      DataFrame fillNa(Object value)
      Fills null values with a specified value.
      Parameters:
      value - Value to fill nulls with
      Returns:
      DataFrame with filled values
    • fillNaColumns

      DataFrame fillNaColumns(Object value, String... columns)
      Fills null values in specified columns with a given value.
      Parameters:
      value - Value to fill nulls with
      columns - Columns to fill
      Returns:
      DataFrame with filled values
    • convert

      DataFrame convert(Map<String,Class<?>> typeMap)
      Converts column types according to the provided type map.
      Parameters:
      typeMap - Map of column names to their target types
      Returns:
      DataFrame with converted types
    • interpolate

      DataFrame interpolate(String method, String... columns)
      Interpolates missing values using specified method.
      Parameters:
      method - Interpolation method to use
      columns - Columns to interpolate
      Returns:
      DataFrame with interpolated values
    • standardize

      DataFrame standardize(String... columns)
      Standardizes specified columns (z-score normalization).
      Parameters:
      columns - Columns to standardize
      Returns:
      DataFrame with standardized values
    • normalize

      DataFrame normalize(String... columns)
      Normalizes specified columns to [0,1] range.
      Parameters:
      columns - Columns to normalize
      Returns:
      DataFrame with normalized values
    • toCsv

      void toCsv(String filePath) throws Exception
      Exports DataFrame to CSV file.
      Parameters:
      filePath - Path where to save the CSV file
      Throws:
      Exception - If there's an error writing the file
    • toJson

      void toJson(String filePath) throws Exception
      Exports DataFrame to JSON file.
      Parameters:
      filePath - Path where to save the JSON file
      Throws:
      Exception - If there's an error writing the file
    • toExcel

      void toExcel(String filePath) throws Exception
      Exports DataFrame to Excel file.
      Parameters:
      filePath - Path where to save the Excel file
      Throws:
      Exception - If there's an error writing the file
    • toPowerBI

      void toPowerBI(String filePath) throws Exception
      Exports DataFrame to PowerBI format.
      Parameters:
      filePath - Path where to save the PowerBI file
      Throws:
      Exception - If there's an error writing the file
    • toHtml

      void toHtml(String filePath) throws Exception
      Exports DataFrame to HTML format.
      Parameters:
      filePath - Path where to save the HTML file
      Throws:
      Exception - If there's an error writing the file
    • toLatex

      void toLatex(String filePath) throws Exception
      Exports DataFrame to LaTeX format.
      Parameters:
      filePath - Path where to save the LaTeX file
      Throws:
      Exception - If there's an error writing the file
    • sum

      double sum(String column)
      Calculates the sum of a column.
      Parameters:
      column - Column name
      Returns:
      Sum value
    • avg

      double avg(String column)
      Calculates the average of a column.
      Parameters:
      column - Column name
      Returns:
      Average value
    • count

      long count(String column)
      Counts non-null values in a column.
      Parameters:
      column - Column name
      Returns:
      Count of non-null values
    • min

      double min(String column)
      Finds the minimum value in a column.
      Parameters:
      column - Column name
      Returns:
      Minimum value
    • max

      double max(String column)
      Finds the maximum value in a column.
      Parameters:
      column - Column name
      Returns:
      Maximum value
    • describe

      Map<String,Object> describe(String column)
      Provides descriptive statistics for a column.
      Parameters:
      column - Column name
      Returns:
      Map of descriptive statistics
    • quantile

      double quantile(String column, double q)
      Calculates the quantile value for a column.
      Parameters:
      column - Column name
      q - Quantile value (0-1)
      Returns:
      Quantile value
    • iqr

      double iqr(String column)
      Calculates the interquartile range for a column.
      Parameters:
      column - Column name
      Returns:
      IQR value
    • frequency

      Map<Object,Long> frequency(String column)
      Calculates frequency distribution for a column.
      Parameters:
      column - Column name
      Returns:
      Map of values to their frequencies
    • normalDist

      double[] normalDist(int size, double mean, double std)
      Generates normal distribution samples.
      Parameters:
      size - Number of samples
      mean - Mean of the distribution
      std - Standard deviation
      Returns:
      Array of samples
    • normalPdf

      double normalPdf(double x, double mean, double std)
      Calculates normal probability density function value.
      Parameters:
      x - Input value
      mean - Mean of the distribution
      std - Standard deviation
      Returns:
      PDF value
    • normalCdf

      double normalCdf(double x, double mean, double std)
      Calculates normal cumulative distribution function value.
      Parameters:
      x - Input value
      mean - Mean of the distribution
      std - Standard deviation
      Returns:
      CDF value
    • binomialDist

      int[] binomialDist(int trials, double prob, int size)
      Generates binomial distribution samples.
      Parameters:
      trials - Number of trials
      prob - Success probability
      size - Number of samples
      Returns:
      Array of samples
    • poissonDist

      double[] poissonDist(double lambda, int size)
      Generates Poisson distribution samples.
      Parameters:
      lambda - Rate parameter
      size - Number of samples
      Returns:
      Array of samples
    • uniformDist

      double[] uniformDist(int size, double min, double max)
      Generates uniform distribution samples.
      Parameters:
      size - Number of samples
      min - Minimum value
      max - Maximum value
      Returns:
      Array of samples
    • correl

      double correl(String col1, String col2)
      Calculates correlation between two columns.
      Parameters:
      col1 - First column name
      col2 - Second column name
      Returns:
      Correlation coefficient
    • linearReg

      double[] linearReg(String xCol, String yCol)
      Performs simple linear regression.
      Parameters:
      xCol - Independent variable column
      yCol - Dependent variable column
      Returns:
      Array containing slope and intercept
    • rsquared

      double rsquared(String xCol, String yCol)
      Calculates R-squared value for linear regression.
      Parameters:
      xCol - Independent variable column
      yCol - Dependent variable column
      Returns:
      R-squared value
    • multipleReg

      DataFrame multipleReg(String[] xCols, String yCol)
      Performs multiple linear regression.
      Parameters:
      xCols - Independent variable columns
      yCol - Dependent variable column
      Returns:
      DataFrame with regression results
    • polynomialReg

      DataFrame polynomialReg(String xCol, String yCol, int degree)
      Performs polynomial regression.
      Parameters:
      xCol - Independent variable column
      yCol - Dependent variable column
      degree - Polynomial degree
      Returns:
      DataFrame with regression results
    • logisticReg

      DataFrame logisticReg(String xCol, String yCol)
      Performs logistic regression.
      Parameters:
      xCol - Independent variable column
      yCol - Dependent variable column
      Returns:
      DataFrame with regression results
    • tTest

      Map<String,Double> tTest(String col1, String col2)
      Performs t-test between two columns.
      Parameters:
      col1 - First column name
      col2 - Second column name
      Returns:
      Map containing test results
    • anova

      Map<String,Double> anova(String... columns)
      Performs one-way ANOVA test.
      Parameters:
      columns - Column names to compare
      Returns:
      Map containing test results
    • chiSquare

      Map<String,Double> chiSquare(String col1, String col2)
      Performs chi-square test of independence.
      Parameters:
      col1 - First column name
      col2 - Second column name
      Returns:
      Map containing test results
    • shapiroWilk

      Map<String,Double> shapiroWilk(String column)
      Performs Shapiro-Wilk normality test.
      Parameters:
      column - Column name
      Returns:
      Map containing test results
    • mannWhitney

      Map<String,Double> mannWhitney(String col1, String col2)
      Performs Mann-Whitney U test.
      Parameters:
      col1 - First column name
      col2 - Second column name
      Returns:
      Map containing test results
    • sma

      double[] sma(double[] data, int window)
      Calculates Simple Moving Average.
      Parameters:
      data - Input data array
      window - Window size
      Returns:
      Array of SMA values
    • ema

      double[] ema(double[] data, double alpha)
      Calculates Exponential Moving Average.
      Parameters:
      data - Input data array
      alpha - Smoothing factor
      Returns:
      Array of EMA values
    • forecast

      DataFrame forecast(String timeCol, String valueCol, int periods)
      Forecasts future values using time series analysis.
      Parameters:
      timeCol - Time column name
      valueCol - Value column name
      periods - Number of periods to forecast
      Returns:
      DataFrame with forecasted values
    • decompose

      DataFrame decompose(String timeCol, String valueCol)
      Decomposes time series into components.
      Parameters:
      timeCol - Time column name
      valueCol - Value column name
      Returns:
      DataFrame with decomposition components
    • seasonalAdjust

      DataFrame seasonalAdjust(String timeCol, String valueCol)
      Performs seasonal adjustment on time series.
      Parameters:
      timeCol - Time column name
      valueCol - Value column name
      Returns:
      DataFrame with adjusted values
    • detectOutliers

      DataFrame detectOutliers(String timeCol, String valueCol)
      Detects outliers in time series data.
      Parameters:
      timeCol - Time column name
      valueCol - Value column name
      Returns:
      DataFrame with outlier information
    • pivot

      DataFrame pivot(String index, String columns, String values)
      Creates a pivot table from the DataFrame.
      Parameters:
      index - Index column name
      columns - Column names for pivot
      values - Values column name
      Returns:
      Pivoted DataFrame
    • melt

      DataFrame melt(String[] idVars, String[] valueVars)
      Unpivots DataFrame from wide to long format.
      Parameters:
      idVars - Columns to use as identifier variables
      valueVars - Columns to unpivot
      Returns:
      Melted DataFrame
    • dummies

      DataFrame dummies(String... columns)
      Creates dummy/indicator variables.
      Parameters:
      columns - Columns to convert to dummy variables
      Returns:
      DataFrame with dummy variables
    • bin

      DataFrame bin(String column, int bins)
      Bins continuous data into discrete intervals.
      Parameters:
      column - Column to bin
      bins - Number of bins
      Returns:
      DataFrame with binned data
    • rollingWindow

      DataFrame rollingWindow(String column, int window, String func)
      Applies function over rolling window.
      Parameters:
      column - Column name
      window - Window size
      func - Function to apply
      Returns:
      DataFrame with rolling window calculations
    • groupBy

      DataFrame groupBy(String... columns)
      Groups DataFrame by specified columns.
      Parameters:
      columns - Columns to group by
      Returns:
      Grouped DataFrame
    • sort

      DataFrame sort(String... columns)
      Sorts DataFrame by specified columns.
      Parameters:
      columns - Columns to sort by
      Returns:
      Sorted DataFrame
    • select

      DataFrame select(String... columns)
      Selects specified columns from DataFrame.
      Parameters:
      columns - Columns to select
      Returns:
      DataFrame containing only the selected columns
    • sample

      DataFrame sample(int n)
      Creates a random sample of rows from the DataFrame.
      Parameters:
      n - Number of rows to sample
      Returns:
      DataFrame containing the sampled rows
    • merge

      DataFrame merge(DataFrame other, String how, String... on)
      Merges current DataFrame with another DataFrame.
      Parameters:
      other - DataFrame to merge with
      how - Type of merge ('inner', 'outer', 'left', 'right')
      on - Columns to merge on
      Returns:
      Merged DataFrame
    • concat

      DataFrame concat(DataFrame other, boolean axis)
      Concatenates current DataFrame with another DataFrame.
      Parameters:
      other - DataFrame to concatenate
      axis - If true, concatenate along columns; if false, along rows
      Returns:
      Concatenated DataFrame
    • reshape

      DataFrame reshape(int rows, int cols)
      Reshapes the DataFrame to specified dimensions.
      Parameters:
      rows - Number of rows in reshaped DataFrame
      cols - Number of columns in reshaped DataFrame
      Returns:
      Reshaped DataFrame
    • quickAnalysis

      DataFrame quickAnalysis(String... columns)
      Performs quick exploratory data analysis on specified columns.
      Parameters:
      columns - Columns to analyze
      Returns:
      DataFrame containing analysis results including basic statistics, distribution information, and potential anomalies
    • fullReport

      Map<String,Object> fullReport(String... columns)
      Generates a comprehensive analysis report for specified columns.
      Parameters:
      columns - Columns to analyze
      Returns:
      Map containing detailed analysis results including statistical tests, visualizations, and data quality metrics
    • timeAnalysis

      DataFrame timeAnalysis(String dateCol, String valueCol)
      Performs time-based analysis on a datetime column and corresponding value column.
      Parameters:
      dateCol - Column containing datetime values
      valueCol - Column containing values to analyze
      Returns:
      DataFrame with time-based analysis results including trends, seasonality, and temporal patterns
    • correlation

      DataFrame correlation(String... columns)
      Calculates correlation matrix for specified columns.
      Parameters:
      columns - Columns to include in correlation analysis
      Returns:
      DataFrame containing correlation matrix with correlation coefficients between all pairs of specified columns
    • missingAnalysis

      DataFrame missingAnalysis()
      Analyzes missing values in the DataFrame.
      Returns:
      DataFrame containing missing value analysis including counts, percentages, and patterns of missing data
    • outlierAnalysis

      DataFrame outlierAnalysis(String... columns)
      Identifies and analyzes outliers in specified columns.
      Parameters:
      columns - Columns to check for outliers
      Returns:
      DataFrame containing outlier analysis results including identified outliers, their impact, and statistical justification