Package com.leumanuel.woozydata.model
Class DataFrame
java.lang.Object
com.leumanuel.woozydata.model.DataFrame
Core class for data manipulation and analysis.
Provides fluent interface for common data operations.
- Version:
- 1.0
- Author:
- Leu A. Manuel
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionclean()Performs automatic data cleaning operations.Removes duplicate rows from the DataFrame.dropNull()Removes rows containing null values.Fills null values with a specified value.Fills null values in specified columns.fixTypes()Automatically converts data types based on content.getData()Returns the underlying data structure of the DataFrame.Ranks (sorts) the DataFrame based on specified columns.Creates a new DataFrame containing only the specified columns.voidshow(int limit) Displays the first n rows of the DataFrame.
-
Constructor Details
-
DataFrame
Creates a new DataFrame with the given data.- Parameters:
data- List of maps representing tabular data
-
-
Method Details
-
clean
Performs automatic data cleaning operations. Includes null removal, duplicate removal, and type fixing.- Returns:
- this DataFrame for method chaining
-
dropNull
Removes rows containing null values.- Returns:
- this DataFrame for method chaining
-
dropDupes
Removes duplicate rows from the DataFrame.- Returns:
- this DataFrame for method chaining
-
fixTypes
Automatically converts data types based on content.- Returns:
- this DataFrame for method chaining
-
fill
Fills null values with a specified value.- Parameters:
value- Value to replace nulls with- Returns:
- this DataFrame for method chaining
-
fillNa
Fills null values in specified columns.- Parameters:
value- Value to replace nulls withcolumns- Columns to fill- Returns:
- this DataFrame for method chaining
-
rank
Ranks (sorts) the DataFrame based on specified columns. The ranking is done in descending order, with null values treated as lowest values. For large datasets (over 1000 rows), parallel processing is used for better performance.- Parameters:
columns- Columns to use for ranking, in order of priority- Returns:
- this DataFrame for method chaining
- Throws:
IllegalArgumentException- if no columns are specified
-
select
Creates a new DataFrame containing only the specified columns. Maintains the original row order but includes only the selected columns. If a specified column doesn't exist, it will be ignored.- Parameters:
columns- Names of columns to select- Returns:
- new DataFrame containing only the selected columns
- Throws:
IllegalArgumentException- if no columns are specified
-
show
public void show(int limit) Displays the first n rows of the DataFrame. Useful for quickly inspecting the data content.- Parameters:
limit- Number of rows to display- Throws:
IllegalArgumentException- if limit is negative
-
getData
Returns the underlying data structure of the DataFrame. Each element in the list represents a row, with column names mapped to values. The returned list is a direct reference to the DataFrame's data.Note: Modifying the returned list will affect the DataFrame's content. For a safe copy, clone the data before modifying.
- Returns:
- List of Maps containing the DataFrame's data
-