DataAnalyzer

Documentation for DataAnalyzer.

Installation

using Pkg
Pkg.add("DataAnalyzer")

Usage

# Import the package
using DataAnalyzer

# Create configuration
config = AnalysisConfig(
    variability_threshold = 2.0,
    output_path = "results.png"
)

# Process data
data = get_data("data.csv")
processed = process_data(data)
results = analyze_results(processed, config)
plot_analysis(results, config)

DataAnalyzer.AnalysisConfig
DataAnalyzer.AnalysisResults
DataAnalyzer.analyze_results
DataAnalyzer.get_data
DataAnalyzer.plot_analysis
DataAnalyzer.process_data

API Reference

Types

DataAnalyzer.AnalysisConfig — Type

struct AnalysisConfig

Configuration for data analysis.

Fields

variability_threshold::Float64: Threshold for variability, must be positive.
output_path::String: Path to save the analysis results.

Constructor

AnalysisConfig(; variability_threshold=2.0, output_path="analysis_results.png")
- variability_threshold: Threshold for variability, default is 2.0.
- output_path: Path to save the analysis results, default is "analysis_results.png".

source

DataAnalyzer.AnalysisResults — Type

struct AnalysisResults

Results of the data analysis.

Fields

high_variance_categories::DataFrame: DataFrame containing categories with high variance.
total_samples::Int: Total number of samples analyzed.
anomaly_rate::Float64: Rate of anomalies detected in the data.

Constructor

AnalysisResults(high_variance_categories::DataFrame, total_samples::Int, anomaly_rate::Float64)
- high_variance_categories: DataFrame containing categories with high variance.
- total_samples: Total number of samples analyzed.
- anomaly_rate: Rate of anomalies detected in the data.

source

Functions

DataAnalyzer.get_data — Function

get_data(path::AbstractString) -> DataFrame

Reads data from a CSV file and returns it as a DataFrame.

Arguments

path::AbstractString: The path to the CSV file.

Returns

DataFrame: The data read from the CSV file.

Throws

ArgumentError: If the file does not exist.
ErrorException: If there is an error reading the data.

source

DataAnalyzer.process_data — Function

process_data(data::DataFrame) -> DataFrame

Processes the input DataFrame by validating required columns, removing missing values, and calculating statistics for each category.

Arguments

data::DataFrame: The input data containing at least 'category' and 'value' columns.

Returns

DataFrame: A DataFrame with calculated statistics including average value, standard deviation, and count for each category.

Throws

ArgumentError: If required columns are missing.
ErrorException: If there is an error in computing statistics.

source

DataAnalyzer.analyze_results — Function

analyze_results(processed_data::DataFrame, config::AnalysisConfig=AnalysisConfig()) -> AnalysisResults

Analyzes the processed data to identify categories with high variability and calculates metrics.

Arguments

processed_data::DataFrame: The DataFrame containing processed data with statistics.
config::AnalysisConfig: Configuration for the analysis, including variability threshold.

Returns

AnalysisResults: An object containing high variability categories, total samples, and anomaly rate.

Throws

ErrorException: If there is an error during the analysis.

source

DataAnalyzer.plot_analysis — Function

plot_analysis(results::AnalysisResults, config::AnalysisConfig=AnalysisConfig()) -> Plot

Generates a bar plot for categories with high variability and saves it to a file.

Arguments

results::AnalysisResults: The results of the analysis containing high variability categories.
config::AnalysisConfig: Configuration for the analysis, including output path for the plot.

Returns

Plot: The generated bar plot.

Throws

ErrorException: If there is an error during the plotting process.

source