DataAnalyzer

Documentation for DataAnalyzer.

Installation

using Pkg
Pkg.add("DataAnalyzer")

Usage

# Import the package
using DataAnalyzer

# Create configuration
config = AnalysisConfig(
    variability_threshold = 2.0,
    output_path = "results.png"
)

# Process data
data = get_data("data.csv")
processed = process_data(data)
results = analyze_results(processed, config)
plot_analysis(results, config)

API Reference

Types

DataAnalyzer.AnalysisConfigType
struct AnalysisConfig

Configuration for data analysis.

Fields

  • variability_threshold::Float64: Threshold for variability, must be positive.
  • output_path::String: Path to save the analysis results.

Constructor

  • AnalysisConfig(; variability_threshold=2.0, output_path="analysis_results.png")
    • variability_threshold: Threshold for variability, default is 2.0.
    • output_path: Path to save the analysis results, default is "analysis_results.png".
source
DataAnalyzer.AnalysisResultsType
struct AnalysisResults

Results of the data analysis.

Fields

  • high_variance_categories::DataFrame: DataFrame containing categories with high variance.
  • total_samples::Int: Total number of samples analyzed.
  • anomaly_rate::Float64: Rate of anomalies detected in the data.

Constructor

  • AnalysisResults(high_variance_categories::DataFrame, total_samples::Int, anomaly_rate::Float64)
    • high_variance_categories: DataFrame containing categories with high variance.
    • total_samples: Total number of samples analyzed.
    • anomaly_rate: Rate of anomalies detected in the data.
source

Functions

DataAnalyzer.get_dataFunction
get_data(path::AbstractString) -> DataFrame

Reads data from a CSV file and returns it as a DataFrame.

Arguments

  • path::AbstractString: The path to the CSV file.

Returns

  • DataFrame: The data read from the CSV file.

Throws

  • ArgumentError: If the file does not exist.
  • ErrorException: If there is an error reading the data.
source
DataAnalyzer.process_dataFunction
process_data(data::DataFrame) -> DataFrame

Processes the input DataFrame by validating required columns, removing missing values, and calculating statistics for each category.

Arguments

  • data::DataFrame: The input data containing at least 'category' and 'value' columns.

Returns

  • DataFrame: A DataFrame with calculated statistics including average value, standard deviation, and count for each category.

Throws

  • ArgumentError: If required columns are missing.
  • ErrorException: If there is an error in computing statistics.
source
DataAnalyzer.analyze_resultsFunction
analyze_results(processed_data::DataFrame, config::AnalysisConfig=AnalysisConfig()) -> AnalysisResults

Analyzes the processed data to identify categories with high variability and calculates metrics.

Arguments

  • processed_data::DataFrame: The DataFrame containing processed data with statistics.
  • config::AnalysisConfig: Configuration for the analysis, including variability threshold.

Returns

  • AnalysisResults: An object containing high variability categories, total samples, and anomaly rate.

Throws

  • ErrorException: If there is an error during the analysis.
source
DataAnalyzer.plot_analysisFunction
plot_analysis(results::AnalysisResults, config::AnalysisConfig=AnalysisConfig()) -> Plot

Generates a bar plot for categories with high variability and saves it to a file.

Arguments

  • results::AnalysisResults: The results of the analysis containing high variability categories.
  • config::AnalysisConfig: Configuration for the analysis, including output path for the plot.

Returns

  • Plot: The generated bar plot.

Throws

  • ErrorException: If there is an error during the plotting process.
source