DataAnalyzer
Documentation for DataAnalyzer.
Installation
using Pkg
Pkg.add("DataAnalyzer")Usage
# Import the package
using DataAnalyzer
# Create configuration
config = AnalysisConfig(
variability_threshold = 2.0,
output_path = "results.png"
)
# Process data
data = get_data("data.csv")
processed = process_data(data)
results = analyze_results(processed, config)
plot_analysis(results, config)DataAnalyzer.AnalysisConfigDataAnalyzer.AnalysisResultsDataAnalyzer.analyze_resultsDataAnalyzer.get_dataDataAnalyzer.plot_analysisDataAnalyzer.process_data
API Reference
Types
DataAnalyzer.AnalysisConfig — Typestruct AnalysisConfigConfiguration for data analysis.
Fields
variability_threshold::Float64: Threshold for variability, must be positive.output_path::String: Path to save the analysis results.
Constructor
AnalysisConfig(; variability_threshold=2.0, output_path="analysis_results.png")variability_threshold: Threshold for variability, default is 2.0.output_path: Path to save the analysis results, default is "analysis_results.png".
DataAnalyzer.AnalysisResults — Typestruct AnalysisResultsResults of the data analysis.
Fields
high_variance_categories::DataFrame: DataFrame containing categories with high variance.total_samples::Int: Total number of samples analyzed.anomaly_rate::Float64: Rate of anomalies detected in the data.
Constructor
AnalysisResults(high_variance_categories::DataFrame, total_samples::Int, anomaly_rate::Float64)high_variance_categories: DataFrame containing categories with high variance.total_samples: Total number of samples analyzed.anomaly_rate: Rate of anomalies detected in the data.
Functions
DataAnalyzer.get_data — Functionget_data(path::AbstractString) -> DataFrameReads data from a CSV file and returns it as a DataFrame.
Arguments
path::AbstractString: The path to the CSV file.
Returns
DataFrame: The data read from the CSV file.
Throws
ArgumentError: If the file does not exist.ErrorException: If there is an error reading the data.
DataAnalyzer.process_data — Functionprocess_data(data::DataFrame) -> DataFrameProcesses the input DataFrame by validating required columns, removing missing values, and calculating statistics for each category.
Arguments
data::DataFrame: The input data containing at least 'category' and 'value' columns.
Returns
DataFrame: A DataFrame with calculated statistics including average value, standard deviation, and count for each category.
Throws
ArgumentError: If required columns are missing.ErrorException: If there is an error in computing statistics.
DataAnalyzer.analyze_results — Functionanalyze_results(processed_data::DataFrame, config::AnalysisConfig=AnalysisConfig()) -> AnalysisResultsAnalyzes the processed data to identify categories with high variability and calculates metrics.
Arguments
processed_data::DataFrame: The DataFrame containing processed data with statistics.config::AnalysisConfig: Configuration for the analysis, including variability threshold.
Returns
AnalysisResults: An object containing high variability categories, total samples, and anomaly rate.
Throws
ErrorException: If there is an error during the analysis.
DataAnalyzer.plot_analysis — Functionplot_analysis(results::AnalysisResults, config::AnalysisConfig=AnalysisConfig()) -> PlotGenerates a bar plot for categories with high variability and saves it to a file.
Arguments
results::AnalysisResults: The results of the analysis containing high variability categories.config::AnalysisConfig: Configuration for the analysis, including output path for the plot.
Returns
Plot: The generated bar plot.
Throws
ErrorException: If there is an error during the plotting process.