Custom Metrics & Rewards - MAT-HPO Library

📊 Custom Metrics

MAT-HPO provides a flexible parameter-based interface that allows you to customize metrics tracking and reward computation without modifying the library code. This feature is especially useful for:

Time Series Forecasting: Track MASE, SMAPE, MAE, RMSE instead of F1/AUC/G-mean
Regression Tasks: Use MSE, R², RMSE, MAE as metrics
Custom Domains: Define any metrics specific to your problem
Custom Rewards: Implement complex reward functions based on multiple objectives

✅ Key Benefits

No library code modification needed
Support for arbitrary number of metrics (not limited to 3)
Preserves original metric values for proper evaluation
Backward compatible with existing F1/AUC/G-mean interface
Domain-agnostic design

Three Core Components

1️⃣ BaseEnvironment Parameters

Configure metrics and rewards in your environment:

custom_metrics: List of metrics to track
metric_names_mapping: Display name mapping
reward_function: Custom reward logic

2️⃣ HPOLogger Configuration

Enhanced logging with custom metrics:

metrics_extractor: Extract metrics from hyperparams
metric_names: Custom display names
Automatic separation of original vs. transformed values

3️⃣ Automatic Integration

Everything works seamlessly:

Optimizer auto-detects environment config
Logger inherits metric settings
Flexible storage in best_hyperparams.json

Complete Time Series Example

Step 1: Define Custom Functions

python

import numpy as np

# Define custom reward function
def timeseries_reward(metrics: dict) -> float:
    """Reward based on training loss (avoid data leakage)"""
    train_loss = metrics.get('train_loss', 1.0)
    if train_loss < 300.0:
        return 0.9
    elif train_loss < 400.0:
        return 0.7
    return 0.3

def extract_timeseries_metrics(hyperparams: dict) -> dict:
    """Extract all time series metrics from hyperparams"""
    return {
        'train_loss': float(hyperparams.get('train_loss', 0.0)),
        'val_loss': float(hyperparams.get('val_loss', 0.0)),
        'mase': float(hyperparams.get('mase', 1.0)),
        'smape': float(hyperparams.get('original_smape', 0.0)),
        'mae': float(hyperparams.get('original_mae', 0.0)),
        'rmse': float(hyperparams.get('original_rmse', 0.0)),
    }

Step 2: Configure Environment

python

from MAT_HPO_LIB import BaseEnvironment

class TimeSeriesEnvironment(BaseEnvironment):
    def __init__(self, model_name, dataset_name):
        super().__init__(
            name=f"TS-{model_name}-{dataset_name}",
            #  Custom metrics list
            custom_metrics=['train_loss', 'val_loss', 'mase', 'smape', 'mae', 'rmse'],
            #  Metric name mapping (for display)
            metric_names_mapping={
                'f1': 'SMAPE',
                'auc': 'MAE',
                'gmean': 'RMSE'
            },
            #  Custom reward function
            reward_function=timeseries_reward
        )

    def train_evaluate(self, model, hyperparams):
        # Train your model...
        train_loss = 331.72
        val_loss = 346.51

        # Evaluate on test set...
        mase = 2.304
        mae = 632.53
        rmse = 809.25
        smape = 0.0618

        # Return all metrics
        return {
            # Original training metrics
            'train_loss': train_loss,
            'val_loss': val_loss,
            'overfitting_ratio': val_loss / train_loss,

            # Original test metrics
            'mase': mase,
            'smape': smape,
            'mae': mae,
            'rmse': rmse,

            # Transformed values for MAT-HPO (higher is better)
            'f1': 0.8 - min(0.8, smape / 2.0),
            'auc': 0.8 - min(0.8, mae / 1000.0),
            'gmean': 0.8 - min(0.8, rmse / 1000.0),

            # Save original values
            'original_smape': smape,
            'original_mae': mae,
            'original_rmse': rmse
        }

    def compute_reward(self, metrics):
        # Use custom reward function
        if self.custom_reward_function:
            return self.custom_reward_function(metrics)
        return 0.5

Step 3: Create Logger and Optimizer

python

from MAT_HPO_LIB import MAT_HPO_Optimizer, HyperparameterSpace
from MAT_HPO_LIB.utils import DefaultConfigs
from MAT_HPO_LIB.utils.logger import HPOLogger

# Create environment
env = TimeSeriesEnvironment("dlinear", "us_births")

# Create hyperparameter space
space = HyperparameterSpace()
space.add_continuous('learning_rate', 1e-5, 1e-2, agent=0)
space.add_discrete('batch_size', [8, 16, 32, 64], agent=0)

# Create config
config = DefaultConfigs.standard()
config.max_steps = 100

# Create optimizer (automatically inherits environment's custom metrics)
optimizer = MAT_HPO_Optimizer(env, space, config)

# Run optimization
results = optimizer.optimize()

print(f"Best reward: {results['best_performance']['reward']:.4f}")

Pro Tip

The optimizer automatically detects and uses the custom metrics configuration from your environment. No manual logger setup needed!

Output Format

best_hyperparams.json

{
  "hyperparameters": {
    "learning_rate": 0.001,
    "batch_size": 32
  },
  "performance": {
    "smape": 0.0618,
    "mae": 632.53,
    "rmse": 809.25,
    "mase": 2.304,
    "train_loss": 331.72,
    "val_loss": 346.51,
    "overfitting_ratio": 1.045,
    "reward": 0.7512
  },
  "step": 42
}

step_log.jsonl (each line)

{
  "step": 0,
  "timestamp": "2025-10-03T08:00:00",
  "metrics": {
    "train_loss": 331.72,
    "val_loss": 346.51,
    "overfitting_ratio": 1.045,
    "mase": 2.304,
    "smape": 0.0618,
    "mae": 632.53,
    "rmse": 809.25,
    "f1_transformed": 0.7691,
    "auc_transformed": 0.1675,
    "gmean_transformed": 0.1000
  },
  "timing": {...},
  "hyperparameters": {...}
}

Metric Separation

The logger automatically separates:

Original values: smape, mae, rmse - True metric values for evaluation
Transformed values: *_transformed - Used internally by MAT-HPO for optimization

Advanced Usage

Manual Logger Configuration

For full control, you can manually configure the logger:

python

from MAT_HPO_LIB.utils.logger import HPOLogger

# Create custom logger
logger = HPOLogger(
    output_dir='./results',
    metric_names={'f1': 'SMAPE', 'auc': 'MAE', 'gmean': 'RMSE'},
    custom_metrics=['train_loss', 'val_loss', 'mase', 'smape', 'mae', 'rmse'],
    metrics_extractor=extract_timeseries_metrics
)

# Create optimizer
optimizer = MAT_HPO_Optimizer(env, space, config)
optimizer.logger = logger  # Override default logger

# Run optimization
results = optimizer.optimize()

Flexible Metric Count

Track as many metrics as you need:

python

super().__init__(
    name="MyEnv",
    custom_metrics=[
        'train_loss', 'val_loss', 'test_loss',
        'mase', 'smape', 'mae', 'rmse', 'mape', 'mse',
        'overfitting_ratio', 'training_time', 'inference_time'
    ],
    # ... other parameters
)

Complex Reward Functions

def sophisticated_reward(metrics: dict) -> float:
    """Multi-objective reward combining accuracy and efficiency"""
    # Accuracy component (70%)
    mase = metrics.get('mase', 10.0)
    accuracy_reward = 0.7 * (1.0 / max(mase, 0.1))

    # Efficiency component (20%)
    train_time = metrics.get('training_time', 1000)
    efficiency_reward = 0.2 * (1.0 / max(train_time / 100, 1.0))

    # Stability component (10%)
    overfitting = metrics.get('overfitting_ratio', 2.0)
    stability_reward = 0.1 * (1.0 if overfitting <= 1.2 else 0.5)

    total_reward = accuracy_reward + efficiency_reward + stability_reward
    return max(0.0, min(1.0, total_reward))

Use Cases

Time Series Forecasting

Metrics: MASE, SMAPE, MAE, RMSE
Reward: Based on validation loss
Track overfitting ratio

Regression

Metrics: MSE, RMSE, MAE, R²
Reward: Inverse of validation MSE
Track prediction intervals

Classification (Default)

Metrics: F1, AUC, G-mean, Precision, Recall
Reward: Weighted combination
Track per-class metrics

Multi-Objective

Metrics: Accuracy + Speed + Memory
Reward: Pareto optimization
Track resource usage

Best Practices

⚠️ Important Considerations

Avoid Data Leakage: Use training/validation metrics for rewards, not test metrics
Normalize Rewards: Keep rewards in a reasonable range (e.g., 0.0-1.0)
Save Original Values: Always preserve original metrics with original_* prefix
Transform Consistently: Ensure "higher is better" for f1/auc/gmean used in optimization

✅ Recommended Patterns

python

# ✅ Good: Save both original and transformed
return {
    'mase': 2.304,                    # Original value
    'f1': 0.8 - min(0.8, mase / 5),  # Transformed (higher is better)
    'original_mase': 2.304            # Explicit original backup
}

# ❌ Bad: Only transformed values
return {
    'f1': 0.8 - min(0.8, mase / 5)   # Lost original information!
}

API Reference

BaseEnvironment Parameters

Parameter	Type	Description
`custom_metrics`	`List[str]`	List of custom metric names to track (e.g., ['mase', 'smape', 'mae'])
`metric_names_mapping`	`Dict[str, str]`	Map internal names to display names (e.g., {'f1': 'SMAPE', 'auc': 'MAE'})
`reward_function`	`Callable[[Dict], float]`	Custom reward computation function taking metrics dict, returning float

HPOLogger Parameters

Parameter	Type	Description
`metrics_extractor`	`Callable[[Dict], Dict]`	Function to extract metrics from hyperparams dictionary
`metric_names`	`Dict[str, str]`	Custom metric display names for console output
`custom_metrics`	`List[str]`	List of metrics to track in logs

More Examples

For complete runnable examples, see:

CUSTOM_METRICS_GUIDE.md - Detailed Chinese guide
examples/timeseries_custom_metrics_example.py - Full working example