Standalone Usage¶

To run Part-X, we need a black-box function, and oracle function and to define various parameters of Part-X.

Black-Box Function¶

The black-box function is the function for whom we need to find the falsifying behaviors. Below, we show to two examples.

The non-linear, non-convex Himmelblaus Function can be defined as follows:

def test_function(X):
   return (X[0]**2 + X[1] - 11)**2 + (X[1]**2 + X[0] - 7)**2 - 40

Oracle Function¶

The oracle function is the function which defines feasible and infeasible points that can be sampled by the Part-X algorithm. The oracle function takes in a point and returns value. The constraint is of the form \(f(x) <= 0\).

Here is an example of simple constraint

# if constraint is X[0]**2 + (X[1]+1)**2 > 0.5, then
def oracle_function(X):
   return -1*(X[0]**2 + (X[1]+1)**2-0.5)

If there are no constraints, you can define the oracle function as follows:

def oracle_function(X):
   return True

Defining GPR Model¶

The Gaussian Process Regressor(GPR) is an essential part of the Part-X algorithm. While one can choose the internal GPR model by importing it:

from partx.gprInterface import InternalGPR
gpr_model = InternalGPR()

This gpr_model can be passed to Part-X algorithm.

However, we even provide a way for users to use their own GPR.

To use this, the user has to import the GPR interface and write their GPR model as shown below:

from .gprInterface import GaussianProcessRegressorStructure
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import Matern, ConstantKernel, RBF, WhiteKernel
from scipy.optimize import fmin_l_bfgs_b
from sklearn.preprocessing import StandardScaler
from warnings import catch_warnings
import warnings


def optimizer_lbfgs_b(obj_func, initial_theta):
   with catch_warnings():
      warnings.simplefilter("ignore")
      params = fmin_l_bfgs_b(
            obj_func, initial_theta, bounds=None, maxiter=30000, maxfun=1e10
      )
   return params[0], params[1]


class UserDefinedGPR(GaussianProcessRegressorStructure):
   def __init__(self, random_state = 12345):
      self.gpr_model = GaussianProcessRegressor(
            kernel=Matern(nu=2.5), alpha=1e-6, normalize_y=True, n_restarts_optimizer=5, random_state = random_state
      )
      self.scale = StandardScaler()

   def fit_gpr(self, X, Y):
      """Method to fit gpr Model

      Args:
            x_train: Samples from Training set.
            y_train: Evaluated values of samples from Trainig set.


      """
      X_scaled = self.scale.fit_transform(X)

      with catch_warnings():
            warnings.simplefilter("ignore")
            self.gpr_model.fit(X_scaled, Y)

   def predict_gpr(self, X):
      """Method to predict mean and std_dev from gpr model

      Args:
            x_train: Samples from Training set.


      Returns:
            mean
            std_dev
      """
      x_scaled = self.scale.transform(X)
      with catch_warnings():
            warnings.simplefilter("ignore")
            yPred, predSigma = self.gpr_model.predict(x_scaled, return_std=True)
      return yPred, predSigma

Defining Bayesian Optimization Model¶

Like the GPR, the Bayesian Optimization (BO) is also an essential part of the Part-X algorithm. While one can choose the internal BO model by importing it:

from partx.bayesianOptimization import InternalBO
bo_model = InternalBO()

This bo_model can be passed to Part-X algorithm.

However, we even provide a way for users to use their own BO code. The idea is that the user can plug in the existing BO implementation for an implementatin such that it returns a single new point.

To use this, the user has to import the BO interface and write their BO model as shown below:

from typing import Callable, Tuple
 import numpy as np
 from numpy.typing import NDArray
 from scipy.optimize import minimize
 from scipy.stats import norm

 from .bointerface import BO_Interface
 from ..gprInterface import GPR
 from ..sampling import uniform_sampling

 class InternalBO(BO_Interface):
    def __init__(self):
       pass

    def sample(
       self,
       x_train: NDArray,
       y_train: NDArray,
       region_support: NDArray,
       gpr_model: Callable,
       oracle_info,
       rng,
    ) -> Tuple[NDArray]:

       """Internal BO Model

       Args:
             x_train: Samples from Training set.
             y_train: Evaluated values of samples from Trainig set.
             region_support: Min and Max of all dimensions
             gpr_model: Gaussian Process Regressor Model developed using Factory
             oracle_info: Oracle defining the constraints.
             rng: RNG object from numpy

       Raises:
             TypeError: If x_train is not 2 dimensional numpy array or does not match dimensions
             TypeError: If y_train is not (n,) numpy array
             TypeError: If there is a mismatch between x_train and y_train

       Returns:
             x_new
       """

       model = GPR(gpr_model)
       model.fit(x_train, y_train)

       pred_sample_x = self._opt_acquisition(y_train, model, region_support, oracle_info, rng)


       return pred_sample_x

    def _opt_acquisition(self, y_train: NDArray, gpr_model: Callable, region_support: NDArray, oracle_info, rng) -> NDArray:
       """Get the sample points

       Args:
             X: sample points
             y: corresponding robustness values
             model: the GP models
             sbo: sample points to construct the robustness values
             test_function_dimension: The dimensionality of the region. (Dimensionality of the test function)
             region_support: The bounds of the region within which the sampling is to be done.
                                        Region Bounds is M x N x O where;
                                           M = number of regions;
                                           N = test_function_dimension (Dimensionality of the test function);
                                           O = Lower and Upper bound. Should be of length 2;

       Returns:
             The new sample points by BO
       """

       tf_dim = region_support.shape[0]
       lower_bound_theta = np.ndarray.flatten(region_support[:, 0])
       upper_bound_theta = np.ndarray.flatten(region_support[:, 1])

       curr_best = np.min(y_train)

       # bnds = Bounds(lower_bound_theta, upper_bound_theta)
       fun = lambda x_: -1 * self._acquisition(y_train, x_, gpr_model)

       random_samples = uniform_sampling(2000, region_support, tf_dim, oracle_info, rng)
       min_bo_val = -1 * self._acquisition(
             y_train, random_samples, gpr_model, "multiple"
       )

       min_bo = np.array(random_samples[np.argmin(min_bo_val), :])
       min_bo_val = np.min(min_bo_val)

       for _ in range(9):
             new_params = minimize(
                fun,
                bounds=list(zip(lower_bound_theta, upper_bound_theta)),
                x0=min_bo,
             )

             if not new_params.success:
                continue

             if min_bo is None or fun(new_params.x) < min_bo_val:
                min_bo = new_params.x
                min_bo_val = fun(min_bo)
       new_params = minimize(
             fun, bounds=list(zip(lower_bound_theta, upper_bound_theta)), x0=min_bo
       )
       min_bo = new_params.x

       return np.array(min_bo)

    def _surrogate(self, gpr_model: Callable, x_train: NDArray):
       """_surrogate Model function

       Args:
             model: Gaussian process model
             X: Input points

       Returns:
             Predicted values of points using gaussian process model
       """

       return gpr_model.predict(x_train)

    def _acquisition(self, y_train: NDArray, sample: NDArray, gpr_model: Callable, sample_type:str ="single") -> NDArray:
       """Acquisition Model: Expected Improvement

       Args:
             y_train: corresponding robustness values
             sample: Sample(s) whose EI is to be calculated
             gpr_model: GPR model
             sample_type: Single sample or list of model. Defaults to "single". other options is "multiple".

       Returns:
             EI of samples
       """
       curr_best = np.min(y_train)

       if sample_type == "multiple":
             mu, std = self._surrogate(gpr_model, sample)
             ei_list = []
             for mu_iter, std_iter in zip(mu, std):
                pred_var = std_iter
                if pred_var > 0:
                   var_1 = curr_best - mu_iter
                   var_2 = var_1 / pred_var

                   ei = (var_1 * norm.cdf(var_2)) + (
                         pred_var * norm.pdf(var_2)
                   )
                else:
                   ei = 0.0

                ei_list.append(ei)
             # print(np.array(ei_list).shape)
             # print("*****")
             # return np.array(ei_list)
       elif sample_type == "single":
             # print("kfkf")
             mu, std = self._surrogate(gpr_model, sample.reshape(1, -1))
             pred_var = std[0]
             if pred_var > 0:
                var_1 = curr_best - mu[0]
                var_2 = var_1 / pred_var

                ei = (var_1 * norm.cdf(var_2)) + (
                   pred_var * norm.pdf(var_2)
                )
             else:
                ei = 0.0
             # return ei

       if sample_type == "multiple":
             return_ei = np.array(ei_list)
       elif sample_type == "single":
             return_ei = ei

       return return_ei

Parameters¶

The parameters that need to be defined for Part-X are mentioned below. The usage of these parameters are defined in the reference_examples

BENCHMARK_NAME: string
Name of the benchmark

test_function: function
The black-box test function

oracle_function: function
The oracle function

num_macro_rep: int
The number of replications

init_reg_sup: 2-d Numpy Array
Needs to be a 3-dimensional list that represents the initial region support of the function. For exmples:
region_support = [[-5,5], [-2,3], [-3,4]]
Here, the first dimension has the range [-5,5], the second dimension has the range [-2,3] and theird dimension has the range [-3,4]

tf_dim: int
Needs to be an interger that represents the dimensionality of the blask-box function

max_budget: int
The maximum budget or the maximum number of evaluations of the black-box function that are allowed.

init_budget: int
The initiliazation budget of the algorithm. This refers to minimimum nunmber of samples that are required to be present in a region in order to generate samples from bayesian optimization and classify the region.

bo_budget: int
The number of samples that needs to be generated from Bayesian Optimization

cs_budget: int
The number of samples that must sampled from continuous sampling phase.

n_tries_randomsampling: int
The number of tries a point should be sampled again to follow constraints in the random sampling phase. In case of no no constraint, set the value to 1. An error is raised if the number of tries is exhausted.

n_tries_BO: int
The number of tries a point should be sampled again to follow constraints in the BO sampling phase. In case of no constraint, set the value to 1. If th number of tries is exhausted, a random feasible point is selected.

alpha: float, [0,1]
Region Classification percentile

R: int
The number of monte-carlo iterations. This is used in calculation of quantiles of a region.

M: int
The number of evaluation of per monte-carlo iteration. This is used in calculation of quantiles of a region.

delta: float, int
A number used to define the fraction of dimension, below which no further brnching in that dimension takes place. It is used for clsssificastion of a region.

fv_quantiles_for_gp list
List of values used for calculation at certain quantile values.

branching_factor: int
Number of sub-regions in which a region is branched.

uniform_partitioning True/False
Wether to perform Uniform Partitioning or not.

start_seed: int
Starting seed of the experiment to ensure reproducibility.

gpr_model: The Gaussian Process Regressor model. Described in detail _reference_gpr_definition_standalone .

bo_model: The Bayesian Optimization model. Described in detail _reference_bo_definition_standalone .

init_sampling_type: str
Initial Sampling Algorithms. Defaults to “lhs_sampling”. Can also use “uniform_sampling”

cs_sampling_type: str
Continued Sampling Mechanism. Defaults to “lhs_sampling”. Can also use “uniform_sampling”

q_estim_sampling: str
Quantile estimation sampling Mechanism. Defaults to “lhs_sampling”. Can also use “uniform_sampling”

mc_integral_sampling_type: str
Monte Carlo Integral Sampling Mechanism. Defaults to “lhs_sampling”. Can also use “uniform_sampling”

results_sampling_type: str
Results Sampling Mechanism. Defaults to “lhs_sampling”. Can also use “uniform_sampling”

results_at_confidence: float
Confidence level at which result to be computed

results_folder_name:
Results folder name,

num_cores: int
Number of cores to use. If value is 1, no parallalization is used. If value is greater than 1, various macro-replication will be spread over the cores.

It would be advisable to refer to Algorithm 1, 2, 3, 4 in the paper Part-X to get a deeper understanding of these paramters and where they are used.

Running the Optimizer¶

Once the black-box function and the parameters are defined, we can run the code.

If we are using psy-staliro and passing the Part-X as an optimizer, we csn define the parameters as follows and pass them as options to psy-staliro.

...

from partx.partxInterface import run_partx

run_partx(BENCHMARK_NAME,
         test_function,
         oracle_function,
         num_macro_reps,
         init_reg_sup,
         tf_dim,
         max_budget,
         init_budget,
         bo_budget,
         cs_budget,
         n_tries_randomsampling,
         n_tries_BO,
         alpha,
         R,
         M,
         delta,
         fv_quantiles_for_gp,
         branching_factor,
         uniform_partitioning,
         start_seed,
         gpr_model,
         bo_model,
         init_sampling_type,
         cs_sampling_type,
         q_estim_sampling,
         mc_integral_sampling_type,
         results_sampling_type,
         results_at_confidence,
         results_folder_name,
         num_cores)