Hardware specifications

I want to do Bayesian optimization to find the best set of values. This is the dimension of my set data [102069 rows x 1415 columns] and my model is :

def build_model(hp):
“”“Creates a model.”“”

growing_strategy = hp.Choice("growing_strategy", ["LOCAL", "BEST_FIRST_GLOBAL"])
split_axis = hp.Choice("split_axis", ["SPARSE_OBLIQUE", "AXIS_ALIGNED"])
model_params = {
   "task": tfdf.keras.Task.REGRESSION,
    "min_examples": hp.Choice("min_examples", [2, 5, 7, 10]),
    "categorical_algorithm": hp.Choice("categorical_algorithm", ["CART", "RANDOM"]),
    "shrinkage": hp.Choice("shrinkage", [0.02, 0.05, 0.10, 0.15]),
    "num_candidate_attributes_ratio": hp.Choice("num_candidate_attributes_ratio", [0.2, 0.5, 0.9, 1.0]),
    "growing_strategy": growing_strategy,
    "split_axis" :split_axis,
    "max_depth":hp.Choice("max_depth", [3, 4, 5, 6, 8])

model_params["max_depth"] = hp.Choice("max_depth", [3, 4, 5, 6, 8])
if growing_strategy == "BEST_FIRST_GLOBAL":
    model_params["max_num_nodes"] = hp.Choice("max_num_nodes", [16, 32, 64, 128, 256])

if split_axis == "SPARSE_OBLIQUE":
    model_params["sparse_oblique_weights"] = hp.Choice("sparse_oblique_weights", ["BINARY", "CONTINUOUS"])
    model_params["sparse_oblique_normalization"] = hp.Choice("sparse_oblique_normalization", ["NONE", "STANDARD_DEVIATION", "MIN_MAX"])
    model_params["sparse_oblique_num_projections_exponent"] = hp.Choice("sparse_oblique_num_projections_exponent", [1.0, 1.5])
model = tfdf.keras.GradientBoostedTreesModel(**model_params)

# Optimize the model accuracy as computed on the validation dataset.
return model

#Implemento la bayesian_opt
keras_tuner_bayes = kt.BayesianOptimization(

but I have this problem

{‘task’: 2, ‘min_examples’: 7, ‘categorical_algorithm’: ‘CART’, ‘shrinkage’: 0.05, ‘early_stopping’: ‘NONE’, ‘num_candidate_attributes_ratio’: 0.2, ‘growing_strategy’: ‘LOCAL’, ‘split_axis’: ‘AXIS_ALIGNED’, ‘l1_regularization’: 0.0, ‘l2_regularization’: 0.0, ‘l2_categorical_regularization’: 10.0, ‘num_trees’: 700, ‘max_depth’: 6}
Use /tmp/tmpahoa71ah as temporary training directory
[WARNING 23-10-30 19:47:27.0726 CET gradient_boosted_trees.cc:1818] “goss_alpha” set but “sampling_method” not equal to “GOSS”.
[WARNING 23-10-30 19:47:27.0729 CET gradient_boosted_trees.cc:1829] “goss_beta” set but “sampling_method” not equal to “GOSS”.

[WARNING 23-10-30 19:47:27.0732 CET gradient_boosted_trees.cc:1843] “selective_gradient_boosting_ratio” set but “sampling_method” not equal to “SELGB”.
Reading training dataset…
Segmentation fault (core dumped)
in the test system I am using at the moment I have no GPU at disposal. However I am planning for a much better kit. Unfortunately I am not sure what I should be looking for, I mean should it be only GPU power?if so how much (for a decent processing time i.e. less than 1 hour)? and what about memory?

Any help will be much appreciated.

Hi @Caterina_Alessi
It seems to me the messages you are getting are info and warning messages, as opposed to error messages (see here). Also, at a glance, the warning messages is just Tensorflow telling it skips some combinations of hyper-parameters.
In the end, do you get result after the code finishes to run?
As per as computer requirements, maybe you can start with a smaller space of hyper parameters to optimize?

hi @tagoma ,
It stop with Segmentation fault (core dumped) that is an error and so I don’t reach my result. I don’t have also enough memory ecc but I can ask more memory and also the GPU but I don’t know how much. I also tried with a smaller space.

1 Like

Hi @Caterina_Alessi. Ok I missed it at the end of your message.
Maybe you can also try out the use smaller batch size trick?
Whatever you try out, hopefully you’ll get your model/code running soon.