Basic Keras model underperforming against Scikit-Learn MLPRegressor

I’ve experimented with sklearn’s MLPRegressor class and have seen that it does fairly well for the dataset I’m looking at without much tuning. However, I’d like to be able to build out a more complex model in Tensorflow/Keras using a split LSTM and Dense network.

To fulfill that end, I’m trying to first replicate the performance of MLPRegressor in Tensorflow for a very basic architecture but struggling so far.

Here’s an attempt to create identical models with each. The parameters in the TF implementation are intended to be based on the MLPRegressor documentation, including certain default values.

import numpy as np
import matplotlib.pyplot as plt

from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.models import Model
from tensorflow.keras.wrappers.scikit_learn import KerasRegressor
from tensorflow.keras.regularizers import L2
from tensorflow.random import set_seed

from sklearn.neural_network import MLPRegressor
from sklearn.pipeline import Pipeline
from sklearn.compose import TransformedTargetRegressor
from sklearn.preprocessing import StandardScaler

from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error

all_sk = []
all_tf = []

for _ in range(100):

    X, y = make_regression(n_samples=10000, n_features=20, n_informative=10, n_targets=1)
    # y = StandardScaler().fit_transform(y[:,None])[:,0]
    use_scaling = True
    seed = np.random.randint(0,1000)
    def simple_tf_model():
        dense_input = Input(shape=(X.shape[1],))
        dense = Dense(100, activation="relu", kernel_regularizer=L2(l2=0.0001))(dense_input)
        dense = Dense(1, activation="linear", kernel_regularizer=L2(l2=0.0001))(dense)     
        tf_model = Model(inputs=[dense_input], outputs=dense)     
        return tf_model
    sk_model = MLPRegressor(max_iter=5, hidden_layer_sizes=(100,), batch_size=200, random_state=seed, verbose=1)
    tf_model = KerasRegressor(build_fn=simple_tf_model, batch_size=200, epochs=5, validation_split=0.1)
    if use_scaling:
        sk_pipeline = Pipeline([('scaler', StandardScaler()), ('model', sk_model)])
        sk_model = TransformedTargetRegressor(regressor=sk_pipeline, transformer=StandardScaler()) 
        tf_pipeline = Pipeline([('scaler', StandardScaler()), ('model', tf_model)])
        tf_model = TransformedTargetRegressor(regressor=tf_pipeline, transformer=StandardScaler()) 
    sk_preds = sk_model.predict(X)
    tf_preds = tf_model.predict(X)
    def get_mse(preds, name):
        print(name, mean_squared_error(preds, y))
        if name == "SK":
            all_sk.append(mean_squared_error(preds, y))
            all_tf.append(mean_squared_error(preds, y))
    get_mse(sk_preds, "SK")
    get_mse(tf_preds, "TF")

sk_arr = np.array(all_sk)
tf_arr = np.array(all_tf)

print(sk_arr.mean()). # 350.0151514048654
print(tf_arr.mean()) # 382.19699899150226

Running the code above, there are two noticeable observations:

  • The loss during training is roughly half for MLPRegressor versus the TF model. This is also what I’ve observed on the real dataset.

  • The final MSE of the predictions on the training set is always lower for MLPRegressor (note: I’m not sure if the random seeds have the same effect on both models, but running the above in the loop and then comparing means should show this).

Any suggestions on why this might be are appreciated.