How can I perform multi-modal early-fusion using these 2 models?

I have 2 models that both work with 224x224x3 images: 1 is for binary classification with an output of (None, 1) and the other one returns (None, 365), expecting as a final result the same binary classification of the 1st model. This is for my thesis and the idea is to check if early-fusion would improve the accuracy of the 1st standalone model.

For this, I need to perform early-fusion using these 2 models, but I have never performed multi-modal fusion before and, despite understanding the theory of how it works, I find myself lost in implementing it in code (i have only seen late-fusion examples, but not early-fusion) . So far, I’ve come to understand that I have to:

  1. Extract input-level features of both models.
  2. Concatenate them in a single layer.
  3. Build a new model that ends up in binary classification.
  4. Not necessary to retrain this new model.

What advice can you give me solving this issue? Are there any concepts I am not comprehending correctly?

This is what I have so far:

import keras
import pandas as pd
from places_365 import VGG16_Places365
from keras.models import load_model
from keras import losses
from keras.models import Sequential
from keras.layers import Dense, Input
from keras.optimizers import Adam
from keras.metrics import Precision, Recall


places365_model = VGG16_Places365(weights='places')
model_365_features = places365_model(Input(shape=(224,224,3)))

vgg19_model_location = 'models/vgg19_binary.keras'
vgg19_model = load_model(vgg19_model_location)
vgg19_model_features = vgg19_model(Input(shape=(224,224,3)))

concatenate_layer = keras.layers.Concatenate([model_365_features, vgg19_model_features])

model = Sequential()
model.add(Dense(256, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(64, activation='relu'))
model.add(Dense(32, activation='relu'))
model.add(Dense(16, activation='relu'))
model.add(Dense(8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))