I have a dataset of videos (60); from this videos I extract images and audio,
from each video I extract 30 images, so in total 1800 images, the shape of x_train and y_train is
x_train.shape => (1800,224,224,3),
y_train.shape => (1800,5)
From audio i extract 15 signal (array of numbers), so in total 60*15=900, the shape of x_train and y_train is
x_train.shape => (900, 128)
y_train.shape => (900,5)
images are fed into VggFace (fine tuned ) model(1); the output shape is (1800,128)
audios are fed into a Vggish model (2), the output shape is (900, 128)
after training two model (1 and 2)
Both models are used as input for third model3; The problem I faced is that:
model3.fit([x_audio_train, x_video_train], y_train, …)
I got the error:
the input size should be the same.
I hope you got my issue;
How can I fix this ?