Trying to convert Matlab model to Tensorflow and get similar performance

My question might be a little bit off-topic because it’s mostly about comparing setup and performance of two models. I designed a model on Matlab and I tried to convert it to Tensorflow hoping to get almost similar results. My task is a simple sequence (2 features) to label classification (binary). I have 6000 time steps (2 features) and each time step is either related to class 0 or 1.
I designed the following in Matlab:

layers = [ ...
    sequenceInputLayer(numFeatures)
    convolution1dLayer(16,32,"Name","conv1d","Padding","same","PaddingValue","replicate")
    bilstmLayer(512,"Name","bilstm","OutputMode","sequence")
    fullyConnectedLayer(512,"Name","fc_1")
    fullyConnectedLayer(512,"Name","fc_2")
    dropoutLayer(0.5,"Name","dropout")
    fullyConnectedLayer(numClasses)
    softmaxLayer
    classificationLayer];

I tried many combinations in Tensorflow. I’m not an expert to know why and what is actually crucial to improve performance. Here is my Tensorflow model that I tried to make it identical:

model = Sequential()
model.add(layers.Conv1D(16,32,padding='same',input_shape=(X_train.shape[1], X_train.shape[2])))
model.add(layers.Bidirectional(layers.LSTM(512,return_sequences=True)))
model.add(layers.Dense(512))
model.add(layers.Dense(512))
model.add(layers.Dropout(0.5))
model.add(Dense(1, activation='sigmoid'))

Now my questions:
1- Are these two models the same? I know one is in Matlab and this is not a Matlab forum but just in case if people have some ideas.
2- In Matlab I setup my data as X: 1x1 Cell → 2x6000 double Y: 1x1 Cell → 1x6000. The 1x1 cell refers to multiple sample or replications which is just 1 here. Eventually, in Tensorflow I came up with X=(6000,1,2), Y=(6000,1).
I tried (85,358,2) which distinguishes the sequences with 0 and 1 class and pad them to the largest sequence. With this setup the problem is when doing testing. If on test data I set the time intervals and ask for labels only, the model will have 100% accuracy but the problem is that this is cheating because the model itself should determine how long 0 sequences and 1 sequences are. So I have to make (1000,358,2) if I have 1000 time steps for testing and pad 357 redundant steps (must be the same as the training length and the first dimension can only have a label not the latter dimensions). In this case the model performs very poorly. Spoiler: with X=(6000,1,2), Y=(1000,1,2) setup, I get 96.14% accuracy. Finally, my questions is how to setup the data. I tested many different setups and searched around for long. Am I doing it right? Also when I use batch size 32, which dimension does this depend on? The Tensorflow setup for sequence classification is (None,seqLen,features) which None is the samples.
3- I used learning rate decay schedule in Matlab and I could achieve 99.51% accuracy. I applied the same in Tensorflow and in best case scenario with very fine tuning and long running time I could get 96.14% accuracy. Here is my approach:

def scheduler(epoch, lr):
    return lr * np.power(0.97,np.round(epoch/20))
myOptimizer=tf.keras.optimizers.Adam(learning_rate=0.0001)
model.compile(loss='binary_crossentropy', optimizer=myOptimizer, metrics=['accuracy'])
callback = tf.keras.callbacks.LearningRateScheduler(scheduler)
model.fit(X_train, y_train, epochs=100, batch_size=32, callbacks=[callback],validation_data=(val_data, val_labels))

My third question is that based on the model at the beginning, my data setup, and my optimizer setup, is it possible to achieve Matlab’s performance? Any hint is appreciated.