I am trying to run dataset in tensor flow but the data set is too heavy for the ram memory of my GPU so i need to apply some tensorflow pipeline on it, and need to load it from a HDF file per batch but again as the train_test_split is not accepting the DHF format directly ,and I need to convert it to numpy. But when i am trying to convert the whole dataset (which is two two arrays of trainX and trainY )numpy so that it can be readable for train_test_split, it is still enough heavy for for my ram.
So what is the correct syntax and code for reading data from file so that the model can read data per batch, from another file (like hdf) instead of putting it on ram. Then splitting it into train and test, making tensor slices from it and applying repeat, shuffle and batching on it,
The Data is saved in HDF file with the two arrays in it trainX and trainY,(i,e the the data and its truth values). I want to implement some pipelines techniques on it, so it can read data per batch and then perform the following operation on it i have read about TFRecordDataset but still cant implement it how to implement it in my case
"
trX, teX, trY, teY = train_test_split(trainX , trainY,
test_size = .1, random_state = 42)
train_data = tf.data.Dataset.from_tensor_slices((trX, trY))
train_data = train_data.repeat().shuffle(buffer_size=500,
seed= 8).batch(batch_size).prefetch(1)
"
these above steps i am implementing for whole dataset now how to pipeline it for every batch
Blockquote