Train Test Split Using np.random.rand()

Why TensorFlow documentation uses np.random.rand() function to create train test split of the dataset? For example:

def split_dataset(dataset, test_ratio=0.30):
  """Splits a panda dataframe in two."""
  test_indices = np.random.rand(len(dataset)) < test_ratio
  return dataset[~test_indices], dataset[test_indices]

Above code snippet is copied from Automated hyper-parameter tuning. We can’t say with 100% guarantee that this code will split 30% of the dataset as test and remaining 70% as training since the process is dependent on random number generation without any initial seed. Then why the official documentation uses numpy random number generation to split the dataset?

Hi @Shiv_Katira, yes by using np.random.rand we cannot split the dataset exactly to 70% and 30%. But if you see the number of the sample present after splitting the dataset it will be close to 70% and 30%. That might be the reason for using random method. Thank You.