Hi all,

I ran the following script on Intel and AMD machines and observed a difference in the raw files generated by the code.

```
import numpy as np
import tensorflow as tf
# float32 NumPy array
a = np.arange(100, dtype=np.float32)
# The same array with the same dtype in TensorFlow
a_tf = tf.constant(a, dtype=tf.float32)
# Square root with NumPy
sqrt = np.sqrt(a)
sqrt.tofile('../np_exp/sqrt.raw')
# Square root with TensorFlow
with tf.Session() as sess:
sqrt_tf = sess.run(tf.sqrt(a_tf))
sqrt_tf.tofile('./sqrt.raw')
```

The raw files (of the tf session) generated on AMD and INTEL are compared, and the difference observed is as follows:

```
1.4142135 1.4142134 3
2.0 1.9999999 5
2.4494898 2.4494896 7
2.828427 2.8284268 9
3.0 2.9999998 10
3.162278 3.1622777 11
3.6055512 3.6055508 14
4.0 3.9999998 17
4.5825763 4.5825753 22
4.6904163 4.6904154 23
4.795831 4.7958307 24
4.8989797 4.898979 25
5.099019 5.0990186 27
5.1961527 5.196152 28
5.385165 5.3851647 30
5.656854 5.6568537 33
5.7445626 5.744562 34
5.830952 5.830951 35
6.0 5.9999995 37
6.0827627 6.0827622 38
6.324556 6.3245554 41
6.557439 6.557438 44
7.0000005 6.9999995 50
7.0710683 7.0710673 51
7.2111025 7.2111015 53
7.2801104 7.280109 54
7.416199 7.4161983 56
8.0 7.9999995 65
8.062258 8.062257 66
8.124039 8.124038 67
8.306624 8.306623 70
8.42615 8.426149 72
8.5440035 8.544003 74
8.6602545 8.660254 76
9.0 8.999999 82
9.055386 9.055385 83
9.110435 9.110433 84
9.165153 9.165151 85
9.273619 9.273618 87
9.380833 9.380831 89
9.486833 9.486834 91
9.539392 9.5393915 92
9.591662 9.591661 93
9.797959 9.797958 97
9.899495 9.899494 99
9.949875 9.949874 100
./AMD/sqrt.raw
./INTEL/sqrt.raw
Total Matches : 54 Mismatches : 46
```

(first column refers to AMD, second column refers to INTEL and third column corresponds to the i_th element in the output array that we are comparing. tolerence used for comparision is 0.0000001, i.e., the values are printed only if the difference is >= 0.0000001)

**Note**: the raw files generated by numpy sqrt computation by the above script has no difference across both the architechtures.

**Steps to reproduce**:

- Used tensorflow 1.15.5 which is built from source ( Build from source | TensorFlow).
- Disabled all supports during ./configure step (XLA, CUDA, etc.,).
- Provided “-march=x86-64” for --copt and --host_copt flags while doing bazel build and set --config=v1.

Can anyone please let me know whether there is a way to make tensorflow behave similarly across different CPU architechtures? Please let me know.

Thank you!