I have a simple hierarchical model, currently developed in PyMC3 as follows:

```
a_b = np.array([[3, 4, 6, 1, 5, 2, 9, 7],
[1, 2, 8, 4, 5, 3, 9, 6],
[2, 2, 3, 1, 5, 5, 9, 8],
[2, 1, 8, 2, 9, 3, 8, 8],
[2, 4, 9, 1, 4, 3, 5, 5],
[1, 2, 9, 1, 3, 5, 5, 4]])
a_w = np.array([
[ 7, 6, 4, 9, 5, 8, 1, 3],
[ 9, 8, 2, 5, 4, 5, 1, 3],
[ 8, 8, 5, 9, 5, 5, 1, 2],
[ 8, 9, 2, 8, 1, 8, 2, 2],
[ 8, 6, 1, 9, 6, 7, 4, 4],
[ 9, 8, 1, 9, 7, 5, 5, 6]])
aw_tc = np.sum(a_w, axis=1)
ab_tc = np.sum(a_b, axis=1)
dm_no, c_no = a_w.shape
with pm.Model() as BayesianBWM:
w_star = pm.Dirichlet("w_star", a=0.01*np.ones(c_no))
gamma_agg = pm.Gamma("beta_agg",alpha=0.01, beta=0.01)
for i in range(dm_no):
globals()[f"w{i}"] = pm.Dirichlet('w'+str(i), a=gamma_agg*w_star)
globals()[f"aw{i}"] = pm.Multinomial('a_w'+str(i),n=aw_tc[i], p=globals()[f"w{i}"], observed=a_w[i,:])
globals()[f"ab{i}"] = pm.Multinomial('a_b'+str(i),n=ab_tc[i], p= 1 / globals()[f"w{i}"], observed=a_b[i,:])
```

As you see, we have two observed matrices, a_w and a_b, and for each row of these matrices, we learn a w, and finally based on these w’s, we learn w_star.

I have difficulties implementing this in Tensorflow JointDistributionSequential. I want to develop it purely in TF probability and not using other high level libraries such as Edward2.

Any help is highly appreciated!