Best place to submit RFC for tensorflow core developers?


I’m working on a proof-of-concept tool for TensorFlow ops argument checking and automatic documentaion. Where is the best place to submit an RFC for core developers?

I tried a week ago but still no response. It seems extremely low traffic.



1 Like

/cc @markdaoust What do you think? Can we discuss informally these things earlier in the forum before, eventually, preparing a formal RFC PR?

1 Like

Thanks @Bhack,

Yeah, we can always discuss things here.

I own a lot of the documentation tools, so it’s right to have me in the loop.
And I can always help track down the right people, or ping internal list if necessary.

What’s the idea?

If I miss an update on this thread, just ping me again.

1 Like

Hi @markdaoust, @Bhack,

Apologies, it’s not quite ready to present for discussion yet. But I’m pleased to find the right venue, thank you for the prompt response. I’ll fix some things and be in touch shortly.




The best place to submit an RFC for core developers is the TensorFlow GitHub repository. You can post an issue on the repository with a detailed description of your tool. The core developers will review your issue and provide feedback if needed.

1 Like

The RFC process is formally defined at:

But as the process has a not trivial overhead it is better to informally discuss the idea in the forum or in any other SIG specific channel just for an early evaluation.

Hi @Bhack,

Thank you - sounds good. I’m cleaning up some documentation and fixing bugs at the moment, and will send an email in the next few days.



Hi @Bhack, @markdaoust,

The idea is here:

If you want to skip the examples and see how it works, check out:

schema section

and the following section #computation-graphs (sorry, it won’t let me link more than two)

As I mention in the README, it is a proof-of-concept only. I originally intended it to be a PyPI package to make using TensorFlow easier. But, I realized only recently that it is not maintainable in that form since these schemas would all need to exist in separate versions corresponding to each TensorFlow release.

When I started it, I thought it would take a week because the idea was “simple”. I was wrong… While the core idea is simple, I discovered that TensorFlow ops were quite a bit more complex in their behavior than I’d realized.

In any case, I have some sense for why this is, and why it is so difficult from an engineering standpoint for TensorFlow to keep documentation in synch with the actual capabilities of the ops. And, why many of the exceptions it raises are cryptic or wrong. I believe it would be feasible to implement within TensorFlow, but it might be a major undertaking. But if so, I’d be glad to discuss further.

And if you see some ideas in here that might be feasible but not others, would be great to hear your thoughts.



   import opschema
   Schema for tf.gather_nd
   Index  Description           
   b      batch                 
   r      read location         
   w      write location        
   e      slice element         
   c      read address component
   params  indices  return[0]
   bre     bwc      bwe      
   Index ranks
   rank(b) Unconstrained                          
   rank(r) in [1, 7]                              
   rank(w) Unconstrained                          
   rank(e) Unconstrained                          
   rank(c) = 1                                    
   rank(b,c,w) in [0, 10]     sum-range constraint
   rank(b,e,r) in [0, 10]     sum-range constraint
   Computed dimensions
   Index predicates
   DType Rules
   indices.dtype in (int32, int64)
   params.dtype in (int32, int64, float16, float32, float64)

Interesting, so you’re building up data structures to document the ops.

I do like the named index descriptions. Overall this doe come out clearer than trying to describe it any other way.

For example, compare this to the shape constraints block that I added to tensor_scatter_nd_update:

assert tf.rank(indices) >= 2
index_depth = indices.shape[-1]
batch_shape = indices.shape[:-1]
assert index_depth <= tf.rank(tensor)
outer_shape = tensor.shape[:index_depth]
inner_shape = tensor.shape[index_depth:]
assert updates.shape == batch_shape + inner_shape

My biggest concern here is that there are already multiple systems in tensorflow for describing ops, and I’m not sure we want to add another.

For example look at the code for registering an op in tensorflow:

And the api_def pbtxt files:

And the data-structure definitions are here:

Plus some ops have additional wrappers+docs in python (this one is trivial, some aren’t, Some of these accidently mask docs generated lower down the stack), these wrappers can do even more than ops:

Plus there was someone looking into trying to make python typing understand tensors, dtypes and shapes (and shape-functions?), but IDK if that effort is alive or dead.

So personally I feel like this already has a lot of moving parts (too many), so I’d vote on the side of cleaning up and expanding the systems we already have instead of rolling new ones.

The other concern I have about this approach is that there are 3000+ APIs in TensorFlow at the moment, and even if we can find a good sponsor for this, it would be months of work to get the top 10% of the API covered.

You’ve got some nice ideas here, I’m just trying to think if there are ways we could introduce them incrementally.

Hi Mark,

Yes, it would be a major undertaking to implement this. Ultimately I would only advocate a solution that has a single source of truth, and one that actually achieves a mathematically complete description of legal inputs. I know there is a very high premium on maintainability for any proposed feature, and I respect that completely.

It seems like the current mechanism of REGISTER_OP and pbtxt files is not a single source of truth, and is also not a complete description of legal inputs - maybe there is some place where there are more constraints defined?

Perhaps a first step would be to coalesce the REGISTER_OP and pbtxt mechanisms into one single mechanism? And, then augment them with a richer set of constraints?

I guess it also depends how much enthusiasm there is for cleaning up the exceptions and/or the documentation. At the moment both are pretty wild as you hopefully can see from the many examples.

Would you and @Bhack be up for a pow-wow some time? I’d like to contribute to tensorflow somehow - not sure if this would be it, but I have some smaller ideas as well.



I would also add that we still have many low level API cases related to asserts.
Just two sample two cases as we don’t have a specific GitHub label:

Hi @Bhack,

I’m trying to write a schema for LSTMBlockCell - would you be interested to try? If I can write one, then I could use opschema to generate many different test cases, valid and invalid.

Or can you give an example of a set of successful inputs that would be helpful.


As mentioned by @markdaoust more then the single case I think that we need to find a way to consolidate what we have evaluating the PR review resources.

Mark I don’t know what is you perception of the review bandwidth currently in the main repo. What kind of PR do you think we could review to progress a consolidation?


I added an LSTMBlockCell schema to opschema, hoping to use opschema to generate and run lots of variations of inputs on it. Unfortunately when I try to run it, it aborts:

python -m validate tf.raw_ops.LSTMBlockCell reports

From inspection, this is happening at:


=>    tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
                                        inputs, attrs, num_outputs)
  except core._NotOkStatusException as e:

Inputs were:

x: [495, 21]:float16
cs_prev: [495, 198]:float16
h_prev: [495, 198]:float16
w: [219, 792]:float16
b: [792]:float16
wci: [198, 198]:float16
wcf: [198, 198]:float16
wco: [198, 198]:float16

I tried setting a signal handler in Python:

signal.signal(signal.SIGABRT, handler) 

but it doesn’t work. Do you happen to know if there’s a way to trap the signal?


Hi @markdaoust @Bhack

(I removed all links - this chat keeps saying “Sorry, you can’t post a link to that host”)

I looked over those artifacts Mark mentioned.

So personally I feel like this already has a lot of moving parts (too many), so I’d vote on the side of cleaning up and expanding the systems we already have instead of rolling new ones.

Yes I agree - I would not advocate for just creating a new system and have it sit alongside the existing infrastructure. That would just create clutter and redundancy and confusion. My sense from looking at some of this current infrastructure, for example shape_inference.h
is that the basic abstractions are lacking. So I feel the first step, if there were interest, would be to discuss just on a logical level what abstractions are needed to define legal inputs.

For instance, tf.gather_nd accepts over 1600 different combinations, just for int32 inputs:

python -m explain tf.gather_nd -i \
    | awk '{ if ($2 == "int32" && $4 == "int32") { print $0 } }' \
    | less

which are specified in ops/tf/

The TensorFlow counterpart seems to be partly expressed in:

But, it doesn’t seem like the set of primitives defined in shape_inference.h are sufficient to truly define the legal inputs of an op.

My sense is the first step if this were viable would be to talk to people who wrote shape_inference.{h,cc} and discuss what other problems came up, what motivated its design.

I would love to hear from other developers who have thought about this problem, as well as your thoughts about the other aspects of my proposal - such as the comp_dims and notion of signatures and layouts etc.

The other concern I have about this approach is that there are 3000+ APIs in TensorFlow at the moment, and even if we can find a good sponsor for this, it would be months of work to get the top 10% of the API covered.

When you say ‘APIs’ do you mean APIs for each op, or something else?

For sure it would be a ton of work. However, I spent almost four months full time on opschema (though many blind alleys) so I’d be eager to put in a significant effort on the TF codebase. Part of my motivation, I’ll admit, is to pad my CV.

Also, my hope is that ultimately it will save a lot of time for TF core developers. Defining the op in the schema automatically gives you a generator for unit tests, so things like individual edge cases that cause a new issue (such as with LSTMBlockCell) which are discovered late can be discovered very early, and more thoroughly. As an example, it took about an hour to write the schema for LSTMBlockCell, but I can now use it to quickly discover that it crashes.



As many TF ops are going to be lowered to StableHLO soon or later, I sthink It could be interesting to investigate how OpenXLA/StableHLO is going to design the coverage with its interpreter:

1 Like

@markdaoust If you are interested take a look at the StableHLO team reply on Github