Hello World of efficient in-browser pose estimation

After a brief search, it appears that there are many competing options to do casual pose estimation in the browser. (yay choices! Yay improvements in 2022!).


For Halloween, I’d like to have the trick-or-treaters be able to control stuff by waving their arms. I thought the Hello-World way of doing this would be to open up a page on a Pixel 6 phone that turns on the camera, raise my arms, and be able to send back to the server “hey, this guy is raising his right arm about 90 degrees and left arm about 30 degrees”

But there are a LOT of options! Which should I start with? (I admit, I don’t know what most of these are, and hope to squeak by without learning any details)

  1. mediapipe or not (new hotness! sounds like yes, better as long as not iOS)
  2. WebAssembly or WebGL? (Sounds like WebAssembly is better for a Pixel 6 Chrome Mobile, 'cause no beefy laptop GPU)
  3. movenet or not? MoveNet: modello di rilevamento della posa ultra veloce e preciso.  |  TensorFlow Hub (sounds like yes)
  4. movenet fp16 or int8 (I’m guessing int8 but this one is a total guess)
  5. lightning or Thunder Next-Generation Pose Detection with MoveNet and TensorFlow.js — The TensorFlow Blog (sounds like I should use lightning)

I’ve got a hello-world that ops open a video stream navigator.mediaDevices.getUserMedia, I’m not running node or any modern tech (typescript, webpack, etc etc). Seriously, this is as hacky as possible: “Latest Mobile Chrome on a Pixel 6 phone, served up from a local folder with python3 -m http.server, that sends back left-arm and right-arm angles once every 1 second.”