- Edited
On another note and something I think would be especially interesting in PT, would it be possible to get an extra channel with the video stream that has depth data? It could be encoded in the alpha mask red parts, or however it works with the separate mask data used with the AI PT.
Having depth data opens up a whole range of possibilities from depth clipping your virtual hands (which unfortunately are not enabled for pico4 in deovr) with the model to more realistic perspective re-projection on head movements where you have actual perspective changes instead of the 2d warping what we now have with the 6DOF. And more.
The hands clipping would add a big level of immersion, I have implemented this in a VR viewer for Gaussian splatting where I have some scenes with human models and I was amazed by the immersion it provides. Of course there is no haptic feedback, but the visual feedback alone tricks me into a sensation of touching.
As for how to get the depth (estimation) data, I don't think this can be done in realtime on the device itself. But if preprocessing like the AI image segmentation for PT masks infra can be leveraged, then there are options like https://depth-anything.github.io/ which I believe is the current state of the art for monocular depth estimation. Of course here we have stereo and we can use that extra information and then I believe state of the art is https://haofeixu.github.io/unimatch/ although I haven't really been following the progress that closely.