For maximum immersion, I play around with the pitch, FOV, zoom, etc; but it would be so amazing to be if there was a feature that could do this automatically!
I would imagine it would have to be done using optical processing/recognition, or "AI" like you do for the passthrough, because I imagine none of the videos would have mapped out the perfomer's body for the scene. Perhaps this is overly simplistic thinking. but could you use some part of the viewer to "snap to", like a penis? Then adjust the pitch, FOV, zoom, etc based off that? If not, would it be possible to somehow "snap to" a controller held at the penis level?
I don't expect it to be perfect because it depends on the recording, but I feel like this would be amazing.