Unfortunately all of these models are trained on 2d data sets. I just don't see how this can be adapted to produce a stereo image pair, without some wonky depth map estimation -> 3d reconstruction (which, from all the tests I've seen, results in tons of artifacts/flickering).
When you can turn a 2d image into a 3d 180 SBS video, that will be the real killer app for VR.