Video compression works a bit different from image compression.
On a film-reel the illusion of motion is created by showing one image after another in rapid succession. Something I guess every one is familiar with.
In digital media this is referred to as RAW; no compression. This is usually the format professional recording studio's and filmmakers shoot their footage in.
This results in huge files, a 30 minute scene on the website can easily be a couple terabyte in RAW footage, for 1 angle.
1 frame or 1 image, even in (current) VR, is a flat 2D object.
In video, when we're talking about encoding and decoding (codec) there is a 3rd dimension, time.
Digitally, with math we can not only store the data of one frame, but also how one frame relates to another frame. The most basic and common techniques is the use of P-frames. These frames are computer-generated images where pixels from frame 1 are reused and even moved to generate frame 2.
Sometimes this can create an effect most ppl have already encountered once or more where the video is getting smudged. https://www.youtube.com/watch?v=i-bz21deEeY
When you add more frames to a video, you need more of this transitional data. Adding 1 frame = adding 2 changes.
ie, lets say we have 2 frames per second, Frame A B
1 change
Now let's increase the framerate by 50%, thus 3 frames per second
A B C , 2 changes in the same amount of time, that's a 100% increase
In both cases, the next frame is again frame A, which is a whole new frame. No change over time needs to be recorded.
There is a lot more to it, a lot a lot, but with this I want to illustrate that seemingly only 50% of extra content is added, mathematically this is an exponential function.
Something extra on video frames https://www.youtube.com/watch?v=eYHBSoCmC0I