I'm not sure what you mean. Maybe I'm missing something.
Assuming that the recording of the original performance included sound, you'd still be able hear the game sounds and control panel or keyboard, from the original recording. You could then seamlessly add a mic to the stream and add whatever live "vocals" you wanted. A viewer would see and hear the game and the controls, and you talking with the chat about the score somebody got 20 minutes ago or whatever, and would just assume that you were talking while playing.
As long as your face weren't part of the shot, no chicanery would be apparent.
The only time suspicion would arise is if there were vocals but NO game/CP sound (ie, playback of a no-sound recording, or accidentally muting the system sound in XSplit). Because then the question would be "how can we hear this guy talking, but not his joystick knocking around?" No problem there if the joystick-knocking is in the recording.
And loading an INP undetectably is simple. Just write a bat file and double-click it. Nobody could possibly know what your shortcut actually does, and when MAME starts in playback mode it doesn't give any indication that it's playing an INP.