Ha, ok. AFAICT, the standard way of virtual texturing is to have just one single but huge texture. So you would scale down duck and cat UVs, and translate them so the UVs don't overlap but are beside each other in one common texture space. We also put both texture images into one big image so they match the new UVs.
At runtime, only the tiles containing duck and cat texture would be loaded if we render just that.
You also want a preprocessing tool to generate our common UV atlas and image tiles automatically, so you don't have to scale UVs manually for all your models.
The only limitation we get from this is that all our textures must share the same amount of channels. If we want an alpha channel for some foliage model, all other models get alpha channel too, although they don't need it. That's maybe the point where we want to use two textures, one RGB and one RGBA, and the problem you mention is unavoidable.
You could then store a material ID in your GBuffers, and the materials knows if it needs RGB or RGBA and picks the corresponding texture.
The topic is related to ‘global parametrization’, which might make it easier to think about it. Say we do a game like Rage, and each surface in the game has its own unique texture. Our scene has 10 ducks, but each duck can have different colors. To achieve this, we would generate unique UVs for each instance of the duck. So even the geometry is the same for all, each has its own texture. Now we could even bake static lighting into our texture, and each duck has it's own correct and unique lighting. We can also just paint our runtime decals into this texture. And beside material ID we also need an instance ID in the GBuffer.
That's very different then from the traditional way of reusing the same texture multiple times, but the same idea we already know from using static lightmaps.