RuntimeError: module must have its parameters and buffers on device cuda:1 (device_ids[0]) but found one of them on device: cuda:2

DataParallel requires every input tensor be provided on the first device in its device_ids list.

It basically uses that device as a staging area before scattering to the other GPUs and it's the device where final outputs are gathered before returning from forward. If you want device 2 to be the primary device then you just need to put it at the front of the list as follows

model = nn.DataParallel(model, device_ids = [2, 0, 1, 3])
model.to(f'cuda:{model.device_ids[0]}')

After which all tensors provided to model should be on the first device as well.

x = ... # input tensor
x = x.to(f'cuda:{model.device_ids[0]}')
y = model(x)