How to create VHD disk image from a Linux live system?

For future reference, here is how I finally proceeded, with a few comments on the various issues or pitfalls encountered:

1. Boot the machine with a Linux live system

First step was to boot the machine containing the disk to image, using a Linux live system.

NOTE: My first idea was to use an Ubuntu Live USB disk, but the machine did not support booting from USB, so I found it easier to use an old Knoppix live CD.

2. Image the disk using dd and pipe the data through ssh

Then, I copied all the disk content to a file image on my local server using dd and piping the data through ssh:

$ dd if=/dev/hdX bs=4k conv=noerror,sync | ssh -c blowfish myuser@myserver 'dd of=myfile.dd'

A few comments here: this method will read all the disk contents, so it can take very long (it took me 5hrs for a 80Gb disk). The bottleneck isn't the network, but really the disk read speed. Before launching the copy, I advice to check the BIOS/disk/system parameters to ensure that the disk and the motherboard are working at their highest possible speed (this can be checked using the command hdparm -i and by running a test with hdparm -Tt /dev/hdX).

NOTE: dd does not output progress of the operation, but we can force it to do so by sending the USR1 signal to the dd process PID from another terminal:

$ kill -USR1 PIDofdd

Note: Newer versions of dd support the status=LEVEL option (man dd)

   status=LEVEL
          The  LEVEL of information to print to stderr; 'none' suppresses everything but error messages, 'noxfer' suppresses
          the final transfer statistics, 'progress' shows periodic transfer statistics

3. Reclaim the unused space

At this point, the source machine is no longer needed and we will work exclusively on the destination server (running Linux as well). VirtualBox will be used to convert the raw disk image to the VHD format, but before doing so, we can zero out the unused blocks, so that VirtualBox does not allocate space for them in the final file.

In order to do so, I mounted the images as a loopback device:

$ mount -o loop,rw,offset=26608813056 -t ntfs-3g /mnt/mydisk/myfile.dd /mnt/tmp_mnt
$ cat /dev/zero > zero.file
$ rm zero.file

NOTE: The offset indicating the beginning of the partition within the disk image can be obtained by using parted on the image file:

$ parted /mnt/mydisk/myfile.dd
(parted) unit
Unit?  [compact]? B
(parted) print
Model:  (file)
Disk /mnt/mydisk/myfile.dd: 80026361856B
Sector size (logical/physical): 512B/512B
Partition Table: msdos

Number  Start         End           Size          Type      File system  Flags
 1      32256B        21936821759B  21936789504B  primary   ntfs         boot
 2      21936821760B  80023749119B  58086927360B  extended               lba
 5      26608813056B  80023749119B  53414936064B  logical   ntfs

NOTE2: The default Linux kernel NTFS driver provides read-only access, thus it is necessary to install and use the userspace ntfs-3g driver or writing to the disk will raise an error!

4. Create the VHD image using VBoxManage

At this point, we can use the VirtualBox utilities to convert the raw image to a VHD file:

VBoxManage convertfromraw myfile.dd myfile.vhd --format VHD

I was trying to do exactly the same thing as the OP (while rescuing a Windows installation) and ended up creating a new tool for it, ntfsclone2vhd.

You would then simply do something like this:

ntfsclone --save-image -o - /dev/sdXX | ntfsclone2vhd - /mnt/usb/output.vhd

One approach is to use a couple of handy technologies: VirtualBox, and the ntfsprogs package.

Recent versions of VirtualBox allow you to create VHD hard disk files, while ntfsprogs provides the ntfsclone utility. As its name suggests, ntfsclone clones NTFS filesystems, and I believe that it does it at the filesystem level, skipping over unused disk blocks.

So, to begin, create a new VM in VirtualBox, and provision a new, empty VHD-file drive for it. The VHD drive need only be as large as the size of data in use on the physical drive you want to clone (well actually, make it a little bit larger, to allow for some wiggle room).

Next, find a Linux live CD that contains the ntfsprogs package, as well as openssh-server. I like System Rescue CD for this, but pretty much any Debian- or Ubuntu-based live CD should work as well.

Boot the VirtualBox VM with the Linux live CD, and start sshd within the VM so that you will be able execute commands on it remotely. Partition the empty VHD drive approriately, using whatever partitioning tool you prefer (I like plain old fdisk, but I'm somewhat old school).

With another copy of the Linux live CD, boot the machine containing the physical disk you want to clone. I assume that the VirtualBox VM and this machine are accessible to each other over the network. On this machine, execute the following command (all on one line):

ntfsclone --save-image -o - /dev/sdXX |
    ssh root@VirtualBox-VM 'ntfsclone --restore-image --overwrite /dev/sdYY -'

where:

  • /dev/sdXX is the device name (on the local machine) of the physical drive you want to clone, and
  • /dev/sdYY is the device name (in the VM) of the VHD destination drive.

Explanation: The first ntfsclone command in the pipeline extracts an image of the source NTFS filesystem and sends it out through the ssh tunnel, while the second ntfsclone command receives the image and restores it to the VHD drive.

Once the operation completes, the VHD file should contain a file-for-file exact clone of the original physical disk (barring any hardware errors, like bad sectors, that might cause the process to abort prematurely).

One last thing you may want to do is to run a Windows chkdsk on the VHD drive, just to ensure the cloning didn't introduce any problems (it shouldn't have, but hey, I'm a bit paranoid about these things).