Incredibly low KVM disk performance (qcow2 disk files + virtio)
Well, yeah, qcow2 files aren't designed for blazingly fast performance. You'll get much better luck out of raw partitions (or, preferably, LVs).
How to achieve top performance with QCOW2:
qemu-img create -f qcow2 -o preallocation=metadata,compat=1.1,lazy_refcounts=on imageXYZ
The most important one is preallocation which gives nice boost, according to qcow2 developers. It is almost on par with LVM now! Note that this is usually enabled in modern (Fedora 25+) Linux distros.
Also you can provide unsafe cache if this is not production instance (this is dangerous and not recommended, only good for testing):
<driver name='qemu' cache='unsafe' />
Some users reports that this configuration beats LVM/unsafe configuration in some tests.
For all these parameters latest QEMU 1.5+ is required! Again, most of modern distros have these.
I achieved great results for qcow2 image with this setting:
<driver name='qemu' type='raw' cache='none' io='native'/>
which disables guest caches and enables AIO (Asynchronous IO). Running your
dd command gave me 177MB/s on host and 155MB/s on guest. The image is placed on same LVM volume where host's test was done.
qemu-kvm version is
1.0+noroms-0ubuntu14.8 and kernel
3.2.0-41-generic from stock Ubuntu 12.04.2 LTS.
On old Qemu/KVM versions, Qcow2 backend was very slow when not preallocated, more so if used without writeback cache enabled. See here for more information.
On more recent Qemu versions, Qcow2 files are much faster, even when using no preallocation (or metadata-only preallocation). Still, LVM volumes remain faster.
A note on the cache modes: writeback cache is the preferred mode, unless using a guest with no or disabled support for disk cache flush/barriers. In practice, Win2000+ guests and any Linux EXT4, XFS or EXT3+barrier mount options are fines. On the other hand, cache=unsafe should never be used of production machines, as cache flushes are not propagated to the host system. An unexpected host shutdown can literally destroy guest's filesystem.
I experienced exactly the same issue. Within RHEL7 virtual machine I have LIO iSCSI target software to which other machines connect. As underlying storage (backstore) for my iSCSI LUNs I initially used LVM, but then switched to file based images.
Long story short: when backing storage is attached to virtio_blk (vda, vdb, etc.) storage controller - performance from iSCSI client connecting to the iSCSI target was in my environment ~ 20 IOPS, with throughput (depending on IO size) ~ 2-3 MiB/s. I changed virtual disk controller within virtual machine to SCSI and I'm able to get 1000+ IOPS and throughput 100+ MiB/s from my iSCSI clients.
<disk type='file' device='disk'> <driver name='qemu' type='qcow2' cache='none' io='native'/> <source file='/var/lib/libvirt/images/station1/station1-iscsi1-lun.img'/> <target dev='sda' bus='scsi'/> <address type='drive' controller='0' bus='0' target='0' unit='0'/> </disk>