UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 34: unexpected end of data

site[i:i+35].decode('utf-8')

You cannot randomly partition the bytes you've received and then ask UTF-8 to decode it. UTF-8 is a multibyte encoding, meaning you can have anywhere from 1 to 6 bytes to represent one character. If you chop that in half, and ask Python to decode it, it will throw you the unexpected end of data error.

Look into a tool that has this built for you. BeautifulSoup or lxml are two alternatives.


Open the csv file in sublime and "Save with Encoding" -> UTF-8.