Using a web-proxy service to get the html content of the target url?

I would like to suggest you use direct proxy IP:port, for example 115.238.225.26:80. Then you could easy handle problem using next code:

HttpWebRequest req = (HttpWebRequest) WebRequest.Create(new Uri("http://example.com"));
WebProxy webproxy = new WebProxy("115.238.225.26", 80);
webproxy.BypassProxyOnLocal = false;
req.Method = "GET";
req.Proxy = webproxy;
HttpWebResponse response = (HttpWebResponse) req.GetResponse();
var respStream = response.GetResponseStream();
var result = "";
if (respStream != null) {
    var strReader = new StreamReader(respStream);
    result = strReader.ReadToEnd();
}

Then in result variable you will find result page content or empty string in case some problems occurs(respStream==null). Additionally it may be required add exceptions handling for this code in case any connection problems occurs or so.


The main issue you seem to be encountering is that the proxy example you're using requires a POST to update the destination URL you're trying to browse through the proxy. That's why you're not getting any content from the target page, and the error message

<div id="error">Hotlinking directly to proxied pages is not permitted.</div>

I don't know how your code looks like, but it seems like you could use the HttpWebRequest POST Method

WebRequest request = (HttpWebRequest)WebRequest.Create("http://www.glype-proxy.info/includes/process.php?action=update");

var postData = "url="+"http://www.example.com";
postData += "&allowCookies=on";
var data = Encoding.ASCII.GetBytes(postData);

request.Method = "POST";
request.ContentType = "application/x-www-form-urlencoded";
request.ContentLength = data.Length;

using (var stream = request.GetRequestStream()) {
    stream.Write(data, 0, data.Length);
}

var response = (HttpWebResponse)request.GetResponse();
var responseString = new StreamReader(response.GetResponseStream()).ReadToEnd();

You're going to need to find or host a proxy that returns the HTML of the page, such as http://www.glype-proxy.info/. Even so, in order for a proxy to function correctly, it must change the link to the page's resources to it's own "proxied" path.

http://www.glype-proxy.info/browse.php?u=https%3A%2F%2Fwww.example.com%2F&b=4&f=norefer

In the URL above, if you want the path to the original resources, you'll have to find all the resources that have been redirected and unencode the path passed in as the u= parameter to this specific proxy. Also, you may wish to ignore additional elements injected by the proxy , in this case the <div id="include"> element.


I believe the proxy you're using works the same way as the "Glype" proxy I used in this example, but I do not have access to it at the time of posting. Also, if you want to use use other proxies, you may want to note that many proxies display the result in an iFrame (probably for XSS prevention, navigation, or skinning).

Note: Generally, using another service outside of a built-in API is a bad practice, since services often get a GUI update or some other change that could break your script. Also, those services could experiences interruptions or just be taken down.