Logging Into A Website Using C# Programmatically

Logging into websites programatically is difficult and tightly coupled with how the site implements its login procedure. The reason your code isn't working is because you aren't dealing with any of this in your requests/responses.

Let's take fif.com for example. When you type in a username and password, the following post request gets sent:

POST https://fif.com/login?task=user.login HTTP/1.1
Host: fif.com
Connection: keep-alive
Content-Length: 114
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
Origin: https://fif.com
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Content-Type: application/x-www-form-urlencoded
Referer: https://fif.com/login?return=...==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

username=...&password=...&return=aHR0cHM6Ly9maWYuY29tLw%3D%3D&9a9bd5b68a7a9e5c3b06ccd9b946ebf9=1

Notice the cookies (especially the first, your session token). Notice the cryptic url-encoded return value being sent. If the server notices these are missing, it won't let you login.

HTTP/1.1 400 Bad Request

Or worse, a 200 response of a login page with an error message buried somewhere inside.

But let's just pretend you were able to collect all of those magic values and pass them in an HttpWebRequest object. The site wouldn't know the difference. And it might respond with something like this.

HTTP/1.1 303 See other
Server: nginx
Date: Wed, 10 Sep 2014 02:29:09 GMT
Content-Type: text/html; charset=utf-8
Transfer-Encoding: chunked
Connection: keep-alive
Location: https://fif.com/

Hope you were expecting that. But if you've made it this far, you can now programatically fire off requests to the server with your now validated session token and get the expected HTML back.

GET https://fif.com/ HTTP/1.1
Host: fif.com
Connection: keep-alive
Cache-Control: max-age=0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.103 Safari/537.36
Referer: https://fif.com/login?return=aHR0cHM6Ly9maWYuY29tLw==
Accept-Encoding: gzip,deflate
Accept-Language: en-US,en;q=0.8
Cookie: 34f8f7f621b2b411508c0fd39b2adbb2=gnsbq7hcm3c02aa4sb11h5c87f171mh3; __utma=175527093.69718440.1410315941.1410315941.1410315941.1; __utmb=175527093.12.10.1410315941; __utmc=175527093; __utmz=175527093.1410315941.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none); __utmv=175527093.|1=RegisteredUsers=Yes=1

And this is all for fif.com - this juggling of cookies and tokens and redirects will be completely different for another site. In my experience (with that site in particular), you have three options to get through the login wall.

  1. Write an incredibly complicated and fragile script to dance around the site's procedures
  2. Manually log into the site with your browser, grab the magic values, and plug them into your request objects or
  3. Create a script to automate selenium to do this for you.

Selenium can handle all the juggling, and at the end you can pull the cookies out and fire off your requests normally. Here's an example for fif:

//Run selenium
ChromeDriver cd = new ChromeDriver(@"chromedriver_win32");
cd.Url = @"https://fif.com/login";
cd.Navigate();
IWebElement e = cd.FindElementById("username");
e.SendKeys("...");
e = cd.FindElementById("password");
e.SendKeys("...");
e = cd.FindElementByXPath(@"//*[@id=""main""]/div/div/div[2]/table/tbody/tr/td[1]/div/form/fieldset/table/tbody/tr[6]/td/button");
e.Click();

CookieContainer cc = new CookieContainer();

//Get the cookies
foreach(OpenQA.Selenium.Cookie c in cd.Manage().Cookies.AllCookies)
{
    string name = c.Name;
    string value = c.Value;
    cc.Add(new System.Net.Cookie(name,value,c.Path,c.Domain));
}

//Fire off the request
HttpWebRequest hwr = (HttpWebRequest) HttpWebRequest.Create("https://fif.com/components/com_fif/tools/capacity/values/");
hwr.CookieContainer = cc;
hwr.Method = "POST";
hwr.ContentType = "application/x-www-form-urlencoded";
StreamWriter swr = new StreamWriter(hwr.GetRequestStream());
swr.Write("feeds=35");
swr.Close();

WebResponse wr = hwr.GetResponse();
string s = new System.IO.StreamReader(wr.GetResponseStream()).ReadToEnd();

Checkout this post. It's another way of doing it and you don't need to install any package although it might be easier with Selenium.

"You can continue using WebClient to POST (instead of GET, which is the HTTP verb you're currently using with DownloadString), but I think you'll find it easier to work with the (slightly) lower-level classes WebRequest and WebResponse.

There are two parts to this - the first is to post the login form, the second is recovering the "Set-cookie" header and sending that back to the server as "Cookie" along with your GET request. The server will use this cookie to identify you from now on (assuming it's using cookie-based authentication which I'm fairly confident it is as that page returns a Set-cookie header which includes "PHPSESSID").


POSTing to the login form

Form posts are easy to simulate, it's just a case of formatting your post data as follows:

field1=value1&field2=value2

Using WebRequest and code I adapted from Scott Hanselman, here's how you'd POST form data to your login form:

string formUrl = "http://www.mmoinn.com/index.do?PageModule=UsersAction&Action=UsersLogin";

NOTE: This is the URL the form POSTs to, not the URL of the form (you can find this in the "action" attribute of the HTML's form tag

string formParams = string.Format("email_address={0}&password={1}", "your email", "your password");
string cookieHeader;
WebRequest req = WebRequest.Create(formUrl);
req.ContentType = "application/x-www-form-urlencoded";
req.Method = "POST";
byte[] bytes = Encoding.ASCII.GetBytes(formParams);
req.ContentLength = bytes.Length;
using (Stream os = req.GetRequestStream())
{
    os.Write(bytes, 0, bytes.Length);
}
WebResponse resp = req.GetResponse();
cookieHeader = resp.Headers["Set-cookie"];

Here's an example of what you should see in the Set-cookie header for your login form:

PHPSESSID=c4812cffcf2c45e0357a5a93c137642e; path=/; domain=.mmoinn.com,wowmine_referer=directenter; path=/;

domain=.mmoinn.com,lang=en; path=/;domain=.mmoinn.com,adt_usertype=other,adt_host=-


GETting the page behind the login form

Now you can perform your GET request to a page that you need to be logged in for.

string pageSource;
string getUrl = "the url of the page behind the login";
WebRequest getRequest = WebRequest.Create(getUrl);
getRequest.Headers.Add("Cookie", cookieHeader);
WebResponse getResponse = getRequest.GetResponse();
using (StreamReader sr = new StreamReader(getResponse.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

EDIT:

If you need to view the results of the first POST, you can recover the HTML it returned with:

using (StreamReader sr = new StreamReader(resp.GetResponseStream()))
{
    pageSource = sr.ReadToEnd();
}

Place this directly below cookieHeader = resp.Headers["Set-cookie"]; and then inspect the string held in pageSource."