Wednesday, 7 September 2011

Internet Explorer cookie contents - the new format analysed






Microsoft changed the way that Internet Explorer cookie files work and randomised cookie filenames, for security reasons; the cookie text files now have random names which don't indicate the name of the website that saved the cookie, and you have to open up each cookie file individually to check what that is.

However, you can still view the contents of all your IE cookies, unmangled, by exporting your cookies to a single cookies.txt file. Then if you open up that cookies.txt file, you can see the cookie info, in a much more comprehensible, intelligible user-friendly format, eg:

Webtrends seems to be used by Microsoft for recording web visitor analytics & statistics info.

I compared the contents of a couple of the new cookies against the cookies.txt versions to try to figure out how they work. I found that if you copy and paste the text from the cookie file into something else (eg a new text document), the info is broken up into separate lines (ie there's hidden new lines to separate the different components of the info).

For example, the contents of a Twitter cookie file named J0R4GWEF.txt, which like the other contents of cookies appeared to run continuously on in the txt file, was split up like this:

guest_id
v1%3A131542058071389408
twitter.com/
214748475215010693123032155316242192030174605*

The cookies.txt equivalent of that was:

twitter.com TRUE / FALSE 1378897943 guest_id v1%3A131542058071389408

So that helps to figure out the new format of the cookie file. The elements seem to be in this order:

  1. variable name (eg "guest_id")
  2. variable value (eg "v1%3A131542058071389408") - the equivalent of the old "guest_id=v1%3A131542058071389408"
  3. domain name (ie the website which set the cookie, eg "#topofpage")
  4. something I haven't figured out yet (in the example above, it's "214748475215010693123032155316242192030174605") - but it must convert to the expiration date for the variable (ie 1378897943 in the example above), which traditionally is the number of seconds since 1 Jan 1970, and shows up as the "proper" figure in the cookies.txt version. Maybe this long figure also contains other info about the cookie file
  5. * symbol - which marks the end of this variable, and the start of the next variable set by the website, whose name etc follow in the same order.

I worked out the purpose of the * from looking at a single Google cookie file, for example these contents, of a single txt file:

PREF
ID=15025770280c4f56:U=8cbfd7d77ff8ecf4:FF=0:TM=1315398473:LM=1315408615:S=zAzaJeJ5lq1Y-EEk
google.com/
1536
521981312
30321428
744646208
30174577
*
NID
50=IVMzsW2RssDmmdt21XYqM-m6GMBe731GqCispetEG495dEdHdl_tlLqIv8h8tINpCg1kI2lgsAgLheW-TVQzbGoBoiHfBjSJuhOPJSEfWVNTw-H-_Nt16tyNCyIL2zCf
google.com/
9728
2103298560
30211390
722926208
30174577
*

- showed up in the cookies.txt file as this:

google.com TRUE / FALSE 1378844158 PREF ID=15025770280c4f56:U=8cbfd7d77ff8ecf4:FF=0:TM=1315398473:LM=1315408615:S=zAzaJeJ5lq1Y-EEk

google.com TRUE / FALSE 1331583355 NID 50=IVMzsW2RssDmmdt21XYqM-m6GMBe731GqCispetEG495dEdHdl_tlLqIv8h8tINpCg1kI2lgsAgLheW-TVQzbGoBoiHfBjSJuhOPJSEfWVNTw-H-_Nt16tyNCyIL2zCf

Final example, from a Facebook cookie:

eLlnTol8k9yayreWIGxF-h6m
facebook.com/
2147492864
3767864320
30321455
3978419216
30174604
*
translates to:

facebook.com TRUE / FALSE 1378856079 datr eLlnTol8k9yayreWIGxF-h6m
datr

I've not yet worked out how the name of the cookie text file relates to anything in its contents (which no doubt is part of the purpose of the security fix!), so you still can't tell which file was set by which site without opening up each file. The order of info in the cookies.txt document doesn't match the order of the dates that the cookies were created or modified, and they're not in alphabetical order of domain name either. But at least it's possible to check out all the contents of all cookie files at once.

1 comment:

Anonymous said...

The data format in the exported file (cookies.txt) is the format Netscape browsers uses. It's not the a "new format" for IE.

This feature has been around when Netscape browser was widely used.