Restoring PDF from chrome cache hex dump

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Restoring PDF from chrome cache hex dump

Alfredo Palhares
Hello everyone,

So I was trying to be a good person and pay my taxes, part of those taxes for my
cars.

So I went to the my country taxes web portal and got to it, after 2 attempts to
extract the document (the first request I got an pretty 500 as answer) I finally
got the PDF with the details to me to pay.

This was all being done on chromium and I am suffering from this bug[1], so when
I pressed the save PDF, my browser just crashed, yikes... After rebooting
chromium I found that the web portal does not emit the file again (yes, because
apparently there is no use in that).

After a few researching I found the HTTP request in chromme://cache. But its the
http header and a hex dump of the full HTTP request.

To try and separate the HTTP request from the PDF, I opened a few valid PDF with
vim and noticed they all started with %PDF-<format-version> and ended with %EOF.

So Now I can isolate the PDF hex from the rest, here are some excerpts:

00000000: 25 50 44 46 2d 31 2e 34 0a 25 e2 e3 cf d3 0a 32  %PDF-1.4.%.....2
00000010: 20 30 20 6f 62 6a 20 3c 3c 2f 4c 65 6e 67 74 68   0 obj <</Length
00000020: 20 35 32 2f 46 69 6c 74 65 72 2f 46 6c 61 74 65   52/Filter/Flate
00000030: 44 65 63 6f 64 65 3e 3e 73 74 72 65 61 6d 0a 78  Decode>>stream.x
00000040: 9c 2b e4 72 0a e1 32 36 53 b0 30 30 d5 b3 34 57  .+.r..26S.00..4W
00000050: 08 49 e1 72 0d e1 0a e4 2a 54 30 54 30 00 42 08  .I.r....*T0T0.B.
00000060: 99 9c ab a0 1f 91 66 a8 e0 92 af 10 c8 05 00 08  ......f.........

And the end:

0000cde0: 74 20 33 38 20 30 20 52 2f 49 44 20 5b 3c 39 30  t 38 0 R/ID [<90
0000cdf0: 31 66 33 31 65 63 31 61 39 65 65 32 38 62 39 31  1f31ec1a9ee28b91
0000ce00: 36 33 64 32 35 33 63 35 31 66 66 36 33 35 3e 3c  63d253c51ff635><
0000ce10: 37 63 32 30 62 36 33 65 63 30 35 37 62 62 65 35  7c20b63ec057bbe5
0000ce20: 30 61 66 30 66 30 62 38 33 35 34 31 35 37 30 31  0af0f0b835415701
0000ce30: 3e 5d 2f 49 6e 66 6f 20 33 39 20 30 20 52 2f 53  >]/Info 39 0 R/S
0000ce40: 69 7a 65 20 34 30 3e 3e 0a 73 74 61 72 74 78 72  ize 40>>.startxr
0000ce50: 65 66 0a 35 31 38 38 30 0a 25 25 45 4f 46 0a     ef.51880.%%EOF.

With a few more googling, I found some articles on restoring files from the
chrome cache[2][3] but none of the scripts worked for me.

I would like to approach this in Lua, but honestly I have no idea where to
start, so I wrote this email to you guys.

Any ideas or suggestions would be welcome.

Regards,

[1]: https://code.google.com/p/chromium/issues/detail?id=435538
[2]: http://www.alexkorn.com/blog/2010/05/how-to-recover-deleted-javascript-files-using-the-cache-in-chrome-or-firefo/
[3]: http://www.frozax.com/blog/2011/05/recover-file-google-chrome-cache-gzipped/

--
Alfredo Palhares
GPG/PGP Key Fingerprint
68FC B06A 6C22 8B9B F110
38D6 E8F7 4D1F 0763 CAAD

signature.asc (836 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Restoring PDF from chrome cache hex dump

Choonster TheMage
On 1 May 2015 at 02:32, Alfredo Palhares <[hidden email]> wrote:

> Hello everyone,
>
> So I was trying to be a good person and pay my taxes, part of those taxes for my
> cars.
>
> So I went to the my country taxes web portal and got to it, after 2 attempts to
> extract the document (the first request I got an pretty 500 as answer) I finally
> got the PDF with the details to me to pay.
>
> This was all being done on chromium and I am suffering from this bug[1], so when
> I pressed the save PDF, my browser just crashed, yikes... After rebooting
> chromium I found that the web portal does not emit the file again (yes, because
> apparently there is no use in that).
>
> After a few researching I found the HTTP request in chromme://cache. But its the
> http header and a hex dump of the full HTTP request.
>
> To try and separate the HTTP request from the PDF, I opened a few valid PDF with
> vim and noticed they all started with %PDF-<format-version> and ended with %EOF.
>
> So Now I can isolate the PDF hex from the rest, here are some excerpts:
>
> 00000000: 25 50 44 46 2d 31 2e 34 0a 25 e2 e3 cf d3 0a 32  %PDF-1.4.%.....2
> 00000010: 20 30 20 6f 62 6a 20 3c 3c 2f 4c 65 6e 67 74 68   0 obj <</Length
> 00000020: 20 35 32 2f 46 69 6c 74 65 72 2f 46 6c 61 74 65   52/Filter/Flate
> 00000030: 44 65 63 6f 64 65 3e 3e 73 74 72 65 61 6d 0a 78  Decode>>stream.x
> 00000040: 9c 2b e4 72 0a e1 32 36 53 b0 30 30 d5 b3 34 57  .+.r..26S.00..4W
> 00000050: 08 49 e1 72 0d e1 0a e4 2a 54 30 54 30 00 42 08  .I.r....*T0T0.B.
> 00000060: 99 9c ab a0 1f 91 66 a8 e0 92 af 10 c8 05 00 08  ......f.........
>
> And the end:
>
> 0000cde0: 74 20 33 38 20 30 20 52 2f 49 44 20 5b 3c 39 30  t 38 0 R/ID [<90
> 0000cdf0: 31 66 33 31 65 63 31 61 39 65 65 32 38 62 39 31  1f31ec1a9ee28b91
> 0000ce00: 36 33 64 32 35 33 63 35 31 66 66 36 33 35 3e 3c  63d253c51ff635><
> 0000ce10: 37 63 32 30 62 36 33 65 63 30 35 37 62 62 65 35  7c20b63ec057bbe5
> 0000ce20: 30 61 66 30 66 30 62 38 33 35 34 31 35 37 30 31  0af0f0b835415701
> 0000ce30: 3e 5d 2f 49 6e 66 6f 20 33 39 20 30 20 52 2f 53  >]/Info 39 0 R/S
> 0000ce40: 69 7a 65 20 34 30 3e 3e 0a 73 74 61 72 74 78 72  ize 40>>.startxr
> 0000ce50: 65 66 0a 35 31 38 38 30 0a 25 25 45 4f 46 0a     ef.51880.%%EOF.
>
> With a few more googling, I found some articles on restoring files from the
> chrome cache[2][3] but none of the scripts worked for me.
>
> I would like to approach this in Lua, but honestly I have no idea where to
> start, so I wrote this email to you guys.
>
> Any ideas or suggestions would be welcome.
>
> Regards,
>
> [1]: https://code.google.com/p/chromium/issues/detail?id=435538
> [2]: http://www.alexkorn.com/blog/2010/05/how-to-recover-deleted-javascript-files-using-the-cache-in-chrome-or-firefo/
> [3]: http://www.frozax.com/blog/2011/05/recover-file-google-chrome-cache-gzipped/
>
> --
> Alfredo Palhares
> GPG/PGP Key Fingerprint
> 68FC B06A 6C22 8B9B F110
> 38D6 E8F7 4D1F 0763 CAAD

Lua's equivalent of `preg_match_all` (fill an array with regular
expression matches) would be `string.gmatch` (iterate over pattern
matches). Lua's patterns aren't regular expressions, so you'd need to
replace `s` with actual spaces and `{2}` with a repeat of the
preceding character class (`[0-9a-f]`). You can also replace that
character class with `%x` (matches any hexadecimal digit).

To convert the hexadecimal digit strings into numbers (PHP's
`hexdec`), you'd need to use `tonumber` with a base of 16. To convert
that number into the corresponding character (PHP's `chr`), you'd need
to use `string.char`.

Regards,
Choonster

Reply | Threaded
Open this post in threaded view
|

Re: Restoring PDF from chrome cache hex dump

Jeff Pohlmeyer
In reply to this post by Alfredo Palhares
On Thu, Apr 30, 2015 at 11:32 AM, Alfredo Palhares
<[hidden email]> wrote:

>
> So I went to the my country taxes web portal and got to it, after 2 attempts to
> extract the document (the first request I got an pretty 500 as answer) I finally
> got the PDF with the details to me to pay.
>
> This was all being done on chromium and I am suffering from this bug[1], so when
> I pressed the save PDF, my browser just crashed, yikes... After rebooting
> chromium I found that the web portal does not emit the file again (yes, because
> apparently there is no use in that).
>
> With a few more googling, I found some articles on restoring files from the
> chrome cache[2][3] but none of the scripts worked for me.

> I would like to approach this in Lua, but honestly I have no idea where to
> start, so I wrote this email to you guys.

> Any ideas or suggestions would be welcome.


Not a Lua solution, but if you just want the file back (or want
something to compare your answer to)
you could take a look at this thing:

http://www.nirsoft.net/utils/chrome_cache_view.html

(I am neither a Chrome user nor a Windows  user, but I just tried it
in Wine and it seems like it might work there, too.)

 - Jeff

Reply | Threaded
Open this post in threaded view
|

Re: Restoring PDF from chrome cache hex dump

Alfredo Palhares
In reply to this post by Choonster TheMage
Hello,

To answer my own question I could totally restore the file with the help xxd[1]
using the isolated hex dump I just ran:

```
xxd -r hex-dump.hex test.pdf
```

Quickly done a test with:

```
file test.pdf
test.pdf: PDF document, version 1.4
```

Now I can successfully open the PDF, nice!

[1]: http://linux.about.com/library/cmd/blcmdl1_xxd.htm

--
Alfredo Palhares
GPG/PGP Key Fingerprint
68FC B06A 6C22 8B9B F110
38D6 E8F7 4D1F 0763 CAAD

signature.asc (836 bytes) Download Attachment