File content and Forensics

Have you ever accidentally open a video/picture/audio file using notepad or notepad++ or similar text editor apps? If so, you will likely get a lag and see their content something like this:

Fig. 1: PNG Picture opened by Atom (a text editor)

Pretty weird huh?

The topic today is gonna be the content of these weird files and cover some basic Forensic field in the Cyber Security.

1) Content of the file:

“Everything that is stored in Computers, are all binary!” – said by One Little Hacker

And I mean it, EVERY F*KING THINGS ARE IN BINARY. From the games that you normally open up and play, or all the icons you see on the Desktop, or even when you connecting to the Internet and interacting with others, every details are in binary.

So what is binary anyway? Binary basically is number, but the numbers are represented as 0 and 1.

For example, to count from 0 to 12 of our system (decimal), we count: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.

But for binary: 0, 1, 10, 11, 100, 101, 110, 111, 1000, 1001, 1010, 1011, 1100.

Aaaand you can see it all here. So everything that is so basic, I’m gonna skip.

2) So where are the 0 and 1 in the file I see in Figure 1?

Okay, actually the binary we see on our screen compare with the binary on the Computer is not the same.

See your keyboard? QWERTYU… and stuffs? And as I said before, everything that is stored on the Computer is in Binary, so that means the word Q and word W somehow can be transmitted to Binary. And guess what, even the characters like 0, 1, 2, 3, … 9 can all be transmitted to Binary.

(P/S: character 0 in binary is not 0. I know, so much magic!)









Here some of you already has basic knowledge about binary, ascii and stuff, so I’m gonna skip. If you are new, explore this link to understand what is ascii and how to convert into binary.

So what are the “<?>” characters that you see in Figure 1? The text editor will compose 8 consequence bits and render the character for the user, and when the 8 bits are not render-able, like the tab key and delete key, it will display as “<?>”.

3) Forensics:

So what is Forensics? The Forensics field is massive and hard to describe it in 1 topic, so here I’ll just gonna talk briefly and give some basic example.

Note: for those who are not familiar with Linux environment and does not have much experience in the IT industry, I suggest that you don’t read this part because you might not understand anything.

a) How to recognise a file type:

Probably using the extension. Like .txt for text file, or .png or .jpg for pictures.

But no, extensions can be fake and change to something else.

In Linux Operating System, Command Line Interface (CLI) is very common for developers (and hackers) to run script code.

There is a tool called exiftool that can determine the file type by checking the metadata of the file. For example, the Figure 1 is a picture but I changed it to “my.txt” which makes Atom thought it was text file and show out the text content for me. Now I’ll try to execute exiftool:

Fig 2: exiftool

And even php websites has a built-in library that uses exiftool to check the file type, which can then filter the files user upload on. Like to get the picture file only and not get some script file.

b) Poor design website XSS vulnerable:

Yesterday I came across a poor design website that allows users to upload their profile image. And they used Exiftool to check the file type to see it is png or jpg. But even exiftool can be tricked. Here

The exiftool on PHP checks only the first byte of the file and determine its type. And after you upload the picture on, you can view the picture by going to the right URL provided by the Web. However, when I try to go to the URL, I found out that they still keep the extension of my file on there (maybe d, which I thought how about I upload a php file on and then I can run script on there?

So I build a php file like this:

Fig 3: php file

Then I put some bytes at the beginning of the php file as: FF D8 FF DB. Which will trick the exiftool of php to think that this file is jpg. FF D8 FF DB is a magic number which you can find it here.

To put FF D8 FF DB in the beginning of the file you do not type FF D8 FF DB in the file (because you must put it in as binary), here I use a hex editor called Bless and successfully done it.

Fig 4: Hex editor Bless

After that, I post it on the website and successfully run script on the server. After that I can do anything I want like delete things (well I didn’t, I’m an ethical Hacker of course).


To wrap things up, it is extremely hard to determine type of the specific file. Therefore, popular services like Facebook and Google always has one or two security team to ensure no security flaws in their system. Furthermore, they usually reward bounty hunters that found and report the bugs to them. (The prize may get up to thousands of dollars).

That’s it guys. See you again in the next chapter!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s