Technoir - Blog of Satharus

Technoir: Reflecting on Five Years

2023-12-07T00:00:00+02:00

Five years. Five whole years. So many things have changed since I first started this blog. I was still in my second year of university, no one could have even imagined that COVID-19 would happen, the GDPR had just been freshly implemented, and we were all looking forward to the end of the 2010s as we were approaching a new decade. Don’t we all just love how that went?

The first blog post I ever posted here was on 7 Dec. 2018. However, my story with writing goes way beyond that. If you’re not interested in the story, you can skip to “But, why?”. In the other half of this post, I talk about why I started this blog, what I gained from it, and why you may want to do the same.

Backstory

The whole story started with me writing in high school on small Facebook groups for some friends and acquaintances with mutual interests. Short posts such as games and comic reviews. A while after that, I joined Big Geeks (🫡R.I.P.) and started writing more on similar (but more technical) topics such as more comic books and videogames, computer hardware, the tech industry, etc…. At the time, it was just a side thing I was doing besides school.

Fast forward to the last year of high school, after Big Geeks weren’t that active anymore, I went back to writing for my circle on Facebook at the time (not just in groups) in the form of Notes*. My circle at the time included a lot of like-minded people, so the notes would initiate lots of interesting (mostly civil) discussions. I used to write anything from comic book discussions to technical articles like Linux distro reviews. I did that for a while until my second year of university. At this point, I decided to start this blog for, mainly, knowledge sharing and organising my posts better. I also enjoyed writing as it made me relax. So, that did also benefit me. However, it wasn’t the only benefit!

*Back in the day, Facebook had a feature where you could write notes on your profile and people could read them. They would be published as a post the first time but they would be kept as a section on your profile so that anyone could go back to them.

Facebook Notes

Unfortunately, Facebook doesn’t have that feature anymore. I also (fortunately) no longer use Facebook. So, I can’t get any of the original notes I posted from Facebook. However, I dug deep in my drives and found a file which contained a draft of one of my early notes :) Sharing it here just for completeness.

As you can see, it was more of a note that was supposed to start a discussion. Needless to say, I did not really know much about the tech industry at the time. I just had some hopes and dreams, a bit of technical knowledge, and the will to communicate them. In my defence, I was in high school, ha!

satharus.wordpress.com

At first, I started the blog on WordPress using the first template I could find. However, throughout the years and after many different iterations, themes, and names: Satharus’ not-so-secret diary, Satharus’ Blog, and finally, Blog of Satharus, I moved to GitHub Pages for a more reader-friendly experience with no forced tracking. At that point, I gave it the name “Technoir”. This is the current version of “Technoir - Blog of Satharus” which I plan on keeping.

A small piece of trivia, “Satharus” was the name of my character in a couple of RPG games during high school and it just kinda stuck with me. It doesn’t mean anything and I literally got it from a random name generator in World of Warcraft.

But, why?

Why not? It is fun.

In all seriousness, though, I started this blog and continue to write for many reasons:

This blog started as, and still is, a way for me to share any knowledge or ideas I have with the community. Sometimes it was purely a guide such as using MASM on Linux or getting started with cybersecurity. Sometimes it would be a tutorial or an educational piece such as PowerShell deobfuscation, adversary emulation through EDR tests, or reverse engineering. Other times, it was a series on a specific topic such as the 8Bit computer build or buffer overflows. A couple of times, it was just pure banter, whether about open source, the meaning of life, or how there isn’t a specific roadmap for success. And many, many more topics.

This blog became a canvas for me to share anything and everything I wanted to share.

Experimentation

I love tinkering. I love messing around with everything and getting to know how things work. This blog gave me a chance to experiment and share my experiments with the world. Such as that time I thought of a fresh bypass for a newly-released Sysmon event. While the bypass itself is simple (and doesn’t require a blog), it connected me with many like-minded individuals with similar interests online which helped me connect more with the community and see what people at the time thought of the new Sysmon release, red-teaming strategies, and many other interesting topics. It also opened the door for me to experiment more with how Sysmon works.

Joy of Writing

Writing is fun. Or at least, in my opinion. This blog has been a side hobby for me over the years and has helped me relax many times by just writing whatever I wanted to. With the exception of 2023, I never had any pressure from myself to write regularly here. I only wrote when I felt like it. Therefore, there was no stress factor.

What Did I Gain?

Knowledge

I learnt. A lot. I learnt many things, whether while researching topic ideas, writing about topics I already knew a fair share about, researching topics I somewhat knew, or having a reader message me or comment a fun fact regarding a topic I wrote. I learnt from all of these. More importantly, I learnt more about writing and improved my communication skills and consistency.

Friends!

While this blog started originally for my inner circle and friends, I think it went beyond that and at the same time connected me with many other professionals and new friends. I got to know so many like-minded people.

Impact on the Community

It is always nice to give back. I feel that this blog, in one way or another, has given back to the community. You have no idea how happy it makes me when someone reaches out for advice or a question based on something they read on this blog. It does make me feel like I have truly given back to the community.

Giving back to the community matters because without the community, public resources, and knowledge sharing, you probably wouldn’t be where you are now. It is nice to be proud of yourself, your effort, dedication, and work, but in the end, you still have to remember those who have helped you (even indirectly), be grateful to them, and give back when you can. If you believe that you’ve succeeded completely through your own hard work and research, I have bad news for you. :)

Commitment (partially)

“The exception of 2023”… Well, this year, as you can see, I have published a blog post every month. While it may not have been easy for me to think of different ideas to write about, I managed to do it.

However, in previous years, I sometimes committed myself to getting a blog post out “in time” for a specific event or date. Overall, I saw a slight increase in my commitment skills 😃. This blog was, and still is, a way for me to commit to writing ideas I have.

People Liked It!

This blog has been acknowledged by many people in the industry. This helped me get myself out there. No one can deny that it is good for your career!

Management at previous jobs have acknowledged my writing skills by reading my blog and allowed me to write for the company’s blog even though it wasn’t my job to write at the time at all. One of the side duties I had at some point was advising the presales and marketing teams technically by helping them write. This gave me exposure to other parts of the industry which I had no idea about.

Recruiters have also read my blog and some have commented about it in my interviews over the years. This is obviously because I have it on my resume. However, if they wanted to reach it other ways they probably could.

Get Yourself Out There!

If you have any ideas or knowledge you want to share, create a blog, and post the ideas there! There are many platforms you can blog on. This includes Substack, Medium, Dev, or even LinkedIn Articles. Use whatever makes you get your ideas out there. You don’t have to know web development or create your website from scratch. Just get blogging! There is no reason to hesitate about it. Get your ideas out there and don’t let any fears hold you back. There is no such thing as a bad idea.

Epilogue

It has been five years of maintaining this blog, and I don’t mean to stop any time soon.^{(just pls don’t expect a blog each month next year as well thx)}

Thanks for sticking throughout the journey! Here’s to five years of creativity, knowledge sharing, joy, and success 🍻. Obviously, none of this would’ve existed without any of you, my readers. Those who follow the blog, give feedback, and share it with their friends are the reason I continue doing this!

Since it started with comic books, to finish this reflection, I will share one of my favourite comic strips. Here’s to anyone who ever felt that whatever they’re doing isn’t enough or doing an impact.

The Brave and The Bold Volume 3 #30, DC Comics (Feb 2010)

Until next time, dear reader. As always, thanks for reading.

Cover image credit: Everyone on the beach by /u/RetroFreak05

Decoding Circuits: Hardware Reverse Engineering

2023-11-30T00:00:00+02:00

In a previous blog post, I mentioned that hardware can be reverse-engineered. But, why? And how? Is it legal? We shall embark on a journey today to reverse engineer an Arduino Uno R3 ethically and legally, guiding you through how it is done.

This post is an introduction to hardware reverse engineering. There are many more advanced techniques and ways to reverse engineer boards which I won’t get into this time.

But, wait, isn’t Arduino open-source?

Well, yes. The Arduino Uno R3 is an open-source microcontroller board, but we’ll treat it as a black box. Legally (and I’ll touch on this later), reverse engineering an open-source project is the safest choice for us. We’ll ignore Arduino’s public schematic for now. However, if you decide to give reversing this entire board a try, you can compare your output to the original schematic!

Ethics and Legality of Reverse Engineering

Please be careful. Before doing any type of reverse engineering, make sure it’s ethical and legal to do so where you live. Read the Electronic Frontier Foundation’s Reverse Engineering FAQs for a US perspective on the legality and ethics of reverse engineering. While there is a lot of open-source or public-domain software which you can safely and legally reverse engineer, I can’t say the same about hardware. You can check the list of open-source hardware projects on Wikipedia and maybe you’ll find something you like there.

I am not a lawyer, so don’t take anything I say here as any form of legal advice.

Now that we have our legal bits figured out, let’s get started!

Step 0: Disassembly

This is a prerequisite which we won’t really get into because it isn’t really in our scope. Also, the Arduino board we’re working with doesn’t need disassembly as the board is already exposed.

The only tip I’d give for this stage is to try and find a teardown or disassembly guide for the device you’re trying to reverse engineer, or a similar one.

Step 1: Identifying Components

The first step in reverse engineering hardware is identifying the components on the board.

Chip Markings

Most of the time, there are markings on the chips which are soldered to/placed on the board. These markings usually indicate exactly what a chip is or are some sort of code that you’d have to look up to know what that component is, or what its value is. For example, a chip may have its exact part number on it. A resistor, on the other hand, will usually have either a number on it or colour bands which indicate the resistance value.

An example of how resistor values are read

Identifying the Main Components

“Main” will differ depending on what your goal is with reverse engineering. However, what I mean by “main” here is the parts that stand out and could be quite easy to read the markings on. For example, on this Arduino board, we can see a huge Atmel chip with very clear markings. The chip in this case, especially with the number of pins it has, could be a good place to start.

This isn’t always the case, and the size of the chip doesn’t have anything to do with how important it is. However, as an example, it can just give us a starting point.

So, let’s do just that! Let’s identify the “main” components.

PSA: You may not need a fancy camera, microscope, or high-end phone for this kind of reverse engineering. Depending on your case, you could most likely get by by using the macro lens (or a high zoom in good lighting) on your smartphone. These adequate photos here were taken on an over 3-year-old Samsung Galaxy M31 using the default macro camera mode.

That’s a fair share of components. Where do we begin? It would be a good idea to start with whatever you’re familiar with. But, don’t fret, if you’re not familiar with any of them, let’s get to a search engine!

My First Component

Let’s start with component #1. Assuming that I have no idea what that is, what do I do? Look up whatever is on it. You literally don’t even have to think about it. Just type in whatever you see!

Clicking on the first link takes us to the datasheet of a Low-Dropout Linear Voltage Regulator. Upon inspecting the datasheet, it seems like we have the 5-volt output variant and we can find the pin configuration for it as well.

Congrats! Now we know our first component and what its pins are. We can do the same thing for all of the remaining components to try and identify what they are.

Component List

Following the exact same steps as above, we can identify each component. Right now, it doesn’t matter why each component is in the circuit. All that we care about is that we identify what each component is, which will lead to us understanding what it does, and (hopefully) why it is on this board. Some components are easier than others to identify and find. Let’s take a look at the detective work required! 🕵️

Can I reverse engineer, Daddy?

Component #1 - 1117 Voltage Regulator

This one we already identified. Datasheet here.

Component #2 - SGM8542 Operational Amplifier

A simple search leads to the datasheet here.

Component #4 - 47μF 25V Electrolytic Capacitor

Looking up “CS 47 25V” leads us to this page which tells us that it is, to come as no surprise, a 47μF 25V electrolytic capacitor.

Component #5 - M7 General Purpose Diode/Rectifier

Again, looking up “M7 SMD” leads us to many results which tell us about a series of diodes M1 through M7 with varying reverse voltages. This page has more info on them.

A Surface-Mount Device (SMD) is one that is mounted on the surface of the PCB itself and not through a hole in the PCB. SMDs are usually smaller and allow for a higher component density on the PCB. You’ll know an SMD when you see one!

Component #8 - ATmega328P Microcontroller

The centrepiece of this board. The main microcontroller. Just by looking up the chip marking, we can reach the product page here and from the datasheets, we can find its pin diagram as well.

Component #9 - Atmel MEGA16U2

Just like component #8, we can just look up the chip marking and find ourselves on the product page which contains the summary datasheet. This datasheet tells us that this is a microcontroller with a “USB 2.0 Full-speed Device Module”. The fact that it looks like it is connected to the USB Type-B port suggests that it could be the USB controller.

Please guide me

Component #3 - 10KΩ SMD Resistor Network

Looking up “103 SMD” leads us to some results which look similar. But, not quite exactly. Looking up “103 SMD 8 pins” leads us to learn that this is an SMD resistor network array of 10×10³ = 10KΩ.

Component #11 - 22Ω SMD Resistor Network

The same can be done for component #12, which is a 22×10⁰ = 22Ω.

Component #10 - 16MHz Crystal Oscillator

Again, this is much easier if you know what a crystal looks like. Otherwise, you could look up “T16.000” which will eventually lead you to find out that it is a 16MHz crystal oscillator.

Bring ‘em on!

Component #7 - Self-resetting SMD Fuse

This could be hard to search without somewhat knowing what to look for. I know what this is because it is one of the ways to tell if an Arduino is counterfeit or not. According to this article, you can see that they have a custom design for a golden-black SMD fuse. However, most SMD fuses don’t look that different. If you know what an SMD fuse looks like, you’ll probably guess this. If not, then you may be able to figure it out later. Either from the context of the connections or by digging deep enough on the internet.

I am Chips incarnate!

Component #6 - G2L SMD Component

“G2L SMD” leads us to this page, a Chinese online shopping platform, which states that it is a small outline transistor with 5 leads/pins (SOT23-5). This is a little bit misleading, as we can also find this page on Taobao which states that it is a linear voltage regulator in a SOT23-5 package. We could figure that out later at some point. For now, we have sort of an idea of what it is.

Side protip: You can use GIMP, Photoshop, Paint, or any other similar program to label chips and traces

Step 2: Trace, Trace, Trace!

Now that we know what each component is, it is time to figure out how they’re connected. There are multiple techniques you can use to figure this out. Let’s look at some examples.

One of the easy things you can do is trace the connections with your eyes. You will get varying accuracy with this method, depending on how visible the traces are and if the PCB has multiple layers or not. In the image above, we can see that the two labelled pints seem to be connected.

To confirm our connection hypothesis, we can test if the traces are connected with a multimeter. Most multimeters have a continuity test mode, where you can use the leads to test if two points in a circuit are connected.

The multimeter is set to test continuity (green box) and the two points are connected (red line)

Indeed, when we test for continuity, we get a buzzing sound from the multimeter which indicates that these two points are connected. What are these, though? Well, these two points are both solder joints. We’d have to flip the board to see which components are soldered at which location. If you’re unsure, just use the multimeter!

Cross-referencing the datasheet with the pins we found, we can see that pin 13 (PD7) of the ATmega328P is connected directly to the female header labelled “7” which is pin 7 of the Arduino. We can safely assume that the Arduino uses PD7 directly as a digital port. We can tell which way the chip is placed based on the notch (marked in yellow in the picture).

Now all we have to do is do the same for each chip or component on this board, and see where it is connected.

A) Select a chip or component on the board
B) Trace where its connections are going
C) Check the datasheet to see what the functionality or names of the pins are
- This is just to be able to refer to them later as we analyse the circuit a bit further
D) Rinse and repeat

To keep track of it all, you can use electronic design automation (EDA) software like EasyEDA, KiCad, EAGLE, Proteus, or use pen and paper (the more colourful the better!). Whatever you find easy, can afford, and can obtain legally. Some EDA software is available for students at discounts or even free.

Step 3: Understand (or Guess) the Connections

You can do this step simultaneously with the previous step (i.e. figure out the functionality of each chip on its own, referring to its datasheet), or you can start checking the datasheets after you’ve traced all the connections.

Let’s for example figure out some of the pins of the ATmega328P and what their functions are on this board.

Just by tracing from the pictures we took, we can see that there are some pins on the ATmega chip which are connected to Arduino’s digital pins, just like the previous example. However, when there is a via, the pictures weren’t enough. The yellow, orange, and off-white traces in the picture above are coming out from the Arduino pins into a via. If we flip the board (top right image), we can see the traces coming out of the vias into the ATmega chip. Even though this isn’t really visible to us due to the traces going under components already in place, we can verify it with our trusty multimeter and test to see where the trace coming out of the via goes!

A via is a connection between two or more layers of a PCB. It’s usually a drilled hole and you’ll see a trace on one side terminate at the via and another on the other side start from it (or vice-versa!).

A seemingly small part that stands out is this silver thing connected between two of the ATmega’s pins. We have no idea what that component is and it doesn’t seem to have a very clear marking. However, using our multimeter and by looking at the traces, we can tell that they connect the silver components to pins PB6 and PB7 of the ATmega chip.

To the Datasheet!

Looking at the pinout diagram from the datasheet, we can see that digital pins 0 through 11 (as well as 12 and 13, trust me!) are connected to what seem like I/O ports on the ATmega328P. Arduino pin 0 is connected to ATmega PD0, pin 1 to PD1, pin 2, to PD2, and so on. All the way until pin 8, which is actually connected to PB0, and then pin 9 is connected to PB1, and so on until we reach pin 13 which is connected to PB5.

But, what do these pins on the ATmega actually do? If we check the datasheet, we can find their exact descriptions.

They’re both 8-bit bi-directional I/O ports. Cool! Now we know why they’re connected to Arduino’s digital pins. But, wait a minute… What about PB6 and PB7?

We can see under 1.1.3. Port B - PB6 and PB7 can be connected to a crystal oscillator. We now know what that silver component is :)!

Multi-layer PCBs

Since we mentioned vias, it is worth mentioning that not all boards are double-layer PCBs like this Arduino here. Many boards have way more than two, especially more complex ones. So, “the other side”, can be just a different layer that isn’t the outer one on either side. Depending on the design, you may find only the power traces in the outer layers and all of the data traces in the middle. In that case, it’ll require a bit more work :) For now, we’ll stick to our simple double-layer Arduino board!

Step 4 Onwards

At this point, you can do many things depending on what the goal of reverse engineering the hardware is. There are so many things you can do. It ranges from simply identifying the roles of the chips and drawing the schematic of the board, all the way to analysing what the electronic design of a chip on the board is internally. Let’s take a look at some of the things we can do.

Storage/Flash Dumping and Analysis

If the board has a flash or a form of storage device on it, you could consider dumping it and analysing it. It could have important code, microcode, or anything which could hint at what the hardware is doing or help understand it better.

You can watch this video by Ben Eater on analysing a ROM of an old TV censoring device! Super interesting.

Test Pads or Debug Headers

Some devices make it to the market with test pads or debug headers still connected and enabled on the board. These pads/headers are often used during development or for debugging. However, when they’re left by devs, whether accidentally or on purpose (for debugging or maintenance), they can be a very useful addition to analysis. You can use a logic analyser or an oscilloscope for this and see if you get anything useful. Here’s an example of test pad locations on a Raspberry Pi Zero 2 W, another open-source hardware project.

Decapping and RTL Recovery

Another option to understand chips (especially those which don’t have a datasheet, weren’t documented, or are too old) is to decap them and look at them under a microscope. While that is probably expensive, it can yield some good results. If you don’t want to do it yourself, there are some services online which can do it. An example service is DirtyPCBs’ Dirty Decapping.

I’ve seen some people even X-ray chips and memory and analyse their content after digitally processing the images they got from the X-ray scans.

And then?

With enough time, patience, effort, and takeaway meals, we could reverse engineer the entire board and have an identical schematic for it just like the Arduino’s. However, this post is meant to be an introduction to hardware reverse engineering. Maybe, just maybe, sometime in the future we could do that :). For now, you know that hardware reversing exists and is just as fun as software reverse engineering!

Also, if hardware topics like this make you super curious and interested, you may want to check out The Hardware Hacking Handbook by Colin O’Flynn and Jasper van Woudenberg. Again, just please be careful. Only reverse engineer boards you are legally and ethically allowed to reverse engineer.

Thanks for reading! I hope this blog post was fun to read and was beneficial to you :) If you liked this post, please share it with your friends who take X-rays of flash chips for fun in their spare time!

Partial cover image credit: digital multimeter by Good Father.

Back to Basics: Why do we return 0?

2023-10-27T00:00:00+02:00

There is a very high chance that if you read my blog, you’ve seen the line of code return 0; at some point. Specifically, you’ve probably seen it in a main() function, like so:

int main()
{
	//Do stuff
	return 0;
}

Why, though? Can I return something that isn’t zero?

The simple answer is yes. main() is, after all, a function and can return whatever you want it to. Not just that, but programs in general can have any exit code.

Exit Codes

We have discussed before that when you run a program, the OS loads it as a process. That process executes instructions until it exits. However, before it exits, it sets an exit code(value) for the OS to tell it why or how it exited. Returning from the main() function makes the process terminate and sets the exit code to main()’s return value.

On Linux, you can check the exit code of a process by using the shell variable ?. On Windows, you can do the same by checking the variable ERRORLEVEL. e.g. echo %ERRORLEVEL%.

What if I want to exit the program from a function that isn’t main()?

Exit System Call

You use an exit syscall. This differs between languages and OSs. However, Let’s look at Linux for an example.

On Linux, with the standard C library, you can call void exit(int status). This will exit the process and set the exit code to status which is the parameter given to it.

It is worth nothing that returning from the main function does eventually lead to an implicitly-called exit() as well.

Example

#include 
void exitWithError()
{
	exit(128);
}
int main()
{
	exitWithError();
	return 0;
}

What does this mean?

Well, it is just a way of showing why or how a process terminated. Conventionally, 0 means “All good!”. The process executed fine and exited due to a normal reason (i.e. not a crash or error)^[1]. Any other exit code would indicate some form of message regarding the process termination to the OS or user.

^[1] - That is if the developer programmed it correctly.

On some modern operating systems(at least Linux), you’ll find exit codes to be ranging from 0 to 255(an 8-bit integer value).

Example Use Case

If you’ve ever used a Debian-based Linux distro before, you may have seen someone tell you to run the command line: sudo apt update && sudo apt upgrade. Have you ever asked yourself what && does?

&&

&& ensures that the previous “statement” (in this case, the execution of apt update) is true (i.e. exit status is 0). There are two other useful Linux commands we can use to demonstrate this, which are true and false which return 0 and 1 respectively.

You can use this to only proceed if the previous command executed successfully. In the apt example, it wouldn’t upgrade the packages if there was an issue updating the package infromation.

||

Even more interesting is &&’s evil sibling: ||, which will proceed only if the previous command returned anything other than 0 (i.e. had a non-zero exit code).

||, &&, and Exit Codes

Ultimately, we can use both in combination to do different things based on the success (or failure) of the process. We can use the following program to demonstrate that:

#include 
int main()
{
    char c = getchar();
    if (c == 'A')
        return 0;
    else
        return 255;
}

You may or may not have asked yourself before why we return 0. Whether you did or not, I hope you found this post useful, fun to read, or both!

Bonus

But, wait! There’s a bonus :) Remember when I said that main() is just a function and can return whatever you want? Can we make main() return main()? Pretty sure nothing can go wrong with that…

#include 

int main()
{
    printf("Bad Recursion, Brb...\n");
    return main();
}

If you liked this post, please share it with your friends who like returning -1 instead of 255.

As always, thanks for reading.

Labelling: A Pandemic of Our Generation

2023-09-29T00:00:00+02:00

A while back I read about something called the “labelling theory”. The labelling theory suggests that an individual’s identity and behaviour may be determined by the terms that are used to describe them by others (i.e. labelling). This theory, for example, states that if you keep calling someone a criminal, they will eventually become an actual criminal.

Is this true? I don’t know. But, it made me think of something prominent in our generation: labelling. It has become extremely easy to label people with single words, especially on social media platforms. In a world where a lot of our communication consists of short comments, emojis, and reactions to posts, it has become really easy to label people and judge them. Not just that, but the hive mentality makes it even worse. The hive mentality is when a person changes his opinion because a large group of people have an opposing or different opinion. For example, you see a video which you like on some internet platform, you go to check the comments and you see that people don’t like it. Instead of liking the video and carrying on, you follow the hive mentality and dislike the video or join the hate. Even if it was something you don’t actually believe in. It happens so subtly that you may not even notice yourself doing it.

Now imagine someone labels a person in the comments, and even though you may have never thought of such a shallow label, you start agreeing with that person. Is it actually your opinion or is it the hive mentality and the ease of labelling affecting your opinion as well? Whether that label is insulting or not, it still can’t fully describe a person. It can describe a behaviour, and even then, does it describe it or is it a judgement? What you saw in that meme/video/post is just a part of that person and their behaviour.

That’s the moral of whatever I am trying to get to with this post. We try labelling ourselves with simple words, even seemingly innocent ones such as “introvert, extrovert, etc…”. However, we are very complex beings and we can’t be described with single words or labels.

I noticed this pattern a lot on a lot of social media platforms and I really don’t like it. I think that as humans, we should have better communication and less judgement. Anyway, the next time you’re browsing the internet and come across a label. Ask yourself the following:

Is it a description, or is it just a form of judgement?
Is it adding something constructive or is it purely an insult?
Do I agree with that label or am I following the hive?

Until next time…

_{Icon used in the cover by Iconjam on Flaticon.}

Reverse Engineering 101: Dissecting Software

2023-08-25T00:00:00+02:00

Reverse engineering… Could it be just the opposite of engineering? Is it that simple? Let’s see!

Reverse engineering is a very broad field which has lots of applications. Not just that, but almost anything can be reverse engineered too. Let’s take a look at it in this post.

Reverse Engineering

Reverse engineering is basically understanding the internals of something without having access to the original design. While a lot of people think of software when the words “reverse engineering” are said, it, unsurprisingly, isn’t limited to software.

Anything that has properties and behaviour can be reverse engineered. For example, mechanical parts can be reverse engineered to create a replacement for a broken part in a mechanical system if a replacement is no longer being sold. You would have to reverse engineer the system, know where that part fits, what its dimensions are, what material it is made of, etc…

Anything, you say? Well, yes. Maybe you could even figure out how to make a Nuka-Cola… Easy on the radiation, though!

Electronics Reverse Engineering

Modern electronic circuit boards are usually complex and have a lot of components. Just by looking at a PCB, you wouldn’t really know what it does most of the time. However, you can start following the traces, chips, and other components on the PCB and figure out exactly what it consists of and how they’re connected. Add some more specialised tools, some reverse engineering skills, and you now know how the logic inside the circuit itself (or even the chips themselves) works.

Software Reverse Engineering

Like all other forms of reverse engineering, software reverse engineering is understanding how software works without having access to its original design.

In this case, the design/implementation is the source code. You can check The Life of a Binary to understand more on how software is built. But the summary is as follows:

Software is compiled from source code into a binary file (also known as a program)
The binary file is loaded by the operating system
The program behaves based on how it is programmed
- e.g. You click some button you get some action

We can do exactly the opposite of that and then we would’ve reverse engineered a program. With only one difference: we don’t go back to the original source code. We only want to understand how the program is behaving. We can’t get the exact original source code from the binary, and luckily, that doesn’t matter! It doesn’t matter what a variable’s name is or how a developer decided to write a function. As long as we have a behaviour, we can reverse engineer it and we can get pretty close to what the developer wrote. Just not the exact same source code.

To explain this a bit further, you will understand what the binary is doing and what its internals are. But, you will never be able to know what the actual code the developer wrote was. Which, again, doesn’t matter. You know what the program does and how it does it. That’s the goal of reverse engineering.

We will get to know more about how reversing is done and what will be looking at to do so when we talk about disassemblers and decompilers later on in this post!

For now, let’s see how any of this is useful.

Applications of Reverse Engineering

Reverse Engineering has many uses. Here are some of them!

Exploit and Malware Development

To develop malware and exploits for a specific platform, you may need to reverse engineer parts of it to understand how it works. In this case, it is particularly useful if you’re doing a black box penetration test where you don’t know much about the platform before-hand.

Exploit/Malware Analysis

Malware is literally just malicious software, software that does bad things. Therefore, malware analysis can be seen as “just” reverse engineering such bad software. Of course, it is more complicated than that. But, we can stick to that abstraction for now. :)

Cyber Espionage

How do you best understand the capabilities of your enemy? Get hold of it and take it apart. Militaries do so with aircraft and tanks, and cyber weapons are no exception.

Hardware and Low Level Security

Lower level software and firmware are often more locked down and complex. A lot of the time, a researcher would have to reverse engineer a certain part of whichever platform their researching due to it being undocumented, for example.

Interfacing and Electronic Component Obsolescence

When you want to interface two electronic components, you may have to reverse engineer one of them (or both) to find out how it works and what would be the suitable way to connect it to the other components. This also helps when an old component becomes obsolete and you want to replace it with a newer component but there isn’t documentation on how they work together.

Deep Dive: Software Reverse Engineering

At this point, I really recommend that you go and read The Life of a Binary if you haven’t. Anyway, the end product of the building process of software leads to having a program which behaves in a certain way. This program is actually more or less just machine code (AKA 0s and 1s) telling the CPU what to do. Sure, there are other elements to it, which we discuss in the other blog post mentioned. The summary is, we have a program which is basically a file of machine code and we want to reverse engineer it.

How can we do so?

Disassemblers

To put it simply, a disassembler does the opposite of what an assembler does. It parses the machine code and displays it in human-readable assembly. Or at least, that’s what a basic disassembler would do. Modern disassemblers often add features like symbol resolution, API call parameter highlighting, and a couple of other neat features that may help you.

objdump (part of GNU Binutils v2.41.0), a command line utility which has disassembly features

Disassemblers are generally used to get to the deepest details of what a binary does. As it shows you all the assembly instructions and there usually won’t be anything beyond that except API calls which can be looked up in their documentation a lot of the time.

IDA Free 8.3, a very powerful disassembler and debugger. Also a bit of a nicer tool to use compared to objdump

Some malware and sometimes legitimate software try to trick disassemblers into displaying the wrong assembly code by exploiting the way they work. Anti-disassembly is a very interesting topic which you may want to read about.

Debuggers

Debuggers, well, help you debug. Wouldn’t have guessed it, would you? Jokes aside, debuggers can be used to run a program instruction-by-instruction and see what is actually happening. Just like any debugger you’ve used before, they have breakpoints, step into, step over, step out, etc… Very useful.

IDA Free 8.3 debugging a binary

gdb, a powerful command line-based debugger

Decompilers

A decompiler, like you may have already guessed, shows you the program in a pseudo-code-like format. It presents the file in an arguably more readable format which is sometimes easier to understand but also not always accurate. The decompiler tries to interpret and guess what the assembly it sees was as code before it got compiled. It won’t always work 100% accurately. However, it is very useful sometimes to understand blocks of assembly that may get a bit confusing.

IDA Free 8.3 Cloud-based Decompiler

Just because these exist, doesn’t mean you don’t need to learn assembly. You will need assembly.

Ghidra 10.3, a very powerful decompiler and disassembler, with a recently added debugger

Notice here how both decompilers show printf("%d\n", 1); even though that isn’t what is exactly happening in the assembly. I can also assure you that it isn’t what I wrote for this example binary!

The original code I wrote was:

int x = 0;
x++;
printf("%d\n", x);

While both the decompiled code and the one I wrote present the same output, they’re not exactly the same. I am not saying that decompilers are entirely inaccurate, just be mindful that they’re not entirely accurate either.

Intermediate Language

Some languages, aren’t compiled directly to machine code. They are, instead, compiled to some form of intermediate language which is then compiled at runtime into machine code. These languages are much easier to decompile and usually don’t require an understanding of actual machine code. Example languages are Java and C#, they are both compiled at runtime by a VM such as JRE or the .NET CLR.

dnSpy, a .NET decompiler and debugger

Notice how in dnSpy, there is no assembly to read or anything. You get a decompiled version of the original code which could be pretty similar to the original.

Other Useful Tools

Some other tools can be useful to find out information about binaries you want to reverse. Some examples:

Explorer Suite: A complete suite of Windows binary utilities
GNU Binutils: A suit of utilities for Linux binaries. Contains the popular tool strings which can be used to find readable strings in binaries, as well as objdump
Analysers, such as:
- capa by Mandiant: Can analyse some of the capabilities a binary has such as encryption, encoding, internet connectivity, etc…
- binwalk by ReFirm Labs: Has useful features such as entropy calculation and file extraction by signature
Hex Editors: Have multiple uses including modifying existing binaries if needed, extracting data from them, etc…
- Two popular hex editors in the reverse engineering and malware analysis community are: HxD for Windows and Bless for Linux

Ethics of Reverse Engineering

You legally and ethically aren’t allowed to reverse engineer everything in the world. The reason why you’re reversing something also matters a lot. For example, reversing commercial software is often restricted. But, reversing malware, open sample code, and open source software is fine for the most part.

Make sure to check which regulations apply in your region of residence AND region of work and obviously make sure you follow them! You can also check out the Electronic Frontier Foundation’s Reverse Engineering FAQs for a US perspective on this.

I am not a lawyer, don’t take anything I say here as any form of legal advice.

Great, Where do I Learn More?

Well, depends on what kind of reverse engineering you want to do.

Personally, I would generally recommend the following if you want to go for software reverse engineering:

Brush up on your C knowledge
- Understanding pointers and data structures is usually enough
Learn x86 Assembly
Learn Reverse Engineering and PRACTICE reversing binaries
Learn more about binary file structures
- PE Files, ELF Files, Mach-O, etc…
Learn more advanced x86 and OS-related topics (go as deep as you like)
- Segmentation, paging, operation modes, control registers, SMM

Or don’t… Do whatever makes you good at what you want to do! There is No Single Roadmap.

Needless to say, it doesn’t have to be in this exact order, as long as you cover the basics (the first 4 points)!

You may also want to check OpenSecurityTraining2’s learning paths, depending on what you want to do on the long run.

Some Resources

Open Security Training 2:
- Architecture 1001: x86-64 Assembly
- Architecture 2001: x86-64 OS Internal
Open Security Training:
- Introduction To Reverse Engineering Software
- The Life of Binaries
Other Great Resources:

That’s about it, thanks for reading! I hope you learned something new from this post. If you did, or you at least enjoyed it (or both), please share it with your friends who enjoy staring at hex bytes and assembly code for hours upon end.

Some icon and image credits: WikiMedia Commons, Flaticon, Pixabay, The Noun Project, Pixel perfect

The Life of a Binary: From LoC to a PID

2023-07-28T00:00:00+02:00

Programs… Binaries… PE Files… ELF Files… What are those? If you’ve read about computers at some point or even just used them, you’ve probably come across these terms. Today we’ll take a look on how programs are built and the stages they go through. This post is a bit of a primer on knowledge required for multiple fields of software engineering and computer science. One of which is software reverse engineering, which I’ll talk about in the next post.

Refresher: Trip to the Computer Class

If you remember, back in primary school you were probably taught something like this:

We use input devices such as a mouse and keyboard to make the computer process whatever we want and then the computer would give us some sort of output such as something visual on a monitor or a sound.

Then, at some point you may have been taught this in university/college if you studied CS or CE.

We give our computer input and get output. What happens in the middle, though?

A quick summary so that you can follow the upcoming parts, the processor (Central Processing Unit) executes instructions given to it by the user. The processor also uses memory to store data it needs during the execution of the instructions mentioned before.

e.g. The OS loads the program you just double clicked into memory, and the processor starts executing the instructions in that program.

The Birth of a Binary

As mentioned a couple of times previously on this blog, computer programs are written in code. Programming languages such as C, C++, etc… This code is referred to as “Source Code”, as it is the source from which the programs are born.

A PE file is a Portable Executable, a Windows format. Usually seen with extenstions .exe, .dll, etc… ELF is the Executable and Linkable Format on Linux. Usually seen without an extension or .so, .o, etc..

The Compiler

The source code files are passed to a compiler, which compiles the high-level human-written code into machine code. Instructions which can be executed by the CPU. Your CPU can’t execute x++ but it can execute inc eax, for example. What a compiler does is basically that. The result of the output from a compiler is an object file. A file containing machine code.

The Linker

When developers write code, they often use libraries which are basically code that has been written by someone else before for a specific purpose and then other developers can just reuse it in their code. Libraries contain functions, and functions are referenced in code by developers. For example:

#include 

int main()
{
	printf("Hello, world!");
	return 0;
}

The above program prints “Hello, world!”. However, you don’t see a definition for printf() in my code. That is because I included stdio.h, the standard C library for input and output (I/O). This tells the compiler to include that library where the function definition lies. The compiler would then have to link the program to that library in order for it to work!

This is where the linker comes in. The linker sees which references from other library the binary needs, and it links the binary to them. i.e. It tells the binary where the function is in the library, and it writes information in the binary for the OS to know that this program needs this library.

But, how does the OS load this program and make it work?

The Life of a Binary

The OS reads the file, loads the content of it where it is appropriate for it to execute, and loads any libraries needed by the binary. Libraries, after all, are just programs. The only difference is that you can’t double click and run them like normal programs.

This is a very simple abstraction, but if I go into how programs are loaded and the actual structures of binary files, this post will be way too long and you will probably click off now. So, let’s talk about it some other day over a cup of tea. For now, check the link to a course at the end of this post to get an idea on where you can learn more.

That way, your program is now running and it has a process id (PID). What started as some Lines of Code (LoC) is now a running program with an id. Gosh, they grow up so quickly .

The Compilation Process

Let’s take a bit of a deeper look on the whole process. We now have a high level understanding of how lines of code become a running program. Let’s take a look under the hood and see the four stages of what a modern compiler like gcc for example would do.

This is some C code.

#include 
#define RETURN_VALUE 2

int main()
{
	int x = 0;
	x++;
	printf("%d\n", x);
	return RETURN_VALUE;
}

When compiled, it is compiled to the assembly on the left here.

After it is assembled, it’ll look like the machine code (in hex) on the right.

The assembly is for demonstration only. Actual compiled code would look slightly different as it would not yet be linked. However, this code IS linked.

Preprocessing

The preprocessor is a component of the compiler and it processes header files, macro expansions, conditional compilations, etc… For example, before preprocessing we can have a macro like this in our code:

#define SIZE 2*1024 //Macro here
int main()
{
	printf("%d", SIZE);
}

After preprocessing, SIZE is replaced with its defined expansion:

int main()
{
	printf("%d", 2*1024);
}

The existence of the preprocessor allows us to write code that may be more understandable and easier to change later. Don’t quote me on that, though. As different people have decided that macros should be used in different ways.

I think it is fine, as long as your code doesn’t look like this.

Compilation

The compilation process is taking the code produced by the preprocessor and compiling it into assmebly. You can run gcc -S -masm=intel and that will preprocess and compile your file only without assembling or linking it. This will show you how the assembly looks.

This file on the right (hello.asm, the output of gcc) is purely human-readable assembly.

Assembly

The assembly process assembles the human-readable assembly code into a machine code binary. That binary still isn’t executable as it hasn’t been linked yet. However, it will contain very similar assembly instructions. You can run gcc -c which will preprocess, compile, and assemble the code only. No linking yet.

Notice how, on the right, the file is not executable, and to read it in a human readable format, you need to use special software known as a disassembler (check the next blog post).

Note that during compilation, one line of code could be one instruction or multiple instructions when compiled to assembly. But, when the assembly is converted into machine code, it is mapped one-to-one where each assembly instruction is represented in its machine code format. tl;dr: Assembly is the human readable form of machine code.

C Code	Assembly	Machine Code
`x++`	`inc eax`	`0100 0001`

I know that incrementing a variable x won’t necessarily increment the register eax, please don’t @me… This is a simplification.

Linking

The linker does the last step of the process. It links the binary file to the libraries it needs in order to run. You can run gcc without any flags and it will preprocess, compile, assemble, and link.

Note that on the right, the program is now executable and prints “Hello” as expected. It is also now a “dynamically linked” executable as shown when running the file command. Notice how that compares to the file hello.obj from the previous stage, which doesn’t state that it is linked.

And that is it. That’s how a binary is born, lives a hopefully fulfilling life, and then is killed by a user because they decided to try a shiny red button with an X on it…

Let’s all come together for computer process rights and stop killing them when they do something as simple as freezing :(

Thanks for reading! If you want to know more about binaries, their structures, and get into this topic in much more detail, please do check out The Life of Binaries. It is an outstanding free course, and -obviously- inspired this blog post.

I hope you enjoyed this post and learned something new! If you did, please share it with your friends who think of the threads and decide to save process’ lives by using SIGTERM instead of SIGKILL.

Some icon and image credits: WikiMedia Commons, Flaticon, The Noun Project

Shield Yourself: Five Tips to Strengthen your Online Security

2023-06-30T00:00:00+02:00

I could talk for the entirety of this post about how connected we are and how much we rely on the internet and technology and all that stuff but I am pretty sure we all know that. So, let’s just get straight to the point. The fact is that most people have horrible online security habits. We reuse passwords, we set weak passwords, we accidentally post sensitive data online, and many more bad practices to the point that we more than often prioritise convenience over security.

What inspired me to write this post is that when I asked a couple of people in my inner circle, they admitted that they reuse pretty weak passwords and have a couple of bad security practices. The bigger problem is that most of my inner circle are techies and if that is what they do, I can guess how bad it is for people who aren’t really tech-savvy. In this post, I’ll give you five things you can start doing literally now to improve your online security.

#1 : Use Multi-Factor Authentication

This is my number 1 tip for anyone as it is often the most abused weakness when present.

More than one factor?

You see, we often think of usernames and passwords as a factor of authentication. But, they aren’t the only factors. Factors of authentication have been classified as the following:

Something you know: A password, a pin, etc…
Something you are: Something that is part of you, your fingerprint or eye scan
Something you have: A key, your mobile phone, a key card, etc…
There are also “something you do” and “somewhere you are” but those aren’t used as much as they can be impersonated fairly easily

Combine two or more of those and you have got a pretty functional MFA system.

Example

You login with your password (something you know) and get an SMS with a code on your phone (something you have) used to verify that this is you logging in
You login with your password (something you know) and an app on your phone asks you to verify your fingerprint (something you are)

Which factors should I use?

Any two or more of them. You can use your fingerprint via your phone. You can use a physical USB key such as YubiKey. You can use an authenticator app on your phone such as Google Authenticator.

DO NOT USE YOUR PASSWORD MANAGER (see the next section) AS YOUR MFA. This makes it essentially a single factor as both methods of authentication are accessed from the same application.

Also, just a small side note: If an alternative to SMS 2FA exists for the service you are using, please use it instead. SMS 2FA isn’t as strong as for example authenticator app 2FA. However, it is still way better than no 2FA.

#2: Use a Password Manager

This tip is also extremely important as it makes it difficult for evil people to get your password to begin with.

Passwords

Most of the time we use passwords as our login method. If one password gets known somehow and you’re using it somewhere else, that is a huge issue as that other account could be accessed using the leaked password too.

A small technical detail: Passwords are usually stored on servers for whichever service/website you use. However, to increase security, the passwords are never saved as they are (this is called plaintext). When you register, the password goes through a mathematical function which derives a very large number that is called the hash from the password you entered. After that, if everything is implemented well. No one should ever know what your password is! When you try to login, the server hashes the password you entered and compares it to the one stored on the server before letting you log in.

Have any of your current passwords been leaked before? Well, why don’t you go over to haveibeenpwned.com and find out! This website will show you which websites you use have been hacked and if your password was found in the database. If your password was weak, a hacker may have been able to figure out what it is. It is a great idea to change the passwords for the accounts (and everywhere else you use that same password) that appear on that page.

Well, it looks like I have been pwned :(

Attack Example

Have you ever received an email like this? If you have, you’ll know that usually, they send you your real password. But well, how do they know it? They usually find it in an online leak or find its “hash” mentioned above and figure it out by trying many different passwords until they get it. Sometimes they even send you this email but with a password you no longer use. This just shows that if you’ve changed your password lately, they probably don’t actually have access to your account. If you ever get an email like this. Change your password, reset 2FA recovery codes, delete it, and pray that it won’t end like the Black Mirror episode “Shut up and Dance”.

Password Manager?

A password manager is simply software that generates strong passwords for your online accounts and stores them for you. The passwords are stored either locally or on a server in the cloud and are encrypted with your master password, which is the main password you use to access your other passwords. Most password managers have fancy features such as auto-filling your login details when logging in and automatically locking your password vault when you’re away. You can use an offline local one like KeePassXC (which is free and open source) or any other cloud-based password manager. I won’t recommend any specifically as they’re mostly commercial products. However, you can check out passwordmanager.com for comparisons between products categorised by their support for operating systems.

Master Password?

Well, the master password for your password manager is crucial. This is a password that has to be strong, easy to remember for yourself, and at the same time long enough to not be guessed.

Now you may be asking, what’s a strong password? A strong password is long, doesn’t contain any personal information, and has a good combination of mixed-case characters, numbers, and special characters.

Example

If your wife is called Linda and your birthday is on the 21^st of January 1977. Linda*21011977 is a horrible password. It is “long”, has a special character, a number, and mixed-case characters. However, anyone who spends maybe 3 minutes on your Facebook profile, Instagram, Twitter, or anywhere you have personal info posted should be able to figure out your birthday and your wife’s name. They can then start guessing and trying different combinations such as linda21011977, 21011977-linda, etc… It goes without saying that this doesn’t apply to your birthday and your wife’s name only. It applies to the birthdays of friends and relatives, places you’ve lived, and things you like. Anything that could be figured out from your public profiles or found through social engineering is a big no.

To avoid confusion, keep in mind that the hacker won’t be using the regular login screen which has a timeout after usually 3 tries. They’d usually have to obtain your password’s hash from a leak or similar.

A good master password would be something like Junkyard-Museum-Dr1ver-Apple which makes absolutely no sense and has nothing to do with you. You can also create a fun story in your head to remember it!

I really shouldn’t have to mention that using that specific password is not a good idea. Generate and use something similar, but not that exact one.

How do I get started?

Choose a password manager
Spend the lesser part of an evening changing the passwords of your most important accounts to ones automatically generated by the password manager
- That could be your bank, social media accounts, game accounts, etc… Anything you value
Use the password manager to login now instead of using your old password
Whenever you use an account which you haven’t changed the password for, change it and add it to the password manager
Enjoy your enhanced security

#3: Protect your Bank Cards

Image credit: Wikimedia Commons

The convenience of using your online debit/credit card is very nice. You don’t have to worry about cash or anything and most of the time you find it necessary to shop online. Needless to say, it is a horrible practice to enter your card number everywhere you want to shop. A lot of online vendors store your payment info in very non-secure ways. Because of that, you’d usually want to avoid entering your payment details anywhere.

What should I do then?

Use a temporary or virtual card when possible
- This feature exists for some banks where you can create a virtual card from your banking app
- It also exists for some mobile wallets and other similar options
Use cash on delivery or pay using another method which doesn’t include handing over your payment details
Use a service such as PayPal and pay using it when possible
- Sure, you’ve handed over your payment details to PayPal. But, would you rather give it to just PayPal or give it to all other websites and services?

#4: Be Careful About What You Post and Where

Image credit: Stockio.com

It can be tempting for a lot of people to share details of your life online. Sure, it can be fun. But also, when sharing details be careful. Are you posting pictures with location data in them? Do you have personal data on your profile that is unnecessary?

What you post online is used to identify you. It can be used to guess your passwords, location, where your home is, when your home is empty, etc…

Think of something like this picture above but for your social media profiles. I’ve come up with a short non-exhaustive list of things you should avoid posting:

Location data: Where you live, where you are, where you go often, etc…
Your personal schedule: When you’re out for work, when you’re in the gym, etc…
Personal details: Your birthday, your exact wedding day (i.e. anniversary date), etc…

Malicious actors can use this information against you in many different ways. One of which is impersonation, they may be able to impersonate you using the information you have made public. They could also tie the information with the real world and perform malicious actions such as theft.

Bonus: This isn’t just limited to what you post yourself online. Check the permissions of your phone apps, see if any of them are collecting your location, accessing your files, or recording using your microphone when they shouldn’t.

#5: Protect your Devices

Protecting your computer or phone is perhaps the most crucial step. No security or privacy really matters if your device gets malware or gets stolen, right? I mean, sure, maybe you don’t have important files but aren’t you logged in from your browser on important accounts?

Software Protection

This is quite simple. Invest in a proper anti-malware solution (also known as an antivirus) and keep it updated. There are way too many solutions out there. I will not recommend any of them, just do your research and buy whatever works for you and is within your budget. And please, please, don’t pirate your antivirus. That is one of the worst things you can do.

Physical Protection

This boils down to maybe three things you can do without going through too much hassle:

Always lock your device when you’re not using it or leaving your desk
Try to keep your device in physically safe places so that they won’t get stolen
- I know, nobody loses their device on purpose. Just remember to be careful
Avoiding plugging unknown devices into your devices or vice versa, including:
- USB drives you found somewhere
- Mice/Keyboards/Thunderbolt docks that aren’t from a trusted person
- Public USB charging sockets or stations for phones

Summary

Here is a nice tl;dr for you if you just want the tips right away!

Start using MFA everywhere
Use a password manager, make your passwords stronger
Avoid as much as you can entering your debit/credit card number anywhere
Be very careful what you post online, is it really necessary?
Install an antivirus, keep it updated, avoid plugging your devices into anything untrusted and vice versa

Thanks a lot for reading! I really hope this post has made you realise how small things you can do can help you go a long way in your online security. I hope it also showed you how small things we dismiss as unimportant may be a security or privacy risk to us! If you ever feel like the things mentioned here are too much effort, just remember that convenience and security are two opposite sides of the scale. You can’t have both fully at any given time, but you can compromise based on how much risk you can afford and how much effort you can do.

If you liked this post, please share it with your friends who are worried that they are being surveilled by The Galactic Empire.

PS/2 vs USB, FIGHT!

2023-05-27T00:00:00+02:00

Last month, I published 3 blog posts on how computers work on a low level from a hardware perspective. Such a low level, that we created an entire computer on breadboards. Now, you may be wondering how do we interact with a computer? Well, you already know how computers execute instructions to process values in memory and you’ve probably used a keyboard and mouse before… But, how does a computer accept and process inputs from keyboards and mice?

Not All Input Devices are Equal

Two Ways of Input

Since the dawn of modern computing (as far back as the 1950s), we have had generally two methods of handling user input. Interrupts and device polling. Both of these methods can be used to receive user inputs, even though they are very different from a hardware and software perspective.

Interrupts

Interrupts (or at least hardware ones) are a way for devices to tell the CPU that they have something for it which it needs to process. A device that uses interrupts basically says to the CPU: “Hey! Stop what you’re doing. I have data for you” and the CPU pauses whatever it is doing, processes the input, and then continues execution right where it stopped. How rude of the device…

The CPU doesn’t have to waste any time to check if there is any new data. It is just interrupted whenever something is ready for it to process. And when it is interrupted, it executes an interrupt handler. After that, it returns to whatever it was doing.

A pseudo-implementation in which the CPU processes data from an interrupt-based device may be something like this:

//Interrupt Handler
handleInput()
{
	//Load the data and process it
}

//Define handleInput() as an interrupt handler.This is specific to AVRs. But, you get the idea.
#pragma interrupt_handler handleInput

int main()
{
	doStuff(); //The CPU doesn't care about anything as long as it isn't interrupted
}

Well, now you may be wondering: How would a device interrupt the CPU? The answer is simple: through a hardware interrupt.

Hardware Interrupts

A hardware interrupt is an electrical signal sent to the CPU via a specific pin or connection. Let’s simplify things a bit and look at the original Intel 8086 CPU’s pin-outs.

We can see pin 18 labelled as INTR which is short for interrupt (sometimes also labelled as INT). The input device would send a HIGH signal to pin 18 to interrupt the CPU. Depending on the application, the developer would have already implemented the handler for the device and that handler would be executed.

The NMI pin also exists, which stands for Non-maskable Interrupt. But, let’s just not get into that today, ok?

Polling

Devices that use polling, on the other hand, are a bit more polite. They wait for the CPU to check whatever the device has for it. The CPU in this case would keep asking the device: “Do you have something for me? Huh? Pls reply” and the device would then send whatever data it has. This data could be a result of the user typing something on a keyboard, moving the mouse, etc…

This whole process is rather wasteful for the CPU as it keeps wasting time to check if the device has anything for it to process. However, it is much cheaper to implement.

A pseudo-implementation where the CPU is polling a device, would be kind of like this:

int main()
{
	doStuff();
	/*
         The CPU would waste some cycles here to check on the device
         even if the device has nothing useful going on
	*/
        if (somethingToProcess)
	{
		//Load the data and process it
	}
	doMoreStuff();
}

Common Connectors

USB

USB is a very commonly-used example of a protocol which uses polling. Devices which are connected via USB are polled regularly to check if the device has anything to send over.

We are ignoring the software side of things here. Things are a bit different when it comes to software, and sometimes USB events are labelled as “interrupts” due to how software treats them. Even though, physically, USB connections are always polled.

Almost every peripheral you buy these days will probably be using USB. Especially, after the introduction of the type C connector. And why not? It is convenient, easy to plug in and out, cheap to implement, and hot-pluggable, which means you can plug/unplug your USB device anytime.

The only downside of USB is that it uses polling instead of interrupts. In theory, that makes it slower compared to other interrupt-based connectors. How slow though? Does it even matter? In short, not really. For the long(er) answer, continue reading!

You can watch the following video to understand on a low level how USB works.

PS/2

PS/2 on the other hand, although not as popular nowadays due to the use of USB being more prevalent, still exists and was very popular up until USB took over in the late 1990s/early 2000s.

It uses interrupts to communicate with the system and requires virtually no special drivers whatsoever. Pretty much almost any PS/2 keyboard or mouse you will connect to your computer will just work out of the box. That is, if you have a PS/2 port.

If that’s the case, why don’t we all use PS/2? Well, if you’ve ever used them, you’ll probably know why. The connector isn’t really easy to plug/unplug, and is also very bulky compared to modern USB. They also aren’t hot-pluggable.

PS/2 Ports?

Image Credit: Wikimedia Commons by Jud McCranie

Well, they are still being used by some people as they argue that they are “better” and faster than USB. For example, some professional gamers still use PS/2 keyboards as they are relatively faster compared to USB. Another reason is mostly how PS/2 ports are very simple and work as soon as the computer starts. They can help greatly in debugging systems that are failing to start. As sometimes, USB wouldn’t work in those cases.

Even as of 2022 and 2023, some top of the line motherboard manufacturers still include dedicated PS/2 ports.

You can watch the following video to understand on a low level how PS/2 works.

Which is better?

None. You can use either. Both connectors have their pros and cons, use whatever suits your use case.

The summary is that PS/2 is in theory slightly faster than USB. On the other hand, USB is much more convenient to use with features like being hot-pluggable and being able to use USB ports for any device (not just keyboards and mice). However, CPUs these days are way faster than you can imagine. An interrupt from a keyboard or polling a USB keyboard really shouldn’t be much of a concern at all.

If you have issues with your computer, especially with USB drivers or controllers, then using a PS/2 keyboard may just be your thing. Or at least temporarily, until you have things figured out. Also, I read somewhere that some people still use PS/2 for security purposes and disable USB completely since PS/2 is strictly used for keyboard and mouse input and can’t be used for any other purposes.

This could be a good idea? I’m really not sure about how effective this is. But, it may make sense if all you need for a specific machine is a keyboard and mouse. Using PS/2 allows you to disable USB, which in turn disabled USB drives and bad USB devices or “Rubber Duckies”. And please, don’t get me started on creating a “Rubber Ducky” using PS/2 because PS/2 isn’t even hot pluggable so you’d have to restart the whole system and then… yeah… just forget it.

Most people just use USB for convenience, even professional gamers that care about the tiniest of delays. I definitely am using USB, at least. I really wouldn’t want to use PS/2 again. It’s one of the old technologies that I never really miss.

One time as a child, I bent the pins on my favourite PS/2 keyboard and since then I haven’t been fond of connectors with exposed pins. (Yes, I am looking at you, VGA)

Thanks for reading! I hope you enjoyed reading this blog post. If you did, please share it with your friends who enjoy interrupting their CPUs.

The 4043 - Part 3: Days 8 to 10, Well, how do they understand instructions?

2023-04-09T00:00:00+02:00

In the last two posts, we covered quite a bit of the von Neumann architecture and how The 4043 breadboard computer maps to it. This is the third post in a 3-part series on computer architecture and how I built an 8Bit breadboard computer inspired by Ben Eater. If you haven’t read posts one and two, go read them first.

In this post, we will be concluding the series. We will discuss the output module (highlighted in green) and the control logic for the computer (everything else that isn’t ticked, and the Flags Register which we skipped last time). Let’s get started!

Refresher

A quick refresher, we talked in the first post about the “bare minimum” computer which needs:

CPU:
- Registers (✓)
- ALU (✓)
- Control Unit (—)
RAM (✓)
Bus to connect them together (✓)
Clock to sync all of these components together (✓)

Key:

✓ - Discussed
— - Still To be Discussed

By now, we have an idea of how all of these modules work on their own and how they work with each other. What controls them, though? How does a register know when to load a value? When would RAM output a value it is storing at a certain address? We will get into these topics in this post. First, I want to talk about one last module that was built before the control logic: The output module.

Input/Output

I/O is part of the von Neumann architecture, and that makes sense as a computer won’t be all that useful without some form of input and output. By I/O here, we don’t just mean user input and output such as keyboards, mice, speakers, and monitors. We mean any device that communicates with the CPU of the computer and vice versa. This includes Ethernet controllers, storage device controllers, GPUs, and pretty much anything that the CPU communicates with.

I/O is usually done through ports, ports are basically an interface between a computer and any other device. This device can be part of your personal computer overall, but not part of the “bare minimum” computer we’ve been talking about. For example, your computer will function without a storage device controller, but I bet you won’t be able to do much work without a functioning OS which requires some form of storage device to be loaded from.

The summary here is that all devices, even “internal” ones, communicate with the CPU through an I/O port. Modern CPUs mostly have this “port” as a memory location (in the case of memory-mapped I/O) or an actual port from a separate address space (in the case of port-based I/O). This is a bit of a big topic, so I won’t get into it.

Modern computers use chipsets on the motherboard and a lot of other devices to help the CPU communicate with all these devices, we won’t get into that but it is generally a nice topic to look up.

I/O in The 4043

In our breadboard computer, we have only one output device and no input devices. The output device is a seven-segment display module which can be used to display whatever is in the A register. As for input, we don’t have any interactive user input. The user can only interact with the computer via programming it through the memory module. i.e. manually placing the opcodes for instructions and data in memory.

We are communicating with the output module using the bus, and the device has a load signal which allows it to read what is on the bus and display it. This may sound simple, but it becomes really troublesome if we want to add other I/O devices. As each device would need its control pins hard-wired to the CPU. This is a limitation in our breadboard computer’s design, but it really isn’t that hard to implement more I/O devices. We’d just have to add more microcode(explained later) and connections to handle these instructions.

This is why typically CPUs use a separate chip for I/O which can be used to address and communicate with different devices. An example of such a chip is the Intel 8255 chip which was commonly used with the original Intel 8086 processor.

At the end of this post, I have added a link to a Reddit post where someone added more I/O and parts to the computer and was able to get the 8Bit breadboard computer to run the game Snake.

Now we have an idea of everything we need to know in order to start talking about the control unit.

The Control Unit

The control unit is the actual “manager” of the CPU, in the sense that it tells each module when and what to do so that each instruction is executed properly. In order to understand how the control unit operates, we need to differentiate between three things: a clock cycle, an instruction cycle, and a micro-instruction.

Clock Cycle

A clock cycle is a single pulse of the clock signal. i.e. a single transition from 0V to 5V and then back to 0V again.

Each coloured section here is a single clock pulse

Instruction Cycle

Every single CPU has an instruction cycle. This cycle is different between different architectures. But, generally, it follows something like the following figure.

The instruction is fetched from memory. If you remember, in the last post we touched on how the program counter is used to determine which address from memory the due instruction will be fetched from
The instruction is decoded by the control unit. This is the stage where the control unit understands what the instruction will do. Based on the instruction, the control unit sends control signals to prepare the instruction for execution. For example, it would fetch any required data from memory
The instruction is finally executed. The control unit sends control signals to all modules that are relevant to this execution. For example: If the instruction was going to subtract, it would send a subtract (SUB) signal to the ALU
This cycle is repeated for as long as the computer hasn’t been turned off or halted for any reason

Every single instruction ever executed on a computer follows an instruction cycle similar to this. Even the popular nop instruction which stands for “No Operation”, still gets to the “Execute” phase and the control unit would then let the components be idle until the current instruction cycle finishes execution.

A little bit of trivia, in the x86 architecture(before 64-bit iterations): nop (opcode 0x90) was actually just an alias for xchg eax, eax which basically swaps the value in the eax register itself, to end up doing literally nothing.

Micro-Instructions

A micro-instruction is what each instruction divides into. Each instruction is basically a set of actions the computer needs to do. Each action is considered a micro-instruction.

Now, is it safe to assume that each instruction cycle takes 3 clock cycles? No. Each instruction cycle consists of several micro-instructions. In our breadboard computer, we have a max of 8 micro-instructions per instruction. However, actual instructions that are implemented can take anywhere from 2 to 5 micro-instructions. We will get to know why it requires a minimum of 2 in a bit. Each micro-instruction takes one clock cycle.

Micro-instructions are usually labelled as T_n Let’s take a look at an example: Say you want to move a value to the A register, the computer would have to do the following micro-instructions:

The Fetch Phase:

T₁: Move the content of the program counter into the memory address register
- This is to address the instruction that is to be executed, which is kept track of using the program counter
- At this point, the memory data register already loaded the value at the memory address since it reads the address from the MAR automatically

The Decode Phase:

T₂: The MDR is set to output its content, the Instruction register is set to load what’s on the bus, and the program counter is incremented
- Once the Instruction register has loaded the value in memory (the instruction), it is automatically decoded by the control unit and the execution phase is prepared

The Execution Phase:

T₃: The value is output from the instruction register onto the bus and the A register is set to load what’s on the bus
T₄: The computer does nothing
T₅: The computer does nothing
Step 6(actually back to T₁): Rinse and repeat!

Now to answer the question, why does each instruction need at least 2 micro-instructions? We need one for fetching and one for decoding. Depending on what was decoded, we may or may not need more micro-instructions for the execution phase.

The Control Word

But, now you’re probably wondering: “What is a micro-instruction actually doing?” How does it tell the A register to load what’s on the bus or tell the program counter to count?

The answer is simply through control signals, which the control word consists of.

Control Signals

The control word consists of 17 signals. 16 of which are relevant to our microcode. Our computer never uses the OUTB signal (which outputs whatever is in the B register to the bus), I just added it for completeness. Thankfully, it sits right in the middle between the high 8 signals and low 8 signals so we can consider it a divider.

Each signal can have two values: high or low. When a signal is high it functions, effectively signalling the module what to do.

Signal	Description	Signal	Description
`HLT`	Halt Execution	`LDB`	Load what’s on the bus into the B reg.
`LDMA`	Load what’s on the bus into the MAR	`OUTΣ`	Output the ALU’s output to the bus
`OUTMEM`	Output what’s in the MDR onto the bus	`SUB`	Set the ALU to subtract instead of add
`LDMEM`	Load what’s on the bus into the MDR	`PCC`	Make the PC count
`OUTI`	Output what’s in the IR onto the bus	`LDPC`	Load what’s on the bus into the PC
`LDI`	Load what’s on the bus into the IR	`OUTPC`	Output what’s in the PC onto the bus
`OUTA`	Output what’s in the A reg. onto the bus	`LDO`	Load what’s on the bus into the Output
`LDA`	Load what’s on the bus into the A reg.	`FLD`	Load the state of the Flags Reg.

To sum it up, each instruction consists of micro-instructions, and these micro-instructions consist of control signals that are enabled/disabled based on what the computer needs to do. This is exactly how our computer works(and most other computers do).

We will visualise this in a bit and trace a program written in 4043 assembly.

Implementing the jmp

I remember asking you to think of a way to implement a jmp instruction. Have you thought of one?

Well, now that we know the control signals for the computer, we can actually use the LDPC and depending on our architecture, output the value of some register or memory location to the bus to load it in the program counter. This effectively makes the CPU fetch the next instruction from the address that was loaded into the program counter, making it the next instruction to be executed.

The 4043 Assembly

Our computer has a limited set of simple instructions. Limited they are, but the computer is still Turing complete and can compute pretty much anything. Although we have two major limitations which are its speed and the size of the RAM being only 16 bytes. Some people even argue that this computer isn’t really an 8Bit computer since it isn’t able to address 256 bytes of memory and can’t have an instruction set of more than 16. I think it’ll do for now regardless of that.

Instruction	Opcode	Description	Summary
`nop`	`0000`	No Operation
`lda` addr	`0001`	Load value in addr in memory into the A register	A = [addr]
`add` addr	`0010`	Add value in addr in memory into the A register	A += [addr]
`sub` addr	`0011`	Subtract value in addr from the A register	A -= [addr]
`sta` addr	`0100`	Store the value in the A register in addr in memory	[addr] = A
`ldi` imm	`0101`	Loads the value imm to the A register	A = imm
`jmp` addr	`0110`	Jumps to addr, i.e. sets the program counter to addr	PC = addr
`jc` addr	`0111`	Same as `jmp`, but if the Carry Flag (CF) is set (high)
`jz` addr	`1000`	Same as `jz`, but for the Zero Flag (ZF)
`out`	`1110`	Outputs the content of the A register on the 7-segment display
`hlt`	`1111`	Halts execution

But, how does the CPU map which instructions use which micro-instructions on which cycle?

CPU Microcode

For each instruction, we have five micro-instructions. We have a counter that counts from 0 to 4 for each instruction, thus allowing the computer to know which micro-instruction it is executing. Depending on which micro-instruction we’re on and the instruction loaded, these values are used to address into an EEPROM. The EEPROM stores values that represent the signals we mentioned before.

For example, we can program the EEPROMS with values such as the following to implement instructions.

microcode[16][8] = {
   //T1         //T2            //T3         //T4     //T5           //T6 - T8, not used
  {LDMA|OUTPC,  OUTMEM|LDI|PCC,  0,          0,           0,             0, 0, 0},   // 0000 - NOP
  {LDMA|OUTPC,  OUTMEM|LDI|PCC,  OUTI|LDMA,  OUTMEM|LDA,  0,             0, 0, 0},   // 0001 - LDA
  {LDMA|OUTPC,  OUTMEM|LDI|PCC,  OUTI|LDMA,  OUTMEM|LDB,  OUTS|LDA|FLD,  0, 0, 0},   // 0010 - ADD
  ... //More instructions, check GitHub

You can see the full microcode on Ben Eater’s GitHub.

Summary is:
Micro-Instruction + Instruction's opcode = Address
Address into EEPROM + state of the Flags Register = control signal values

Your First 4043 Program

Let’s write a small 4043 program! How about a program that keeps adding 5 to the A register, outputs the result on the seven-segment display, and then halts execution once it goes past 255 (overflows)?

The program is pretty simple, but it effectively uses almost 40% of our available memory. This really shows the limitation of The 4043.

The state of the computer after this program finishes executing, notice the HLT signal

Let’s trace what happens in each micro-instruction!

In the following animations, the LEDs at the bottom left indicate which micro-instruction we’re executing. Each component of the computer has its control lines connected to the Control Unit. Look at the changes between each of the diagrams and it should map to what we discussed earlier.

add 5

This instruction in binary is 0010 (add), 0101 (5), making it 00100101.

We can see that in each micro-instruction, the control unit is enabling components based on the CPU microcode. The signals are determined through a combination of the micro-instruction counter, the opcode coming from the Instruction register, and the values in the flags register. Let’s take a look at two more instructions before we talk about one final thing and then wrap up this really long post.

We still skip looking at jc 4 because it isn’t really that different from a jmp. The only difference is that jc will only work if the Carry flag is set. Look at the jmp 0 section and from there, you should be able to figure out how jc works.

out

For this instruction, we will skip T₁ and T₂ as they are the same for every instruction. All we’re going to look at is T₃ to T₅. As of the start of T₃, the instruction has been fetched and is now in the instruction register and the program counter has been incremented.

This instruction in binary is 1110 (out) and the rest of the bits don’t really matter but I like to set them to 0 when programming the computer, making it 11100000.

In this instruction, after T₃, the control unit isn’t doing anything interesting. It already set the control lines at T₃ and the only remaining thing was the clock to pulse. That’s why at T₄ we see the output module’s display change to a 5.

jmp 0

For this instruction, we will also be skipping T₁ and T₂ as they are the same for every instruction. This instruction in binary is 0110 (jmp), 0000 (0), making it 01100000.

As we can see all this unconditional jmp does is load the specific address (0) in the program counter. We can see that T₁ and T₂ of the next instruction are basically fetching and decoding add 5 which, indeed, is the instruction at address 0.

A picture of the jmp instruction, notice the OUTI and LDPC signals

The Build

On days 8, 9, and 10, I built the EEPROM programmer, the output module, and the control unit.

Module 6: The EEPROM Programmer

The EEPROM programmer is used to write values to the EEPROM easily using an Arduino. There wasn’t much interesting about building this module. However, it was pretty useful and its Arduino code was easy to understand so modifying it was pretty easy. This module took a total of 2 and a half hrs. At this point, I had posted the Day 7 update on Twitter.

Module 7: The Output Module

The output module displaying the numbers of the Fibonacci sequence

This module uses an EEPROM to replace the combinational logic to decode the values which are to be used to display numbers on the seven-segment display. Ben has a great video on that topic.

This module uses a separate clock which is used to switch between the 4 seven-segment displays in order to multiplex them and drive them using a single EEPROM instead of 3. The output register is still however connected to the main clock of the system and is synced with the other modules.

I’d also like to mention that the wiring for this module was absolute hell. Remember when I mentioned that the wires were too thick in the first blog post? This is the part where I was barely able to get by and wire them properly. You can even see some of the wires heavily scuffed from how difficult it was to place them.

This module took 4 hours and a half to build. At this point, we had reached Day 8 on Twitter.

Module 8: The Control Logic

This module by far was the hardest and the longest one to build. It took a total of 7 and a half hours, divided as follows:

The control word and its signals: 2 and a half hours
CPU microcode and control lines: 4 hours
The Flags register and all of its connections: 1 hour

When connecting the control word, I connected them in a different order from what Ben did. Using a multimeter, I was able to figure out what was connected where and I modified the code to match what I had connected.

Finally, this module spanned the 2 final updates of Day 9 and Day 10 on Twitter.

Conclusion

The trilogy comes to an end! In this series, we talked about computer architecture, a little bit of electronics, and saw my journey of building The 4043: An 8Bit computer on a breadboard.

With the project now finished, here is a picture of the finished build.

I also filmed the following video to showcase the whole build and trace the execution while explaining it.

Some Fun Facts

The computer has 16 bytes of RAM
The computer draws 800mA from my power supply on idle
Overall, I used approximately 42 meters of wiring in this build
The computer in total has 63 chips, including the two shift registers used in the EEPROM programmer
The build took a total build time of approximately 34 and a half hours. This excludes the time initially spent to research and plan, find chip replacements, and the things I learned the first time I attempted this project

Now What?

Well, for myself, I’ll be taking a bit of a break because this series and the project were a lot of effort. As of writing this exact line, I start connecting the first chip and wire exactly 17 days ago.

As for you, my dear reader, don’t forget to also take a break and drink lots of water :)

You can also possibly check the following:

Check out the top posts of all time on Ben Eater’s subreddit, where you can find a guy who made his 8Bit computer run the game Snake, or this amazing beautiful build, and a lot of other fun stuff
Check out the following courses on OpenSecurityTraining, if you want to learn more about modern hardware, assembly, and computer architecture:
- Architecture 1001: x86-64 Assembly - A great primer on x86 assembly and some computer science fundamentals such as boolean logic and other topics needed to get started
  - There is Introductory Intel x86: Architecture, Assembly, Applications, & Alliteration - The older version of this course if that’s more your thing
- Architecture 2001: x86-64 OS Internals - An awesome course on x86 architecture. How memory actually works, how the OS uses it, etc… Absolutely outstanding
  - There is also Intermediate Intel x86 - The older version of this course if that’s more your thing
Computer Science Crash Course, which is actually pretty fun and touches on a lot of the topics we discussed here but with many more added topics
Check out the rest of Ben Eater’s videos, he has a series where he plays with a 6502 processor which is really fun
If you want to go deeper, you can check out this article, where Ken Shirriff reverse engineers the original Intel 8086 processor’s hardware to show you where its components are
- There are also other blog posts on reverse engineering the microcode for these processors on his blog, which I think is really cool

And that’s it! Thank you so much for reading, I really, really hope you learned something new from this series and from following the updates on Twitter. I hope you enjoyed it.

I’ve worked really hard to prepare and document a lot of this project in advance to be able to write these blog posts. I really hope you enjoyed them and that they taught you something new somehow!

Please share this series with your friends who like adding pointless instructions to their 8Bit CPUs.

The 4043 - Part 2: Days 5 to 7, But, how do they execute code?

2023-04-07T00:00:00+02:00

In the last post, we talked a bit about computer architecture and how computers generally work. We also got to know a little about how the 8Bit breadboard computer maps to all of that, discussing the clock, registers, and ALU (checked in the image below). I recommend going to read the first part first before continuing to read this post, if you haven’t already.

In this post, we’ll talk about the RAM (highlighted in cyan), the bus (highlighted in blue), and one of the parts of the control unit which is the program counter (top left).

Refresher

A quick refresher, we talked last time about the “bare minimum” computer which needs:

CPU:
- Registers (✓)
- ALU (✓)
- Control Unit (—)
RAM (—)
Bus to connect them together (—)
Clock to sync all of these components together (✓)

Key:

✓ - Discussed
— - Still To be Discussed

By now, we have an idea of how the clock syncs these modules together, how registers are used to store data, and how the ALU operates. Now, we have to ask ourselves: “What really is RAM?” and “What is RAM used for?”.

Random Access Memory

You may be reminded of your favourite Daft Punk album, but bear with me for a moment. Computers generally have two types of memory, volatile and non-volatile. Volatile memory such as RAM loses all of its content upon being powered off. Non-volatile memory such as ROM, NVRAM, and other types we won’t get into stores data in a more permanent way and thus keeps its content after being powered off.

Why use RAM?

Well, you may now be wondering, what is RAM used for? RAM is used to store mainly two things: code and data.

Code

By code here I don’t mean your C or Python lines of code. I mean computer instructions which are stored in binary. As an example, say you write in your C code: x++, which basically increments the x variable by 1.

For simplicity’s sake, let’s assume that the CPU will somehow magically execute this code line using a single instruction. Assuming that you’re using an Intel or AMD modern processor, which both use x86 architecture, we luckily have an instruction for that!

The instruction is inc, which stands for increment and has an opcode of 0x41^[1] or 01000001 in binary. This value would then be stored in RAM at some place (address) from which the CPU can read this instruction, decode it, and then execute it. Later, we will look exactly at how these instructions are executed and decoded. For now, let’s just know the fact that they reside in RAM waiting to be fetched by the CPU.

High Level Code	Assembly	Opcode	Content of RAM
`x++`	`inc eax`	`0x41`	`0100 0001`

^[1] I know that this opcode (0x41) is only for incrementing the eax register, not any inc instruction, please don’t @me. I am trying to simplify things here.

Data

Say you wanted to create a global variable for your program. Where does it get stored? It gets stored in RAM. Specifically, in a data segment. Modern computers divide RAM into segments, but let’s not get into that. The bottom line is that data such as variables and constant values are also stored in RAM, not just program code.

For example, let’s assume you created a global variable: int count. A fixed address (fixed for the duration of the program’s execution) in RAM will be decided by your OS to store the contents of that variable. Depending on the instructions being executed, the CPU may have to access this address in RAM to retrieve this value or to change it.

This may be making you wonder even more, why use volatile memory for all of this? Well, we can possibly achieve the same thing with non-volatile memory. That could be possible, but the main reason we use RAM for this is that it is much, much faster and generally things such as program code and data don’t need any form of long-term persistence. In fact, when a program is executing on your computer, the content of RAM changes so fast that you basically don’t need anything more than very short persistence for its content. As long as the program is still being executed, of course.

Cool, so, so far we got to know that RAM holds our code and data for the CPU to use. But now, we need to answer three questions to get a better picture:

How does the CPU operate on this code or data?
How does the CPU know which address in memory to fetch code from?
How is the code or data moved from RAM to the CPU?

And look no further, we are going to answer them right now!

The Role of Registers

When a CPU wants to operate on data or decode and execute an instruction, it uses registers. For example, if you want to add a value to a variable stored in a memory location, the CPU would first read the value to a register, add to it, and then store it back in memory. Same with code. To decode an instruction, it is loaded from memory into the instruction register. We will get into this in more detail in the next blog post.

Some modern CPUs have ways to directly operate on data in memory without having to move them to registers first. I am really simplifying things here and I don’t want to get into the edge case of each different computer architecture.

Speaking of instruction decoding, how does a CPU know which address in memory to fetch code from?

The Reset Vector

Each CPU is designed to have a reset vector. A reset vector is the default memory location the CPU will start fetching instructions from and executing them. For example, the original Intel 8086 CPU has a reset vector of 0xFFFF0. Our simple breadboard computer has a reset vector of 0 which is the first location in memory.

The CPU then needs a way to go fetch the next instruction after the first one has finished executing.

The Program Counter

A CPU would typically have some sort of counter or a register which stores the address for the command which is to be executed. For x86 processors for example we have the Instruction Pointer (ip/eip/rip) register for 16Bit, 32Bit, and 64Bit respectively.

For our computer, we have a program counter (PC) which can count from 0 to 15, thus being able to address the 16 bytes of memory. Whenever an instruction is decoded, the program counter is incremented by one.

Bonus: Currently we’re executing all instructions sequentially. I want you to try and think of a way to implement a jmp instruction (similar to a goto statement in higher level languages). A jmp instruction basically tells the CPU to execute at a specific address. Have you thought of a way yet? We’ll answer that in the next blog post!

Now that we know how the CPU operates on the code and data. How is it moved into these registers?

The Bus

The bus connects all of these together. If we want to move anything between any of these components we’d have to depend on the bus. A bus in general is just a way of communication between the components. In the case of our breadboard computer, it is a set of 8 connections (wires): one for each bit. Each component has a buffer or a load function which is used to connect and disconnect them from the bus in order to avoid loading unwanted values since it is a single common bus. If the modules aren’t controlled properly, a value could be loaded into the wrong register or stored in the wrong address in memory.

A Single Bus

The von Neumann architecture has only one bus that is used for data and code. So, typically, when you move data between modules, it has to be done in sequential steps and only the relevant components should be enabled to read from the bus or write to it. This is done by the control unit which we will discuss in the next part of this series.

The figure below is an example of how a bus may work. This is a scenario where the user gave the CPU the instruction ADD 0 which adds the value stored at memory address 0 to the A register. We’ll get to assembly in the next blog post, but for now, this description is enough.

In the figure, we can see the following happen in order:

The value 5 is copied from address 0 in memory to the B register through the bus
- At this point, register B is loading whatever is on the bus and the memory is outputting the content of address 0 to the bus
The ALU adds the values of registers A and B which is done without using the bus since the ALU is hard-wired to the A and B registers as we mentioned in the last post
The ALU is set to output its content to the bus, and the A register is set to read whatever is on the bus. Effectively loading the sum into the A register
The sum is loaded into the A register and the instruction has now finished execution

This shows how a bus may work in a computer. In fact, this is exactly how our bus works in the 8Bit computer. Let’s now take a look at how all of this is built on the breadboards.

The Build

On days 5, 6, and 7, I built the RAM, the program counter, and started connecting the bus.

Module 4: The RAM

The RAM module (middle 3 boards) with the clock (top) and the instruction register (bottom)

The RAM module consists of three parts:

The RAM itself, which uses 74LS189 chips. Each chip has 16x4Bit RAM. We use two of these to have a total of 16 bytes
The memory address register (MAR), which is created the same way as the A and B registers. Except, this time, it is only 4 bits because we only need to address 16 bytes which is addressable by only 4 bits
The memory data register (MDR), which is used to output data from and write data to memory

Both the data register and the address register have dip switches which we can use to place values in memory at any of the 16 locations. This is used to program the computer later on.

In this module, I didn’t change anything to Ben’s designs except the same replacement I did for the registers (explained in the previous post). One thing I did change is my usual convention of connecting input and outputs to/from chips, I ended up using a different order for the RAM data input bits.

I just chose this order because it made wiring easier, for these specific chips. It is okay to change the order of the bits as long as you’re aware which bit is where and try to keep the wiring consistent across the same module.

This module took a total of 6 and a half hours to build, which are divided as follows:

RAM: 1 hour and a half
MAR: 3 hours
MDR: 2 hours

At this point, I had shared the 5th day’s update on Twitter which was after finishing the RAM module. I had also cleaned up the LED mess using the nice LED bars I mentioned in the previous post.

Module 5: The Program Counter

The program counter was connected exactly like Ben’s. No changes were made at all. However, something important that I have to note is that the first time I attempted this project, I left the clear pin for the 74LS161 chip floating which made it behave very strangely and would often reset itself randomly. This made sense because:

Floating input is the root of all evil.

Before realising it was the clear pin, I thought the chip itself was faulty. This made me test it using an Arduino where I made a clock signal using one of the Arduino’s pins and then read whatever the chip outputted and printed it to the serial monitor. The chip turned out to be fine and I had to check my connections again.

This is generally a very useful technique. Some people on forums and subreddits have even suggested that you should write unit tests for all chips and verify that they’re all working before starting the project. I personally only tested chips when necessary and most of the testing was done manually. I only automated this one because I wanted to remove all other variables and test just the chip itself. I had no extra wires, no bad breadboard connections, nothing. Just the chip being directly powered from the Arduino and having all of its pins connected to it.

By now, the 6th daily update was posted on Twitter. This module took approximately an hour to build.

Connecting the Bus

Connecting the bus wasn’t hard at all. However, after connecting it, I realised that a lot of the time the registers would load the values correctly but then the most significant four bits would be always high. I soon realised that this was due to floating inputs to the 74LS245 bus transceiver chips. Connecting these inputs to ground fixed this issue.

The 74LS245 chips are basically the buffers which have been mentioned before. They’re used to determine if a module will output to the bus or not. On the other hand, LOAD pins on registers and other modules are what allow input in.

Connecting the bus took around an hour. After connecting the bus and finishing the output module (discussed in the next and final blog post), I filmed this video to explain the state of the build so far.

Conclusion

We now have an idea of how code execution works in a computer, how data and code are moved between the modules, and how the computer keeps track of which instruction to execute. But, how does it understand the instructions? How do we program it? I’ll answer these questions in the next post! We will get into how these modules are enabled/disabled according to the instructions and an example of how an output device works.

In the days spanning from the 5th to the 7th, I built the RAM module, the program counter, and started playing around with the bus. The build now has an accumulated build time of 11 and a half plus 8 and a half. Which totalls to 20 hours.

The build looked like this at this stage:

The ALU is adding 255 to 255 which overflows to 254
The program counter has counted to 2
The memory address register is holding the address of value 0101 (5) which was added manually using the dip switches
Address 5 in memory holds the value 11010010
The instruction register is holding a value of all 1s

That’s all for now! I hope you enjoyed this post and learned something from it.

Thanks for reading! If you enjoyed this, please share it with your friends that like dereferencing random pointers in memory.

Technoir - Blog of Satharus

Technoir: Reflecting on Five Years

Backstory

Facebook Notes

satharus.wordpress.com

But, why?

In all seriousness, though, I started this blog and continue to write for many reasons:

Knowledge Sharing

Experimentation

Joy of Writing

What Did I Gain?

Knowledge

Friends!

Impact on the Community

Commitment (partially)

People Liked It!

Get Yourself Out There!

Epilogue

Decoding Circuits: Hardware Reverse Engineering

But, wait, isn’t Arduino open-source?

Ethics and Legality of Reverse Engineering

Step 0: Disassembly

Step 1: Identifying Components

Chip Markings

Identifying the Main Components

My First Component

Component List

Can I reverse engineer, Daddy?

Component #1 - 1117 Voltage Regulator

Component #2 - SGM8542 Operational Amplifier

Component #4 - 47μF 25V Electrolytic Capacitor

Component #5 - M7 General Purpose Diode/Rectifier

Component #8 - ATmega328P Microcontroller

Component #9 - Atmel MEGA16U2

Please guide me

Component #3 - 10KΩ SMD Resistor Network

Component #11 - 22Ω SMD Resistor Network

Component #10 - 16MHz Crystal Oscillator

Bring ‘em on!

Component #7 - Self-resetting SMD Fuse

I am Chips incarnate!

Component #6 - G2L SMD Component

Step 2: Trace, Trace, Trace!

Step 3: Understand (or Guess) the Connections

To the Datasheet!

Multi-layer PCBs

Step 4 Onwards

Storage/Flash Dumping and Analysis

Test Pads or Debug Headers

Decapping and RTL Recovery

And then?

Back to Basics: Why do we return 0?

Exit Codes

Exit System Call

Example

What does this mean?

Example Use Case

&&

||

||, &&, and Exit Codes

Bonus

Labelling: A Pandemic of Our Generation

Reverse Engineering 101: Dissecting Software

Reverse Engineering

Electronics Reverse Engineering

Software Reverse Engineering

Applications of Reverse Engineering

Exploit and Malware Development

Exploit/Malware Analysis

Cyber Espionage

Hardware and Low Level Security

Interfacing and Electronic Component Obsolescence

Deep Dive: Software Reverse Engineering

Disassemblers

Debuggers

Decompilers

Intermediate Language

Other Useful Tools

Ethics of Reverse Engineering

Great, Where do I Learn More?