Understanding Computer Memory: Bits, Bytes, and Data Storage
Key Terms
- Bit
Short for binary digit, a bit is a fundamental unit of information in Computer Science that represents a state with one of two values, typically 0 and 1.
Any data stored in a computer is, at the most basic level, represented in bits.
- Byte
A group of eight bits. For example, 01101000 is a byte. A single byte can represent up to 256 data values (28). Since a binary number is a number expressed with only two symbols, like 0 and 1, a byte can effectively represent all the numbers between 0 and 255, inclusive, in binary format.
The following bytes represent the numbers 1, 2, 3 and 4 in binary format.
1
2
3
4
1: 00000001
2: 00000010
3: 00000011
4: 00000100
Every bit position from right to left represents an increasing power of 2, starting from 20. This is the standard binary positional notation. Note that endianness is a separate concept that refers to the ordering of bytes (not bits) in multi-byte values.
- Fixed-Width Integer
An integer represented by a fixed amount of bits. For example, a 32-bit integer is an integer represented by 32 bits (4 bytes), and a 64-bit integer is an integer represented by 64 bits (8 bytes).
The following is the 32-bit representation of the number 1, with clearly separated bytes.
1
00000000 00000000 00000000 00000001
The following is the 64-bit representation of the number 10, with clearly separated bytes.
1
00000000 00000000 00000000 00000000 00000000 00000000 00000000 00001010
Regardless of how large an integer is, its fixed-width-integer representation is, by definition, made up of a constant number of bits. It follows that, regardless of how large an integer is, an operation performed on its fixed-width-integer representation consists of a constant number of bit manipulations, since the integer is made up of a fixed number of bits. So an integer equal to 1 will take the same amount of memory as 2147483647.
Memory
Broadly speaking, memory is the foundational layer of computing, where all data is stored.
It’s important to note the following points:
- Data stored in memory is stored in bytes and, by extension, bits.
- Bytes in memory can “point” to other bytes in memory, to store references to other data.
- The amount of memory that a machine has is bounded, making it valuable to limit how much memory an algorithm takes up.
- Accessing a byte or a fixed number of bytes (like 4 bytes or 8 bytes in the case of 32-bit and 64-bit integers) is an elementary operation, which can be loosely treated as a single unit of operational work.
- A memory slot can fit 8 bits, which is 1 byte. For example, a 32-bit integer would take 4 memory slots.
- A memory slot can store an address to another memory slot; that is called a pointer. On a 32-bit system, a pointer occupies 4 bytes; on a 64-bit system, it occupies 8 bytes.
- When you are storing an integer in languages like C, C++, or Java, it is typically a fixed-width integer, meaning it’s either 8, 16, 32, or 64 bits. The point is that we know exactly how many bytes it will take up. Note that some languages, such as Python, use arbitrary-precision integers that can grow as large as memory allows.
- If a value takes more than one memory slot, the required number of contiguous memory slots are allocated back to back to store it.
- Storing a list works similarly. If you want to store a list of five 32-bit integers, for instance, that list is going to occupy 20 memory slots.
Strings and Character Encoding
Not all data is numeric. Strings, for example, are stored as sequences of encoded characters. A common encoding is ASCII, which uses 1 byte per character and covers 128 characters (letters, digits, punctuation). Modern systems typically use UTF-8, a variable-width encoding that is backward-compatible with ASCII. In UTF-8, a character can take anywhere from 1 to 4 bytes — standard Latin letters take 1 byte each, while characters from other scripts (e.g., Chinese, Arabic, emoji) may take 2, 3, or 4 bytes. The string “hello” in ASCII/UTF-8 occupies 5 bytes, one per character.
Stack vs Heap Memory
At runtime, memory is broadly divided into two regions: the stack and the heap. The stack is used for static memory allocation — function call frames, local variables, and return addresses. It is fast because allocation and deallocation follow a strict last-in-first-out order. The heap is used for dynamic memory allocation — objects, arrays, and data structures whose size may not be known at compile time. Heap allocation is more flexible but comes with overhead: the program must explicitly request and release memory (or rely on a garbage collector). Understanding where your data lives helps explain performance characteristics and potential pitfalls like stack overflows or memory leaks.
The Memory Hierarchy: From Registers to Disk
Modern computers organize memory in a hierarchy that trades off speed, size, and cost at each level. At the top sit registers — tiny storage locations inside the CPU itself. A modern processor has a few kilobytes of register space, but access takes less than a nanosecond. One level down is L1 cache, roughly 32-64 KB per core, with access times around 1 ns. Below that is L2 cache (256 KB to 1 MB per core, ~4-10 ns) and L3 cache (several megabytes shared across all cores, ~10-40 ns). Then comes RAM, measured in gigabytes, with access latency around 100 ns. Finally, persistent storage: SSDs respond in roughly 100 microseconds, while traditional HDDs require on the order of 10 milliseconds — millions of times slower than a register read.
| Level | Typical Access Time | Approximate Size |
|---|---|---|
| Register | ~0.5 ns | A few KB |
| L1 Cache | ~1 ns | 32-64 KB per core |
| L2 Cache | ~5 ns | 256 KB - 1 MB per core |
| L3 Cache | ~20 ns | Several MB (shared) |
| RAM | ~100 ns | Gigabytes |
| SSD | ~100,000 ns | Terabytes |
| HDD | ~10,000,000 ns | Terabytes |
Notice the roughly 100x gap between adjacent levels. Going from L1 to RAM is two orders of magnitude slower; going from RAM to disk is another three. This staggering difference is the entire reason caches exist.
Caching works because real-world programs exhibit two key patterns. Temporal locality means that data accessed recently is likely to be accessed again soon — think of a loop counter or a frequently queried database row. Spatial locality means that data stored near recently accessed data is likely to be needed next — think of iterating through an array element by element. These principles hold at every level of the hierarchy, from L1 all the way to disk.
This is also why in-memory caches like Redis and Memcached are so effective in production systems. By keeping hot data in RAM (~100 ns) rather than fetching it from an SSD (~100,000 ns) or HDD (~10,000,000 ns) on every request, you can reduce latency by several orders of magnitude. The memory hierarchy is not just a hardware curiosity — it directly shapes the architecture of high-performance software.
Cache Locality and Why It Matters
When the CPU loads a value from RAM, it does not fetch just that one byte. Instead, it loads an entire cache line — typically 64 bytes — into L1 cache. This design decision has profound consequences for how we should structure data.
Consider spatial locality in practice. When you iterate over an array of 32-bit integers, consecutive elements sit next to each other in memory. Once the CPU fetches one element, the entire 64-byte cache line is loaded, which fits 16 integers (64 bytes / 4 bytes per int = 16 elements total). That means accessing a single int brings the surrounding 15 into cache as well, ready to be read at L1 speed. Contrast this with a linked list: each node is a separate object allocated somewhere on the heap, and following a node.next pointer typically means jumping to a completely different memory address. Each jump is likely to trigger a cache miss, forcing the CPU to wait for a full RAM access.
1
2
3
4
5
6
7
8
9
10
11
12
13
// Array: cache-friendly sequential access
int sum = 0;
for (int i = 0; i < array.length; i++) {
sum += array[i];
}
// LinkedList: each node may cause a cache miss
int sum = 0;
Node current = head;
while (current != null) {
sum += current.value;
current = current.next;
}
Temporal locality complements this: data accessed recently tends to be accessed again. Loop variables, counters, and frequently used objects benefit from staying warm in cache across iterations.
The key takeaway is that arrays and linked lists may both be O(n) for traversal, but arrays can be several times faster in practice due to cache-friendly access patterns. Big-O notation tells you how an algorithm scales, but cache behavior tells you how fast it actually runs.
Memory Alignment and Endianness
Memory alignment means that data types are most efficiently accessed when their starting address is a multiple of their size. A 4-byte int should start at an address divisible by 4; an 8-byte long should start at an address divisible by 8. Misaligned access can be slower — potentially requiring two separate memory reads instead of one — or even cause hardware exceptions on some architectures (such as older ARM processors). In practice, compilers handle alignment automatically by inserting padding bytes between fields in structs and objects, so you rarely need to think about it unless you are optimizing memory layout by hand.
Endianness determines the byte order of multi-byte values in memory. In big-endian format, the most significant byte is stored first. The 4-byte value 0x12345678 would be laid out as [12, 34, 56, 78]. In little-endian format — used by x86, x86-64, and most modern processors — the least significant byte comes first: [78, 56, 34, 12]. Network byte order is big-endian by convention, which is why network protocols must convert between host and network byte order when sending or receiving multi-byte values.
When does endianness matter in practice? Primarily when you are serializing data across systems, reading binary file formats, or inspecting raw memory dumps during debugging. In everyday application code, the language runtime handles byte ordering transparently, so you can safely treat integers as abstract values rather than worrying about their byte layout.
Virtual Memory
Every process on a modern operating system believes it has access to a large, contiguous block of memory all to itself. This illusion is created by virtual memory. The OS assigns each process its own virtual address space, and the CPU’s memory management unit (MMU) translates virtual addresses to physical addresses at runtime. Two processes can use the same virtual address internally without conflict, because they map to different physical locations.
The mechanism behind this translation is called paging. The OS divides virtual memory into fixed-size chunks called pages (typically 4 KB on x86 systems) and physical memory into corresponding frames of the same size. A page table maintained by the OS maps each virtual page to a physical frame. When a program accesses an address, the MMU looks up the page table to find the actual location in RAM. To speed up this lookup, the CPU maintains a translation lookaside buffer (TLB), a small cache of recently used page table entries.
When a process tries to access a page that is not currently in physical RAM, a page fault occurs. The OS then loads the required page from disk (swap space) into a free frame, updates the page table, and resumes execution. If physical memory is full, the OS must evict an existing page first, typically using a least-recently-used (LRU) or similar replacement policy. This is why a system with insufficient RAM starts to feel slow: it spends more and more time swapping pages between RAM and disk, a condition known as thrashing.
Virtual memory also provides memory protection. Each page table entry includes permission bits that control whether a page is readable, writable, or executable. If a process tries to write to a read-only page or access memory outside its allocated range, the MMU triggers a fault and the OS terminates the process with a segmentation fault. This isolation is fundamental to system stability, because a bug in one process cannot corrupt the memory of another.
How JVM Memory Maps to Physical Memory
When you run a Java application, the JVM requests memory from the OS for its various regions: the heap (where objects live), the thread stacks (where local variables and call frames reside), and metaspace (where class metadata is stored). From the OS perspective, all of these are just ranges of virtual addresses within the JVM process. The OS maps them to physical RAM frames through the same paging mechanism described above.
This means the JVM’s heap is not necessarily contiguous in physical memory, even though it appears contiguous to the JVM. The OS may spread heap pages across scattered physical frames and even swap some to disk under memory pressure. The JVM’s garbage collector operates entirely in virtual address space, unaware of the physical layout beneath. This is also why setting -Xmx (max heap size) to more than the available physical RAM technically works but leads to severe performance degradation, as the OS pages heap memory out to disk.
Related posts: Java Memory Management, Arrays: Static and Dynamic Implementations