Computer Systems · Data Representation

CS3 — Floating-Point Representation

📅 Thu 11 Jun 2026 · P3+P4 (double)

⏱ ~120 minutes

Learning intentions

I can explain why integers are insufficient to represent very large numbers and fractions
I can describe the structure of a floating-point number as mantissa × 2^exponent
I can explain what normalised form means and why it is used
I can convert a denary number to normalised floating-point binary
I can convert a normalised floating-point binary number back to denary

Success criteria

I can state that a normalised positive mantissa starts with 0.1 and a negative with 1.0
I can convert a positive denary number to normalised floating-point using a three-step process
I can convert a negative denary number to floating-point by finding the two's complement of the positive mantissa
I can extract a mantissa and exponent from a bit pattern and compute the denary value

Warm up — recap from CS2

Answer from memory · check when done

Convert −37 to 8-bit two's complement.

Convert 11110110 to denary. (Treat as 8-bit two's complement.)

What is the minimum value that can be stored in 8-bit two's complement?

Convert −1 to 8-bit two's complement.

What is the maximum value that can be stored in 6-bit two's complement?

Key vocabulary

Floating-point

A system for representing numbers with a fractional component, using a mantissa and an exponent

Mantissa

The significant digits of the number, stored as a binary fraction in two's complement

Exponent

A power of 2 stored in two's complement that scales the mantissa to produce the final value

Binary point

The equivalent of a decimal point in binary — bits to the right represent fractions (½, ¼, ⅛…)

Normalised form

The standard mantissa format: positive starts 0.1…, negative starts 1.0…, ensuring maximum precision

Precision

The number of significant binary digits available — determined by the mantissa bit width

Why integers are not enough

Everything we have stored so far — positive integers in CS1, negative integers in CS2 — has been a whole number. But real-world data is full of values that cannot be expressed as integers:

Temperatures: 36.8°C, −3.5°C, absolute zero at −273.15°C
Measurements: 1.602 × 10⁻¹⁹ coulombs (charge of an electron)
Coordinates: GPS latitude 55.9533° N
Financial data: £14.99, exchange rate 1.2847

None of these fit into any integer type — they require a fractional component. We could scale by 100 and pretend everything is in pence, but this breaks for very large or very small scientific values. We need a fundamentally different representation.

Floating-point solves this using the same idea as scientific notation: separate the significant digits from the scale. Just as 6.5 × 10³ = 6500 in denary, floating-point stores a mantissa and an exponent independently in binary.

The floating-point structure

Every floating-point number is stored as:

value = mantissa × 2^exponent

Mantissa — a binary fraction representing the significant digits, stored in two's complement with the binary point after the sign bit
Exponent — a two's complement integer telling you how far to shift the binary point

In our lessons we use a simplified 8-bit format: 5 bits for the mantissa + 3 bits for the exponent. Real systems use far more (IEEE 754 double precision uses 64 bits total), but the method is identical.

Mantissa (5 bits)					Exponent (3 bits)
M₄ sign	M₃	M₂	M₁	M₀	E₂ sign	E₁	E₀
sign bit	2⁻¹	2⁻²	2⁻³	2⁻⁴	sign bit	2¹	2⁰

Binary fractions — place values

Just as denary fractions use tenths, hundredths, thousandths…, binary fractions use halves, quarters, eighths…. The binary point divides whole-number place values (left) from fractional place values (right):

Sign	.	Bit 1	Bit 2	Bit 3	Bit 4
sign	.	2⁻¹	2⁻²	2⁻³	2⁻⁴
±	.	0.5	0.25	0.125	0.0625

To read a mantissa: sign bit 0 = positive, sign bit 1 = negative (two's complement). Then add up all fractional columns with a 1, treating the sign bit as −1 if set.

Example: mantissa 01101 → sign=0, then 0.5 + 0.25 + 0 + 0.0625 = 0.8125

Example: mantissa 10011 → sign=1, then −1 + 0 + 0 + 0.125 + 0.0625 = −0.8125

Normalised form

Many values can be expressed in floating-point in more than one way. For example, 0.375 could be represented as 0.1100 × 2⁻¹, or as 0.0110 × 2⁰, or 0.0011 × 2¹. This ambiguity causes problems: comparisons fail, precision is wasted, and hardware gets complicated.

Normalised form defines exactly one valid representation for every value:

Positive normalised: mantissa starts 0.1… — the bit immediately after the sign bit is always 1
Negative normalised: mantissa starts 1.0… — the bit immediately after the sign bit is always 0

This ensures the mantissa is as large as possible in magnitude (≥ 0.5 for positive, ≤ −0.5 for negative), using every available bit for precision. If a mantissa is not in normalised form, shift it left and subtract 1 from the exponent until it is.

Converting denary to floating-point (positive numbers)

Convert the denary number to binary (whole part + fractional part)
Normalise: shift the binary point until the mantissa is in the form 0.1…, counting how many places you shift (that count becomes the exponent)
Store the normalised mantissa in two's complement, pad with zeros to fill all mantissa bits; store the exponent in two's complement

Converting denary to floating-point (negative numbers)

Convert the positive version to normalised floating-point (steps 1–3 above)
Find the two's complement of the positive mantissa: invert all bits, then add 1
Keep the exponent the same — the exponent is always stored for the normalised magnitude

Worked examples

Example 1 — Represent 0.375 in normalised 8-bit floating-point (5+3)

Convert 0.375 to binary fraction:
0.375 × 2 = 0.75 → bit 0
0.75 × 2 = 1.5 → bit 1
0.5 × 2 = 1.0 → bit 1
So 0.375 = 0.011₂

Normalise to 0.1… form — shift left once (multiply mantissa by 2, subtract 1 from exponent):
0.011 → 0.110, exponent = −1
Pad to 4 fractional bits: 0.1100

Store mantissa and exponent:
Mantissa = 01100 (sign bit 0, fractional bits 1100)
Exponent = −1 → 3-bit two's complement = 111

✓

Result: 01100 111
Verify: mantissa 01100 = 0.75, exponent 111 = −1, 0.75 × 2⁻¹ = 0.375 ✓
Normalised? Sign=0, next bit=1 → ✓

Example 2 — Represent −6.5 in normalised 8-bit floating-point (5+3)

Convert +6.5 to binary:
6 = 110₂, 0.5 = .1₂, so 6.5 = 110.1₂

Normalise the magnitude to 0.1… form — shift right 3 places:
110.1 → 0.1101, exponent = 3
Positive mantissa bits: 01101

The number is negative — find two's complement of the mantissa:
Invert 01101 → 10010, add 1 → 10011
Exponent = 3 → 3-bit two's complement = 011

✓

Result: 10011 011
Verify: mantissa 10011 = −1 + 0.125 + 0.0625 = −0.8125, exponent 011 = 3, −0.8125 × 8 = −6.5 ✓
Normalised? Sign=1, next bit=0 → ✓

Example 3 — Convert 01101 010 to denary

Extract components:
Mantissa bits = 01101 · Exponent bits = 010

Calculate mantissa value:
Sign bit = 0 → positive
Fractional bits .1101: 0.5 + 0.25 + 0 + 0.0625 = 0.8125

Calculate exponent value:
010 in two's complement = 2 (MSB = 0, so positive)

✓

Value = 0.8125 × 2² = 0.8125 × 4 = 3.25
Normalised? Sign=0, next bit=1 → ✓

Now you try

Convert the floating-point bit pattern 10110 001 to denary. Show all steps, then verify normalisation.

⚠️ Common mistakes — examiner feedback

Forgetting to normalise. Always check that the bit after the sign bit is 1 (positive) or 0 (negative). A non-normalised mantissa like 0.0110 costs you the mark even if the maths is otherwise correct.
Wrong direction of shift and wrong sign of exponent. Shifting the mantissa left by n places means the exponent is −n (not +n). Think of it as: you made the mantissa bigger, so the exponent must compensate by being smaller.
Not converting the mantissa to two's complement for negative numbers. Finding the binary for +6.5 and writing it with a 1 sign bit is wrong. You must invert all 5 bits and add 1 to get the proper two's complement negative mantissa.
Confusing binary fractions with denary fractions. 0.1₂ = 0.5₁₀ — not 0.1₁₀. Always convert binary fractions using the place value table.
Losing bits when padding. If the mantissa only has 3 significant fractional bits and your format has 4, pad with a 0 on the right. Dropping bits or misaligning changes the value.

📝 Exam tip

Floating-point is the most commonly failed topic in Higher Computing. Pupils who lose marks almost always do so for one reason: they rush. This is a completely mechanical process — there is no trick and no insight required beyond the method. Every mark is available if you write every step.

Expect these question forms:

"Represent X in normalised floating-point" (1–2 marks: method + answer)
"Convert bit pattern XXXXX XXX to denary" (1–2 marks)
"Explain what normalised form means" (1–2 marks: definition + reason)
"Explain why 0.1 cannot be represented exactly" (1–2 marks)

Task Set A

Task Set A — Higher core

All questions use the 8-bit format: 5-bit mantissa + 3-bit exponent, both in two's complement.

Represent 0.625 in normalised floating-point. Give the full 8-bit pattern (mantissa then exponent, space between).

Represent 3.5 in normalised floating-point. Give the full 8-bit pattern.

Represent −3.5 in normalised floating-point. Give the full 8-bit pattern.

Convert 01100 011 to denary.

Convert 10110 010 to denary.

Which of these 5-bit mantissa patterns represents a normalised positive number?

00101

Incorrect — sign=0, next bit=0. Not normalised. A normalised positive mantissa must start 0.1… (next bit must be 1).

01011

Correct — sign=0, next bit=1 → starts 0.1… This is the normalised positive form. Value = 0.5+0.125+0.0625 = 0.6875.

10011

Incorrect — sign=1, next bit=0 → starts 1.0… This is a normalised negative number, not positive.

11001

Incorrect — sign=1, next bit=1 → starts 1.1… This is not normalised. A normalised negative must start 1.0…

Explain what normalised floating-point means and why it is used.

Convert 01110 001 to denary.

B9 — past paper style (2 marks)

Represent −0.625 in normalised 8-bit floating-point (5+3). Show all working.

B10 — past paper style (2 marks)

Explain why the same bit pattern can represent completely different denary values depending on whether it is treated as an unsigned integer or as a floating-point number.

✅ Higher checkpoint — B7 (normalised form explanation) and B9 (full negative conversion) are the highest-value question types in this topic. Confident on both = exam-ready.

Task Set B

Task Set B — Extension · Beyond the specification

In most programming languages, 0.1 + 0.1 + 0.1 == 0.3 evaluates to False. Using your knowledge of binary fractions, explain why.

Store the value 1.5 in 8-bit floating-point (5+3). Then explain what changes in the bit pattern when you multiply by 2, and give the new bit pattern. Why is multiplication by powers of 2 so efficient in floating-point?

IEEE 754 double precision uses 64 bits: 1 sign bit, 11 exponent bits, 52 mantissa bits. Compare the range and precision this offers against our simplified 8-bit format (3-bit exponent, 4 fractional mantissa bits). What practical consequence does the larger exponent have?

📁 File this in OneNote under:
Higher Computing Science → Computer Systems → CS3

📌 Teacher notes — not for pupils (Shift+T to toggle)

Timing (120 min double):
5 min — warm up (CS2 recap), circulate
5 min — key vocabulary together
10 min — why integers aren't enough (discuss: how would you store 36.8°C?)
5 min — binary fraction place values (do a few together: what is 0.101₂?)
10 min — normalised form: show what it means visually with the format diagram
15 min — Examples 1 and 2 worked on board together, Example 3 and Now You Try independently
5 min — common mistakes ("has anyone made this one just now?")
25 min — tasks
5 min — cold call review on B4/B5 (conversion back, fastest to check)

Watch for: pupils who normalise in the wrong direction (shifting right instead of left, flipping the exponent sign); pupils who forget the two's complement step for negative numbers (just writing a 1 in the sign bit); and pupils who misread binary fractions (0.1₂ ≠ 0.1₁₀).

Whiteboard tip: draw the format diagram on the board before the lesson. Keep it visible throughout. Pupils lose track of which bits are which under exam conditions.

C1 is worth a brief mention even for pupils who don't attempt it — the 0.1 + 0.1 + 0.1 ≠ 0.3 result visibly surprises most pupils and motivates why floating-point precision matters.