Objects and Data Types

The term object has many meanings in programming, just as it does in real-life. The first and most fundamental definition is an object is a typed unit of data in memory.

Memory is your canvas and objects are your paint. Understanding the fundamental types of data you can work with, and their capabilities, is the first step on your journey toward programming!

Let’s zoom in a bit to explore memory more closely, before zooming our way out toward thinking in terms of objects in memory.

Bits and Bit Patterns

Your computer’s memory is comprised of billions of electronic cells. These cells, called transistors are now the most frequently manufactured device in history, with over 13 sextillion produced1! Each transistor has two states, off or on. This is represented numerically as a bit whose value is either 0 or 1. Despite 1 being the maximal value of a bit, notice there are two possible values thanks to 0. Unlike humans, bits are binary.

You can form many unique patterns by stringing bits together. For example, with two bits strung together, you can form four different bit patterns: 00, 01, 10, and 11. With three bits strung together, you can form eight different combinations: 000, 001, 010, 011, 100, 101, 110, 111. How many bit patterns can you form with four bits?

Each time you extend a bit pattern by another bit, you double the number of possible patterns. Thus, given any number of bits, you can calculate the possible unique patterns with the mathematical expression 2bits. With 8 bits, called a byte there are 28, which evaluates to 256, different patterns. Exponential growth is a powerful force! With only 64 bits, there are 264 patterns, which evaluates to 18,446,744,073,709,551,616 possibilities.

Humanity has settled on the decimal system as its defacto numbering system. Languages have their own alphabets for textual data. There is an emergent universal character set in emoji 🤠. As all of memory is the binary system of 0s and 1s, how can computers process these widely different types of data?

Classification and Representation

A symbol will follow this question, what does the symbol mean? V

If you classify it as an English character, it represents the letter “vee”. Classified as a Roman Numeral, though, it represents the number five. If you read it in the phrase, “I’m V excited!” it’s an abbreviation of the word “very”. Choosing to classify, and interpret, the simple symbol V as different types of data allows for the same symbol to represent very different ideas.

Programmers decide to classify each block of memory they use, or each object, as a specific type of data: numerical data, textual data, sequences of data, compound groupings of many types of data, and so on. In doing so, the same underlying bit patterns are interpretted to represent different ideas.

Binary Bit Pattern Number (Integer) Text (Character)
00000000 0 NULL (invisible)
00000001 1 Start of Heading (invisible)
00000010 2 Start of Text (invisible)
00110000 48 0
00110001 49 1
00110010 50 2
01000001 65 A
01000010 66 B
01000011 67 C

You can poke at this concept directly in an interactive Python session. To follow along, open a new terminal session and run the python program by typing it in and pressing enter. Each of the lines beginning with >>> is one where you will type in an expression and press enter for the Python interpreter to evaluate.

>>> int(0b01000001)

>>> chr(0b01000001)

>>> int(0b01000010)

>>> chr(0b01000010)

>>> int(0b01000011)

>>> chr(0b01000011)

You are encouraged to try other bit patterns! When you are done, evaluate the function call expression quit() in the interactive Python program.

Each line had its own function call expression. A function call starts with a function’s name, in these examples there were two functions used, both int and chr. The word int is short for integer and chr short for character, which includes letters, numbers, punctuations, and other textual symbols. Following the function name are a pair of parentheses and inside the parenthesis are any pieces of input data the function needs, called arguments. In these examples, a single argument was given to the function calls, and each of the arguments was a bit pattern. In Python, and many other languages, bit patterns are denoted with a prefix of 0b followed by the pattern of 0s and 1s. Each expression you wrote was evaluated by the Python interpreter and its resulting value was printed to the screen.

Note: When working in an interactive Python interpreter, the evaluation of each expression is displayed, or “printed”, back to you automatically. This is great for messing around! This will not be the case when writing stored programs. There you would need to surround any expression you want displayed in a print function call, for example print(chr(0b01000011)).

What an object’s bit pattern means or represents is decided entirely by the type of data you decide to classify it as. Thus, when programming in a high level language like Python, it is more important to understand the types of data you are working with than exactly how they are represented as bit patterns. This is one of the most important examples of data abstraction in computer science: we will not need to concern ourselves with the details of these lower, bit-level concerns for quite some time2!

Built-in Types

Programming languages offer many built-in data types for you to work with, typically including:

  • numerical
    • integers
    • decimal numbers
    • complex numbers
  • textual
  • logical
  • collections of many objects
    • sequences
    • sets
    • dictionaries

The names programming languages choose for these common data types, thier capabilities, and their limitations, are usually very similar. However, there will be subtle differences specific to each language. When learning a new programming language, start by spending time understanding and exploring its built-in data types and their capabilities.

Primitive Data Types

Let’s explore these common data types in the context of Python, starting with primitive types which are some of the simplest available to you in a programming language.

Literal Expressions and type inspection

How do you express the integer number one hundred ten in Python? Quite literally! 110

What about fifty percent represented as a decimal? Quite literally! 0.50

What about the textual word programming? Quite literally! "programming"

The built-in, primitive types of a programming language often have literal syntax for constructing and representing objects directly in your programs. (Notice: none of these examples involve any binary! Though, previously, you wrote a binary literal without knowing it. Abstraction for the win!) These are examples of literal expressions.

To learn the type classification of any object in Python, you can use the built-in type function. Try it by starting python in a terminal, if you don’t already have one running, and following along:

>>> 110

>>> type(110)
<class 'int'>

>>> type(211)
<class 'int'>

>>> 0.5

>>> type(0.5)
<class 'float'>

>>> "programming"

>>> type("programming")
<class 'str'>

Each literal expression evaluates to literally itself. There is no further evaluation necessary or possible in these expressions. You also certainly noticed the classifications of each object’s type have peculiar names such as int, float, and str. Let’s address each individually.

int - Integers

The built-in type int is short for integer. Integers are useful for counting and programs count more things than you’d expect! For example, the number of likes on your last social media post, the number of words in your paper, the number of walking steps in your activity tracker, the number of points scored in games, and so on.

In Python, an int literal is either a zero or a nonzero digit followed by zero or more digits. So, 0, 1, 987654321, and 2020 are all example int literals, but 0110 is not. For large numbers you cannot use commas to denote places, such as in 1,000, but you can use underscores which are completely ignored. Thus, 1_000_000 is the same int value as 1000000.

We’ll explore the kinds of capabilities an int object has shortly, but to connect them with your understanding of calculations, try asking the Python interpreter to evaluate some integer arithmetic expressions:

>>> 110 * 2

>>> 1 + 2 * 3

>>> (1 + 2) * 3

>>> 1_234 + 1

float - Floating-point “Decimal” Numbers

Why not classify the type of a decimal numbers, such as 0.25, with a name like dec or decimal? Because they’re not exactly decimal numbers, if you need to be very precise about it. The name float is short for floating-point.

Here’s a perplexing, motivating example of very subtle limitations of floating-point arithmetic versus true decimal arithmetic. Can you spot any miniscule errors?

>>> 0.25 + 0.25

>>> 10.0 / 3.0

>>> 0.1 + 0.2

The exact reasons for these computations having a slight error, less than a trillionth, are beyond your concerns right now, but the intuition behind it, and for the name floating-point, is as follows. Let’s imagine an elementary numbering system where you are given three digit placeholders to form a number with:

d.d × 10x

For the two d digits, you can choose any digit between 0 and 9. For the x digit, you can also choose any digit between 0 and 9, and can choose for it to be positive or negative. In reality, a float value has far more digits to work with than this, this is dramatically simplified for illustrative purposes.

What is the largest number you can represent?

9.9 × 109 which is 9,900,000,000!

What is the smallest number you can represent?

0.1 × 10 − 9 which is 0.0000000001

That’s a massive range of numbers considering you only needed 3 digits and a sign to represent each!

What’s the catch? Try describing a limitation of this system on your own.

What is the second largest number you can represent?

The second largest number would be 9.8 × 109 or 9,800,000,000, that means there are 99,999,999 whole numbers between the largest and second largest that are impossible to represent using this system! More precisely, there are only 1,539 unique numbers this system can represent in the range between 0.0000000001 and 9,900,000,000! Your mathematics has taught you there are an infinite number of decimal numbers in this range!

The good news is modern floating-point precision numbers use 64-bits of precision and can thus represent 264 different possible numbers. That’s astronomically more precision (over 18 quintillion possible numbers!) than our toy system above, but it is not infinite nor arbitrary precision. The motivating examples demonstrated there are often very small round-off errors in floating-point arithmetic that can be ignored. However, if you find yourself writing software that has no room for small errors in precision (rockets, bank software, medical devices, and so on), you will want to invest more time in understanding the trade-offs of floating-point numbers versus slower, more specialized options available for handling precise, numerical data.

We will use float for numbers with decimal points without hesitation or concern moving forward. For our purposes in COMP110, and most purposes you will encounter in computing (apps, games, most data science applications) the exceedingly small amounts of “round-off error” pose no problems and will be ignored moving forward.

str - Strings of Characters for “Textual” Data

The “textual” data type in Python is str, short for string of characters, which is common terminology across programming languages. Strings are used all the time in modern programs! Every application you use makes use of string values. Anywhere you see a label, are able to type text in, and so on is made possible by strings.

Why not simply call it text? Text often conveys a bias toward thinking in words and visible symbols like letters, whereas “characters” can be invisible (such as spaces, tabs, and new line breaks), non-“textual” (digits, code, special symbols, and so on), and from many different character sets (Arabic, Chinese, English, Emoji, German, and so on).

There are a few different syntaxes for working with str literals, but we will default to choosing double quoted str literals, such as "hello, world". The characters following the first double-quote character " begin the contents of the string value and the matched pairing double quote ends the str value. You can also use single-quotes, such as 'hello, world', and the result is the same. Most programming languages use double-quoted strings, while some allow either or, so we’ll settle on the syntax that’s more generalizable.

If a str begins and ends with a double quote, how can you have a double quote within a string? To solve this thorny issue, programming languages have settled on a technique called escaping which means placing a backslash character \ before some character with special meaning. This is best demonstrated through tinkering in an interactive Python session:

>>> "The students rejoiced, \"we love programming!\""
'The students rejoiced, "we love programming!"`

Notice the backslashes disappeared! Escape codes in str data only appear in the code you write and when that code is evaluated to become a true str object, the \" is interpreted as simply a ". Without the leading backslashes in this example, there would have been a syntactical error.

It is worth pausing to emphasize the characters of a string have no meaning in our program beyond simply being characters. For example, the string "110" may have only digit characters in it, but it is not number and you cannot do numerical computations with it. Try to and you will see some surprising results:

>>> "110" + "110"

The add operator, when used with two str values, produces a third str value that is the first str’s characters immediately followed by the second’s. The technical term for this operation is concatenation which is a term that originates in the chemical concept of carbon atoms forming chained bonds with other carbon atoms. Here, characters are being “bonded” into a longer sequence.

A str is a Sequence of Characters

Through your programming journey you will encounter many types of data that are considered sequences of simpler values. In fact, you already have! Notice a str is a sequence of characters. The ordering of these characters is important! There’s a meaningful difference between “please” and “asleep”, even though they share the exact same characters.

We will explore sequences in much more depth soon, but while we are on the subject of str values, let’s foreshadow a bit with some examples of how to access individual items in a sequence. In a str value, each item is a character.

>>> "str!"[0]

>>> "str!"[1]

>>> "str!"[2]

>>> len("str!")

>>> "str!"[len("str!") - 1]

>>> 110[0]
TypeError: 'int' object is not subscriptable

Subscription syntax is formed by pairs of square brackets following a compound value. The individual items in a sequence can be accessed by their index inside of the square brackets. You can find the number of items in any sequence using the built-in len function. Notice the str value "str!" has a length of 4 because there are three letter characters followed by an exclamation point character. The quotes are not a part of the str’s contents once it is evaluated, they are simply how we denote str literals in our programs.

Index numbering starts from 0. This is true in Python, as well as in most general-purpose programming languages. It takes some comfort to get used to! The first item in a sequence is always at index 0. In a sequence with 10 items, the last item’s index is 9. Since the len function will return the number of items in a sequence, you can always find the last item’s index by computing len(sequence) - 1.

Integers and floating-point values are not sequences, they are singular numerical values. Thus, you cannot subscript them like a str.

Docstrings are for documentation

A docstring is a special kind of string in Python used to document the programs you write. Typically you will write a docstring at the top of every file describing its purpose. As we begin defining subprograms, or functions, you will also write docstrings in them describing their purposes in plain English.

Docstrings are written for other humans to read, or for you to read in the future to help refresh the purpose of specific files and functions you authored. Rather than beginning and ending with a single double quote ", a docstring stands out because it begins and ends with three double quotes strung together. For example: """This is a docstring."""

As your programs become more complex, using docstrings will become more important. For now, when working in a stored program, begin each file with a docstring describing its purpose.

An object is a typed unit of data in memory

In the opening of this lesson it was said, “an object is a typed unit of data in memory”. There is a lot of information packed into that statement now that you know a bit more about memory and types! Types are important because they decide how data is interpreted and, as you will explore in the next lesson, inform how you can make use of the object and its capabilities. The operations you can perform on numbers, such as adding them or subtracting them, are different than the operations you can perform on strings. This is broadly true both in programming and the real world: when you have different types of objects you’ll want to do different things with them.

  1. That’s 13,000,000,000,000,000,000,000 transistors produced! https://en.wikipedia.org/wiki/MOSFET

  2. In later computer science courses you will encounter systems programming languages, as you work closer to the actual machine, and will learn in much more depth the various binary systems behind integer, floating point, ASCII/UTF-8 data, and more.