5. Data types

This section covers the basic stuff of what any program works with: data.

Data comes in two basic forms: literals and variables (cf. 1. Basic concepts). It is also distinguished by its type. Just as there are different file types for music and pictures and documents, there are different data types for text data, integer numbers and decimal numbers. In general, you will use the 'string' data type for text, the 'int' data type for whole numbers, the 'float' data type for decimal numbers and the 'Boolean' data type for true or false data.

Python in general supports all C-style data types, such as char, double, long, etc., but you will not need to use these explicitly in a normal Python program.

5.1. Strings

Strings are like sentences (and in fact, often are). They're sequences of typographical characters, like numbers and letters, and are delimited by several types of quotation marks:

a) Single quotes:
  • 'This is a string'
b) Double quotes:
  • "This is a string"
c) Triple quotes:
  • """This is a string
    that extends over several lines"""
Generally speaking, there is no difference what sort of delimiter you choose to enclose your string in. The differences arise due to the symbols that mean special things to Python. For example, you can't use a simple single quote mark in a string delimited by single quotes, since Python will interpret it as the end of the string. So if you have text that has a lot of single quotes, it's better to use double quotes to enclose your text, and vice-versa. To display a special character in a string, you need to tell Python that it's not what it seems to be, via the 'escape symbol', the backslash:
  • print("Hello, it's me!") # prints: Hello, it's me!
    print('Hello, it\'s me!') # same thing
    print("Your dad said: \"I'm coming home late.\"") # prints: Your dad said: "I'm coming home late."
You can also indicate certain whitespace characters by escape sequences:
  • print("This is one line.\nThis will appear on the next line.") # the \n tells Python to output a newline
You also need to escape the backslash, to output a backslash:
  • print("The Windows system folder is C:\\Windows\\System") #prints: The Windows system folder is C:\Windows\System
A list of escape sequences follows:
  • \n Newline
    \t Tab
    \' Single quote
    \" Double quote
    \\ Backslash
You can use triple-quoted strings for text that ignores all escape sequences (previously mentioned with respect to docstrings):
  • """This text won't care that there are double quotes "" single quotes ' ' or backslashes\ strewn
        it even preserves tabs and newlines! it only ends when it sees another triple quote mark """

5.1.1. String operations

String manipulation is one of the number one things that simple programs are written to do. There are a few key functions that you'll use frequently: concatenating strings; getting the length of a string; extracting subsets of a string; modifying subsets of a string; and checking if a string includes a given substring.

Concatenating strings is easy in Python: you just add them with +:
  • firstword="Hello"
    phrase = firstword+", "+secondword
    print(phrase) # prints "Hello, world!"
As mentioned previously, you can also repeat strings with *:
  • phrasetwice = phrase*2 #"Hello, world!Hello, world!"
To extract a substring from a string, you should understand how array indexing works. Consider the string "phrase" in the example above ("Hello, world!"). This string is a list of characters, each of which has a unique number that indicates its position in the string, starting with 0. So phrase[0] = "H", phrase [1] = "e" and so on. You can also refer to characters by a negative number starting from the end of the string: phrase[-1] = "!", phrase[-2] = "d" and so on. To refer to a substring of more than one letter, use a colon:
  • phrase[1:4] = "ello"
    phrase[-6:-2] = "world"
    phrase[1:-1] = "ello, world!"
Including the colon but omitting one of the numbers extends the reference to the beginning or end of the string:
  • phrase[1:] = "ello, world!"
    phrase[:-2] = "Hello, world"
This functionality can be useful for cutting out the last letter of a string, for example.

To find out the length of a string, use the following method:
  • len(phrase) # returns the integer number 13
There are two useful ways of finding if a string includes a particular substring (without getting into the wide world of regular expressions).

Using the "find" method searches for the given substring and returns the position index of the first matching spot and -1 otherwise:
  • phrase.find("world") # returns 7
Using the "count" method returns the number of times the given substring occurs in the test string:
  • phrase.count("l") # returns 3
To replace one substring with another, the replace method works well:
  • greeting = phrase.replace("world", "Mom") # greeting = "Hello, Mom!"
You can split a string into an array of its component words:
  • array = phrase.split() # array[0] = "Hello,"; array[1] = "World!"
(You could specify a different separator character by giving an argument inside the parentheses of the split method. This is occasionally useful when turning string data delimited by tabs or commas into arrays.)

Lastly you can convert strings into upper or lower case with the 'upper' and 'lower' methods:
  • print(phrase.upper) # outputs "HELLO, WORLD!"

5.2. Lists & Arrays

Arrays are rather like matrices in linear algebra. In regular Python, what is generally dealt with is lists: They are lists of things, indexed in all the same ways strings are:
  • phrase = ["an", "apple", "a", "day"]
    print(phrase0])# "an"
    print(phrase[2])# "a"
    print(phrase[1:3]) # "['apple', 'a', 'day']"
You can have lists of lists (multidimensional lists):
  • threesquarematrix = [[1,2,3],[4,5,6],[7,8,9]]
Lists can be treated in much the same way strings can,

There is also something called a tuple, which is mostly the same as a list, but created with round parentheses rather than square brackets, and whose contents cannot be changed after creation (i.e., trying to assign tuple[3] = "orange" will result in an error).

5.2.1. Array operators

There is actually a true array data type, which can be created from a list or tuple with the 'array' function:
  • array([1,2,3,4])
Note the presence of both the parentheses and square brackets. the square brackets denote a list, while the parentheses turn that list into an array.

To use arrays, you must include one of the following lines at the start of your code:
  • import numpy
  • from numpy import *
For more on arrays, refer to the numpy tutorial.

5.3. Objects; conversion of data types

Python is an object-oriented programming language, which means that basically, everything in Python is an object. Python hides most of this functionality from you unless you need it. In general, when you write a line such as the following:
  • phrase.replace("world", "Mom")
you are invoking the "method" 'replace' on the string object named phrase with the arguments given in the parentheses.

Remember, while every variable in Python is an object, different objects may be of different type even if they contain the same information (e.g. the string "5" is not the same as the number 5) which is why data type conversion is important.

The following functions are useful:
  • b = int(a) # interprets variable 'a' as an integer number and puts it into variable 'b' (truncating--i.e., rounding down--any decimals)
    b = float(a) # interprets variable 'a' as a floating point (decimal) number and puts it into variable 'b'
    b = str(a) # formats the variable 'a' into a nicely-printable string
    b = bin(a) # turns the number 'a' into a binary string (i.e. a string consisting of 1s and 0s)
N.B. Remember, the input function always returns a string! So convert your result to the appropriate type before doing anything with it.

Up to Index
Previous: 4. Control flow | Next: 6. Functions & Includes