2025-01-05
Julia is a modern programming language that is commonly used for numerical analysis and scientific computing. It combines the speed of languages like C++ or Fortran with the ease of use of Matlab or Python. This is because Julia was designed to solve the “two-language problem”: A lot of software is often developed in a dynamic language like Python and then re-implemented in a statically typed language for better performance. With Julia, you get the best of both worlds:
Julia walks like Python, and runs like C++.
Other Resources:
Warning
This course is fairly fast-paced.
It is assumed that the reader is already familiar with a programming language such as MATLAB, Python or C++.
I will be making comparisons to these languages throughout the course.
Let’s start with a simple hello-world. The print function works exactly like it does in Python:
There is also the println()
command, which is exactly the same except that it ends with a newline character.
Of course, you can use Julia like a calculator:
Note that division implicitly converts the input into float; if you want to do integer division, use div(n, m)
.
Julia provides a very flexible system for naming variables. In the Julia REPL, you can write mathematical symbols and other characters with a tab; for example, the Greek letter π can be typed via \pi<TAB>
.
This makes it possible to translate mathematical formulas into code in a very elegant way.
There are alot of built-in math functions:
You might be wondering what happens when you try to overwrite a built-in function or symbol:
Like Python, Julia is a dynamically typed language. This means that variables do not have a fixed data type like in C++, but can point to different data via dynamic binding.
Consider two variables, x and y. After assigning y to x, both variables point to the same memory location; no data is being copied.
Dynamic Variable Binding
Figure was created with app.diagrams.net and is hereby licensed under Public Domain (CC0)
In Python you can use the id()
operator to see what’s actually going on:
As you can see, after the assignment, both variables have the same memory address. Something like that would not be possible in C++.1
This distinction may seem trivial, but has some important implications when dealing with mutable types, whose contents can be changed:
As no copy is being made, any change to variable a
will also affect variable b
. To actually make a deep copy, use the deepcopy()
command1:
Warning
For performance reasons, avoid binding values of different types to the same variable.
Code that avoids changing the type of a variable is called type stable.
You can see the type of a variable with the typeof()
operator:
Julia uses 64 bits for integers and floats by default. Other types available are:
Int8, Int16, Int32, Int64, Int128, BigInt
UInt8, UInt16, UInt32, UInt64, UInt128
Float16, Float32, Float64, BigFloat
To define a variable of a given size, use x = int16(100)
. For example, to define an integer of arbitrary length, use
As specified in the IEEE754 standard, floating point numbers support inf and NaN values.
Floating point numbers can only be approximated, so a direct comparison using a==b
may give unexpected results:
This is a general problem with floating point numbers, and exists in other programming languages as well.
The machine precision can be obtained with eps()
, which gives the distance between 1.0 and the next larger representable floating-point value:
Using that, we can implement a function isapprox(a, b)
to test whether to numbers are approximately equal:
When multiplying variables with a coefficient, you can omit the multiplication symbol *
.
As a consequence, coefficients have a higher priority than other operations (“multiplications via juxtaposition”):
As in other programming languages, exceeding the maximum representable value of a given type results in wraparound behaviour:
In this sense, calculating with integers is always a form of modulo arithmetic.
Control structures such as branches and loops are easy to implement in Julia; the syntax is very similar to MATLAB:
Just as in C++, Julia supports the ternary if statement:
Multiple logical conditions can be combined with basic comparison operators:
To iterate over a range or an array, use a for-each loop:
This can be used to iterate over a specific range:
Exceptions are a way of dealing with unexpected errors. When such an error occurs, it is best to deal with the problem as early as possible. By throwing an exception, you skip the entire function call until it reaches a point where the exception is caught.
For example, the sqrt
function throws a DomainError when applied to a negative real value:
An exception like this can be thrown using the throw
keyword:
There are many built-in exceptions available.
Exception |
---|
DomainError |
ArgumentError |
BoundsError |
OverflowError |
You may also define your own exceptions in the following way:
An error is an eception of type ErrorException
. It can be used to interrupt the normal control flow.
Simple functions can be defined via:
More advanced functions are defined using the function
keyword:
Note that we use the @assert
macro to ensure that the arguments are positive.
Functions can be applied element-wise to arrays using the dot notation, f.(x)
:
The same can be achieved with the map(f, arr)
function:
Functions in Julia can have positional arguments and keyword arguments, which are separated with a semicolon ;
.
Sometimes it is convenient to write functions which can take an arbitray number of arguments. Such a function is called varargs
functions. You can define a varargs function by following the last positional argument with an ellipsis:
Note
Note that the varargs mechanism works differently in Julia than in C++. In C++, the expression args + ...
is shorthand for recursion, meaning that the expression is evaluated to ((((x1 + x2) + x3) + x4) + ... )
.
In Julia, however, it is much simpler: the varargs argument is just a tuple that you can iterate over.
Important
As a convention in Julia, functions that modify an argument should have a ! at the end.
For example, sort()
and sort!()
both sort an array; however, one returns a copy, and the other functions sorts the array in place.
Exercise
Implement a function which calculates the sine of a real number x.
\[ \sin(x) = \sum_{k=0}^\infty (-1)^k \frac{x^{2k+1}}{(2k+1)!} \]
One can think of a String as an array of characters with some convenience functions. Julia supports Unicode characters via the UTF-8 encoding.
As in Java and Python, strings are immutable. The value of a string object cannot be changed.
To change a character in a string, you have to first convert the string to an array, modify the desired character, and then join the array back into a string:
There is a class-type for single characters, AbstractChar
:
Keep in mind that not all integer values are valid unicode characters. For performance, the Char
conversion does not check that every value is valid.
String literals are delimited by double quotes (not single quotes):
A SubString
is a view into another string. It does not allocate memory, but instead references the original string.
# Range Indexing
str = "Hello, World!"
substring_copy = str[1:5] # Creates a new string copy
println(substring_copy) # Outputs: "Hello"
# SubString Function
str = "Hello, World!"
substring_view = SubString(str, 1, 5) # Creates a view into the original string
println(substring_view) # Outputs: "Hello"
So while both methods can extract a substring, the SubString function is more memory-efficient as it does not create a new string but rather a view into the original string.
As mentioned above, Julia supports Unicode characters. Because of the variable length encodings, you cannot iterate over a string as you can in a normal array. Not every integer is a valid index.
This also means that the number of characters in a string is not always the same as the last index.
julia> str
"∀ x ∃ y"
julia> length(str)
7 # number of characters
julia> lastindex(str)
11 # last index
To iterate through a string, you can use the string as an iterable object:
Multiple strings can be concatenated:
The choice of *
to concatenate strings may seem unusual, but mathematically it makes sense, since concatenation is a non-commutative operation.
You can evaluate variables within a string with the $
character:
Basic string operations
julia> "Avocado" < "Coffee"
true
julia> findfirst("and", "Avocados and Chocolate and Coffee.")
10:12
julia> findall("and", "Avocados and Chocolate and Coffee.")
2-element Vector{UnitRange{Int64}}:
10:12
24:26
To repeat a string multiple times, use repeat
:
Two other very handy operations are split
and join
:
These functions are very useful for handling csv-data.
Julia uses Perl-compatible regular expressions (regexes), as provided by the PCRE library.
Regular expressions are a common concept found in other programming languages, so there is no need to go into detail here. For a quick refresher, I refer the reader to the Python regex documentation and the tutorial on regular-expressions.info.
For example, to match comment lines, you can use the following regex:
Exercise
Write a regular expression to parse bibliography data in the following format:
surename, forename, and surename2, forename2. year. Title. Publisher.
Example:
Lauwens, Ben, and Allen B. Downey. 2019. Think Julia. O’Reilly Media.
Solution
julia> m = match(r"^(?P<names>.*)\. (?P<year>\d{4})\. (?<title>.*)\. (?<publisher>.*)\.$", str)
RegexMatch("Lauwens, Ben, and Allen B. Downey. 2019. Think Julia. O’Reilly Media.", names="Lauwens, Ben, and Allen B. Downey", year="2019", title="Think Julia", publisher="O’Reilly Media")
julia> authors = split(m["names"], " and ")
2-element Vector{SubString{String}}:
"Lauwens, Ben,"
"Allen B. Downey"
julia> year = parse(Int, m["year"])
2019
Symbols are a special type of immutable data that represent identifiers or names. They are denoted by a colon (:) followed by the name, such as :example
.
The advantage of symbols over strings is that they offer very efficient comparisons:
julia> @btime "abcd" == "abcd"
5.632 ns (0 allocations: 0 bytes)
true
julia> @btime :abcd == :abcd
0.025 ns (0 allocations: 0 bytes)
true
In this sense, symbols are very similar to enums, except that they do not provide type safety: all symbols are of type “symbol”, whereas enums have their own distinct types.
Symbols are also used for meta-programming, which we will learn more about later.
In many data science applications we have to deal with strings that are only a few characters long. For example, city names are usually very short, and country codes are only two characters long.
For better performance, it is advantageous to store such data using a fixed-width string. This can be done using the InlineStrings.jl package, which provides eight fixed-width string types of up to 255 bytes.
julia> using InlineStrings
julia> country = InlineString("South-Korea")
"South-Korea"
julia> typeof(country)
String15
TODO: Move this to chapter 5.
Is is possible to store additional information inside a string by