Julia is a fast, flexible and robust language. Having used Julia for 2 weeks, and Python for 7 years, I can already say I prefer Julia. It is not as mature as Python, but I believe it has the potential to far exceed it.
It was Ars Technica’s “The unreasonable effectiveness of the Julia programming language” that finally convinced me to learn the Julia programming language. For years, I’ve heard whispers and glowing praises about Julia. It’s a dynamic programming language, but is said to be much faster than Python, and even as fast as C. The terse, clean syntax is supposed to allow very generic but also robust code creation, reducing friction to collaboration. On top of that, it has math friendly syntax like in Matlab or R but also with real maths symbols. All of this has been driving adoption of Julia in academia, a highly scientific, collaborative environment with very high computational needs. The adoption in industry has been predictably slower.
So does Julia live up to the hype? I decided to find out. To this end, I completed the recently released Julia Academy course “Computational Modeling in Julia with Applications to the COVID-19 Pandemic”. This is a 16 hour course which works with real Covid-19 pandemic data, and teaches you how to implement SIR epidemiology models as well.1 See the above picture. After completing this course I challenged myself to rewrite my Random Forest Python code in Julia and also my corresponding blog post. You can see the Julia code here and the twin blog post here. This resulted in code of similar length, but that was 9 times faster and that felt much more robust.
The table below has a quick comparison of the Python and Julia Random Forests fitting times. This was on the Universal Bank Loan data, with 4000 training samples and 1000 test samples. The random forest had 20 trees, with each tree having 40 to 120 leaves. Tests were run from the Anaconda CMD and Julia REPL.
|Number of runs||Scikit-learn||Python||Julia|
|Fitting time (s)*||10||0.04810 ± 0.00951||6.87132 ± 0.31106||0.73991 ± 0.04208|
|ratio mean times||10||1.00||142.85||15.38|
|test accuracy (%)||10||98.43 ± 0.58||98.56 ± 0.46||98.66 ± 0.37|
Scikit-learn (written in Cython) is still the fastest, but that code is more heavily optimised than mine, and also it use parallel processing.
In general, my experience with Julia was very positive. It is indeed fast and powerful, and the syntax is very nice. It is not an upgrade to Python, but I am going to frame much of this article as such. It operates under some very different paradigms - for example, Julia is very much a functional programming language, whereas Python is object-orientated. The creators themselves tried to integrate the best of several different languages into Julia including Python, R, Matlab, Ruby, C and Lisp (see their 2012 release statement). But my experience is mostly with Matlab, C++ and Python. Of these, Julia is most likely to replace the code I write with Python.
C++ is a complex and powerful static typed language with memory management capabilities, and I don’t see Julia replacing mission critical software written with it. Matlab is proprietary software which has great support for specialised scientific computing. Its Simulink control software and image processing toolbox are the nicest of their kind that I’ve used. Python, however, is a different story.
Python is easy to learn and lovely to tinker with, but as soon as your project expands that joy dissipates. It’s noticeably slower than other languages. It has no type checking and very generous scoping rules. This makes you think less when writing your own code, which is great, but it makes you think more when reviewing someone else’s. Did they intend this variable to be an array? A string? A custom type? Have I accidently reused an identifier? Popular packages go out of their way to not use Python for core processing, including Numpy, Scikit-learn and TensorFlow (they use, C, C++ and Cython). Julia promises to fix these many issues, and it’s a relief.
I want to state upfront that my main frustration with Julia is that it is not as mature as Python. The release of Julia 1.0 was just over two years ago; Python 1.0 was released 25 years ago. The Julia community is playing catch-up with Python and has the second-mover advantage of knowing what works and what doesn’t. But there simply are not as many packages, features, tutorials or videos as Python has. Fewer people ask questions on StackOverflow or Discourse. Like many open-source projects, the documentation is often lacking.2 The language itself is changing fast, and some code on Julia Academy’s own online tutorials is already out of date.3 Then there is all the massive amounts of legacy code in companies and institutions. So if you’re a beginner programmer, you can stop reading now. My advice is focus on Python. It has more resources and will get you further. But if you can relate to my frustrations with Python or have more of your own, read on.
So what makes Julia special? A good summary is given by Serdar Yegulalp at www.infoworld.com, which I’ll briefly repeat here:
gaussian(x, μ, σ) = 1/(σ*√(2π))*exp(-(x-μ)^2/(2σ^2)). A disadvantage is that string indexing sometimes breaks because Unicode characters can take two or more bytes. So Base functions like
eachind()should be used to handle strings properly.
quote. So Julia programs can generate other Julia programs.
The next three sections are Things I like about Julia, Neutral issues about Julia, and Things I dislike about Julia. Lastly there are small Annoyances I would like to vent after using it for 2 weeks. I guess no programming language is perfect.
Like all programming languages, there was a learning curve to Julia. This was made more difficult by the lack of resources. However generally it was easy to learn coming from a Python background and there are many things I definitely prefer about Julia.
This really does balance controlling types with giving the user flexibility.
For my random forest code, it essentially provided a way to make object specific functions, even though Julia is a functional language.
For example, the
RandomForestClassifier have different
fit!() methods associated with each of them.
I had no problem calling the
fit!() method from within the
A mistake I made at first is to use
AbstractFloat inside the struct, whereas the recommendation is to always have concrete types in definitions.
This definitely slowed down my code, which is why I added the type to the struct definition:
I also added an outer constructor to set this to
Float64 as a default.
Multiple dispatch can easily be abused because finding the best fit is a problem that grows exponentially with the number of different arguments.
However most functions have a very low number of methods associated with them.4
Another fault is that sometimes it is not fully unambiguous which method should be called, especially with
Union data types.
But this is being worked on and clearer rules should be published in the future.
A prime advantage of the type system is on display with my
For my Python code, I wrote a separate score function inside the
They are however essentially identical functions.
For my Julia code, I wrote a single function which takes in a type of
Since I defined both my classifiers to be subtypes of
AbstractClassifier, calling score on those objects dispatches to this
Of course you can argue I could have made a super class in Python which implements
score, and then have both
RandomForestClassifier inherit from it.
However this adds complexity without much benefit.
In Julia, there is a robust type system and it makes sense to follow those design patterns.
Another use case is with my
This function prefers to take in a confusion matrix.
But another version will accept two vectors, use these to create a confusion matrix, and then dispatch that matrix to the main
The Julia keywords: abstract type, baremodule, begin, break, catch, const, continue, do, else, elseif, end, export, false, finally, for, function, global, if, import, let, local, macro, module, mutable struct, primitive type, quote, return, struct, true, try, using, while
The syntax of Julia is very nice. I think the keywords are well chosen and make the language very flexible.
I specifically like the module and namespace control keywords:
export allows you to define which variable names and functions in a module you want to expose to another namespace with
using, the functions will not be extensible - the new namespace will not be able to edit them or accidently overwrite them.
This can be overridden with
import, but that requires directly importing each identifier.
On this topic, a minor annoyance is that during debugging it is nicer to
import so you can update the functions without restarting the code.
There is no import all option like with Python’s
import *. So you have to import functions one at a time, which is annoying.
But considering how dangerous importing all is in Python, this is probably for the better.
Update: the popular Revise package addresses exactly this problem.
Julia requires you to end all functions and control loops with the
This is the standard for most languages, and I prefer this to Python where the ‘end’ of a function is inferred by indents.
In fact I would prefer if
end was used in more cases. For example,
end is not required for
export. This enables you
to write code like the following:
This is a contrived example, but shows a real error I had. The correct code should be
export x, y where the
, tells the compiler that there is more to export.
But I forgot this comma (it was a long list of exports) and therefore everything after it was ignored and I got “undefined” errors.
Unlike Python where vectorisation is recommended as much as possible, you don’t have to vectorise in Julia. Julia loops overs arrays just as fast as vectorisation. In fact, because you use potentially less allocations with for loops and in-place operations, these can be faster. See #6 at ww.stochasticlifestyle.com/7-julia-gotchas-handle/.
Indeed I found this to be the case with my random forest classifier. In Python, I was able to get significant speed increases by predicting samples in batches. All samples which have similar values are evaluated in the same recursive call. In Julia, I found that these were roughly equivalent, with the batch prediction method being about 10% slower.
If you do want to use broadcasting, there is a nice syntax using
This can be used to broadcast most functions e.g.
f.(x) will apply
f individually to each value in the array
Julia tends to err on the side of performance, whereas Python errs on the side of flexibility.
An example is with integer overflow. By default Julia does not check for integer overflow because this slows down the code. So
3^40 will result in a negative number.
Python has automatic overflow checking, so
3**40 gives the correct answer.
But if you really need to do overflow checking, such as with Cryptography, then you can use the
big() function e.g.
Before I go on to the negatives, there are few issues I don’t feel strongly about either way.
Finally, the dislikes. I’ve certainly passed my enchantment phase with Julia. There are some things I really do not like about Julia.
scoremethods. But in Julia a struct only contains a constructor method, and all other methods are external to it. For example, here is the definition of the
BinaryTreestruct in my code:
add_node!- are all external to it. There is a
methodswith()function which can retrieve methods which use a type, but this is limited to search in a single module. For example, calling
methodswith(BinaryTree)will only retrieve a
Base.sizemethod, which was an extension. Calling
methodswith(BinaryTree, TreeEnsemble)will retrieve the other eight methods that I defined for it. A simple fix to this all is to have good documentation, especially for Julia Help mode.
fit!methods for my
AbstractClassifierstruct. But there is nothing forcing a subtype of
AbstractClassifierto implement concrete versions of these methods (e.g. like the
virtualkeyword would in C++). Also concrete structs cannot have children. For example, making a struct which is a subtype of
These are minor issues I don’t like. They may be changed in future updates or are not important issues.
globalkeyword if you want to modify a variable that is external to a function or control loop (I understand this is a recent change). However, you don’t have to write
globalif you just want to read the value. Here is an example:
yeven though no
globalkeyword was used for
y. Hence running
f()will return 60. I had a few errors where I accidently used the name of a global variable and hence my values were wrong. I would prefer this to require a call to load the variable
yin the function, e.g.
returnkeyword, a function will return the last calculated value. To return nothing, you have to explicitly write
return nothing. I would prefer if the latter was the default.
So I may have a few complaints about Julia (the programming language) but overall I really like it. I mean it when I say I will use it for all personal programming where I would have used Python before. The extra control, flexibility and robustness is a big reason to switch to Julia. If none of this convinces you, I think achieving a 9 times increase in speed for code of similar length and effort should.
This was a long review of Julia but it is by no means exhaustive. There are many other features I liked about it but it shares many of these with Python. This includes type inference, containers which can hold multiple types, and higher order functions. I’ve also avoided reviewing packages, including curated packages like Plots and DataFrame. I can say briefly that Plots is capable (the image at the top was generated with it) but lacking in features compared to Python’s Matplotlib or Seaborn, and that I prefer DataFrame to Pandas. But this is a topic for a whole other time.
In closing, I think Julia is an awesome language and I hope the community grows. I also hope industry begins to adopt it like academia already has.
SIR stands for Susceptible, Infected and Recovered/Removed. I have not published my code because it is a solution for an assignment in the online course. ↩
A typical example. This is the full documentation in Julia Help for the pie chart function
pie in the Plots package: “Plot a pie diagram”. What? What are the keyword arguments? How do I rotate the chart? How do I add annotations? How do I set colours? ↩
For example, the
CSV.read(file_name) syntax in the 6 months old Computational Modelling course doesn’t work anymore. The new syntax requires you to specify the data sink type - it’s a good change. So the new syntax is:
CSV.read(file_name, DataFrame). ↩
Exceptions to this are the operator functions such as + and *, which have 184 and 364 methods respectively. If one loads the DifferentialEquations package, this grows to 420 and 833 respectively. ↩