Julia Cheat Sheet



I wanted to make a cheat sheet for myself containing a reference of things I use when it comes to Unicode and when using Unicode in Vim, Python, Julia and Rust.

My Julia programming cheat sheet Nasser M. Abbasi May 13, 2020 Compiled on May 13, 2020 at 5:35pm Contents 1 installing 0.5 1 2 installing 0.3 2 3 installing 0.2 3 4 getting help on functions 7. Julia v1.0 or above. Credits This cheat sheet was created by Victoria Gregory, Andrij Stachurski, Natasha Watkins and other collaborators on behalf of QuantEcon.

First some basics:

  1. Unicode Code Pointshttps://unicode.org/glossary/#code_point are unique mappings from hexadecimal integers to an abstract character, concept or graphical representation. These graphical representations may look visually similar but can represent different “ideas”. For example: A, Α, А, A are all different Unicode code points.

    • A : U+0041 LATIN CAPITAL LETTER A
    • Α : U+0391 GREEK CAPITAL LETTER ALPHA
    • А : U+0410 CYRILLIC CAPITAL LETTER A
    • A : U+FF21 FULLWIDTH LATIN CAPITAL LETTER A

    The Unicode consortium defines a Graphemehttps://unicode.org/glossary/#grapheme as a “What a user thinks of as a character”. Multiple code points may be used to represent a grapheme. For example, my name in Devangari and Tamil can be written as 3 graphemes, but it consists of 4 and 5 code points respectively in these languages:

    • DEVANGARI: दीपक
      • : U+0926 DEVANAGARI LETTER DA
      • : U+0940 DEVANAGARI VOWEL SIGN II
      • : U+092A DEVANAGARI LETTER PA
      • : U+0915 Dec:2325 DEVANAGARI LETTER KA
    • TAMIL: தீபக்
      • : U+0BA4 TAMIL LETTER TA
      • : U+0BC0 TAMIL VOWEL SIGN II
      • : U+0BAA TAMIL LETTER PA
      • : U+0B95 TAMIL LETTER KA
      • : U+0BCD TAMIL SIGN VIRAMA

    Additionally, multiple “ideas” may be defined as a single code point. For example, the following grapheme ﷺ translates to “peace be upon him” and is defined as the code point at U+FDFA:

    • ﷺ : U+FDFA ARABIC LIGATURE SALLALLAHOU ALAYHE WASALLAM

    And to make matters more complicated, graphemes and visual representations of code points may not be a single column width wide, even in monospaced fonts. See the code point at U+FDFD:

    • ﷽ : U+FDFD ARABIC LIGATURE BISMILLAH AR-RAHMAN AR-RAHEEM

    Code points can be of different categories, Normal, Pictographic, Spacer, Zero Width Joiners, Controls etc.

  2. The same “idea”, i.e. code point can be encoded into different bits when it is required to be represented on a machine. The bits used to represent the idea depend on the encoding chosen. An encoding is a map or transformation of a code point into bits or bytes. For example, the code point for a 🐉 can be encoded into UTF-8, UTF16, UTF32 in Python as follows.

    Python prints the bytes as human readable characters if they are valid ASCII characters. ASCII defines 128 characters, half of the 256 possible bytes in an 8-bit computer system. Valid ASCII byte strings are also valid UTF-8 byte strings.

  3. When receiving or reading data, we must know the encoding used to interpret it correctly. A Unicode encoding is not guaranteed to contain any information about the encoding. Different encodings exist for efficiency, performance and backward compatibility. UTF-8 is a good pick for an encoding in the general case.

In vim in insert mode, we can type Ctrl+VCheck out :help i_CTRL-V_digit for more information. followed by either:

  • a decimal number [0-255]. Ctrl-v255 will insert ÿ.
  • the letter o and then an octal number [0-377]. Ctrl-vo377 will insert ÿ.
  • the letter x and then a hex number [00-ff]. Ctrl-vxff will insert ÿ.
  • the letter u and then a 4-hexchar Unicode sequence. Ctrl-vu03C0 will insert π.
  • the letter U and then an 8-hexchar Unicode sequence. Ctrl-vU0001F409 will insert 🐉.

Using unicode.vim, we can use :UnicodeName to get the Unicode number of the code point under the cursor. With unicode.vim and fzf installed, you can even fuzzy find Unicode symbols.

Since Python >=3.3, the Unicode string type supports a “flexible string representation”. This means that any one of multiple internal representations may be used depending on the largest Unicode ordinal (1, 2, or 4 bytes) in a Unicode string.

For the common case, a string used in the English speaking world may only use ASCII characters thereby using a Latin-1 encoding to store the data. If non Basic Multilingual Plane characters are used in a Python Unicode string, the internal representation may be stored as UCS2 or UCS4.

Julia Cheat Sheet

In each of these cases, the internal representation uses the same number of bytes for each code point. This allows efficient indexing into a Python Unicode string, but indexing into a Python Unicode string will only return a valid code point and not a grapheme. The length of a Unicode string is defined as the number of code points in the string.

As an example, let’s take this emoji: 🤦🏼‍♂️ [1]. This emoji actually consists of 5 code pointsWe can view this breakdown using uniview. In vim, we can use :UnicodeName.:

  • 🤦 : U+1F926 FACE PALM
  • 🏼 : U+1F3FC EMOJI MODIFIER FITZPATRICK TYPE-3
  • : U+200D ZERO WIDTH JOINER
  • ♂ : U+2642 MALE SIGN (Ml)
  • : U+FE0F VARIATION SELECTOR-16

In Python, a string that contains just this emoji has length equal to 5.

If we want to keep a Python file pure ASCII but want to use Unicode in string literals, we can use the U escape sequence.

As mentioned earlier, indexing into a Python Unicode string gives us the code point at that location.

Iterating over a Python string gives us the code points as well.

However, in practice, indexing into a string may not be what we want or may not be useful. More often, we are either interested in:

  1. indexing into the byte string representation or
  2. indexing into the graphemes.

We can use the s.encode('utf-8') function to get a Python byte string representation of the Python unicode string in s.

If we are interested in the number of graphemes, we can use the grapheme package.

For historical reasons, Unicode allows the same set of characters to be represented by different sequences of code points.

We can use the built in standard library unicodedata to normalize Python Unicode strings.

It is best practice to add the following lines to the top of your Python file that you expect to run as scripts.

If your Python files are part of a package, just adding the second line is sufficient. I recommend using pre-commit hooks to ensure that the encoding pragma of python files are fixed before making a git commit.

Let’s take a look at how Julia handles strings.

Printing the length of the string in Julia returns 5. As we saw earlier, this is the number of code points in the unicode string.

Julia String literals are encoded using the UTF-8 encoding. In Python, the indexing into a string would return the code point at the string. In Julia, indexing into a string refers to code unitshttps://unicode.org/glossary/#code_unit, and for the default String this returns the byte as a Char type.

If we want each code point in a Julia String, we can use eachindexSee the Julia manual strings documentation for more information: https://docs.julialang.org/en/v1/manual/strings/.

And finally, we can use the Unicode module that is built in to the standard library to get the number of graphemes.

If we wish to encode a Julia string as UTF-8As of Julia v1.5.0, only conversion to/from UTF-8 is currently supported: https://docs.julialang.org/en/v1/base/strings/#Base.transcode, we can use the following:

Let’s also take a look at rust. We can create a simple main.rs file:

And compile and run it like so:

[1] “It’s Not Wrong that ‘🤦🏼‍♂️’.length 7.” [Online]. Available: https://hsivonen.fi/string-length/.

[2] “Working with strings in Rust.” [Online]. Available: https://fasterthanli.me/articles/working-with-strings-in-rust.

[3] “Let’s Stop Ascribing Meaning to Code Points.” [Online]. Available: https://manishearth.github.io/blog/2017/01/14/stop-ascribing-meaning-to-unicode-code-points/.

[4] “Breaking Our Latin-1 Assumptions.” [Online]. Available: https://manishearth.github.io/blog/2017/01/15/breaking-our-latin-1-assumptions/.

[5] “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!).” [Online]. Available: https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/.

[6] “Dark corners of Unicode.” [Online]. Available: https://eev.ee/blog/2015/09/12/dark-corners-of-unicode/.

Julia Mullaney The Cheat Sheet

[7] “How Python does Unicode.” [Online]. Available: https://www.b-list.org/weblog/2017/sep/05/how-python-does-unicode/.

At its core, this article is about a simple cheat sheet for basicoperations on numeric matrices, which can be very useful if you workingand experimenting with some of the most popular languages that are usedfor scientific computing, statistics, and data analysis.

Julia

Sections

  • Introduction
  • MATLAB/Octave
  • Cheat sheet

Introduction

Matrices (or multidimensional arrays) are not only presenting thefundamental elements of many algebraic equations that are used in manypopular fields, such as pattern classification, machine learning, datamining, and math and engineering in general. But in context ofscientific computing, they also come in very handy for managing andstoring data in an more organized tabular form.
Such multidimensional data structures are also very powerfulperformance-wise thanks to the concept of automatic vectorization:instead of the individual and sequential processing of operations onscalars in loop-structures, the whole computation can be parallelized inorder to make optimal use of modern computer architectures.

Language overview

Before we jump to the actual cheat sheet, I wanted togive you at least a brief overview of the different languages that weare dealing with.

All four languages, MATLAB/Octave, Python, R, and Julia are dynamicallytyped, have a command line interface for the interpreter, and come withgreat number of additional and useful libraries to support scientificand technical computing. Conveniently, these languages also offer greatsolutions for easy plotting and visualizations.

Combined with interactive notebook interfaces or dynamic reportgeneration engines(MuPAD for MATLAB,IPython Notebook for Python,knitr for R, andIJulia for Julia based onIPython Notebook) data analysis and documentation has never been easier.

MATLAB (stands for MATrixLABoratory) is the name of an application and language that wasdeveloped byMathWorks back in

Julia Mullaney Cheat Sheet

  1. One of its strengths is the variety of different and highlyoptimized “toolboxes” (including very powerful functions for image andother signal processing task), which makes suitable for tacklingbasically every possible science and engineering task.
    Like the other languages, which will be covered in this article, it hascross-platform support and is using dynamic types, which allows for aconvenient interface, but can also be quite “memory hungry” forcomputations on large data sets.

Even today, MATLAB is probably (still) the most popular language fornumeric computation used for engineering tasks in academia as well as inindustry.

Julia

GNU Octave

It is also worth mentioning that MATLAB is the only language in thischeat sheet which is not free and open-sourced. But since it is soimmensely popular, I want to mention it nonetheless. And as analternative there is also the free GNU Octavere-implementation that follows thesame syntactic rules so that the code is compatible to MATLAB (exceptfor very specialized libraries).

This imageis a freely usable media under public domain and represents the firsteigenfunction of the L-shaped membrane, resembling (but not identicalto) MATLAB’s logo trademarked by MathWorks Inc.

Initially, the NumPy project started out underthe name “Numeric” in 1995 (renamed to NumPy in 2006) as a Pythonlibrary for numeric computations based on multi-dimensional datastructures, such as arrays and matrices. Since it makes use ofpre-compiled C code for operations on its “ndarray” objects, it isconsiderably faster than using equivalent approaches in (C)Python.

Python NumPy is my personal favorite since I am a big fan of the Pythonprogramming language. Although similar tools exist for other languages,I found myself to be most productive doing my research and data analysesin IPython notebooks.
It allows me to easily combine Python code (sometimes optimized bycompiling it via the Cython C-Extension or thejust-in-time (JIT) Numba compiler if speed isa concern) with different libraries from the Scipystack includingmatplotlib for inline data visualization (youcan find some of my example benchmarks in this GitHubrepository).

Julia Cheat Sheet Pdf

The R programming language was developed in1993 and is a modern GNU implementation of an older statisticalprogramming language called S, which wasdeveloped in the Bell Laboratories in 1976.Since its release, it has a fast-growing user base and is particularlypopular among statisticians.

R was also the first language which kindled my fascination forstatistics and computing. I have used it quite extensively a couple ofyears ago before I discovered Python as my new favorite language fordata analysis.
Although R has great in-built functions for performing all sortsstatistics, as well as a plethora of freely available librariesdeveloped by the large R community, I often hear people complainingabout its rather unintuitive syntax.

Julia

With its first release in 2012, Julia is by farthe youngest of the programming languages mentioned in this article. aWhile Julia can also be used as an interpreted language with dynamictypes from the command line, it aims for high-performance in scientificcomputing that is superior to the other dynamic programming languagesfor technical computing thanks to its LLVM-based just-in-time (JIT)compiler.

Personally, I haven’t used Julia that extensively, yet, but there aresome exciting benchmarks that look very promising:

C compiled by gcc 4.8.1, taking best timing from all optimization levels(-O0 through -O3). C, Fortran and Julia use OpenBLAS v0.2.8. The Pythonimplementations of rand_mat_stat and rand_mat_mul use NumPy (v1.6.1)functions; the rest are pure Python implementations.

Bezanson, J., Karpinski, S., Shah, V.B. and Edelman, A. (2012), “Julia:A fast dynamic language for technical computing”.
(Source: http://julialang.org/benchmarks/, with permission from thecopyright holder)

Alternative data structures: NumPy matrices vs. NumPy arrays

Python’s NumPy library also has a dedicated “matrix” type with a syntaxthat is a little bit closer to the MATLAB matrix: For example, the“ * ” operator would perform a matrix-matrix multiplication of NumPymatrices - same operator performs element-wise multiplication on NumPyarrays.

Vice versa, the “.dot()” method is used for element-wisemultiplication of NumPy matrices, wheras the equivalent operation wouldfor NumPy arrays would be achieved via the “ * “-operator.

Most people recommend the usage of the NumPy array type over NumPymatrices, since arrays are what most of the NumPy functions return.