Why learn Latex?

2020.Nov.25

If you are pursuing a degree in STEM, you’ll probably run into this dreaded/beloved rule tool at one point: LaTeX. Opinions on using LaTeX as the tool for type-settings assignments seem to be extremely polarized between students: those who like it usually take their passion to the extreme, declaring anyone who uses anything other typesetter as either tasteless or mindless slaves of the evil Microsoft Office suite; while people who hate it also hate it with a passion, often refusing to use it unless absolutely required for an assignment, and often vowing never to touch it again.

I thought to revisit this question after this question pop up in one of my social media thread: outside STEM academia, what good is learning LaTeX? And I was surprised to see that most respondent jumped in immediately to say a firm: no, LaTeX is in fact useless outside of STEM academia, even people that have been avid users of LaTeX. I personally think this view, though “correct” in the sense there are little if no direct applications of LaTeX outside of STEM academia, cannot be further from the truth: learning LaTeX properly can give you a set of skill greatly beneficial to you outside academia. I feel the issue is deeply tied with my main cases against LaTeX, how the use of LaTeX in most STEM assignments failed to address what LaTeX is, and what it is used for. So let us start the discussion with that.

The case against LaTeX

My case against LaTeX is largely summed up in this article from 4 years ago, and my stance against teaching LaTeX the haphazard method that usually occurs in STEM assignments has largely stayed the same: LaTeX is not taught as a typesetting tool, with proper introductions to it benefits and limitations. Instead, it is taught as a loosely and often uncoordinated list of seemingly arbitrary rules, which ends up with LaTeX being reduced to a merely a requirement which acts as a barrier of entry for students who have never used LaTeX before. For the few lucky individuals that manage to wrap their heads around basic LaTeX then use this as a bragging right instead of diving deeper into what LaTeX can achieve and usually end up generating documents that I can easily typeset better in a typical WYSIWYG processor.

LaTeX, at the end of the day, is a tool developed by computer scientist to use computer science methods to solve typesetting issues. Failing to introduce it as such leave so much wasted potential on the table, with much of the reusable concepts being left out, leading most to declare LaTeX as useless. The following discussion for some cases of LaTeX are by no means universal, but highlight how properly learning LaTeX can benefit you in areas outside pure academic writing.

The case for LaTeX

So how can LaTeX be beneficial outside of academia, the following is by no means an exhaustive list, but I think presents interesting arguments for the beginning of discussions.

The Golden standard for mathematical symbol typesetting

Probably the most important reason for STEM to be using LaTeX is it being the golden standard for mathematical symbol typesetting. Fractions, nested super/subscripts, mixing uniform and non-uniform symbol heights, custom character decorations, dynamically changing symbol size based on other content… Name any sort of weird and exotic symbol placements for mathematical expressions, LaTeX most likely has a way of typesetting what you want. LaTeX was so much in the lead so many years ago that it has become more than a tool in this aspect: it is now a standard that other tools try to follow. Be it plotting tools in Python, WYSISWG equation editors, even the hotkeys for equation typesetting in Microsoft Word are based on LaTeX syntax.

If you are a STEM student, chances are that you will be using mathematical symbols and related typesetting functions in you work, whether you choose to stay in academia or not. If you are a bit of a document neat-freak like I am, and if seeing even simply mathematic text like in plots or descriptions instead of the formal makes your skin crawl, then learning latex is definitely worth it, even if just for learning how to typeset mathematical expression. It might feel like a brute force method to learn something as mundane as hotkeys and character sheets, but LaTeX puts this method of type setting first and foremost under your eyes, forcing you to learn and get familiar with this typesetting standard.

Document structure enforced by default

Formal documents often have predefined structures: sections and subsections, chapters and such. Titles of such structures are typically highlighted using enlarged fonts with perhaps a different font. How you create such structures in a WYSIWYG editor? Given the toolbar show at the top of the window, it most straight forward way making such structure is to simply type in the editing window:

In you editor window... [plaintext]
1 My first structure title

1.1 My first sub-structure title

Then move to the many pronounced buttons in to toolbar and start pushing buttons until you get it to look right. While this is all fine and good for the first few lines, the problem then starts when the document starts getting large. What if the style of the sub-subsection title? Let me scroll to the previous instance of a sub-subsection title. Wait… is this one such title that I forgot to format, or is this just a one-line paragraph starting with a decimal number? What number am I on again to add a new section? Oh no, the font settings for section titles is too large, let me scroll through the document and fix all the formats. Then when the document is printed, every notices that one instance where you missed the formatting, and now you feel angry at having just wasted a bunch of paper. All hell breaks loose when you use manual labor to construct structure tracking content such as list-of-figures.

This is the issue of not using document structuring tools provided by the office suites (Word and LibreOffice usage documentations here). While the office suites have been trying hard to push such functionalities forwards to the user, people typically stick with what they are familiar with, leading to people generating truly horrific documents that are excessively difficult to maintain and edit. Meanwhile, in LaTeX land, maybe one of the first command one learns is the construction of document structures.

Default way of handling structure [latex]
\section{My first structure title}
\subsection{My first sub-structure title}

The equivalent of all the button pushing in a WYSISYG editor is effectively writing a LaTeX document as something like:

Hacking way of handling structure styling [latex]
\textbf{\fontsize{18}{24}\selectfont 1. My first structure title}
\textbf{\fontsize{16}{20}\selectfont 1.1 My first sub-structure title}

It is explicitly more exhausting to structure a document wrongly than it is to write it in the “correct” way in LaTeX! This is one of the key aspects of LaTeX that can be a takeaway for anyone doing document management, even if you are not a computer science enthusiast: Computer are so much better at keeping track of mundane details than the human brain is, find the way for the computer to assist you with the mundane, and leave your brain free for the more important type settings and writing task! WYSISWG editors leaves this as an option to be explored, LaTeX presents this as the default option.

This of course, isn’t an entirely foolproof solution to document structures. I’ve still seen LaTeX documentations that resort to the horrific typesetting method in the second code snippet above when some people are trying to do fancy title formatting instead of RTFM. The true solution would be to use LaTeX as a starting point for learning more fundamental concepts in document management such as markup languages and style-context separation, which incidentally is also an important building block for Web applications that are ubiquitous today.

Appreciation for good typesetting and micro typesetting

This I feel is the biggest plus of learning LaTeX in my personal experience. A lot of people claim that LaTeX generates good type-setting, but they spend little time in investing what makes the default typesetting generated by LaTeX good. This is probably the best example of how powerful a tool LaTeX is, with one key philosophy behind the TeX typesetting engine being that there are some typesetting just too complicated to do by hand, so let’s have a machine do it for us. Here are a couple of examples to the typesetting and micro-typesetting used by the LaTeX that is greatly used in the field of formal publications. The following is by no means an exhaustive list, but is meant to get people to appreciate what LaTeX does well and get people to appreciate good typesetting in general medium.

For the comparison for default typesetting used in LaTeX and a WYSIWYG editor, I’m going to paste the first two paragraphs of this article into the two typesetters. For the sake of image generation, I’m using a A5 paper with a 0.8” margin all around the page (this is hopefully the last time I use imperial units in my writing), with a font size of 10pt. All other settings will be left at default unless otherwise specified. The font will be left default for both typesetters, so Computer Modern for LaTeX and Times New Roman for the Office Suites (Aha! Here we are already seeing a problem with LaTeX! How many users know how to apply these settings to a LaTeX document without resorting to Google? I certainly didn’t! With LibreOffice/Word, I’m pretty confidence anyone who is mildly tech-savvy can find these setting by looking through the dropdown menus) Now let us see the result:

Display of the documents generated as described, the document from the left uses LaTeX, the document on the right uses LibreOffice

I think most would agree that the document on the left looks, better. But why? Let us get into the specifics.

Proper justified text boxes

One feature that most people will notice is that the right side of the document is flushed at some bounding box in the LaTeX version. A visual well-defined boundary enhances the reading experience by having the brain immediately understanding where to start and stop before actually beginning to read. This is formally called “justified” alignment. But some may complain that this option is available in Office suites too, and it is unfair that bash on office suites when justified alignment is literally 1 button press away. So lets compare the two documents again with the justified settings on in the Office suite document:

Display of the documents generated as described, the document from the left uses LaTeX, the document on the right uses LibreOffice with justified text alignment

Now the comparison feels closer, but the feel of the right document still feels inhomogeneous. Two obvious examples of this inhomogeneous feeling can be found on the forth row of the first paragraph and the second row of the second paragraph on the right. Both lines stand out as feeling oddly blank, with some word feeling oddly stretched out.

The reason for this is that true-justified alignment is incredibly difficult to do! The spacing between words and characters must be taken into account when calculating what needs to be done for just a single line of justified alignment, can this calculation will in turn determine at which word should new lines begin. For WYSIWYG editors, where characters can literally disappear at the push of a button, running the full calculation is unacceptable for most users (you probably don’t want to wait 30 seconds for the page to render when you accidentally hit a backspace at the beginning of a 30-page document!) so such tools typically resort to “lazy” justified alignment methods simply blow up the spacing in a simple flush-left aligned document and hope for the best.

Indications for this technique being employed is most obviously seen by the fact that no words change lines when toggling between flush-left and justified alignment. But if you do the toggle between a flush-left and justified alignment in LaTeX, the context shift is very pronounced. LaTeX has the benefit that the where words can characters can go is calculated after all the context is provided by the user, therefore having all the time in the world to chew the numbers required for a pretty justified layout (As a side note, the blog you are staring at right now also uses lazy justified alignment defined in the CSS standard and implemented by various browsers. That is why some lines may look“weird”, and why most website still don’t use justified alignment.)

The problem becomes even more complicated when we take into account hyphen-separated line breaks (which unfortunately doesn’t show up in our example): where can the hyphens be? Everyone can agree that breaking the line in between a word like “you” using a hyphenation like y-ou is very ugly and breaks the flow of reading. But what about more involved words such as “involved”, where can the hyphen be there? Are the better ways of line breaks without resorting to hyphens? Is there in fact a better solution when considering later text combinations? All these questions require a surprising amount of computation time, which WYSISYG editors simply don’t have.

Micro typesetting --- Ligatures and Kerning

Micro typesetting mainly refers to the placement relation of characters within a word that optimizes the flow of reading. A good micro-typesetting contributes to the feeling of togetherness of elements when reading. Let’s look at the most obvious fault in the LibreOffice document, the word “beneficial”:

Display of the word "beneficial" as written by the default font in Latex (left, Computer Modern), and LibreOffice (right, Times New Roman).

The dissonance in the Times New Roman font of the f and i characters being “kind of together but not quite” generate a feeling that the word is somehow broken. Computer Modern gets around this by having what is known as ligatures: simply defining sets of characters into a single glyph to be used together in print: instead of 2 separate glyphs f and i, the document contains a single character fi. While computer Modern doesn’t contain an entire set of ligatures used in English (looking at the double t’s in the documents, I would say that Times New Roman does a better job than Computer Modern), why the default font for Office suites is one with so little ligatures sets is kind of perplexing. But I here you say: changing fonts is just a push of a couple of buttons! This “flaw” cannot be held against my Office tools. And I would say no they should not be held accountable… if you know how to enable them! Ligatures are off by default in the popular office suite: Microsoft Office. For the sake of argument, let us render the documents again with a ligature complete font: Linux Libertine. Here, I’m going to cheat a bit by taking a screenshot of the Office editing window instead of exporting it to PDF to highlight certain issues with WYSIWYG editor not doing a good job at rendering “good” typesetting to the user.

Display of the documents rendered by LaTeX (left) and the LibreOffice editing window(right), both using Linux Libertine as the font.

Here we can see the editing window of the LibreOffice is riddled with strange words designs, despite using the same font: words like “vowing” in the last line of the first paragraph looks disjointed between the i and n; words like “people” have two distinct designs between the first and second paragraph. These are examples of poor kerning, or inter-glyphs spacing within a word. Again, good kerning gives the characters of a work a sense of togetherness that enhances reading, while bad kerning makes certain words look like typos. While kerning issues displayed is fixed when LibreOffice exports the document to a PDF or a printer, I feel that having the user have an immediate feel of what is good typesetting during editing is equally important for a pleasant writing experience.

Proper substitution of fonts

I think most people that use bold fonts under windows will subconsciously think bold fonts look… strange. This is best highlighted with denser Chinese characters, where character strokes are packed closer together. Let us look at this two words using the same font family in Microsoft Office. The left most characters using a regular Noto Sans Font, the center characters uses the “bold” method used by clicking a big B button in Microsoft Office, and the right characters uses the correct method of generating bold fonts: using Noto Sans Bold.

Display of 2 Chinese characters in regular (left) fake bold (center) and real bold (right) in Microsoft Word using Noto Sans CJK TC as the example font.

We can see that the default bold method generated by Microsoft Office is a horrific mess, basically it takes the existing font and just extrapolates each stroke, resulting in strokes being mushed together as well as horrible anti-aliasing artifacts appearing towards the ends of the strokes. The correct method of generating bold font is to use a designed font set with the spacing management specifically designed to handle broader strokes. For LaTeX the type setting engine automatically substitutes the correct font when you specify that you want a bold font, only resorting to the horrible fake bold method is no corresponding bold font can be found on the system. You probably don’t know the existence of the bold variation of the Time New Roman font!

General typesetting rules

None the typesetting decisions discussions above are strictly perks of LaTeX then sub-par choices made by various office suites. Different office suites handle the various typesetting above differently. For example: Microsoft Word has a much more sophisticated justified calculation engine (it runs per paragraph rather than per-line) though the lack of hyphening word breaks still leave a couple of lines feeling a bit empty; Word also handles a more consistent kerning rule in the presentation window, though (by default) refuses to use ligatures even when the features are present in the font, and refuses (by default) to perform font substitution when switching font weights. LibreOffice on the other hand, has the horrific kerning rules in the presentation window, with the application of ligatures a bit too enthusiastic leading to weird intra-word spacing, and opts to fix these typesetting issues during the export process; what LibreOffice does correctly is substitute the font for font weights. But the overall argument still stands: see how much LaTeX gets right by default? This is what people mean by LaTeX documents look “nicer”. When compounded with the fact of consistent formatting among document structure elements, it is little wonder why LaTeX remains the defacto standard for academic typesetting.

Of course, what we have discussed mainly focuses on what computers (and in turn LaTeX) does well: micromanagement of well-defined typesetting rules. In general, I like to describe the results of the LaTeX typesetting as beautifully boring, it lacks any visual flare that can be used to drawing your eyes to certain important elements in a page (why LaTeX beamer slide nearly always looks boring), but what presents structure in a consistent and neat format exceedingly well, which is exactly what most books for formal presentation are looking for, hence it’s popularity in academic works, particularly in STEM related fields.

That is not the end of the story of typesetting. LaTeX does not guard against bad macro-typesetting practices. Take this textbook that is obviously designed using LaTeX as an example:

Example page take from the textbook "Pattern Recognition and Machine Learning", by Christopher Bishop. Contextually, this is a splendid book! The main culprit I want to discuss here is less the author, and more the publisher: Springer.

Why is the larger margin for the right pages, which should be designed to guard against physical damaged, as well as act as a space for note-taking placed towards the spine of the book, where damage is least likely to occur and the most annoying part of the book to write on?? As a small rant, I find this sort of bad typesetting decision much more prominent in recent textbooks than the older ones. With a lot of learning material moving to electronic media, such design choices are understandable: a PDF by default may be seen as a list of single faced paper rather than a book, but at the same time I find this failure to adapt content for the presentation media a huge oversight: this is what computer typesetting engines are good for, adapting layouts without manual contextual edits!

Why any of this is important

LaTeX is a tool, specifically a typesetting tool (not a writing tool!) with incredible features and beautiful defaults. I feel that a good tool, beside the obvious gets-things-done requirement, assists the user in getting things done the right way, and make the user appreciate why the right is considered correct. LaTeX, when taught correctly, can do a bunch of these things, as well as give great insight as to what makes good typesetting good, and all of this is already leaving out the more nerdy comp-sci features such as code-reusability and modularization and such! At the end of the day, declaring a tool as “useless” just because one missed the point of what LaTeX tries to provide, what it does well, and appreciation of what goes into making “good” tool, feel a tad hasty, wouldn’t you say?