KFX Guide – 02 – The ASS subtitle format

If you would like to contribute to the guide, you can do that at https://github.com/Zahuczky/zahuczkys-kfx-guide/.

In this chapter, you're going to get introduced to the ASS subtitle format, and every ASS tag.

I will make it clear now, that this is not a full-on “Aegisub guide,” it is assumed that you already know your way around Aegisub, at least in a sense that you can navigate the software.

If you have no experience with the software, google can probably help you with getting going, and you can always ask others for help.
The best place to ask for help is the GoodJob! Media Discord server.
Upon joining, you can claim the “Subber role” in the #get-roles channel, and with that you can have access to subtitling specific channels. I recommend asking these kinds of questions in the #typesetting channel.

With that out of the way, let's go through how a line is constructed in the ASS format.
You can divide what's inside a line into two basic parts.
Things that are in {curly brackets} and things that are outside of it.

{This is in the brackets}This is out of the brackets

As you can see, only the text outside the brackets is visible on the video.
We can place formatting tags and comments inside the brackets.

\t(<time1[ms]>,<time2[ms]>,\tag<value>)

For a total beginner, this tag might be a bit complicated as a first to be introduced to, but kfx-ing is more-or-less based on this tag, and it'll make understanding the other tags easier later. \t is short for “transform”, as it can be used to transform between two values of one tag. Its structure is

{\t(0,1000,\fs200)}

where 0 is the starting point of the transform, 1000 is the end point of the transform, both in milliseconds(1 second = 1000 milliseconds) relative to the start time of our line, \fs is our tag and 200 is its value. In this configuration, it'll transform the font size of our text to 200 in one second.

You can add acceleration to \t tag by supplying a 3rd argument after the times, like \t(0,1000,1.5\fs200) An acceleration value of "1" will make the transform linear, while numbers above 1 will make it start slow and end fast, and number between 0 and 1 will make it start fast and end slow. Feel free to experiment with it a bit.(this is an order, not a request)

In the above example, the original font size was 100.
Multiple \t tags can be used on a line.

\fs<number>

The tag \fs is short for "font size". It specifies the height of the text in pixels.

Its argument can be any number, integer, or decimal. Be careful though, since fonts can have padding above and below them, so the font might not appear exactly as many pixels as you gave. It's font dependent.


You can have multiple \fs tags throughout your line, and the text after your tag will be what the tag applies to.


It can be animated by \t, as you can see in the above example.

\fn<name>

\fn specifies the font, or typeface of our text. It cannot be animated by \t.

{\fnArial}Arial {\fnTimes New Roman}Times New Roman

You can have multiple \fn tags throughout your line, and the text after your tag will be affected by it.
Can't be animated by \t.

\an<1-9>

"an" is short for "alignment" and it has two use cases. If our line doesn't have a \pos tag(more on that later, but basically if it doesn't have a defined place on the screen) it defines the placement or our subtitle. Numbers from 1 to 9 can be its argument, and they reflect the sides/corners/middle of the screen, with each number representing its relative place on a keyboard numpad. In this state, they respect margins set in the style or on the line.

{\an5}\an5

Its other use case is setting the anchor point of our text, relative to its center. The center of a line is... well the center of the text. With the \an tag we can define the anchor point of our text relative to its center, and this point is going to be the one that tags such as \pos or \move use for the coordinates of the line. In the image below, you can see the lines and their anchor points.

Only one \an tag can be used on one line

Can't be animated by \t.

\pos(<xCoord>,<yCoord>)

"pos" is short for "position," as it can be used to define the position of the text on the screen, by providing it X and Y coordinates. Coordinates are always relative to the top left of our screen. For example, on a Full HD video(and Full HD subtitle file[which you can set in File->Subtitle Props.{it should always match the video resolution}]) \pos(0,0) will make our line appear in the top left, while \pos(1920,1080) will put it in the bottom right.

Only one \pos tag can be used on a line.
Can't be animated by \t

\move(<xCoord1>,<yCoord1>,<xCoord2>,<yCoord2>,<time1[ms]>,<time2[ms]>)

As it's name implies, \move can be used to move subtitles around the screen. Its structure is:

{\an5\move(0,540,1920,540,0,2000)}Moving text

where "0,540" are our starting coordinates, "1920,540" are our ending coordinates, and "0,2000" is our starting and ending time of the movement in milliseconds.
So \move with these arguments will move the text from 0,540 to 1920,540 in two seconds, starting from the beginning of the line.

A \move tag can't have acceleration, and only one can be used on a line.

I wouldn't think it's worth mentioning, but for the sake of completeness, no, it can't be animated by \t.

(For multi-step/non-linear movement you can use shadow tags(\shad\xshad\yshad), more on those later)

\bord<num>, \xbord<num>, \ybord<num>

\bord is short for "border", and it can be used to add a border to our text. Its argument is the thickness of the border in pixels.



It can be animated by \t.

\shad<num>, \xshad<num>, \yshad<num>,

\shad is short for "shadow", and it can be used to add a shadow to our text. Its argument is the distance of the shadow from the text in pixels.

Using the \shad tag will always place our shadow to the bottom right of the text, and increasing the value will place it further away.


\xshad and \yshad can be used to place the shadow in a specific direction, and can be used to create a shadow in any direction.


Using \xshad and \yshad can give all kinds of offsets to your subtitles. You can also use them together.

There's also something known as the “shad trick.”
The shad trick is using 0xFE (254) alpha(transparency, more on that below) values on your fill and border, which will make the practically invisible, while still allowing you to render a shadow. It's useful, when you would like to blur(more on that below) the entire text high amounts, or want to move your text in any form of non-linear way, for which you can use multiple \t tags and even acceleration.
With this you can overcome the limitations of the \move tag, and create any kind of movement you want.

All shadow tags can be animated by \t, which makes them a valuable tool for kfx-ing. Keep that in mind.

\blur<num>, \be<num>,

As the name implies, \blur can be used to blur our text, specifically with gaussian blur.
Although its usage can be quite tricky, since using \blur on lines with border and/or shadows will only blur the border and/or shadow.
If both are present, both will be blurred, but the fill of the text will remain sharp.

As mentioned in the shadow section, you can use the shad trick to use blur on the entire line, even if you want borders, although in that case your border and fill be the same color.


There's also another kind of blur, which is rarely used nowadays.
While it's also a kind of blur, with \blur's gaussian blur the blur extends outwards from the text, with \be the blur sort of extends inwards.

Both can be animated by \t.

\fscx<num>, \fscy<num>,

\fscx and \fscy can be used to scale our text in the X and Y axis respectively. (it's short for "font scale X / Y")
Its argument is a percentage, and it can be any number, integer or decimal.
It's important to note that the scaling is relative to the original size of the text, where 100 is the default value, so if you use \fscx50, it'll make the text half as wide as it originally was, and if you use \fscy200, it'll make the text twice as tall as it originally was.
There is no tag to scale the text on both axes at the same time. (There is one in VSFilterMod, \fsc, tho but that's kinda irrelevant as it was never widely adopted)

Both can be animated by \t.

\fsp<num>

\fsp is short for "font spacing", and it can be used to change the spacing between the letters of our text.
Its argument is the amount of pixels to add to the space between the letters.
It can be any number, integer or decimal, positive or negative. (negative values will make the letters overlap)

It can be animated by \t.

\frx<num>, \fry<num>, \frz<num>,

\frx, \fry, and \frz can be used to rotate our text in the X, Y, and Z axis respectively.
Its argument is the amount of degrees to rotate the text.
It can be any number, integer or decimal, positive or negative.
It's important to note, that while the rotation values are in degrees, so 0 to 360, you can use numbers outside of this range.
360 is the same as 0, and 330 is the same as -30, and 30 is the same as 390, and so on.



All of them can be animated by \t.

\fax<num>, \fay<num>,

\fax and \fay can be used to skew or slant our text in the X and Y axis respectively. It's often referred to as "shearing."
Its argument is the factor to this formula: 45 * (1 - (1/2)^factor) * 2
so 1 will make the text skew/slant by 45 degrees, 2 will make it skew/slant by 67.5 degrees, 3 will make it skew/slant by 78.75 degrees, and so on.
It can be any number, integer or decimal, positive or negative, but you probably want to stay within the range of -2 to 2.



Both can be animated by \t.

\c<colorCode>, \1c<colorCode>, \2c<colorCode>, \3c<colorCode>, \4c<colorCode>,

\c is short for "color", and it can be used to change the color of our text.
Its argument is a color code, which is a hexadecimal number, and it can be any number from 0 to FFFFFFF.
It's important to note that the color code is in the format of 0xBBGGRR, where BB is the blue value, GG is the green value, and RR is the red value.
So 0xFF0000 is blue, 0x00FF00 is green, 0x0000FF is red, 0xFFFFFF is white, 0x000000 is black, and so on.

\c and \1c are equivalent, and they change the color of the text's fill.
\2c changes the secondary color of the text, which is used for karaoke timing tags, but we'll not go deep into those here. You'll learn more about them later.
\3c changes the color of the text's border.
\4c changes the color of the text's shadow.



All of them can be animated by \t.

\alpha<HEX>, \1a<HEX>, \2a<HEX>, \3a<HEX>, \4a<HEX>,

\a is short for "alpha", and it can be used to change the transparency of our text. (Why is transparency called "alpha"?)
Its argument is a hexadecimal number, and it can be any number from 0 to FF.
So 00 is fully opaque, and FF is fully transparent.

\alpha applies the transparency to everything, including the fill, border and shadow.
\1a changes the transparency of the text's fill.
\2a changes the transparency of the text's secondary color.
\3a changes the transparency of the text's border.
\4a changes the transparency of the text's shadow.

All of them can be animated by \t.

Referring back to the "shad trick", you can use 0xFE (254) alpha values on your fill and border, which will make the text practically but not technically invisible, while still allowing you to render a shadow.

\org(<xCoord>,<yCoord>)

\org is short for "origin", and it can be used to define the origin of our text, which is the point around which the text is rotated.
Its argument is the X and Y coordinates of the origin, and it's always relative to the top left of the screen.
It's important to note that the origin is always the center of the text by default(so the anchor point(by alignment) / the same as the \pos tag), and it's only visible when you have rotation on your line with \frx, \fry, or \frz.

In the above example, you can see {\org(960,540)\frz45}\org(960,540)\frz45

Only one \org tag can be used on a line.

\fad(<fadein>,<fadeout>), \fade

\fad is kinda short for "fade", and it can be used to add a fade-in and fade-out effect to our text.
Its arguments are the time in milliseconds for the fade in and fade out, and it's always relative to the start and end of the line.
So \fad(100,100) will make the text fade in and out in 100 milliseconds.

There's also another kind of fade tag, which is \fade.
It can be used to make complex fades.
It takes in 3 alpha values in decimal format, so 0-255, and 4 time values in milliseconds.
Let's call them a1, a2, a3, t1, t2, t3, t4.
The line is going to start with the a1 alpha value.
From t1 to t2, the line will fade from a1 to a2.
After t2, the line will stay a2.
From t3 to t4, the line will fade from a2 to a3.
After t4, the line stay a3.

So basically, it's kinda like the \fad tag, but instead of the fading time always being relative to the start and end, and the alpha values always being fully transparent or fully visible, we can modify each one of those separately.

Let's take this example {\fade(200,50,255,500,1000,2000,2500)\an5}\fade(200,50,255,500,1000,2000,2500)



Here, our text starts out as alpha 200, which is almost totally transparent, but still visible.
Then, after being like that for 500 milliseconds, so 0.5 seconds, it fades to alpha 50 in 500 milliseconds, which is almost totally visible.
After that, it stays like that for a second, then in another 500 milliseconds fades to 255, so completely invisible.

Realistically, I've never seen anyone use this, as this can be achieved by \t and \alpha tags.

Only one \fad or \fade tag can be used on a line.

\p<num>

\p can be used to insert vector drawings into the subtitles.
Using \p1 will set our line into drawing mode, while \p0 will exit it.
Using a number larger than 1 in the \p tag will scale the drawing down. Using \p2 will halve the size of the drawing, using \p4 will 1/8 the size of the drawing, and so on. Another way of saying it is that the drawing resolution gets multiplied by the argument of p.

A vector drawing is basically a shape, that we define by its outer points.
The format to draw vectors in ASS subtitles is similar to SVGs but not a perfect match.
On this site, and in recent versions of aegisub, in vector drawings, X coordinates are red, while Y coordinates are green. In bezier curves, the point coordinate is underlined, while the control points aren't.

All vector drawings have to start with the character "m" which places the "pen" to the coordinates that come after it.
After that, we can switch to either line mode with "l" (lowercase L) or to bezier mode with "b".
In line mode, when we define a pair of x and y coordinates our "pen" is going to stroke a straight line between those.
With vector drawings, we can't really draw lines by themselves, it has to be a closed shape, so we need at least 2 pairs of coordinates after the starting position and the first "l" command.
With bezier mode, after the "b" character, we have to define 3 sets of coordinates. The first two sets are going to be "control points" which can be imagined as points of magnets, and they're going to pull the drawing to themselves. The third point is the end of the line.
This might sound a bit complicated at first, but let's look at two examples, I swear it's not that hard.



In this example, you can see is {\pos(0,0)\an7\p1}m 100 100 l 200 100 200 200 100 200.
\p1 puts us into drawing mode at 1:1 scale. \an7 makes sure our drawing is anchored to the top left and \pos(0,0) will make sure the anchor is exactly in the top left.
While we specify coordinates in the drawings, those are relative to the subtitle anchor point that we discussed in the \an part of this guide. So if we want our coordinates to be exactly representative of the video, we have to make sure we use \an7 and \pos(0,0)
After the tags, outside of the curly brackets, we have the letter "m" which places our pen to 100, 100 X/Y coordinates.
Then, we switch into line mode and define three more points. 200,100 / 200,200 / 100,200. This way, we defined every corner of a rectangle. A line was drawn from each coordinate to the next, and the shape gets filled. Bamm, that's a rectangle.

You may notice, that we didn't draw a line from the bottom left back to the top. Our renderer will always close any shape that was left open, by connecting the last point with the first.

Technically, there are a few more drawing commands, "s" for cubic b-splines, "p" for extended b-splines and "c" for closing b-splines, which are supported by both libass and vsfilter, but no other toolings supports them, including the visual typesetting tools in aegisub. Most tooling will likely break on them, or corrupt them, so for now, let's just forget that they exist.
See? It's not that complicated.



Let's look at another example. It's almost the same shape, but we are using a bezier curve on the last line.
In the image above, you can see this drawing:
{\pos(0,0)\an7\p1}m 100 100 l 200 100 200 200 b 170 230 130 230 100 200
It's a bit zoomed in for clarity's sake. As we can see, up until the "b" character it's exactly like the previous drawing, but after that, we have three sets of coordinates. Two of these are the coordinates for the control points, and the last, underlined one is the coordinate of the point. The two red squares show the control points on the image, and as you can see, those "pull" the drawing to themselves.

Technically, there are a few more drawing commands, "s" for cubic b-splines, "p" for extended b-splines and "c" for closing b-splines, which are supported by both libass and vsfilter, but no other toolings supports them, including the visual typesetting tools in aegisub. Most tooling will likely break on them, or corrupt them, so for now, let's just forget that they exist.

Neither the drawings, nor the \p tag can be animated by \t.

\clip, \iclip

The \clip tag can be used to apply clipping masks to our subtitles.

There are two types of clips we can use, rectangular and vector clips.
Rectangular clips, as the name implies can only be used to create a rectangular clipping mask, but they have a much easier syntax.
It requires two sets of X and Y coordinates, the first will define the top left of the rectangle, and the second will define the bottom right.



In this above example, you can see {\an5\clip(400,500,1500,550)}Half of this text is clipped off.
Here's the clip visually rendered by Aegisub:



As you can see, only the parts inside the clip are rendered in our subtitles.
If you'd like to invert this, so only the things outside of the clip area are rendered, you can just change \clip to \iclip which stands for "inverted clip." This works on both rectangular and vector clips.

For vector clips, instead of comma-separated arguments, we can use a vector drawing discussed in the drawing section of this guide.
For example, {\an5\clip(m 485 569 l 1408 570 1406 489 b 1180 586.25 674 544.75 485 499)}Half of this text is clipped off.



Unlike vector drawings, vector clips will always be scaled 1:1, and we can't scale it down / multiply the resolution as we could with \p tags.

And, an important thing to note about clips is that the performance of rectangular and vector clips widely differs, as vector clips need much more processing.
With rectangular clips, the area in which the characters are rendered just gets clamped down, restricting to the area of the clip, but for vector clips, they're rasterized into a bitmap and then multiplied into the subtitles alpha channel, so a lot of vector clips, especially big ones can have a serious impact on performance.
For typesetting purposes, this usually isn't really a concern, but for kfx-ing, you may have to pay attention to this.



In the image above, you can see some rendering statistics generated with Noro's Assytics (precisely, with joletb's fork).
In the first seconds, thousands of rectangular clips are rendered, and at 3 seconds, thousands of vector clips are rendered. You can see the big jump in the graph. The red line at the top marks 42 milliseconds, which is the frame timing of 24fps video, so if the rendering time goes above that, you might have some performance issues. Note, that this is a fabricated example, but with a lot of clips in your kfx, it's easy to hit that ceiling.

Only rectangular clips can be animated by \t, vector clips can't be.

\i, \b, \u, \s

These tags can be used to apply italic, bold, underline, and strikeout to our text respectively.
Each of them takes a binary argument, 0 or 1, where 0 means off, and 1 means on.



In the above example, you can see each of these applied to the text.
{\an5\fnArial\i1}Italicized text.{\i0}\N
{\b1}Bold text{\b0}\N
{\u1}Underlined Text.{\u0}\N
{\s1}Crossed text.{\s0}


You can alternate between them on and off on the line, and if you don't have a "closing" tag, it's just applied to the entire line.

None of them can be animated by \t.

\n, \N, \q, \r

For kfx-ing, you're rather unlikely to use any of these, maybe \r, but for completeness's sake, I'll include them.
The \n and \N tags can be used for soft linebreaks and hard linebreaks respectively.
A soft line break will only apply if the wrapping style is set to 2.
A hard line break will insert a line break regardless of the wrapping style.

\q can be used to set the wrapping style.
  • 0: Smart wrapping, make each line approximately equally long, but top line wider when equal width is impossible. Only \N forces line breaks.
  • 1: End-of-line wrapping, fill as much text in a line as possible, then break to the next line. Only \N forces line breaks.
  • 2: No word wrapping, wide lines will extend beyond the edges of the screen. Both \n and \N force line breaks.
  • 3: Smart wrapping, similar to style 0, but bottom lines are made wider.

(Yes, I straight up copied these from the Aegisub guide.)

\r can be used to set a new style on the line, if you provide a style name as an argument, or without an argument, will just reset all formatting to the style default for the text following it.

None of them can be animated by \t.

Writers/contributors: Zahuczky