Teaching Exuberant Ctags to Support New Languages

Exuberant Ctags is a wonderful tool that I use to help navigate source code, e.g. finding function definitions. It supports many programming languages but also allows you to extend its understanding of those languages. I have written about how you can add new ‘rules’ for Lua and for PHP.

But what if Ctags has no support for a language? You can still use the program to create tags if you ‘teach’ Ctags about the language. Today I will explain how to do so.

The Goal

Exuberant Ctags can already create tags for build scripts written for GNU Make. For example, it recognizes variables that you can write in Make files. In this article I will demonstrate how to configure Ctags to understand new languages by using Tup as the example language. Tup is also a build tool like GNU Make and has similar concepts about variables, ‘targets’, and so on.

So the goal in this article will be to make Ctags understand how to build tags about the data in Tup files.

The Example Tup File

By the end of the article we want Exuberant Ctags to extract useful tags from this file:

######################################################################
#
# This Tup file compiles PHP Mode for GNU Emacs, available at:
#
#     https://github.com/ejmr/php-mode
#
# It will byte-compile all of the Emacs Lisp code for the project.
#
######################################################################

# The path to Emacs itself.
EMACS := /usr/local/bin/emacs

# Flags to give Emacs when compiling each individual Elisp file.
FLAGS += -Q
FLAGS += -L .
FLAGS += -batch
FLAGS += -f batch-byte-compile

# The source files we want to compile.
SOURCES += php-mode.el
SOURCES += php-mode-test.el

# This is the one and only 'rule' in our Tup file, which compiles all
# of the sources listed above.
: foreach $(SOURCES) |> ^ Compiling %f^ $(EMACS) $(FLAGS) %f |> %B.elc

Beginning the Configuration

First we need to find the configuration file for Ctags. For example, I use /home/eric/.ctags. The part of the manual in that link will help you decide what file to use or create. The contents we will write, however, will be the same in any case.

Our Ctags configuration file will consist of command-line options that we could type every time we run the program, but that would be tedious. It is useful, though, to keep the manual at hand since we will be writing those command-line options into our configuration file.

The first thing we must define is the name of the language, like so:

--langdef=tup

This tells Exuberant Ctags that we are defining support for the tup language, and then we can use the name tup with other command-line options. For example, we can tell Ctags which files are Tup files.

--langmap=tup:(Tupfile)
--langmap=tup:+.tup

The value to --langmap has this format: language:pattern. For the language part we use tup since that is the value we gave to --langdef. Then we have the value (Tupfile), telling Ctags that it should parse any file named Tupfile using the patterns we will later define for the tup language. Note that we must wrap the name of file in parentheses; Ctags would give us an error if we wrote --langmap=tup:Tupfile.

Next we want to also associate any file ending with the *.tup extension in its name. Note well that we do not actually type the asterisk; it is optional, so when we use .tup Ctags correctly assumes we mean *.tup.

But that does not explain the plus sign. Each time we use --langmap Ctags erases the existing map. Let’s look at what happens when we leave out the plus sign:

--langdef=tup
--langmap=tup:(Tupfile)
--langmap=tup:.tup

If we run ctags --list-maps we will see this:

tup      *.tup

In other words, Ctags is only mapping files that match *.tup to the tup language, meaning it will ignore any file named Tupfile. This happens because the second use of --langmap overrides the first. We don’t want that; we want to append the second map to the first, and to do that we must add a plus sign to the beginning of the second map. After adding the plus we can run ctags --list-maps and we will see this:

tup      *.tup Tupfile

That indicates that Ctags is mapping both filename patterns to the tup language.

Before moving on I want to mention that we could use --langmap only once and combine both patterns like this:

--langmap=tup:(Tupfile),tup:+.tup

We still must use the plus sign to append the second mapping. Personally I prefer to write one mapping per-line, but that’s merely a matter of style. You can provide a comma-separated list of mappings all as one value if you like.

Creating Tags for Variables

Now that we’ve defined the name of our new language and some file associations we are ready to tell Ctags how to actually parse the language. To do this we use regular expressions. For the rest of this article I will assume you are comfortable with regular expressions, particularly capturing groups because we will often use them.

In order for Ctags to find variables in Tup files we need to write a regular expression to match them. We do this via --regex-tup=…, which gets its name from our initial --langdef=tup. Here is the regular expression we will use:

--regex-tup=/([a-z0-9_]+)[ \t]*[:+]?=/\1/v,variable/i

Now we’ll break it down. The value has four parts, each separated with a forward slash.

  1. The regular expression itself. We match a name, optional whitespace, and the equals sign which may have some optional preceeding characters. For this example that is good enough to recognize a line as defining a variable. But note well that we wrap the variable name in parentheses to capture it. We do that because the second part is…

  2. …The name of the tag. When we search for tags, these are the names we will see. So the regular expression will match EMACS := /usr/local/bin/emacs but the tag name will simply be EMACS (due to the back-reference, i.e. \1).

  3. The part v,variable is what Ctags calls the kind. It provides semantic information about the nature of the tag. Each kind is a single letter followed by a comma and then a more descriptive label. You can run ctags --list-kinds=all to see all of the built-in kinds.

  4. The final part contains any flags for the regular expression. In our example we use i so that the expression is case-insensitive.

Exuberant Ctags requires only the first two parts. Personally I recommend defining the ‘kind’ as well, but it is optional, as are the flags for the regular expression.

Note: That regular expression does not cover all valid Tup variable names, e.g. &var or foo-@(BAR). I leave that as an exercise for the reader. Which is just another way of saying, “I am lazy.”

Creating Tags for Rules

Now that we have looked at the individual parts for tag definitions, let’s add one to match Tup rules:

--regex-tup=/:[ \t]+(foreach[ \t]+)?([^ \t|>]+)/Build \2/r,rule/i

So given the rule in our example file, : foreach $(SOURCES) |> …, Ctags will create a tag named Build $(SOURCES). This shows how we can put arbitrary text into the tag names if we want. When using an editor or IDE that supports tab-completion for tags this can be a useful way to distinguish groups of tags.

Conclusion

Those are the basics of extending Exuberant Ctags to support new languages. Just for reference, here is the complete definition for Tup that we built over the course of this article:

--langdef=tup
--langmap=tup:(Tupfile)
--langmap=tup:+.tup
--regex-tup=/([a-z0-9_]+)[ \t]*[:+]?=/\1/v,variable/i
--regex-tup=/:[ \t]+(foreach[ \t]+)?([^ \t|>]+)/\2/r,rule/i

There is tons of room for improvement there, but hopefully now you can understand how to make those improvements and how to add support for any languages for which you want to incorporate Exuberant Ctags.

Advertisements

One thought on “Teaching Exuberant Ctags to Support New Languages

Add Your Thoughts

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s