Common Lisp pathnames, used to represent file paths, have the reputation of being hard to work with. This article aims to change this unfair reputation while highlighting the occasional quirks along the way.
Filenames and file paths
The distinction between filename and file paths is not always obvious. On POSIX systems, the filename is the name of the file, while a file path represents its absolute or relative location in the file system. Which also means that all filenames are file paths, but not the other way around.
Common Lisp uses the term filename for objects which are either pathnames or namestrings, both being representation of file paths. We will try to avoid confusion by using the terms filenames, pathnames and namestrings when referring to Common Lisp concepts and we will talk about file paths when referring to the language-agnostic file system concept.
Pathnames
Pathnames are an implementation-independent representation of file paths made of six components:
- an host identifying either the file system or a logical host;
- a device identifying the logical of physical device containing the file;
- a directory representing an absolute or relative list of directory names;
- a name;
- a type, a value nowadays known as file extension;
- a version, because yes file systems used to support file versioning.
While this representation might seem way too complicated —it originates from a time where the file system ecosystem was much richer— it still is suitable for modern file systems.
The make-pathname
function is used to create pathnames and lets you specificy
all components. For example the following call yields a pathname representing
the file path represented on POSIX systems by /var/run/example.pid
:
(make-pathname :directory '(:absolute "var" "run") :name "example" :type "pid")
Common Lisp functions manipulating file paths of course accept pathnames, letting you keep the same convenient structured representation everywhere, only converting from/to a native representation at the boundaries of your program.
Special characters
What happens when you construct a pathname with components containing separation
characters, e.g. a directory name containing /
on a POSIX system or a type
containing .
? According to Common Lisp 19.2.2.1.1, the behaviour is
implementation-defined; but if the implementation accepts these component values
it must handle quoting correctly.
For example:
- CLISP rejects separator characters in component values, signaling a
SIMPLE-ERROR
condition. - CCL accepts them and quotes them when converting the pathname to a
namestring. So
(namestring (make-pathname :name "foo/bar" :type "a.b"))
yields"foo\\/bar.a\\.b"
on Linux. - SBCL accepts and quotes them but does not quote
.
in type components, yielding"foo\\/bar.a.b"
for the example above. - ECL accepts them but fails to quote them when converting the pathname to a namestring.
One could wonder about which implementation, CCL or SBCL, is correct regarding
the quoting of the .
character in type strings on POSIX platforms. While
everyone understands that /
is special in file and directory names, .
is
debatable because POSIX does not mention the type extension in its definitions:
foo.txt
is the name of the file, not a combination of a name and a type. As
such, I would argue that quoting and not quoting are both correct. And as you
will realize then reading about namestrings further in this article, it is
irrelevant since namestrings are not POSIX paths.
Note that whether ECL violates the standard or not is unclear since there is no
character quoting for POSIX paths. In other words, there is no such thing as a
directory named a/b
, because it could not be referenced in a way different
from a directory named b
in a directory named a
. This behaviour derives
directly from POSIX systems treating paths as strings and not as structured
objects.
Invalid characters
The Common Lisp standard mentions special characters but is silent on the subject of invalid characters. For example POSIX forbids null bytes in filenames. But since it is not a separation character, implementations are free to deal with it as they see fit.
When testing implementations with a pathname containing a null byte using
(make-pathname :name (string (code-char 0)))
, CCL, SBCL and ECL accept it
while CLISP signals an error mentioning an illegal argument.
I am not convinced by CLISP’s behaviour since null bytes are only invalid in POSIX paths, not in Common Lisp filenames, meaning that the error should occur when the pathname is internally converted to a format usable by the operating system.
Pathname component case
A rarely mentioned property of pathnames is the support for case conversion.
MAKE-PATHNAME
and function returning pathname components (e.g.
PATHNAME-TYPE
) support a :CASE
argument, either :COMMON
or :LOCAL
indicating how how to handle character case in strings.
With :LOCAL
—which is the default value—, these functions assume that
component strings are already represented following the conventions of the
underlying operating system. It also dictates that if the host only supports one
character case, strings must be returned converted to this case.
With :COMMON
, these functions will use the default (customary) case of the
host if the string is provided all uppercase, and the opposite case if the
string is provided all lowercase. Mixed case strings are not transformed.
These behaviours are not intuitive and made much more sense at a time where some file systems only supported one specific case. You should probably stay away from component case handling unless you really know what you are doing.
On a personal note, as someone running Linux and FreeBSD, I am curious about the behaviour of various implementations on Windows and MacOS since both NTFS and APFS are case insensitive.
Unspecific components
While all components can be null, some of them can be :UNSPECIFIC
(which ones
is implementation-defined). The only real use case for :UNSPECIFIC
is to
affect the behaviour of MERGE-PATHNAMES
: if a component is null, the function
uses the value of the component in the pathname passed as the :DEFAULTS
argument; if a component is :UNSPECIFIC
, the function uses the same value in
the resulting pathname.
For example:
(merge-pathnames (make-pathname :name "foo")
(make-pathname :type "txt"))
yields the "foo.txt"
namestring, but
(merge-pathnames (make-pathname :name "foo" :type :unspecific)
(make-pathname :type "txt"))
yields "foo"
.
Unfortunately the inability to rely on its support for specific component types (since it is implementation-defined) makes it interesting more than useful.
Namestrings
Namestrings are another represention for file paths. While pathnames are
structured objects, namestrings are just strings. The most important aspect of
namestrings is that unless they are logical namestrings (something we will cover
later), the way they represent paths is implementation-defined (c.f. Common Lisp
19.1.1 Namestrings as Filenames). In other words the namestring for the file
foo
of type txt
in directory data
could be data/foo.txt
. Or
data\foo.txt
. Or data|foo#txt
. Or any other non-sensical representation.
Fortunately implementations tend to act rationally and use a representation as
similar as possible to the one of their host operating system.
One should always remember that even though namestrings look and feel like
paths, they are still a representation of a Common Lisp pathname, meaning that
they may or may not map to a valid native path. The most obvious example would
be a pathname whose name is the null byte, created with (make-pathname :name (string (code-char 0)))
, whose namestring is a one character string that has no
valid native representation on modern operating systems.
Pathnames can be converted to namestrings using the NAMESTRING
function, while
namestrings can be parsed into pathnames with PARSE-NAMESTRING
. The #P
reader macro uses PARSE-NAMESTRING
to read a pathname. As such,
#P"/tmp/foo.txt"
is identical to #.(parse-namestring '"/tmp/foo.txt")
.
Note that most functions dealing with files will accept a pathname designator, i.e. either a pathname, a namestring or a stream.
Native namestrings
An unfortunately missing feature from Common Lisp is the ability to parse native namestrings, i.e. paths that use the representation of the underlying operating system.
To understand why it is a problem, let us take *.txt
, a perfectly valid
filename at least on any POSIX platform. In Common Lisp, you can construct a
pathname representing this filename with (make-pathname :name "*" :type "txt")
. No problem whatsoever. However the "*.txt"
namestring actually
represents a pathname whose name component is :WILD
. There is no namestring
that will return this pathname when passed to PARSE-NAMESTRING
.
As a result, when processing filenames coming from the external world (a command line argument, a list of paths in a document, etc.), you have no way to handle those that contain characters used by Common Lisp for wild components.
There is no standard way of solving this issue. Some implementations provide
functions to parse native namestrings, e.g. SBCL with
SB-EXT:PARSE-NATIVE-NAMESTRING
or CCL with CCL:NATIVE-TO-PATHNAME
.
Wildcards
Up to now pathnames may have looked like a slightly unusual representation for paths. But we are just getting started.
Pathname can be wild, meaning that they contain one or more wild components.
Wild components can match any value. All components can be made wild with the
special value :WILD
. Directory elements also support :WILD-INFERIORS
which
matches one or more directory levels.
As such
(make-pathname :directory '(:absolute "tmp" :wild) :name "foo" :type :wild)
is equivalent to the /tmp/*/foo.*
POSIX glob pattern, while
(make-pathname :directory '(:absolute "tmp" :wild-inferiors "data" :wild)
:name :wild :type :wild)
is equivalent to /tmp/**/data/*/*.*
.
Wild pathnames only really make sense for the DIRECTORY
function which returns
files matching a specific pathname.
Logical pathnames
We currently have talked about pathnames representing either paths to physical files or pattern of filenames. Logical pathnames go further and let you work with files in a location-independent way.
Logical pathnames are based on logical hosts, set as pathname host components. Logical pathnames can be passed around and manipulated as any other pathnames; when used to access files, they are translated to a physical pathname, i.e. a pathname referring to an actual file in the file system.
SBCL uses logical pathnames for source file locations. While SBCL is shipped
with its source files, their actual location on disk depends on how the software
was installed. Instead of manually merging pathnames with a base directory value
everywhere, SBCL uses the SYS
logical host to map all pathnames whose
directory starts with SRC
to the actual location on disk. For example on my
machine:
(translate-logical-pathname "SYS:SRC;ASSEMBLY;MASTER.LISP")
yields #P"/usr/share/sbcl-source/src/assembly/master.lisp"
.
Another example would be CCL which maps pathnames with the HOME
logical host
to subpaths of the home directory of the user.
Note that logical hosts are global to the Common Lisp environment. While SYS
is reserved for the implementation, all other hosts are free to use by anyone.
To avoid collisions, it is a good idea to name hosts after their program or
library.
Logical namestrings
Logical namestrings are implementation-independent, meaning that you can safely use them in your programs without wondering about how they will be interpreted. Their syntax, detailed in section 19.3.1 of the Common Lisp standard, is quite different from usual POSIX paths. The host is separated from the rest of the path by a colon character, and directory names are separated by semicolon characters.
For example "SOURCE:SERVER;LISTENER.LISP"
is the logical namestring equivalent
of the /server/listener.lisp
POSIX path for the SOURCE
logical host.
The astute reader will notice the use of uppercase characters in logical namestrings. It happens that the different parts of a logical namestring are defined as using uppercase characters, but that the implementation translates lowercase letters to uppercase letters when parsing the namestrings (c.f. Common Lisp 19.3.1.1.7). We use the canonical uppercase representation for clarity.
Translation
Translation is controlled by a table that maps logical hosts to a list of pattern (wild pathnames or namestrings) and their associated wild physical pathnames.
One can obtain the list of translations for a logical host with
LOGICAL-PATHNAME-TRANSLATIONS
and update it with (SETF LOGICAL-PATHNAME-TRANSLATIONS)
. Each translation is a list where the first
element is a logical pathname or namestring (usually a wild pathname) and the
second element is a pathname or namestring to translate into.
The translation process looks for the first entry that satisfies
PATHNAME-MATCH-P
, which is guaranteed to behave in a way consistent with
DIRECTORY
. When there is match, the translation processes replaces
corresponding patterns for each components.
And of course if translation results in a logical pathname, it will be recursively translated until a physical pathname is obtained.
A simple example would be the use of a logical host referring to a temporary directory. This lets a program manipulates temporary pathnames without having to know their actual physical location, the translation process being controlled in a single location.
(setf (logical-pathname-translations "tmp")
(list (list (make-pathname :host "tmp"
:directory '(:absolute :wild-inferiors)
:name :wild :type :wild)
(make-pathname :directory '(:absolute "var" "tmp" :wild-inferiors)
:name :wild :type :wild))))
or if we were to use namestrings:
(setf (logical-pathname-translations "tmp")
'(("TMP:**;*.*.*" "/var/tmp/**/*.*")))
Translating pathnames or namestrings using the TMP
logical host yields the
expected results. For example (translate-logical-pathname "TMP:CACHE;DATA.TXT")
yields #P"/var/tmp/cache/data.txt"
.
Caveats
While logical pathnames are an elegant abstraction, they are plagued by multiple issues that make them hard to use correctly and in a portable way.
Logical namestring components can only contain letters, digits and hyphens (or
the *
and **
sequences for wild namestrings). This limitation probably comes
from a need to be compatible with all existing file systems, but it can be a
showstopper if one needs to refer to files whose naming scheme is not controlled
by the program.
Namestring parsing is confusing: calling PARSE-NAMESTRING
on an invalid
namestring (because it contains invalid characters or because the host is not a
known logical host) will not fail. Instead the string will be parsed as a
physical namestring, introducing silent bugs. The LOGICAL-PATHNAME
can be used
to validate logical pathnames and namestrings.
The way translation converts between both pathname patterns is unclear. It is not specified by the Common Lisp standard. Debugging patterns can quickly become very frustrating, especially with implementations unable to produce quality error diagnostics.
Finally, the behaviour of logical pathnames with other functions is rarely obvious, leading to frustrating debugging sessions.
They nevertheless are a unique and helpful feature for very specific use cases.
Recipes
Resolving a path
Files are accessible through multiple paths. For example, on POSIX systems,
foo/bar/baz.txt
, foo/bar/../bar/baz.txt
refer to the same file. If your
operating system and file system support symbolic links, you can refer to the
same physical file from multiple links, themselves being files.
It is sometimes useful to obtain the canonical path of a file. On POSIX systems,
the realpath
function serves this purpose. In Common Lisp, this canonical path
is called truename, and the TRUENAME
function returns it.
Transforming paths
The :DEFAULTS
option of MAKE-PATHNAME
is useful to construct a pathname that
is a variation of another pathname. When a component passed to MAKE-PATHNAME
is null, the value is taken from the pathname passed with :DEFAULTS
.
For example to create the pathname of a file in the same directory as another pathname:
(make-pathname :name "bar"
:defaults (make-pathname :directory '(:absolute "tmp") :name "foo"))
Or to create a wild pathname matching the same file names but with any extension:
(make-pathname :type :wild
:defaults (make-pathname :name "foo" :type "txt"))
Or to obtain a pathname for the directory of a file:
(make-pathname :name nil
:defaults (make-pathname :directory '(:relative "a" "b" "c")
:name "foo"))
Joining two paths
Joining (or concatenating) two paths can be done with MERGE-PATHNAMES
. In
general calling (MERGE-PATHNAMES PATH1 PATH2)
returns a new pathname whose
components are taken either from PATH1
when they are not null, or from PATH2
when they are. As a special case, if the directory component of PATH1
is
relative, the directory component of the result pathname is the concatenation of
the directory components of both paths.
In other words
(merge-pathnames (make-pathname :directory '(:relative "x" "y"))
(make-pathname :directory '(:absolute "a" "b" "c")))
yields "/a/b/c/x/y/"
but
(merge-pathnames (make-pathname :directory '(:absolute "x" "y"))
(make-pathname :directory '(:absolute "a" "b" "c")))
yields "/x/y/"
.
Finding files
The DIRECTORY
function returns files matching a pathname, wild or not.
If the pathname is not wild, DIRECTORY
returns a list of one or zero element
depending on whether a file exists at this location or not.
If the pathname is wild, DIRECTORY
behaves similarly to POSIX globs. Due to
the way pathnames are structured, with the name and type being two different
components, a common error is to specify a wild name without a type. In this
case, DIRECTORY
will not return any file with an extension (since their
pathname has a non-null type). To match all files with any extension, set both
the name and the type to :WILD
.
Another interesting possibility is to only match directories. Directories are
represented by pathnames with a non-null directory component and a null name
component. Therefore to find all directories in /tmp
(top-level only):
(directory (make-pathname :directory '(:absolute "tmp" :wild)))
Note that DIRECTORY
returns truenames, i.e. pathnames representing the
canonical location of the files. An unexpected consequence is that the function
will resolve symlinks. Since the Common Lisp standard explicitely allows extra
optional arguments, some implementations have a way to disable symlink
resolving, e.g. SBCL with :RESOLVE-SYMLINKS
or CCL with :FOLLOW-LINKS
.
Resolving tildes in paths
It is commonly believed that tilde characters in paths is a universal feature. It is not. Tilde prefixes are defined in POSIX in the context of the shell (cf. POSIX 2017 2.6.1 Tilde Expansion) and are only supported in very specific locations.
To obtain the path of a file relative to the home directory of the current user,
use the USER-HOMEDIR-PATHNAME
function.
For example:
(merge-pathnames (make-pathname :directory '(:relative ".emacs.d")
:name "init" :type "el")
(user-homedir-pathname))