A few years ago, I took a class on computational theory concurrently with a class on abstract algebra and I noticed a few interesting things:
- Divisibility rules are a simple application of modular arithmetic.
- Such modular arithmetic operations can be implemented in deterministic finite-state automata (DFAs).
- DFAs and regular expressions have equivalent computational power.
Therefore, any divisibility rule can be written as a regular expression. In fact, there's a straightforward algorithm for doing this for any divisor and in any base system.
Divisibility
Say there is a natural number . When written as a series of base- digits , is expressed as a polynomial as shown in Equation 1:
Using Horner's method, this can be re-written into the series shown in Equation 2:
Writing the series as in Equation 2 makes it possible to process the digits in order, with once the series is fully evaluated.
If is divisible by , it will be in its congruence class (i.e. ). To test this using modular arithmetic, there needs to be a way to calculate from . Equation 3 shows how this can be done using some properties of modular arithmetic:
This gives Equation 4 as a modular form of Equation 2:
DFAs
A Deterministic Finite-state Automaton (DFA) is a type of basic computing machine that accepts or rejects a string of input symbols by reading them one at a time and advancing its internal state for each input symbol using a transition function to one of a finite set of possible internal states. The divisibility problem is especially well-suited for this type of computing model—there's a string of input symbols (the digits of the number), a finite set of internal states (the congruence classes), a transition function (Equation 4), and simple acceptance/rejection criteria (numbers in congruence class ). Putting this together gives the DFA definition in Equation 5:
In Equation 5,
- The set of states is the set of congruence classes for plus an extra state added as the DFA's initial state.
- The input alphabet is the set of base- digits.
- The transition function is defined based on Equation 4, with corresponding to the initial case. This case is only separated for clarity since it's mathematically equivalent to the case.
- The initial state is the extra initial state .
- The set of accepting states is just the congruence class , which must be part of to be divisible by .
DFAs are often shown as graphs with the states as nodes and the input symbols as edges. Figure 1 shows a diagram of a DFA generated from Equation 5 with and :
Regular Expressions
Regular expressions are a ubiquitous tool for text searching, and have equivalent computational power to DFAs. As part of this equivalence, it's possible to construct a regular expression from any DFA by way of Kleene's algorithm. Kleene's algorithm makes use of Generalized Non-deterministic Finite-state Automata (GNFAs) as an intermediary between DFAs and regular expressions. GNFAs use regular expressions on their transition edges rather plain input symbols; Figure 2 shows the GNFA version of the DFA in Figure 1:
Kleene's algorithm is similar to the Floyd-Warshall algorithm and works by eliminating nodes in the GNFA one-by-one until only one edge with the final regular expression remains. This is done by incoporating paths through each eliminated node into the regular expressions between every other pair of nodes; Figure 3 shows this process with the GNFA in Figure 2:
Code
Constructing the DFA and then applying Kleene's algorithm to get a regular expression is pretty straightforward, but also pretty tedious to do by hand. Luckily, all of the math translates pretty directly to code. The first step is to define the input symbols, as shown in Listing 1:
Next, the transition function is defined as in Listing 2:
This is then used to construct the GNFA as in Listing 3:
With the GNFA constructed, the next step is to reduce it using Kleene's algorithm as shown in Listing 4:
Finally, only the start and accept states remain, and the edge between them is combined with the accept state's self-edge to form the final regular expression as in Listing 5:
The full code is available in divregex.cpp; it's very simple and under 100 lines in total, though the output can get pretty long since nothing is done to optimize the generated regular expressions.
Usage
The program takes two arguments: the first is the divisor and the second is the base (which is optional: the default is decimal). Listing 6 shows how to compile the program and use it to generate the regular expression found in Figure 3:
Since the output is written to stdout, it is very easy to use it with other programs as shown in Listing 7:
The divregex
program can also be compiled with Emscripten to run in web browsers, as shown in Listing 8:
The compilation results from Listing 8 power the live demo in the next section:
Live Demo
Regular expression matching numbers divisible by in base :
…But, Why?
Why not? Sure, grep probably still isn't the best tool to use when trying to figure out whether numbers are multiples of a particular number, but the fact that it can be used for this at all is an interesting application of computational theory. It also demonstrates how surprisingly powerful regular expressions can be in some cases, especially given some of the more “obvious” things that they can't be used for.