Emacs Font Lock How to highlight multiline text

I use a custom start and end tag to mark comment text. One of the problem is when the comment is long and mixed with content, it's hard to identify the boundary of comment and content, my eyes fatigue sometimes.

I would like to let Emacs to highlight the comment text block so I can distinguish them and focus on the content I'm writing. Other popular editors like VIM or Sublime can also achieve this, this post illustrates how to do it in Emacs.

Create a custom mode

Use define-derived-mode to create a new mode based on an existing mode such as html-mode or fundamental-mode is the first step.

(define-derived-mode mytext-mode fundamental-mode "mytext"

Enable multiline text matching

By default, Font Lock only process line of text, to support multiline text, we have to add those code

(define-derived-mode mytext-mode fundamental-mode "mytext"
  (set (make-local-variable 'font-lock-multiline) t)
  (add-hook 'font-lock-extend-region-functions
(defun test-font-lock-extend-region ()
  "Extend the search region to include an entire block of text."
  ;; Avoid compiler warnings about these global variables from font-lock.el.
  ;; See the documentation for variable `font-lock-extend-region-functions'.
  (eval-when-compile (defvar font-lock-beg) (defvar font-lock-end))
    (goto-char font-lock-beg)
    (let ((found (or (re-search-backward "\n\n" nil t) (point-min))))
      (goto-char font-lock-end)
      (when (re-search-forward "\n\n" nil t)
        (setq font-lock-end (point)))
      (setq font-lock-beg found))))            

Setup regular expression and highlighting options

The naive version looks like below.

(setq mytext-highlights '(("START_TAG\\(.\\|\n\\)*?END_TAG" . font-lock-warning-face)))

This not always work, it's works well in a small file(dozens of KB), but no effect in a large file(dozen of MB).

By default, the Font Lock use jit-lock to perform highlighting, so called just in time fontification or lazy fontification. It works incrementally based on the current visible part of the file in buffer. I was wondering if it's the jit-lock causes the problem. So I disabled it as below:

(set (make-local-variable 'font-lock-support-mode) nil)

Still no problem with small file, but when jit-lock is disabled, an eager one is used which has a limit size for the buffer, in my case, the limit is 256KB. Here is how to cancel the limit

(set (make-local-variable 'font-lock-maximum-size) nil)

Another problem arises: the regular expression overflowed. I realized why it has to set the limit of the buffer, an eager fontifier will highlight the whole buffer once for all, it's slow, but it's not the biggest problem, the bigger problem is it's hard to find a regular expression that matches all multiline block in a big file without stack overflow.

When I digged more in ELISP code, I found the real problem is not the jit-lock, here are the critical code about regular expression based highlighting:

(defun font-lock-fontify-keywords-region (start end &optional loudly)
  "Fontify according to `font-lock-keywords' between START and END.
START should be at the beginning of a line.
LOUDLY, if non-nil, allows progress-meter bar."
  (unless (eq (car font-lock-keywords) t)
    (setq font-lock-keywords
      (font-lock-compile-keywords font-lock-keywords)))
  (let ((case-fold-search font-lock-keywords-case-fold-search)
    (keywords (cddr font-lock-keywords))
    (bufname (buffer-name)) (count 0)
        (pos (make-marker))
    keyword matcher highlights)
    ;; Fontify each item in `font-lock-keywords' from `start' to `end'.
    (while keywords
      (message (format "start %s end %s loop keywords %s" start end (car keywords)))
      (if loudly (message "Fontifying %s... (regexps..%s)" bufname
              (make-string (cl-incf count) ?.)))
      ;; Find an occurrence of `matcher' from `start' to `end'.
      (setq keyword (car keywords) matcher (car keyword))
      (goto-char start)
      (while (and (< (point) end)
          (if (stringp matcher)
              (re-search-forward matcher end t)
            (funcall matcher end))
                  ;; Beware empty string matches since they will
                  ;; loop indefinitely.
                  (or (> (point) (match-beginning 0))
                      (progn (forward-char 1) t)))
     (message (format "looping matches start %s end %s" (match-beginning 0) (match-end 0)))
    (when (and font-lock-multiline
           (>= (point)
               (save-excursion (goto-char (match-beginning 0))
                       (forward-line 1) (point))))
      ;; this is a multiline regexp match
      ;; (setq font-lock-multiline t)
      (put-text-property (if (= (point)
                      (goto-char (match-beginning 0))
                      (forward-line 1) (point)))
                 (1- (point))
                   (match-beginning 0))
                 'font-lock-multiline t))
    ;; Apply each highlight to this instance of `matcher', which may be
    ;; specific highlights or more keywords anchored to `matcher'.
    (setq highlights (cdr keyword))
    (while highlights
      (if (numberp (car (car highlights)))
          (font-lock-apply-highlight (car highlights))
        (set-marker pos (point))
            (font-lock-fontify-anchored-keywords (car highlights) end)
            ;; Ensure forward progress.  `pos' is a marker because anchored
            ;; keyword may add/delete text (this happens e.g. in grep.el).
            (if (< (point) pos) (goto-char pos)))
      (setq highlights (cdr highlights))))
      (setq keywords (cdr keywords)))
    (set-marker pos nil)))
(defsubst font-lock-apply-highlight (highlight)
  "Apply HIGHLIGHT following a match.
HIGHLIGHT should be of the form MATCH-HIGHLIGHT, see `font-lock-keywords'."
  (let* ((match (nth 0 highlight))
     (start (match-beginning match)) (end (match-end match))
     (override (nth 2 highlight)))
    (if (not start)
    ;; No match but we might not signal an error.
    (or (nth 3 highlight)
        (error "No match %d in highlight %S" match highlight))
      (let ((val (eval (nth 1 highlight))))
        (message (format "in font-lock-apply-hightlight start %s end %s with face val %s" start end val))
    (when (eq (car-safe val) 'face)
      (add-text-properties start end (cddr val))
      (setq val (cadr val)))
        (message (format "in font-lock-apply-hightlight cond list (not (or val (eq override t))) %s (not override) %s override %s" (not (or val (eq override t))) (not override) override))
     ((not (or val (eq override t)))
      ;; If `val' is nil, don't do anything.  It is important to do it
      ;; explicitly, because when adding nil via things like
      ;; font-lock-append-text-property, the property is actually
      ;; changed from <face> to (<face>) which is undesirable.  --Stef
     ((not override)
      ;; Cannot override existing fontification.
      (or (text-property-not-all start end 'face nil)
          (put-text-property start end 'face val)))
     ((eq override t)
      ;; Override existing fontification.
      (put-text-property start end 'face val))
     ((eq override 'prepend)
      ;; Prepend to existing fontification.
      (font-lock-prepend-text-property start end 'face val))
     ((eq override 'append)
      ;; Append to existing fontification.
      (font-lock-append-text-property start end 'face val))
     ((eq override 'keep)
      ;; Keep existing fontification.
      (font-lock-fillin-text-property start end 'face val)))))))

What the code does is performing re-search-forward repeatedly and apply text property and font face to matched text.

From the debugging output I found every matched pieces of text are found correctly, but the font face is not applied.

Here is the thing: there are two phases of fontifications, the first round is called syntactic fontification, for example if you treat the text as HTML, it will be fontified according to HTML syntax, in my case, the text that matched with custum regular expression are fontified as string. The second round is so called search based fontification, as the code above shows, when override is nil, the already fontified text will not be override.

The solution is do not derive your mode from a parent mode that will fontify the undesired part of the file and set the override option to t.

Your code would be something like this:

(setq mytext-highlights '(("START_TAG\\(.\\|\n\\)*?END_TAG" . (0 font-lock-warning-face t))))

Here is the documentation about the format of the options:

(matcher . subexp-highlighter)
In this kind of element, subexp-highlighter is a list which specifies how to highlight matches found by matcher. It has the form:
          (subexp facespec [override [laxmatch]])

The car, subexp, is an integer specifying which subexpression of the match to fontify (0 means the entire matching text). The second subelement, facespec, is an expression whose value specifies the face, as described above.

The last two values in subexp-highlighter, override and laxmatch, are optional flags. If override is t, this element can override existing fontification made by previous elements of font-lock-keywords. If it is keep, then each character is fontified if it has not been fontified already by some other element. If it is prepend, the face specified by facespec is added to the beginning of the font-lock-face property. If it is append, the face is added to the end of the font-lock-face property.

If laxmatch is non-nil, it means there should be no error if there is no subexpression numbered subexp in matcher. Obviously, fontification of the subexpression numbered subexp will not occur. However, fontification of other subexpressions (and other regexps) will continue. If laxmatch is nil, and the specified subexpression is missing, then an error is signaled which terminates search-based fontification.

If you are not sure about the parent mode will fontify undesired text, the override should always be set to t. So your custom style always get the highest priority.

The full code list

(define-derived-mode mytext-mode fundamental-mode "mytext"
  (setq mytext-highlights '(("START_TAG\\(.\\|\n\\)*?END_TAG" . (0 font-lock-warning-face t))))
  (setq  font-lock-defaults '(mytext-highlights))
  (message "deriving mode mytext")
  (set (make-local-variable 'font-lock-multiline) t)
  (add-hook 'font-lock-extend-region-functions
(defun test-font-lock-extend-region ()
  "Extend the search region to include an entire block of text."
  ;; Avoid compiler warnings about these global variables from font-lock.el.
  ;; See the documentation for variable `font-lock-extend-region-functions'.
  (eval-when-compile (defvar font-lock-beg) (defvar font-lock-end))
    (goto-char font-lock-beg)
    (let ((found (or (re-search-backward "\n\n" nil t) (point-min))))
      (goto-char font-lock-end)
      (when (re-search-forward "\n\n" nil t)
        (setq font-lock-end (point)))
      (setq font-lock-beg found))))

Evaluate this code in Emacs with eval-buffer and execute M-x mytext-mode in your buffer to apply the custom highlighting.