ハフマン符号 (2) で説明した「無記憶情報源モデル」はもっとも簡単な情報源モデルです。このモデルは、記号を生成するとき以前に生成した記号との間に関係がないため「無記憶」と呼ばれますが、このモデルを一般化して状態 (記憶) を持つモデルを考えることができます。
参考文献 [1] によると、記憶があるモデルを「有限状態確率モデル」とか「マルコフ情報源モデル」と呼ぶそうです。一般には、入力データから適切な情報源モデルを作成するのは大変難しいので、次のような単純なモデルを考えることにします。
これを「有限文脈モデル」といい、直前に出現した記号列の長さを「次数 (order)」といいます。有限文脈モデルは 1 次 (order-1) がいちばん簡単です。直前に出力した記号を覚えておいて、それに従って出現頻度表を切り替えるという単純な方法で実現できます。つまり、各記号ごとに出現頻度表を用意しておいて、直前に出力した記号が a であれば、a の出現頻度表を使って符号化を行うわけです。
したがって、記号が 256 種類あれば、出現頻度表も 256 個必要になります。order-2 であれば、ab や cd のあとに現れる記号の出現頻度表が必要になるので、個数は 256 * 256 = 65536 になります。このように、次数が大きくなるほど必要となるメモリ量が爆発的に増えるので、単純な方法では低次の有限文脈モデルしか実現できないのが欠点です。
有限文脈モデルの詳しい説明は、以下に示す拙作のページをお読みくださいませ。
;;; ;;; rca1.lisp : 適応型レンジコーダ (order-1) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; ;;; Released under the MIT license ;;; https://opensource.org/license/mit/ ;;; (require :arc) (use-package '(:arc :bitree :rangecoder)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; 出現頻度表の取得 (defun get-freq (freq c) (unless (aref freq c) (setf (aref freq c) (bitree-initialize 256))) (aref freq c)) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (out) (call-with-byte-input-file in-file (lambda (in) (let ((size (file-length in)) (freq (make-array 256 :initial-element nil)) (c0 0)) (write-file-size out size) (when (plusp size) (call-with-range-encoder out (lambda (rc) (dotimes (x size) (let ((c (read-byte in))) (arc-encode rc (get-freq freq c0) c :limit #x4000 :inc 16) (setf c0 c)))))))))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (in) (let ((size (read-file-size in)) (freq (make-array 256 :initial-element nil)) (c0 0)) (call-with-byte-output-file out-file (lambda (out) (when (plusp size) (call-with-range-decoder in (lambda (rc) (dotimes (x size) (let ((c (arc-decode rc (get-freq freq c0) :limit #x4000 :inc 16))) (write-byte c out) (setf c0 c)))))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
;;; ;;; rca2.lisp : 適応型レンジコーダ (order-2) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; ;;; Released under the MIT license ;;; https://opensource.org/license/mit/ ;;; (require :arc) (use-package '(:arc :bitree :rangecoder)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; 出現頻度表の取得 (defun get-freq (freq c0 c1) (unless (aref freq c0 c1) (setf (aref freq c0 c1) (bitree-initialize 256))) (aref freq c0 c1)) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (out) (call-with-byte-input-file in-file (lambda (in) (let ((size (file-length in)) (freq (make-array '(256 256) :initial-element nil)) (c0 0) (c1 0)) (write-file-size out size) (when (plusp size) (call-with-range-encoder out (lambda (rc) (dotimes (x size) (let ((c (read-byte in))) (arc-encode rc (get-freq freq c0 c1) c :limit #x4000 :inc 64) (setf c0 c1 c1 c)))))))))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (in) (let ((size (read-file-size in)) (freq (make-array '(256 256) :initial-element nil)) (c0 0) (c1 0)) (call-with-byte-output-file out-file (lambda (out) (when (plusp size) (call-with-range-decoder in (lambda (rc) (dotimes (x size) (let ((c (arc-decode rc (get-freq freq c0 c1) :limit #x4000 :inc 64))) (write-byte c out) (setf c0 c1 c1 c)))))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
* (load "rca1.lisp") T * (test) ----- encode ----- Evaluation took: 0.860 seconds of real time 0.707117 seconds of total run time (0.701392 user, 0.005725 system) 82.21% CPU 2,067,199,482 processor cycles 2,825,168 bytes consed ----- decode ----- Evaluation took: 1.170 seconds of real time 1.179626 seconds of total run time (1.179626 user, 0.000000 system) 100.85% CPU 2,831,209,272 processor cycles 2,822,208 bytes consed NIL
* (load "rca2.lisp") T * (test) ----- encode ----- Evaluation took: 1.200 seconds of real time 1.200058 seconds of total run time (1.170202 user, 0.029856 system) [ Run times consist of 0.008 seconds GC time, and 1.193 seconds non-GC time. ] 100.00% CPU 2,880,208,486 processor cycles 43,211,008 bytes consed ----- decode ----- Evaluation took: 1.690 seconds of real time 1.689014 seconds of total run time (1.669090 user, 0.019924 system) 99.94% CPU 4,053,649,071 processor cycles 43,198,000 bytes consed NIL
表 : 適応型レンジコーダの結果 (order-1 :limit #x4000, :inc 16) (order-2 :limit #x4000, :inc 64) ファイル名 サイズ order-0 order-1 order-2 ------------------------------------------------------ alice29.txt 152,089 87,147 66,515 54,244 asyoulik.txt 125,179 75,533 55,136 46,562 cp.html 24,603 16,299 12,026 10,221 fields.c 11,150 7,164 5,022 4,134 grammar.lsp 3,721 2,305 1,819 1,645 kennedy.xls 1,029,744 460,734 317,753 202,944 lcet10.txt 426,754 249,491 187,635 150,625 plrabn12.txt 481,861 273,392 205,733 174,123 ptt5 513,216 78,090 54,518 59,754 sum 38,240 25,638 18,392 17,445 xargs.1 4,227 2,743 2,279 2,241 ------------------------------------------------------ 合計 2,810,784 1,278,536 926,828 723,938 表 : 実行時間 (秒) : order-0 : order-1 : order-2 -------+---------+---------+--------- 符号化 : 0.73 : 0.86 : 1.20 復号 : 1.04 : 1.17 : 1.69 実行環境 : SBCL 2.1.11, Ubunts 22.04 (WSL2), Intel Core i5-6200U 2.30GHz
;;; ;;; rcb1.lisp : バイナリレンジコーダ (order-1) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; (require :brc) (use-package '(:brc :rangecoder)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; 出現頻度表の取得 (defun get-freq (freq c) (unless (aref freq c) (setf (aref freq c) (initialize-binary-model 256))) (aref freq c)) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (out) (call-with-byte-input-file in-file (lambda (in) (let ((size (file-length in)) (freq (make-array 256 :initial-element nil)) (c0 0)) (write-file-size out size) (when (plusp size) (call-with-range-encoder out (lambda (rc) (dotimes (x size) (let ((c (read-byte in))) (bm-encode rc (get-freq freq c0) c :limit #x200 :inc 4) (setf c0 c)))))))))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (in) (let ((size (read-file-size in)) (freq (make-array 256 :initial-element nil)) (c0 0)) (call-with-byte-output-file out-file (lambda (out) (when (plusp size) (call-with-range-decoder in (lambda (rc) (dotimes (x size) (let ((c (bm-decode rc (get-freq freq c0) :limit #x200 :inc 4))) (write-byte c out) (setf c0 c)))))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
;;; ;;; rcb2.lisp : バイナリレンジコーダ (order-2) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; ;;; Released under the MIT license ;;; https://opensource.org/license/mit/ ;;; (require :brc) (use-package '(:brc :rangecoder)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; 出現頻度表の取得 (defun get-freq (freq c0 c1) (unless (aref freq c0 c1) (setf (aref freq c0 c1) (initialize-binary-model 256))) (aref freq c0 c1)) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (out) (call-with-byte-input-file in-file (lambda (in) (let ((size (file-length in)) (freq (make-array '(256 256) :initial-element nil)) (c0 0) (c1 0)) (write-file-size out size) (when (plusp size) (call-with-range-encoder out (lambda (rc) (dotimes (x size) (let ((c (read-byte in))) (bm-encode rc (get-freq freq c0 c1) c :limit #x200 :inc 4) (setf c0 c1 c1 c)))))))))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (in) (let ((size (read-file-size in)) (freq (make-array '(256 256) :initial-element nil)) (c0 0) (c1 0)) (call-with-byte-output-file out-file (lambda (out) (when (plusp size) (call-with-range-decoder in (lambda (rc) (dotimes (x size) (let ((c (bm-decode rc (get-freq freq c0 c1) :limit #x200 :inc 4))) (write-byte c out) (setf c0 c1 c1 c)))))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
* (load "rcb1.lisp") T * (test) ----- encode ----- Evaluation took: 2.110 seconds of real time 2.100453 seconds of total run time (2.091775 user, 0.008678 system) 99.53% CPU 5,041,774,433 processor cycles 13,492,432 bytes consed ----- decode ----- Evaluation took: 2.300 seconds of real time 2.303512 seconds of total run time (2.303330 user, 0.000182 system) 100.17% CPU 5,528,557,649 processor cycles 13,484,672 bytes consed NIL
* (load "rcb2.lisp") T * (test) ----- encode ----- Evaluation took: 2.409 seconds of real time 2.410752 seconds of total run time (2.380574 user, 0.030178 system) [ Run times consist of 0.053 seconds GC time, and 2.358 seconds non-GC time. ] 100.08% CPU 5,786,410,519 processor cycles 187,424,016 bytes consed ----- decode ----- Evaluation took: 2.790 seconds of real time 2.783597 seconds of total run time (2.703823 user, 0.079774 system) [ Run times consist of 0.125 seconds GC time, and 2.659 seconds non-GC time. ] 99.78% CPU 6,681,089,123 processor cycles 187,469,536 bytes consed NIL
表 : バイナリレンジコーダの結果 (limit #x200, :inc 4) ファイル名 サイズ order-0 order-1 order-2 ------------------------------------------------------ alice29.txt 152,089 86,830 65,572 52,827 asyoulik.txt 125,179 75,167 54,475 45,191 cp.html 24,603 16,161 11,618 9,431 fields.c 11,150 6,866 4,699 3,885 grammar.lsp 3,721 2,185 1,657 1,540 kennedy.xls 1,029,744 404,990 293,293 167,118 lcet10.txt 426,754 245,166 185,426 147,871 plrabn12.txt 481,861 274,882 204,410 171,053 ptt5 513,216 68,112 53,373 56,926 sum 38,240 21,188 17,635 16,570 xargs.1 4,227 2,623 2,131 2,083 ------------------------------------------------------ 合計 2,810,784 1,204,170 894,289 674,495 表 : 実行時間 (秒) : order-0 : order-1 : order-2 -------+---------+---------+--------- 符号化 : 2.03 : 2.11 : 2.41 復号 : 2.21 : 2.30 : 2.79 実行環境 : SBCL 2.1.11, Ubunts 22.04 (WSL2), Intel Core i5-6200U 2.30GHz
;;; ;;; lzrc1.py : LZSS + Binary Range Coder (order-1) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; ;;; Released under the MIT license ;;; https://opensource.org/license/mit/ ;;; (require :brc) (use-package '(:brc :rangecoder)) ;;; 定数 (defconstant min-len 6) (defconstant max-len 256) (defconstant pos-bits 16) ; 13 (8k), 15 (32k), 16 (64k) (defconstant slide-size (ash 1 pos-bits)) (defconstant slide-size2 (* slide-size 2)) (defconstant slide-limit (+ slide-size2 max-len)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; スライド窓 (defstruct slide-window stream buff ht data-size) (defun initialize-slide-window (s) (let* ((buff (make-array slide-limit :element-type 'unsigned-byte)) (data-size (read-sequence buff s))) (make-slide-window :stream s :buff buff :ht (make-hash-table) :data-size data-size))) ;;; ハッシュ値を求める (defun hash-value (sw rp) (do ((buff (slide-window-buff sw)) (value 0) (i 0 (1+ i))) ((>= i min-len) value) (setf value (+ (* value 256) (aref buff (+ rp i)))))) ;;; データ入力 (defun read-data (sw start size) (read-sequence (slide-window-buff sw) (slide-window-stream sw) :start start :end (+ start size))) ;;; データ移動 (defun move-data (sw to from size) (let ((buff (slide-window-buff sw))) (dotimes (n size) (setf (aref buff (+ to n)) (aref buff (+ from n)))))) ;;; スライド窓の更新 (defun update-slide (sw rp) (cond ((< (slide-window-data-size sw) slide-limit) ;; EOF rp) (t ;; buffer update (move-data sw 0 slide-size (+ slide-size max-len)) (setf (slide-window-data-size sw) (read-data sw (+ slide-size max-len) slide-size)) ;; hash update (maphash (lambda (k v) (if (< (car v) slide-size) (remhash k (slide-window-ht sw)) (do ((xs v (cdr xs))) ((null xs)) (decf (car xs) slide-size) (when (and (consp (cdr xs)) (< (cadr xs) slide-size)) (setf (cdr xs) nil))))) (slide-window-ht sw)) (- rp slide-size)))) ;;; データの挿入 (defun insert-data (sw rp) (let ((value (hash-value sw rp)) (ht (slide-window-ht sw))) (if (gethash value ht) (push rp (gethash value ht)) (setf (gethash value ht) (list rp))))) ;;; 最長一致列の探索 (defun longest-match (buff s1 s2 &key (start 0) (limit max-len)) (do ((i start (1+ i))) ((>= i limit) i) (unless (= (aref buff (+ s1 i)) (aref buff (+ s2 i))) (return i)))) (defun search-data (sw rp) (let ((buff (slide-window-buff sw)) (value (hash-value sw rp)) (low-limit (- rp slide-size)) (match-len 0) (match-pos 0)) (do ((xs (gethash value (slide-window-ht sw)) (cdr xs))) ((or (null xs) (< (car xs) low-limit))) (when (and (< (+ rp (max min-len match-len)) (slide-window-data-size sw)) (= (aref buff (+ rp match-len)) (aref buff (+ (car xs) match-len))) (let ((n (longest-match buff rp (car xs) :start 3))) (when (< match-len n) (setf match-len n match-pos (car xs)) (when (= n max-len) (return))))))) ;; データの終端をチェック (when (>= match-len (- (slide-window-data-size sw) rp)) (setf match-len (- (slide-window-data-size sw) rp))) (values match-len match-pos))) ;;; 出現頻度表の取得 (defun get-freq-flag (freq f0) (unless (aref freq f0) (setf (aref freq f0) (initialize-alpha-model 2))) (aref freq f0)) (defun get-freq-code (freq c0) (unless (aref freq c0) (setf (aref freq c0) (initialize-binary-model 256))) (aref freq c0)) ;;; LZRC 符号 : 符号化 (defun encode (rc fin) (do ((freq-flag (make-array 2 :initial-element nil)) (freq-code (make-array 256 :initial-element nil)) (freq-len (initialize-gamma-model (1+ (- max-len min-len)))) (freq-pos (initialize-gamma-model slide-size)) (sw (initialize-slide-window fin)) (rp 0) (c0 0) (f0 0)) ((>= rp (slide-window-data-size sw))) (multiple-value-bind (match-len match-pos) (search-data sw rp) (cond ((< match-len min-len) (setf match-len 1) (alpha-encode rc (get-freq-flag freq-flag f0) 0 :limit #x200 :inc 4) (setf f0 0) (bm-encode rc (get-freq-code freq-code c0) (aref (slide-window-buff sw) rp) :limit #x200 :inc 4)) (t (alpha-encode rc (get-freq-flag freq-flag f0) 1 :limit #x200 :inc 4) (setf f0 1) (gamma-encode rc freq-len (- match-len min-len) :limit #x200 :inc 4) (gamma-encode rc freq-pos (- rp match-pos 1) :limit #x200 :inc 4))) ;; (do ((n match-len (1- n))) ((zerop n)) (setf c0 (aref (slide-window-buff sw) rp)) (insert-data sw rp) (incf rp) (when (>= rp slide-size2) (setf rp (update-slide sw rp))))))) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (fout) (call-with-byte-input-file in-file (lambda (fin) (let ((size (file-length fin))) (write-file-size fout size) (when (plusp size) (call-with-range-encoder fout (lambda (rc) (encode rc fin)))))))))) ;;; LZSS 符号 : 復号 (defun decode (rc fout size) (do ((freq-flag (make-array 2 :initial-element nil)) (freq-code (make-array 256 :initial-element nil)) (freq-len (initialize-gamma-model (1+ (- max-len min-len)))) (freq-pos (initialize-gamma-model slide-size)) (buff (make-array (ash 1 pos-bits) :element-type 'unsigned-byte)) (rp 0) (c0 0) (f0 0)) ((zerop size)) (cond ((= (alpha-decode rc (get-freq-flag freq-flag f0) :limit #x200 :inc 4) 1) (setf f0 1) (let ((num (+ (gamma-decode rc freq-len :limit #x200 :inc 4) min-len)) (pos (- rp (1+ (gamma-decode rc freq-pos :limit #x200 :inc 4))))) (when (minusp pos) (incf pos (length buff))) (do ((num num (1- num))) ((zerop num)) (let ((c (aref buff pos))) (write-byte c fout) (setf (aref buff rp) c c0 c) (incf pos) (incf rp) (when (>= pos (length buff)) (setf pos 0)) (when (>= rp (length buff)) (setf rp 0)))) (decf size num))) (t (setf f0 0) (let ((c (bm-decode rc (get-freq-code freq-code c0) :limit #x200 :inc 4))) (write-byte c fout) (setf (aref buff rp) c c0 c) (incf rp) (when (>= rp (length buff)) (setf rp 0))) (decf size))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (fin) (let ((size (read-file-size fin))) (call-with-byte-output-file out-file (lambda (fout) (when (plusp size) (call-with-range-decoder fin (lambda (rc) (decode rc fout size)))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
;;; ;;; lzrc2.py : LZSS + Binary Range Coder (order-2) ;;; ;;; Copyright (C) 2023 Makoto Hiroi ;;; ;;; Released under the MIT license ;;; https://opensource.org/license/mit/ ;;; (require :brc) (use-package '(:brc :rangecoder)) ;;; 定数 (defconstant min-len 9) (defconstant max-len 256) (defconstant pos-bits 16) ; 13 (8k), 15 (32k), 16 (64k) (defconstant slide-size (ash 1 pos-bits)) (defconstant slide-size2 (* slide-size 2)) (defconstant slide-limit (+ slide-size2 max-len)) ;;; ファイルサイズの書き込み (defun write-file-size (out size) (write-byte (logand (ash size -24) #xff) out) (write-byte (logand (ash size -16) #xff) out) (write-byte (logand (ash size -8) #xff) out) (write-byte (logand size #xff) out)) ;;; ファイルサイズの読み込み (defun read-file-size (in) (+ (ash (read-byte in) 24) (ash (read-byte in) 16) (ash (read-byte in) 8) (read-byte in))) ;;; スライド窓 (defstruct slide-window stream buff ht data-size) (defun initialize-slide-window (s) (let* ((buff (make-array slide-limit :element-type 'unsigned-byte)) (data-size (read-sequence buff s))) (make-slide-window :stream s :buff buff :ht (make-hash-table) :data-size data-size))) ;;; ハッシュ値を求める (defun hash-value (sw rp) (do ((buff (slide-window-buff sw)) (value 0) (i 0 (1+ i))) ((>= i min-len) value) (setf value (+ (* value 256) (aref buff (+ rp i)))))) ;;; データ入力 (defun read-data (sw start size) (read-sequence (slide-window-buff sw) (slide-window-stream sw) :start start :end (+ start size))) ;;; データ移動 (defun move-data (sw to from size) (let ((buff (slide-window-buff sw))) (dotimes (n size) (setf (aref buff (+ to n)) (aref buff (+ from n)))))) ;;; スライド窓の更新 (defun update-slide (sw rp) (cond ((< (slide-window-data-size sw) slide-limit) ;; EOF rp) (t ;; buffer update (move-data sw 0 slide-size (+ slide-size max-len)) (setf (slide-window-data-size sw) (read-data sw (+ slide-size max-len) slide-size)) ;; hash update (maphash (lambda (k v) (if (< (car v) slide-size) (remhash k (slide-window-ht sw)) (do ((xs v (cdr xs))) ((null xs)) (decf (car xs) slide-size) (when (and (consp (cdr xs)) (< (cadr xs) slide-size)) (setf (cdr xs) nil))))) (slide-window-ht sw)) (- rp slide-size)))) ;;; データの挿入 (defun insert-data (sw rp) (let ((value (hash-value sw rp)) (ht (slide-window-ht sw))) (if (gethash value ht) (push rp (gethash value ht)) (setf (gethash value ht) (list rp))))) ;;; 最長一致列の探索 (defun longest-match (buff s1 s2 &key (start 0) (limit max-len)) (do ((i start (1+ i))) ((>= i limit) i) (unless (= (aref buff (+ s1 i)) (aref buff (+ s2 i))) (return i)))) (defun search-data (sw rp) (let ((buff (slide-window-buff sw)) (value (hash-value sw rp)) (low-limit (- rp slide-size)) (match-len 0) (match-pos 0)) (do ((xs (gethash value (slide-window-ht sw)) (cdr xs))) ((or (null xs) (< (car xs) low-limit))) (when (and (< (+ rp (max min-len match-len)) (slide-window-data-size sw)) (= (aref buff (+ rp match-len)) (aref buff (+ (car xs) match-len))) (let ((n (longest-match buff rp (car xs) :start 3))) (when (< match-len n) (setf match-len n match-pos (car xs)) (when (= n max-len) (return))))))) ;; データの終端をチェック (when (>= match-len (- (slide-window-data-size sw) rp)) (setf match-len (- (slide-window-data-size sw) rp))) (values match-len match-pos))) ;;; 出現頻度表の取得 (defun get-freq-flag (freq f0 f1) (unless (aref freq f0 f1) (setf (aref freq f0 f1) (initialize-alpha-model 2))) (aref freq f0 f1)) (defun get-freq-code (freq c0 c1) (unless (aref freq c0 c1) (setf (aref freq c0 c1) (initialize-binary-model 256))) (aref freq c0 c1)) ;;; LZRC 符号 : 符号化 (defun encode (rc fin) (do ((freq-flag (make-array '(2 2) :initial-element nil)) (freq-code (make-array '(256 256) :initial-element nil)) (freq-len (initialize-gamma-model (1+ (- max-len min-len)))) (freq-pos (initialize-gamma-model slide-size)) (sw (initialize-slide-window fin)) (rp 0) (c0 0) (c1 0) (f0 0) (f1 0)) ((>= rp (slide-window-data-size sw))) (multiple-value-bind (match-len match-pos) (search-data sw rp) (cond ((< match-len min-len) (setf match-len 1) (alpha-encode rc (get-freq-flag freq-flag f0 f1) 0 :limit #x200 :inc 4) (setf f0 f1 f1 0) (bm-encode rc (get-freq-code freq-code c0 c1) (aref (slide-window-buff sw) rp) :limit #x200 :inc 4)) (t (alpha-encode rc (get-freq-flag freq-flag f0 f1) 1 :limit #x200 :inc 4) (setf f0 f1 f1 1) (gamma-encode rc freq-len (- match-len min-len) :limit #x200 :inc 4) (gamma-encode rc freq-pos (- rp match-pos 1) :limit #x200 :inc 4))) ;; (do ((n match-len (1- n))) ((zerop n)) (setf c0 c1 c1 (aref (slide-window-buff sw) rp)) (insert-data sw rp) (incf rp) (when (>= rp slide-size2) (setf rp (update-slide sw rp))))))) ;;; ファイルの符号化 (defun encode-file (in-file out-file) (call-with-byte-output-file out-file (lambda (fout) (call-with-byte-input-file in-file (lambda (fin) (let ((size (file-length fin))) (write-file-size fout size) (when (plusp size) (call-with-range-encoder fout (lambda (rc) (encode rc fin)))))))))) ;;; LZSS 符号 : 復号 (defun decode (rc fout size) (do ((freq-flag (make-array '(2 2) :initial-element nil)) (freq-code (make-array '(256 256) :initial-element nil)) (freq-len (initialize-gamma-model (1+ (- max-len min-len)))) (freq-pos (initialize-gamma-model slide-size)) (buff (make-array (ash 1 pos-bits) :element-type 'unsigned-byte)) (rp 0) (c0 0) (c1 0) (f0 0) (f1 0)) ((zerop size)) (cond ((= (alpha-decode rc (get-freq-flag freq-flag f0 f1) :limit #x200 :inc 4) 1) (setf f0 f1 f1 1) (let ((num (+ (gamma-decode rc freq-len :limit #x200 :inc 4) min-len)) (pos (- rp (1+ (gamma-decode rc freq-pos :limit #x200 :inc 4))))) (when (minusp pos) (incf pos (length buff))) (do ((num num (1- num))) ((zerop num)) (let ((c (aref buff pos))) (write-byte c fout) (setf (aref buff rp) c c0 c1 c1 c) (incf pos) (incf rp) (when (>= pos (length buff)) (setf pos 0)) (when (>= rp (length buff)) (setf rp 0)))) (decf size num))) (t (setf f0 f1 f1 0) (let ((c (bm-decode rc (get-freq-code freq-code c0 c1) :limit #x200 :inc 4))) (write-byte c fout) (setf (aref buff rp) c c0 c1 c1 c) (incf rp) (when (>= rp (length buff)) (setf rp 0))) (decf size))))) ;;; ファイルの復号 (defun decode-file (in-file out-file) (call-with-byte-input-file in-file (lambda (fin) (let ((size (read-file-size fin))) (call-with-byte-output-file out-file (lambda (fout) (when (plusp size) (call-with-range-decoder fin (lambda (rc) (decode rc fout size)))))))))) ;;; 簡単なテスト (defun test () (let ((files '("alice29.txt" "asyoulik.txt" "cp.html" "fields.c" "grammar.lsp" "kennedy.xls" "lcet10.txt" "plrabn12.txt" "ptt5" "sum" "xargs.1"))) (format t "----- encode -----~%") (time (dolist (file files) (encode-file (format nil "./canterbury/~a" file) (format nil "~a.en" file)))) ;; (format t "----- decode -----~%") (time (dolist (file files) (decode-file (format nil "~a.en" file) (format nil "~a.de" file))))))
* (load "lzrc1.lisp") T * (test) ----- encode ----- Evaluation took: 5.840 seconds of real time 5.839015 seconds of total run time (5.790404 user, 0.048611 system) [ Run times consist of 0.035 seconds GC time, and 5.805 seconds non-GC time. ] 99.98% CPU 14,014,431,458 processor cycles 106,589,136 bytes consed ----- decode ----- Evaluation took: 1.080 seconds of real time 1.083720 seconds of total run time (1.073604 user, 0.010116 system) 100.37% CPU 2,601,053,221 processor cycles 19,074,848 bytes consed NIL
(load "lzrc2.lisp") T (test) ----- encode ----- Evaluation took: 7.960 seconds of real time 7.965677 seconds of total run time (7.675003 user, 0.290674 system) [ Run times consist of 0.516 seconds GC time, and 7.450 seconds non-GC time. ] 100.08% CPU 19,119,176,450 processor cycles 629,636,976 bytes consed ----- decode ----- Evaluation took: 1.479 seconds of real time 1.473866 seconds of total run time (1.444078 user, 0.029788 system) [ Run times consist of 0.059 seconds GC time, and 1.415 seconds non-GC time. ] 99.66% CPU 3,537,532,647 processor cycles 180,439,024 bytes consed NIL
表 : LZRC 符号の結果 (スライド窓 : 64 k, 最長一致 : 256, :limit #x200, :inc 4) (最短一致長 : order-0 = 4, order-1 = 6, order-2 = 9) ファイル名 サイズ order-0 order-1 order-2 ----------------------------------------------------- alice29.txt 152,089 52,811 50,614 49,096 asyoulik.txt 125,179 48,017 45,426 43,821 cp.html 24,603 7,957 7,718 8,551 fields.c 11,150 3,116 3,116 3,347 grammar.lsp 3,721 1,220 1,246 1,401 kennedy.xls 1,029,744 88,097 70,405 69,846 lcet10.txt 426,754 139,935 132,644 127,691 plrabn12.txt 481,861 190,350 179,751 167,477 ptt5 513,216 51,536 48,370 52,435 sum 38,240 12,591 12,447 13,754 xargs.1 4,227 1,732 1,739 1,977 ----------------------------------------------------- 合計 2,810,784 597,362 553,476 539,396 表 : 実行時間 (秒) : order-0 : order-1 : order-2 -------+---------+---------+--------- 符号化 : 6.46 : 5.84 : 7.96 復号 : 1.06 : 1.08 : 1.48 実行環境 : SBCL 2.1.11, Ubunts 22.04 (WSL2), Intel Core i5-6200U 2.30GHz