388 lines
17 KiB
HTML
388 lines
17 KiB
HTML
<!DOCTYPE html>
|
|
<html lang="en"><head>
|
|
<meta charset="UTF-8">
|
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
|
<link rel="shortcut icon" href="/img/icon.png" type="image/png">
|
|
<meta name="generator" content="Hugo 0.78.2" />
|
|
<meta property="og:title" content="King James Bible: An Adventure in Compression" />
|
|
<meta property="og:description" content="Figuring out how much space the Bible takes on a calculator or a Game Boy is fun" />
|
|
<meta property="og:type" content="article" />
|
|
<meta property="og:url" content="http://toasters.rocks/king-james-bible/" />
|
|
<meta property="og:image" content="http://toasters.rocks/images/2020/01/screenshot20200110191340.png" />
|
|
<meta property="article:published_time" content="2020-01-11T00:38:16+00:00" />
|
|
<meta property="article:modified_time" content="2020-01-11T00:59:58+00:00" />
|
|
|
|
<meta name="twitter:card" content="summary_large_image"/>
|
|
<meta name="twitter:image" content="http://toasters.rocks/images/2020/01/screenshot20200110191340.png"/>
|
|
|
|
<meta name="twitter:title" content="King James Bible: An Adventure in Compression"/>
|
|
<meta name="twitter:description" content="Figuring out how much space the Bible takes on a calculator or a Game Boy is fun"/>
|
|
|
|
<title>King James Bible: An Adventure in Compression - toasters rocks</title>
|
|
<link rel="stylesheet" href="/css/styles.css" />
|
|
<link rel="stylesheet" href="/css/syntax.css" />
|
|
<script src="https://kit.fontawesome.com/8ced65a629.js" crossorigin="anonymous"></script>
|
|
</head><body>
|
|
<header><img src="/img/icon.png"><h1>toasters rocks</h1></header>
|
|
<main>
|
|
<aside><nav>
|
|
|
|
<a href="/">
|
|
|
|
|
|
<i class="fas fa-home"></i>
|
|
|
|
|
|
Home
|
|
</a><br/>
|
|
|
|
<a href="http://juju2143.ca/">
|
|
|
|
|
|
<i class="fas fa-user"></i>
|
|
|
|
|
|
About
|
|
</a><br/>
|
|
|
|
<a href="/fr/">
|
|
|
|
|
|
<i class="fas fa-globe"></i>
|
|
|
|
|
|
Français
|
|
</a><br/>
|
|
|
|
<a href="https://yukiis.moe/">
|
|
|
|
|
|
<i class="far fa-comment"></i>
|
|
|
|
|
|
Comics
|
|
</a><br/>
|
|
|
|
<a href="https://codewalr.us/">
|
|
|
|
|
|
<i class="far fa-folder-open"></i>
|
|
|
|
|
|
Forums
|
|
</a><br/>
|
|
|
|
</nav>
|
|
<br/>
|
|
<nav>
|
|
|
|
|
|
<a title="Twitter " href="https://twitter.com/juju2143">
|
|
|
|
|
|
<i style="color: #4da7de" class="fab fa-twitter"></i>
|
|
|
|
<span style="color: #4da7de">Twitter</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="Discord " href="https://discord.gg/cuZcfcF">
|
|
|
|
|
|
<i style="color: #7289da" class="fab fa-discord"></i>
|
|
|
|
<span style="color: #7289da">Discord</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="GitHub " href="https://github.com/juju2143">
|
|
|
|
|
|
<i style="color: #221e1b" class="fab fa-github"></i>
|
|
|
|
<span style="color: #221e1b">GitHub</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="Patreon " href="https://patreon.com/juju2143">
|
|
|
|
|
|
<i style="color: #F96854" class="fab fa-patreon"></i>
|
|
|
|
<span style="color: #F96854">Patreon</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="YouTube " href="https://youtube.com/user/julosoft">
|
|
|
|
|
|
<i style="color: #e02a20" class="fab fa-youtube"></i>
|
|
|
|
<span style="color: #e02a20">YouTube</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="YouTube 2 " href="https://youtube.com/c/juju2143">
|
|
|
|
|
|
<i style="color: #e02a20" class="fab fa-youtube"></i>
|
|
|
|
<span style="color: #e02a20">YouTube 2</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="Twitch " href="https://twitch.tv/juju2143">
|
|
|
|
|
|
<i style="color: #6441a5" class="fab fa-twitch"></i>
|
|
|
|
<span style="color: #6441a5">Twitch</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="Instagram " href="https://instagram.com/j.p.savard">
|
|
|
|
|
|
<i style="color: #d6249f" class="fab fa-instagram"></i>
|
|
|
|
<span style="color: #d6249f">Instagram</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="DeviantArt " href="https://deviantart.com/juju2143">
|
|
|
|
|
|
<i style="color: #c5d200" class="fab fa-deviantart"></i>
|
|
|
|
<span style="color: #c5d200">DeviantArt</span>
|
|
</a><br/>
|
|
|
|
|
|
<a title="SoundCloud " href="https://soundcloud.com/juju2143">
|
|
|
|
|
|
<i style="color: #fe3801" class="fab fa-soundcloud"></i>
|
|
|
|
<span style="color: #fe3801">SoundCloud</span>
|
|
</a><br/>
|
|
|
|
</nav></aside>
|
|
|
|
|
|
|
|
|
|
<article style="background-image: url('/images/2020/01/screenshot20200110191340.png');">
|
|
<div class="metadata" style="height: calc((var(--height) - 2em) * 0.7478152309612984 - 3.5em)">
|
|
|
|
|
|
<h2 name="top">King James Bible: An Adventure in Compression</h2>
|
|
<p>Figuring out how much space the Bible takes on a calculator or a Game Boy is fun</p>
|
|
|
|
|
|
|
|
|
|
<i class="far fa-calendar-alt"></i>
|
|
<time datetime="2020-01-11">January 11, 2020</time><br/>
|
|
|
|
|
|
<i class="fas fa-tags"></i>
|
|
|
|
|
|
#<a class="btn btn-sm btn-outline-dark tag-btn" href="http://toasters.rocks/tags/tech">Tech</a>
|
|
<br/>
|
|
|
|
|
|
<i class="fas fa-hourglass"></i> ~7 minutes
|
|
|
|
</div>
|
|
<p>Well, time for another adventure, and with every adventure it begins with a very silly thought that isn’t even mine this time:</p>
|
|
<p><img src="/images/2020/01/screenshot20200110194154.png" alt="Discord screenshot of DJ Omnimaga who says "I wonder if one can fit the entire bible on a TI-Nspire CX with mViewer GX PDF converter"">
|
|
“I wonder if one can fit the entire bible on a TI-Nspire CX with mViewer GX PDF converter”, says our friend DJ</p>
|
|
<p>And there you go, am I searching for the answer:</p>
|
|
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">me: trying to find out how big the Bible is in terms of computer storage because someone asked on Discord<br><br>me, literally 30 seconds later: <a href="https://t.co/qQiEqTKnCk">https://t.co/qQiEqTKnCk</a></p>— 輝き雪 Yuki, CEO of snow (@juju2143) <a href="https://twitter.com/juju2143/status/1215378475277787137?ref_src=twsrc%5Etfw">January 9, 2020</a></blockquote>
|
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
|
|
|
|
That’s the Wikipedia effect right there, you look for something and before you know you know everything there is to know about religion and now you’re on some completely unrelated page about quantum theory.</p>
|
|
<p>So I downloaded the whole King James Version on <a href="http://www.gutenberg.org/">Project Gutenberg</a>, removed the header and footer they put there for better text processing, it’s about 4.4 MB, converted to PDF, since the format support plain text directly it’s not that much more (I got a 3 MB file), then converted to work on a TI-Nspire with the <a href="https://tiplanet.org/forum/editgx.php">mViewer GX PDF converter</a> I… I think I broke TI-Planet. Well, from what it was able to generate (76 pages out of 1664, pretty much the book of Genesis?) each 10 pages is about 1.3 MB, so by extension the whole thing should be around 216 MB. We’re dealing with images now, and not just plain text, so yeah. Could be lower if you set the resolution to something almost unreadable, but at this point you’re better using a plain text reader on your calc.</p>
|
|
<p>So in conclusion, maybe. Maybe you can manage to do it. But it’s gonna take most of your calc space, which is, with nothing installed, is about 100 MB.</p>
|
|
<p>But wait a minute, we have another contender…</p>
|
|
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Didn't they manage to cram the whole Bible on a GameBoy cartridge?</p>— Minty Root (@Minty_Root) <a href="https://twitter.com/Minty_Root/status/1215378787652833282?ref_src=twsrc%5Etfw">January 9, 2020</a></blockquote>
|
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
|
|
|
|
what are you talkin' about Minty</p>
|
|
<p>Oh God, we’re gonna have some fun with that. Sure enough, there was an unlicensed King James Bible for the Game Boy published by Wisdom Tree in 1994, if you want to see it in action there was an <a href="https://www.youtube.com/watch?v=Kz0TOQ1BF-M">Angry Video Game Nerd episode about it</a>, but what’s amazing about it is that is that the ROM is only one megabyte, including the entire text of the Bible, a search engine and two word search games.</p>
|
|
<p>(Note, if you’re emulating it, use <a href="http://bgb.bircd.org/">BGB</a>. Any other emulator will introduce bugs due to its weird mapping no one will understand except BGB. Of course, I will not provide the ROM for the usual copyright reasons.)</p>
|
|
<p><img src="/images/2020/01/screenshot20200109163510.png" alt="Screenshot of the hangman game running in an emulator that is not BGB featuring characters you can’t normally input">
|
|
Here’s what I mean. The reader will crash and the games will make you guess garbage you can’t input.</p>
|
|
<p>So for fun, with the KJB text I have in hand, I tested some of the most common compression utilities, all set to their maximum/best/slowest settings:</p>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th style="text-align:left">Compression</th>
|
|
<th style="text-align:right">Size</th>
|
|
<th style="text-align:right">Ratio</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="text-align:left">zpaq -m5</td>
|
|
<td style="text-align:right">739407</td>
|
|
<td style="text-align:right">16.682%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">bzip2 -9</td>
|
|
<td style="text-align:right">993406</td>
|
|
<td style="text-align:right">22.412%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">lzma -9</td>
|
|
<td style="text-align:right">1048408</td>
|
|
<td style="text-align:right">23.653%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">xz -9</td>
|
|
<td style="text-align:right">1048616</td>
|
|
<td style="text-align:right">23.658%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">7z -mx9</td>
|
|
<td style="text-align:right">1048710</td>
|
|
<td style="text-align:right">23.660%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">zstd –ultra -22</td>
|
|
<td style="text-align:right">1068137</td>
|
|
<td style="text-align:right">24.099%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">rar -m5</td>
|
|
<td style="text-align:right">1142360</td>
|
|
<td style="text-align:right">25.773%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">gzip -9</td>
|
|
<td style="text-align:right">1385457</td>
|
|
<td style="text-align:right">31.258%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">zip -9</td>
|
|
<td style="text-align:right">1385595</td>
|
|
<td style="text-align:right">31.261%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">lz4 -9</td>
|
|
<td style="text-align:right">1596418</td>
|
|
<td style="text-align:right">36.017%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">lzop -9</td>
|
|
<td style="text-align:right">1611939</td>
|
|
<td style="text-align:right">36.367%</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">Uncompressed</td>
|
|
<td style="text-align:right">4432375</td>
|
|
<td style="text-align:right">100%</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>Note that some of these are different containers for the same algorithm, hence similar filesizes, and some of them are better suited for other uses, e.g. lz4 and lzop are better to decompress the Linux kernel at boot time because they’re fast and use less memory, and zstd is starting to replace xz because it’s 1300% faster despite producing slightly bigger files.</p>
|
|
<p>So, with our goal of a ROM size of 1048576 bytes with enough space left to fit some code for the decompressor that is fast enough to be playable on a Game Boy, a good-looking UI, a search engine and some games, only zpaq and bzip2 would fit the bill, and even then. (Special mention to lzma which fits a megabyte almost exactly.) Most of those algorithms were devised after 1994, bzip2 in particular was devised between 1996 and 2000, but even though it has the best compression ratio it’s way slower than gzip.</p>
|
|
<p>Anyway, I’m not an expert, but yeah, there’s more efficient compressors out there, but we don’t usually use them because they’re either experimental and/or very, very slow, the PAQ ones in particular. So I’d imagine a slow compressor with a fast decompressor that is tuned for English text.</p>
|
|
<p>So, now that we have our compression benchmark on file size, it’s appropriate to make a decompression benchmark based on time, because that’s what we need, right? So here’s some tests under a normal load on my good ol' iMac 27" mid-2011 running Linux (don’t laugh, it’s old af but it’s still my daily driver and it still works for me) using the above files decompressed to <code>/dev/null</code> and ran several times until it gives somewhat consistent approximate results. I didn’t bothered to time the software during the compression phase because it’s irrelevant to our use case (and I haven’t thought of that when I tested), but all of them were quite fast except zpaq.</p>
|
|
<table>
|
|
<thead>
|
|
<tr>
|
|
<th style="text-align:left">Decompression</th>
|
|
<th style="text-align:right">Time (s)</th>
|
|
</tr>
|
|
</thead>
|
|
<tbody>
|
|
<tr>
|
|
<td style="text-align:left">lz4</td>
|
|
<td style="text-align:right">0.008</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">zstd</td>
|
|
<td style="text-align:right">0.015</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">lzo</td>
|
|
<td style="text-align:right">0.016</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">rar</td>
|
|
<td style="text-align:right">0.035</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">gzip/zip</td>
|
|
<td style="text-align:right">0.040</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">lzma/xz/7z</td>
|
|
<td style="text-align:right">0.080</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">bzip</td>
|
|
<td style="text-align:right">20.210</td>
|
|
</tr>
|
|
<tr>
|
|
<td style="text-align:left">zpaq</td>
|
|
<td style="text-align:right">16.203</td>
|
|
</tr>
|
|
</tbody>
|
|
</table>
|
|
<p>So you have a list that is rather backwards from the other list, with the notable exception of zpaq. Of course, it’s going to be at least a thousand times slower on a Game Boy (could try to run these tests on a 486 or something to get better numbers), and it’s kinda hard to quantify compessed bytes versus decompression time, but it’s quite enough to draw conclusions about what kind of compression we’re dealing with. The more compressed it is, the slower it will be, which is rather in contradiction with our “efficient compressor, fast decompressor” theory. One solution would be decompressing in chunks only when needed, and the Game Boy screen is rather small, so it could work. The search engine have the ability to search words fast and the games included deals with words too, so maybe there is something to do with whole words as well, such as mapping words to IDs or a similar technique.</p>
|
|
<p>Now time to actually figure out what the decompression is in that bible ROM. Sadly, I’m not well-versed in Z80 debugging to figure it out, but I can already imagine it’s a very efficient algorithm even by today’s standards and if you figure it out it could probably compete with gzip and zstd or something.</p>
|
|
<p><blockquote class="twitter-tweet"><p lang="en" dir="ltr">Plot twist: you reverse engineer some unlicensed bible reader for the Game Boy from 1994 and you find a previously unknown compression algorithm that can compete with today's algorithms<br><br>Wanted: someone with enough Z80 debugging knowledge to figure it out and get the credit</p>— 輝き雪 Yuki, CEO of snow (@juju2143) <a href="https://twitter.com/juju2143/status/1215496156526075910?ref_src=twsrc%5Etfw">January 10, 2020</a></blockquote>
|
|
<script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
|
|
|
|
I have ideas of grandeur here, as usual</p>
|
|
<p>So there you go, open-ended thoughts about obscure ways to read the Bible. If you have any information about it or you feel like doing the gruesome work of debugging the ROM, feel free to comment below or share it with me on <a href="https://twitter.com/juju2143">Twitter</a>, and I will make a follow-up eventually, a part 2 if you will, likewise if you have any suggestions such as adding a compression algorithm I can review for the tables above.</p>
|
|
<p>It’s a pretty interesting project since, well, I’m not that religious and I’m definitely not the kind of idiot who quote the Bible out of context (please don’t do that) but I still like to research about it, and I still consider myself as a nice Christian who believe in science. I always said that it’s about what you personally believe and not what others believe, always read everything with a rational mind and uh, yeah I could rant a long time about that and it’s not that much the point here, maybe for another time, but yeah. If you followed up until here and you want to look out for more about this, well for that quite interesting ROM that is, I wish you good luck, and I’ll see you for another blog post :)</p>
|
|
|
|
</article>
|
|
|
|
<ul class="pagination">
|
|
|
|
<li class="page-item">
|
|
<a class="previous" href="http://toasters.rocks/miyuki-2019/">« Miyuki 2019</a>
|
|
</li>
|
|
|
|
|
|
<li class="page-item">
|
|
<a class="next" href="http://toasters.rocks/emoji-region-flags/">Emoji region flags »</a>
|
|
</li>
|
|
|
|
</ul>
|
|
|
|
|
|
<article>
|
|
<div id="disqus_thread"></div>
|
|
<script type="application/javascript">
|
|
var disqus_config = function () {
|
|
|
|
|
|
|
|
};
|
|
(function() {
|
|
if (["localhost", "127.0.0.1"].indexOf(window.location.hostname) != -1) {
|
|
document.getElementById('disqus_thread').innerHTML = 'Disqus comments not available by default when the website is previewed locally.';
|
|
return;
|
|
}
|
|
var d = document, s = d.createElement('script'); s.async = true;
|
|
s.src = '//' + "juju2143" + '.disqus.com/embed.js';
|
|
s.setAttribute('data-timestamp', +new Date());
|
|
(d.head || d.body).appendChild(s);
|
|
})();
|
|
</script>
|
|
<noscript>Please enable JavaScript to view the <a href="https://disqus.com/?ref_noscript">comments powered by Disqus.</a></noscript>
|
|
<a href="https://disqus.com" class="dsq-brlink">comments powered by <span class="logo-disqus">Disqus</span></a>
|
|
</article>
|
|
|
|
|
|
</main>
|
|
<footer>Copyright © 2020 J.P. Savard</footer>
|
|
</body>
|
|
</html>
|