Bichon – Email Exporter Script for backup & recovery

You can reuse Bichon’s archived emails with another app, but there are a few important things to know as Bichon does not have Email export implemented (yet)! Essentially, the .store files inside data/eml/ are Tantivy index files, not raw .eml files. That means other apps cannot read them directly. You have to extract the original emails first.

1.) Setup a python virtual environment ( Creating isolated userland Python virtualenvs in TrueNAS – Infrastructure Blog )

Bash
truenas[/mnt/ssd-pool/bichon]$ python3 -m venv venv_bichon --without-pip

truenas_admin@truenas[/mnt/ssd-pool/bichon]$ source venv_bichon/bin/activate
(venv_bichon) truenas_admin@truenas[/mnt/ssd-pool/bichon]$ curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2131k  100 2131k    0     0  3759k      0 --:--:-- --:--:-- --:--:-- 3758k

## NOTE: VENV is activated shown as (venv_bichon), type 'deactivate' to exit! ##

(venv_bichon) truenas_admin@truenas[/mnt/ssd-pool/bichon]$ python3 get-pip.py
Collecting pip
  Downloading pip-26.0-py3-none-any.whl.metadata (4.7 kB)
Collecting setuptools
  Using cached setuptools-80.10.2-py3-none-any.whl.metadata (6.6 kB)
Collecting wheel
  Using cached wheel-0.46.3-py3-none-any.whl.metadata (2.4 kB)
Collecting packaging>=24.0 (from wheel)
  Using cached packaging-26.0-py3-none-any.whl.metadata (3.3 kB)
Downloading pip-26.0-py3-none-any.whl (1.8 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.8/1.8 MB 12.1 MB/s  0:00:00
Using cached setuptools-80.10.2-py3-none-any.whl (1.1 MB)
Using cached wheel-0.46.3-py3-none-any.whl (30 kB)
Using cached packaging-26.0-py3-none-any.whl (74 kB)
Installing collected packages: setuptools, pip, packaging, wheel
Successfully installed packaging-26.0 pip-26.0 setuptools-80.10.2 wheel-0.46.3

(venv_bichon) truenas_admin@truenas[/mnt/ssd-pool/bichon/venv_bichon]$ pip install zstandard
Collecting zstandard
  Downloading zstandard-0.25.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (3.3 kB)
Downloading zstandard-0.25.0-cp311-cp311-manylinux2014_x86_64.manylinux_2_17_x86_64.whl (5.6 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.6/5.6 MB 12.5 MB/s  0:00:00
Installing collected packages: zstandard
Successfully installed zstandard-0.25.0
100

2.) What is required for the script?

The Python library for tantivy was built without Zstd support enabled. Since we can’t easily recompile the Python tantivy library on TrueNAS to enable that feature, we are going to use a “Brute Force” recovery method.

Because the .store file is basically just a giant pile of Zstd-compressed blocks, we don’t actually need the database index to read it. We can “carve” the emails out by scanning the 65f42b91.store file for Zstd magic headers, decompressing them, and checking if they look like emails.

Why this works:
Bypasses the Index: We aren't asking Tantivy for permission anymore; we're just reading the raw bytes off the disk.

Standard Compression: Even if the Python tantivy library wasn't compiled with Zstd, the zstandard library we installed via Pip certainly supports it.

Magic Numbers: Every compressed block in that .store file starts with the same 4 bytes. We are just slicing the giant file at every occurrence of those bytes and trying to unzip them.
bichon_mail_export.py
#!/usr/bin/env python3

"""
# SPDX-License-Identifier: MIT
#
# Copyright (c) 2026 MRi-LE
#
# This software is provided "as is", without warranty of any kind.
# Authored by Michael Richter, with assistance from AI tools.
"""

import os
import zstandard as zstd
import tarfile
import io
import re
from pathlib import Path

# === CONFIG ===
STORE_DIR = "/mnt/ssd-pool/bichon/eml"
OUTPUT_FILE = "/mnt/ssd-pool/bichon/exported_emails.tar.gz"
ZSTD_MAGIC = b'\x28\xb5\x2f\xfd'
MIN_SIZE = 2000 

def main():
    store_files = sorted(Path(STORE_DIR).glob("*.store"))
    dctx = zstd.ZstdDecompressor(max_window_size=2**31)
    total_count = 0

    print(f"Creating compressed archive: {OUTPUT_FILE}")
    
    # Open tarball for writing with gzip compression ('w:gz')
    with tarfile.open(OUTPUT_FILE, "w:gz") as tar:
        for store_file in store_files:
            print(f"Streaming from: {store_file.name}")
            with open(store_file, 'rb') as f:
                data = f.read()

            chunks = data.split(ZSTD_MAGIC)
            
            for chunk in chunks[1:]:
                try:
                    decompressed = dctx.decompress(ZSTD_MAGIC + chunk, max_output_size=100*1024*1024)
                    parts = re.split(b'\n(?=Return-Path:|Received:|From: )', decompressed)
                    
                    for part in parts:
                        clean_part = part.strip()
                        
                        # Apply our proven "Strict Filters"
                        if len(clean_part) > MIN_SIZE:
                            if clean_part.startswith((b"Return-Path:", b"Received:", b"From:")):
                                # Create a file-like object in memory
                                email_stream = io.BytesIO(clean_part)
                                
                                # Create header info for the file inside the tar
                                tar_info = tarfile.TarInfo(name=f"email_{total_count:05d}.eml")
                                tar_info.size = len(clean_part)
                                
                                # Add the memory stream to the tarball
                                tar.addfile(tar_info, fileobj=email_stream)
                                total_count += 1
                                
                                if total_count % 500 == 0:
                                    print(f"  Archived {total_count} emails...")
                except:
                    continue

    print(f"\nSuccess! Total of {total_count} emails streamed into {OUTPUT_FILE}")

if __name__ == "__main__":
    main()
100

NOTE: Adjust the OUTPUT_FILE as per your config

3.) Execute and monitor the output

Bash
(venv_bichon) truenas_admin@truenas[/mnt/ssd-pool/bichon]$ python3 bichon_mail_export.py
Creating compressed archive: /mnt/ssd-pool/bichon/recovered_emails.tar.gz
Streaming from: 2b5ebeda9eb142719759bf49a1ce6e50.store
  Archived 500 emails...
  Archived 1000 emails...
  Archived 1500 emails...
  Archived 2000 emails...
  Archived 2500 emails...
  Archived 3000 emails...
  Archived 3500 emails...
  Archived 4000 emails...
  Archived 4500 emails...
  Archived 5000 emails...
  Archived 5500 emails...
Streaming from: 3259b5417e894544919ebc9cc5c9ad1e.store
Streaming from: 35688f9a8f1f467691850201b395eaa4.store
Streaming from: c11b9c7701d24df5b1d5aa1357c61edb.store

Success! Total of 5651 emails streamed into /mnt/ssd-pool/bichon/recovered_emails.tar.gz

4.) You can verify the sanity of the exported eml files by looking into their header.

Therefore i created a inspect scrip as below which comes with header & body search capabilties

bichon_export_inspector.py
#!/usr/bin/env python3

"""
# SPDX-License-Identifier: MIT
# Copyright (c) 2026 MRi-LE
# This software is provided "as is", without warranty of any kind.
# Authored by Michael Richter, with assistance from AI tools.


Bichon Inspect Export Tool: Inspect or Search .eml files inside a .tar.gz archive.

positional arguments:
  archive               Path to the .tar.gz or .tar file to process

options:
  -h, --help            show this help message and exit

Inspection Options:
  -l LIMIT, --limit LIMIT
                        Limit the number of files displayed during inspection (default: 10)

Search Options:
  -s KEYWORD, --search KEYWORD
                        Search for a specific word/phrase inside the archive
  -b, --body            Extend search to the email body (default: headers only)
"""

import tarfile
import re
import argparse
import os
from email.header import decode_header

def decode_mime(text):
    try:
        decoded_parts = decode_header(text)
        result = ""
        for decoded_string, charset in decoded_parts:
            if isinstance(decoded_string, bytes):
                result += decoded_string.decode(charset or 'utf-8', errors='ignore')
            else:
                result += decoded_string
        return result
    except:
        return text

def process_archive(args):
    if not os.path.exists(args.archive):
        print(f"Error: File '{args.archive}' not found.")
        return

    # Prepare search keyword (Case Insensitive)
    search_keyword = args.search.lower() if args.search else None
    
    print(f"Opening: {os.path.basename(args.archive)}")
    if search_keyword:
        mode = "BODY" if args.body else "HEADERS"
        print(f"Action: Searching for '{args.search}' (Case Insensitive) in {mode}")
        print(f"Limit:  Stopping after {args.limit} matches.")
    else:
        print(f"Action: Inspecting top {args.limit} files")
    
    print(f"{'Filename':<18} | {'Size (KB)':<10} | {'Subject (Decoded)'}")
    print("-" * 80)

    try:
        with tarfile.open(args.archive, "r:*") as tar:
            found_count = 0
            for member in tar:
                if member.isfile() and member.name.endswith(".eml"):
                    f = tar.extractfile(member)
                    if f:
                        # Determine how much to read
                        read_size = 1024 * 64 if (search_keyword and args.body) else 8192
                        raw_content = f.read(read_size)
                        
                        # Convert content to lower case for case-insensitive matching
                        content_text = raw_content.decode('utf-8', errors='ignore')
                        content_lower = content_text.lower()
                        
                        # Filter logic
                        should_display = False
                        if search_keyword:
                            if search_keyword in content_lower:
                                should_display = True
                        else:
                            should_display = True

                        if should_display:
                            # Extract Subject only if we are actually displaying the file
                            subject = "No Subject Found"
                            s_match = re.search(r'(?i)^Subject:\s*(.*)', content_text, re.MULTILINE)
                            if s_match:
                                subject = decode_mime(s_match.group(1).strip())
                            
                            size_kb = member.size / 1024
                            print(f"{member.name:<18} | {size_kb:<10.2f} | {subject}")
                            found_count += 1
                        
                        # Apply the limit to BOTH inspection and search
                        if found_count >= args.limit:
                            break

            print("-" * 80)
            status = f"Found {found_count} matches." if search_keyword else f"Showed {found_count} files."
            print(f"Done. {status}")
            
    except Exception as e:
        print(f"Error: {e}")

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description="EML Archive Tool: Inspect or Search .eml files inside a .tar.gz archive.",
        formatter_class=argparse.RawTextHelpFormatter
    )
    
    parser.add_argument("archive", help="Path to the .tar.gz or .tar file to process")

    # Global Options
    parser.add_argument("-l", "--limit", type=int, default=10, 
                        help="Limit displayed files or search results (default: 10)")

    # Search Options
    src_group = parser.add_argument_group('Search Options')
    src_group.add_argument("-s", "--search", metavar="KEYWORD", 
                           help="Search for a word/phrase (Case Insensitive)")
    src_group.add_argument("-b", "--body", action="store_true", 
                           help="Extend search to the email body (default: headers only)")

    args = parser.parse_args()
    process_archive(args)
100

NOTE: Make sure you have the virtual environment still loaded or point to the specific python3 binary

Bash
(venv_bichon) truenas_admin@truenas[/mnt/ssd-pool/bichon]$ python3 bichon_inspect_export.py recovered_emails.tar.gz
Inspecting top 10 emails inside: recovered_emails.tar.gz
Filename           | Size (KB)  | Subject (Decoded)
---------------------------------------------------------------------------
email_00000.eml    | 44.32      | Jetzt live verfolgen - Ihr DHL Paket kommt heute...
email_00001.eml    | 10.44      | Issue kann sich nicht per WiFi verbinden was
email_00002.eml    | 35.40      | möchten Sie Ihre Tran
email_00003.eml    | 9.32       | Bearbeitete Bewertung von Nobilia
email_00004.eml    | 12.91      | wichtige Nachricht in Ihrer Online-Filiale
email_00005.eml    | 48.42      | RE: Vorbereitung
email_00006.eml    | 61.76      | Geliefert:  Ihre Amazon.de-Bestellung
email_00007.eml    | 20.38      | Neue Benachrichtigung: Du wurdest im Thema
email_00008.eml    | 38.48      | Ihr Amazon Paket liegt am gewünschten Ablageort
email_00009.eml    | 37.67      | Eine neue Onlinerechnung liegt vor