This code is a **word unscrambler** program that uses a...

August 30, 2025 at 09:31 AM

from enum import Enum import logging from string import ascii_lowercase import sys import random import bootstrap import config cfg = config.add_commandline_args( f"Unscramble! ({__file__})", "A fast word unscrambler." ) cfg.add_argument( "--sparsefile", help="Path to the sparsefile to populate or query.", metavar='FILENAME', default="/usr/share/dict/sparse_words", ) cfg.add_argument( "--dictfile", help="Path to a file containing language words.", metavar='FILENAME', default="/usr/share/dict/words" ) group = cfg.add_mutually_exclusive_group() group.add_argument( '--query', '-q', type=str, help="The scrambled form of a word to ponder." ) group.add_argument( "--populate_destructively", help="Discard, overwrite and rebuild the sparsefile using a new random seed.", action="store_true" ) logger = logging.getLogger(name) letter_sigs = {} sparse = {} sparsefile_bits = 38 # 238 = 256Gb sparsefile sparsefile_length = 2 ** sparsefile_bits sparsefile_mask = sparsefile_length - 1 length_multiplier = int(sparsefile_length / 100) def populate_letter_sigs() -> None: for x in ascii_lowercase: letter_sigs[x] = random.getrandbits(sparsefile_bits) def compute_word_sig(word: str) -> int: sig = (len(word) * length_multiplier) & sparsefile_mask pop = {} for letter in word: if letter in letter_sigs: pop[letter] = pop.get(letter, 0) + 1 for letter in pop: factor = (letter_sigs[letter] * pop[letter]) & sparsefile_mask sig = sig ^ factor logger.debug( "(%d x %s) => 0x%x (factor=0x%x)" % (pop[letter], letter, sig, factor) ) return sig def read_dict_write_sparsefile(dictfile: str, sparsefile: str) -> None: with open(dictfile, "r") as f: for word in f: word = word.replace('\n', '') word = word.lower() sig = compute_word_sig(word) logger.debug("%s => %x" % (word, sig)) if sig in sparse: sparse[sig] += ",%s" % word else: sparse[sig] = word with open(sparsefile, "r+b") as f: total = len(sparse) count = 0 for sig in sparse: word = bytes(sparse[sig], "ascii") count += 1 logger.debug( f"[{count} / {total}]: Write {len(word)} bytes at {sig:x} for \"{word}\"" ) # Ensure no collisions f.seek(sig) for x in range(0, len(word)): c = bytes(f.read(1)) if c != b'\0': logger.error(f"About to clobber {c} @ 0x{f.tell():x}") sys.exit(1) # Write this one out f.seek(sig) f.write(word) class State(Enum): FIRST_CHAR = 1 SKIP_EMPTY = 2 ACCUMULATE_NOT_EMPTY = 3 @bootstrap.initialize def main() -> None: seed = 12347 random.seed(seed) populate_letter_sigs() if config.config['populate_destructively']: print(f"Rebuilding the sparsefile using random seed {seed}") read_dict_write_sparsefile( config.config['dictfile'], config.config['sparsefile'] ) sys.exit(0) # Otherwise, we're looking up a word... assert(config.config['query'] is not None) sig = compute_word_sig(config.config['query']) with open(config.config['sparsefile']) as f: result = "" f.seek(sig) state = State.FIRST_CHAR failed = False while True: c = f.read(1) logger.debug("c='%c', result='%s', failed=%s, state=%s" % ( c, result, failed, state)) if state == State.FIRST_CHAR: if c == chr(0) or c == '': failed = True print("Lookup failed. Searching for approximate match...") state = State.SKIP_EMPTY else: failed = False result = result + c state = State.ACCUMULATE_NOT_EMPTY elif state == State.SKIP_EMPTY: if c != chr(0) and c != '': result = result + c state = State.ACCUMULATE_NOT_EMPTY elif state == State.ACCUMULATE_NOT_EMPTY: if c != chr(0) and c != '': result = result + c else: break if failed: print("Closest next match was: '%s'" % result) sys.exit(1) else: print(result) sys.exit(0) if __name__ == "__main__": main()

This code is a word unscrambler program that uses a technique involving a sparse file and hashing-like computations to map words to unique identifiers (signatures) for efficient lookup.

Here's a breakdown of what the code does:

  1. Configuration and Argument Parsing:

    • The config setup defines a command-line interface for the program, allowing users to run it with certain options.
      • --sparsefile: Specifies a file path for storing the computed sparse representations of words.
      • --dictfile: Specifies a file containing the dictionary of valid words to use.
      • --query: Lets the user input a scrambled word to search.
      • --populate_destructively: Rebuilds the sparse file, discarding any existing content.
  2. Sparse File and Letter Signatures:

    • A sparse file is a large virtual file used to map the words efficiently.
    • Each letter of the alphabet is assigned a random hash-like signature (letter_sigs) using random.getrandbits, ensuring a unique mapping for each letter within a specific bit range.
    • Words are processed letter by letter to compute their signature (compute_word_sig) using the formula:
      sig = (length_multiplier * len(word)) XOR random_letter_signatures
      
      This approach ensures that small changes to letters or length significantly affect the signature, reducing collisions.
  3. Rebuilding the Sparse File (--populate_destructively):

    • The read_dict_write_sparsefile() function reads the specified dictfile (a dictionary of valid words), computes the unique signature for each word, and writes it to the correct position in the sparse file.
    • Before writing to the sparse file, the code checks to ensure no signature collision/clobbering occurs.
  4. Word Lookup (--query):

    • If a query (scrambled word) is provided, the program computes its signature using the same hashing mechanism.
    • It seeks the sparse file at the computed signature's position, reads back any stored word(s), and checks if the word matches.
    • If no exact match is found, it attempts to find the closest match (based on subsequent sparse entries).
  5. Error Handling and States:

    • State is an Enum used to manage and track the state of the lookup process:
      • FIRST_CHAR: Indicates the first character read from a sparse file.
      • SKIP_EMPTY: Skips null bytes (empty regions in the file).
      • ACCUMULATE_NOT_EMPTY: Collects non-null bytes to form a word.
  6. Main Program Operation:

    • The main() function initializes the random seed and populates letter signatures.
    • Depending on the flags, it either:
      • Rebuilds the sparse file.
      • Or queries and unscrambles the input word using the sparsefile lookup.

High-Level Workflow:

  • Initialization: Configure random seed and arguments. Assign random letter signatures.
  • Population (optional): Create or rebuild the sparse file by computing each dictionary word's signature and storing it in the sparsefile.
  • Query (unscramble): Given a scrambled word, compute its signature, look it up in the sparsefile, and return the unscrambled word or the closest match.

Example Use Case:

  1. Populate a sparse file:

    python unscramble.py --populate_destructively
    

    This processes a dictionary and stores hashed representations into the sparse file.

  2. Unscramble a word:

    python unscramble.py --query "elppa"
    

    If "elppa" is a scrambled form of "apple," it will compute the signature and return "apple."

Key Takeaways:

  • Sparse File: Efficient storage for a large key-value structure using direct file access.
  • Signature Computation: A hashing-like mechanism provides each word a unique identifier based on its letters and length.
  • Primary Functionality: Fast unscrambling of words via precomputed hashes.
Generate your own explanations
Download our vscode extension
Read other generated explanations

Built by @thebuilderjr
Sponsored by beam analytics
Read our terms and privacy policy
Forked from openai-quickstart-node