Ad
  • Custom User Avatar

    The kata does not define the bare minimum properties of "enemy language" required for frequency analysis to be valid.

    To identify arbitrary mappings of the graphemes of the language there needs to be an assumption on how the language is composed, aka the language model: what is defined as a morpheme in the language? How are graphemes composed in a morpheme? How are the morphemes combined?

    Frequency analysis has a lot of underlying assumption on how the language operates. Without this you can easily construct "unbreakable" conlangs. For example, invent an Caesar English conlang as

    1. split text into words (by whitespace or fixed width)
    2. for each word, choose a random character `c`, then replace word with `c + caesar(word)` with `c` as the shift
    

    or a RLE English language as

    1. take away up to 26 same/consecutive looping characters from the start of the text
    2. encode it as `A-Z` as length `1-26`, then the starting character
    3. repeat 1-2 until the text is exhausted
    

    and these will defeat any frequency analysis.

    And even after all this, a mapping is different from a shift, which adds another layer of complexity.

  • Custom User Avatar

    Random tests almost never generate cases where there are multiple keys that can decrypt to CODEWARSHEADQUARTERS, so any solution that finds and returns the first valid key will pass all the tests.

    Random tests also also doesn't test for keys with periods (e.g ABABAB vs AB): length = 2 is trivial (just start searching from length 2), length = 5 is irrelevant because 5 is a prime (and 5 same characters in a row is extremely unlikely), and 6 <= length <= 11 case almost never generates such keys. A blind search without consideration of actual range of key lengths will fail on these tests, finding a key shorter than possible.

  • Custom User Avatar

    The kata premise is flawed: the author's intention of the kata as detailed in the comments below is

    collect a corpus to aid the decryption process

    However, since a part of the plaintext is known, and most importantly, the known plaintext is almost double the key length (usually more), it reduces to a plaintext attack instead, which has much simpler attack methods (e.g crib-drag). Every existing solution does this instead of doing any frequency analysis.

    If the kata needs to be de-cheesed, the key length needs to be at least as long as the known plaintext (which might not be feasible performance-wise).