3 kyu

Huffman Encoding

206 of 1,571muesli4
Description
Loading description...
Algorithms
  • Please sign in or sign up to leave a comment.
  • wol42verine Avatar

    Code is passing tests but failing the Attempt. I didn't even know that was possible, and can't understand what's happening.

  • exec Avatar

    Very good challenge, took me quite a bit of brainpower to figure it out... hoping to encounter more like this.

    Creating this solution was both enlightening and challenging, especially when it came to handling edge cases. One key complexity arose from dealing with scenarios where there was only one unique character in the input string. The original instructions left room for interpretation on this specific case, and my initial approach ended up returning an empty list, which didn't align with the test expectations.

  • EPiph Avatar

    This probably isn't a spoiler... but spoiler it if it is...

    If the description of building the tree here is confusing to you, have a look at this amazing video by Tom Scott on YouTube.

    You're welcome ;p

  • arsuhinars Avatar

    I have problem with "for an alphabet with same frequencies and with size of 2^n all encodings should have the length n" tests. Only a half in average from it are passed. I've checked my code several times that it implements Huffman algortihm from description and I don't know how to fix it.

  • pcyanide Avatar

    "the higher frequencies are, the lower the length of their encoding should be (can also be equal for non-powers of 2)"

    Why powers of two should make the difference?

    For the frequencies [('b', 2), ('e', 2), ('g', 4), ('i', 8), ('m', 16), ('n', 32), etc, my Python code gets: b: 000, e: 001, g: 010100, i: 010101, m: 010110, etc.

    So, 'g' has frequency 4 (2²) and code 010100 length 6; 'i' has frequency 8 (2³) with code 010101 also length 6. Both frequencies are powers of two, have same code length, but I don't see any problem with that, as there are no frequencies between 4 and 8.

  • ArtBalan Avatar

    I keep getting this error :

    TypeError: Cannot read property 'map' of null at Context.<anonymous> (test.js:122:35) at processImmediate (internal/timers.js:464:21) But my code ain't 122 line long (only 80) and I don't even use map property in my code. What am I doing wrong ?

  • Stryder_ZX25 Avatar

    I get that input validations is necessary and all, but I spent more time wondering why would one pass an empty list to encode (I am fine with empty strings) than finding errors in my code. Great kata nevertheless. I would suggest imporoving the functions signatures to remove some redundency in the computations but maybe it is me who wasn't clever enough (though I could ignore some variables but that annoys me even more). Again, great kata.

  • noktelfa Avatar

    I pass the initial test, but when I run the Attempts, I'm getting errors outside my own code.

  • trashy_incel Avatar

    tests in Python are not using decorators, and in the full tests suite, the basic_tests have no assertions attached to them and therefore appear as failed

  • artyomSultanov Avatar

    Good tests... This kata is not for the algs, but for testing

  • mjpieters Avatar

    Please make it clearer how code should 'reject' a frequency table with just one or no elements. Currently the description states:

    Note: Frequency lists with just one or less elements should get rejected. (Because then there is no information we could encode, but the length.)
    

    There is no clear definition of 'reject' across programming languages. Make it explicit what should happen in such cases.

    For the Python version, the starter text (the template solution text you get when you start training) confuses matters too, as it tells you only str is expected:

    # takes: [ (str, int) ], str; returns: String (with "0" and "1")
    def encode(freqs, s):
        return ""
    
    # takes [ [str, int] ], str (with "0" and "1"); returns: str
    def decode(freqs,bits):
        return ""
    

    Where the input should be rejected, the tests expect None here, which is definitely not a str object. A seasoned Python developer might reasonably expect to have to raise an exception here, as that's the ideomatic way of rejecting bad input in Python. With no further clarity in the description and the comments stating a str return is expected, that's a perfectly reasonable assumption to make.

    So, for Python at least, please update the returns lines in the template:

    # takes: [ (str, int) ], str; returns: str (with "0" and "1"), or None when rejecting
    def encode(freqs, s):
        return ""
    
    # takes [ [str, int] ], str (with "0" and "1"); returns: str, or None when rejecting
    def decode(freqs,bits):
        return ""
    

    (This issue was re-raised by request)

  • B1ts Avatar

    JS random tests are too easy to bypass.

  • AlejandorLazaro Avatar

    As mentioned by others in this kata discussion, there are hidden requirements that don't quite make sense until looking into other discussion items, such as when to return empty strings versus Nones for results.

    Also, I'd love if we could see logs and more details for 'odd' failures like the ones I'm hitting, where I've created a working Huffman Encoding script, but there's 1 test that fails with:'0' should equal None, with no other information.

  • dc22 Avatar

    I am testing Huffman encoding in Python: Passed tests : 2781 but Failed tests: 265

    I don't undestand the following error Log frequencies(l) l encode([],l) Test Passed ERROR restore sorted sequence from frequencies should succeed: [] should equal ['l']

    Can you help me Please

  • einsum Avatar

    This kata's basic tests are incredibly inconsistent

    test.assert_equals( encode( fs, []), '')
    test.assert_equals( encode( [], "" ), None );
    test.assert_equals( encode( [('a', 1)], "" ), None );
    test.assert_equals( encode( [ ("a",4), ("b",1), ("c",2) ], "" ), "" )
    
    1. Why is it that we are testing enode(fs, []), when according to the hint the second argument should be a string ???

      takes: [ (str, int) ], str; returns: String (with "0" and "1")

    2. Why is encode(fs, "") supposed to return None if len(fs)<2 and else "" ????
  • MateuszPuto Avatar

    This comment has been hidden.

  • RiyazMo Avatar

    I know that there are different ways to build the Huffman tree but I would like to know if it is possible to use the module "networkx" in python. Thanks :)

  • rachmiles@gmail.com Avatar

    Hello - I don't really understand the requirements fully here! One of the test cases has freqs = [] and s = 'v' My encode function correctly returns None With no encoded value, my decode function doesn't have a lot to work with, and so it also returns None. However, I'm getting the error "restore sorted sequence from frequencies should succeed: [] should equal ['v']" If there is no frequency table, how does it know to decode to 'v'? Any help gratefully received! Thanks

  • rsa Avatar

    Clojure Translation ready for review.

  • mjpieters Avatar

    Please make it clearer how could should 'reject' a frequency table with just one or no elements. Currently the description states:

    Note: Frequency lists with just one or less elements should get rejected. (Because then there is no information we could encode, but the length.)

    For the Python version, the starter text confuses matters too, as it tells you str is expected:

    # takes: [ (str, int) ], str; returns: String (with "0" and "1")
    def encode(freqs, s):
        return ""
    
    # takes [ [str, int] ], str (with "0" and "1"); returns: str
    def decode(freqs,bits):
        return ""
    

    The tests expect None here (and not an exception, which is the ideomatic way of rejecting bad input).

    Please update the description to state what should be returned when rejecting, and update the returns lines:

    # takes: [ (str, int) ], str; returns: str (with "0" and "1"), or None when rejecting
    def encode(freqs, s):
        return ""
    
    # takes [ [str, int] ], str (with "0" and "1"); returns: str, or None when rejecting
    def decode(freqs,bits):
        return ""
    
  • Tarlanc Avatar

    Brilliant Kata and it was great fun to solve it. But were the >2500 random tests really necessary? It posed quite an additional challenge on the performance of the algorithm. I know, it's 3kyu, but getting the tree right is already a puzzle without the added complication of scarce time.

  • sashamc Avatar

    Now I work with java translation and try to choose kata api design. Well, any solution should be able:

    1. on text sample, calculate symbol frequencies table;
    2. with that frequencies given, build appropriate data structures for encoding with compression, and a) serialize coding table calculated, b) encode message;
    3. with coding table calculated or provided, decode message just coded. So, there are 4 methods in kata api, and 2-a), 3) are not the same as haskell variant. I think, it's more clear how to work with code table, then just 'decode a bit sequence using the given frequencies' (haskell kata description). For example, a:4, b:1, c:2 (symbol:frequency) "aaaabcc" coding result is 0000101111 - here a:0, b:10, c:11. When "usually we choose 0 for the left branch and 1 for the right branch", shouldn't be b:11, c:01 if a:0? a=0 and freq(a)=4 > freq(b+c)=3; so last code letter for c should be the same as first for a, because of freq(c)=2 > freq(b)=1 (in accordance with select branch rule. Whould it be convenient to work with this 1), 2-a), 2-b), 3) api schema in kata java version?
  • lonkaan Avatar

    Test cases are vulnarable If user makes changes to the given list in the function, because other tests are dependent in the former ones. In Python

  • lonkaan Avatar

    Why while testing encode function you give frequency list as [('dSg?VQu4NJF8UMhZvq!eC9maiW1ko7Ax', 320)] and expect us to encode this like every single character in the text has frequency 320? And In the description you say that if "frequency list has one or less element should be rejected". Very confusing. If you give me freq just like that, I would choose whole string as a character in the encdoing. Understanding test suit is the real deal in this kata btw.

  • dawk1ns Avatar

    Could someone translate this in Java when and if they had the time please?Looks like a really fun kata.

  • urxvtcd Avatar

    Hi, for the Haskell version my code is failing with:

    the higher frequencies are, the lower the length of their encoding should be (can also be equal for non-powers of 2)
    Falsifiable (after 1 test): 
    expected: GT
     but got: LT
    "aefklmnopqrstuxyz"
    

    however I find this message somewhat unhelpful, I can't figure out what is the root cause, what is being compared to what, or what frequencies are used.

  • user9644768 Avatar

    There is something entirely wrong with this kata! I didn't test out my code yet because it is irritating example, how b can have 10 and c can have 11, despite the fact that the occurence of b is lower than c. Please check example on the wikipedia, from that it is expected that the frequency should be in order of a> b > c which is certainly not true here. The problem is not with the convention, it has to do with inherent inconsistency! I am sure something is wrong with me, as it is quite improbable that this number of people would have completed the kata. Please help! I just want to get out of the irritation.

  • idlang Avatar

    I completed this kata yesterday, but didn't submit as I wanted to tidy up my code. Having come back to it today, the code has disappeared, and there is just the skeleton! Is there any way to get it back, or do I have to recode from scratch?

  • MastaB Avatar

    Haskell

    I've been investigating why my solution is too slow.

    I use trace to see the input to the tests.

    There are roughly : 3300 tests in total (before it times out) 1100 of those are repeated tests

    Can we have repeated tests removed please?

    Or even better just do 3000 tests in total, then I can pass this kata! :-)

    Thanks!

  • MastaB Avatar

    Haskell - Speed

    I'm in the dark. Please help. For me...

    'Run Sample Tests' Completes in 'Time 2147ms' and 'Completed in 0.0069 seconds'

    'Attempt' 'Timed Out 12000ms' - Hangs after the passing the test 'the higher frequencies are, the lower the length of their encoding should be (can also be equal for non-powers of 2)'

    Question - For those who did this in Haskell, how long does it take to complete the sample tests? Question - And how long does it take to complete the 'Attempt' tests? Question - What is the test after the above test?

  • MastaB Avatar

    Hi! Good fun so far, but...

    Haskell - I pass the sample tests, but timeout on the full tests.

    Is there a way to see what the tests are to debug this?

    I'm still a noob in Haskell so not sure how to do this.

  • moeatsy Avatar

    "Huffman coding" article at wikipedia and "nerdfirst huffman encoder online" was very helpfull to understand problem of this kata and debug code :)

  • Shiv  Singh Avatar

    Can anyone help me? I am stuck on the decode function,I have coded the encode function which is easy enough(Create a tree according to the frequency argument and then find the code for each character by visiting it from the root node). but how do i figure out the word boundaries for each letter in the encoded string as there are no separators. for ex: we get 0000101111 after encoding the example in description,should it be like |0|0|0|0|10|11|11| so that we know how many bits for each word,can someone explain please!

  • Madjosz Avatar

    Why is there a parameter for the frequencies in the encode method? Normally one should get the frequencies (and the resulting encoding tree) directly from the given String s or is this for simulating the encoding of a substring with frequencies gathered from a bigger text?

  • dagolinuxoid Avatar

    This comment has been hidden.

  • zapakh Avatar

    Python. I had tremendous fun; thank you.

  • Limpo Avatar

    There can be multiple results after encoding or decoding because there are multiple trees we can get after using function encode() or decode() For instance, in example in the instruction we can get tree with leaves 'b' and 'c' swapped and therefore we will get after encoding this: encode(frequency('aaaabcc'), 'aaaabcc') = '0000111010' after decoding this: decode(frequency('aaaabcc'), '0000101111') = 'aaaacbb'

  • Blind4Basics Avatar

    This comment has been hidden.

  • JustifiedFlaw Avatar

    Some tests don't make sense. It expects an empty string to be returned when passing an empty string, except if the frequencies have 1 key. Why? Decoding or encoding an empty string should return an empty string.

  • StefanPochmann Avatar

    The tests are lacking. The most "Clever"-voted Python "solution" doesn't do Huffman. For example it turns the string 'abcdefgh' into the table {'a': '0', 'b': '10', 'c': '110', 'd': '1110', 'e': '11110', 'f': '111110', 'g': '1111110', 'h': '1111111'}, which is just wrong (and leads to a far too long encoded result string). The eight characters occur equally often, so they should be encoded as the eight different three-bits strings '000' to '111'.

  • tmxk Avatar

    I have passed, but I still have some little questions.

    (1).What do we do if we find the two smallest subtrees with the same frequency?

    (2).What do we do if we find several smallest subtrees all with the same frequency?

    I think theoretically we can pick any two of them in any order?

  • pinealan Avatar

    The Python test suite used the standard library function randint() without importing it. Please fix it.

  • Blind4Basics Avatar

    I don't know if it's the same in all languages (it seems to be...) but, the example tests are almost a joke :o => Only edge cases and no test of the actual encoding/decoding!? (meaning, not just the length of the string....!)

  • kirilloid Avatar
  • metalim Avatar
  • kirillzzz Avatar

    In our example fs = [ [ 'a', 4 ], [ 'c', 2 ], [ 'b', 1 ] ]

    I don't understand what're the differents in these test: 1) Test.assertEquals( decode( fs, "" ), "" ); Test.assertEquals( decode( [ ["a",1] ], "" ), null );

    1. Test.assertEquals( encode( fs, "" ), "" ); Test.assertEquals( encode( [ ["a",1] ], "" ), null );

    When my function has to return null? And when it has to return ""?

  • wunder Avatar

    A very nice kata. My JS-skills are nothing to brag about, so the code structure is messy. I would love to get some hints on katas that help to learn to create nice, readable, structured code in JS without using the "class" and "new" keywords, etc. which I heard "should be avoided" as they were introduced for the sake of java folk (like myself) to make them feel more "comfortable" with js ;)

  • hufftheweevil Avatar

    I finally completed this, after weeks of putting it off because I didn't want to take the time to understand it. One thing that helped me was realizing that it was kind-of open-ended in that you decided exactly how it was coded, as long as you decoded it in the same manner.

    I'm quite surprised that my solution was the shortest so far by a landslide. After completing the frequencies function, I set my aim to keep it short, and I'm pleased with the result. Sure, it's probably not readable, but I left some comments under my solution if anyone is interested in how I kept it so short. And I challenge any golfers out there to beat it.

  • docgunthrop Avatar

    EDIT: This is in regards to the JavaScript translation.

    The description explains what Huffman encoding is (which the web has an abundance of information on), but doesn't explain at all what is required of this kata.

    • What is the expected output of the encode and decode functions? One of the sample tests is: Test.assertEquals( encode( fs, s ).length, 10 );, expecting a value with a length of 10. I'm failing that test because my solution returns a string of length 2. So what is encode supposed to return exactly? Also, is the decode function expected to return anything besides just the original uncompressed string?
  • dcsmith Avatar

    cool! I have a very inefficient solution but it was fun to figure out :)

  • JohanWiltink Avatar

    This was fun. :]

    I wouldn't approve it until I had solved it, but there you are.

    RSN I'll get the JS translation up. ( I actually solved it in JS first, then translated. I wonder if it's visible in my code. :P )