• AdrianTheFrog@lemmy.world
    link
    fedilink
    English
    arrow-up
    15
    ·
    edit-2
    1 year ago

    UTF-8 and ASCII are normally already 1 character per byte. With great file compression, you could probably reach 2 characters per byte, or one every 4 bits. One character every bit is probably impossible. Maybe with some sort of AI file compression, using an AI’s knowledge of the English language to predict the message.

    Edit: Wow, apparently that already exists, and it can achieve even higher of a compression ratio, almost 10:1! (with 1gb of UTF-8 (8 bit) text from Wikipedia) bellard.org/nncp/

    If an average book has 70k 5 character words, this could compress it to around 303 kb, meaning you could fit 1.6 million books in 64 gb.

    You can get a 2tb ssd for around $70. With this compression scheme you could fit 52 million books on it.

    I’m not sure if I’ve interpreted the speed data right, but It looks like it would take around a minute to decode each book on a 3090. It would take about a year to encode all of the books on the 2tb ssd if you used 50 a100s (~$9000 each). You could also use 100 3090s to achieve around the same speed (~$1000 each)

    52 million books is around the number of books written in the past 20 years, worldwide. All stored for $70 (+$100k of graphics cards)

    • Sotuanduso@lemm.ee
      link
      fedilink
      English
      arrow-up
      10
      ·
      1 year ago

      There’s something comical about the low low price of $70 (+$100k of graphics cards) still leaving out the year of time it will take.

      • Cicraft@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Well I guess you could sacrifice a portion for an index system and just decode the one you’re trying to read