Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

4 alphabets #6

Open
jbenet opened this issue Jan 6, 2015 · 6 comments
Open

4 alphabets #6

jbenet opened this issue Jan 6, 2015 · 6 comments

Comments

@jbenet
Copy link

jbenet commented Jan 6, 2015

I propose:

4 alphabets:

  • 馃槉 be256 - 1 emoji 1 byte
  • 馃榾 be512 - 1 emoji 2 bytes
  • 馃槂 be768 - 1 emoji 3 bytes
  • 馃槃 be1024 - 1 emoji 4 bytes

notes:

  • emoji are unicode, and we're picking alphabets, we already have tables at play.
  • there's 1244 emoji, we could sacrifice 4 to be alphabet identifiers. I suggest those above (increasing in happiness 馃槃 ).
  • Each should be contained in the other, so there's no ambiguity on what number they represent.
  • You get a smiley face at the beginning of every (self-describing) emoji-text!! (it would make me so happy)
  • if there is no leading encoding identifier (quirksmode), you assume the smallest alphabet possible (when you run into a character from a larger set you bump up to that. statistically, we're very unlikely to find sequences of bytes like: ([0-256]000)+ -- ie. [10, 0, 0, 0, 2, 0, 0, 0, ... ])

[ deleted a portion in which jbenet utterly fails to do simple math ]

Guys, be1024 is going to be huge sweet.


updates:

  • added 馃敟 to sha1, and "(+1 alphabet)"
  • deleted math fail in which I claimed the equivalent of 2^10 = 2^32
@pfrazee
Copy link
Owner

pfrazee commented Jan 6, 2015

just to note, sha1 is 160 bits long, so that example truncates it to 40 bits. a be1024 of the full sha1 would be 17 chars (1 encoding + 16 data)

@jbenet
Copy link
Author

jbenet commented Jan 6, 2015

@pfraze

5 char/hash * 4 byte/char * 8 bits/byte = 160 bits/hash

(i had to triple check)

but yeah, i missed + 1 char for encoding (corrected above). sad to loose a whole 1024 bits on encoding char.

@pfrazee
Copy link
Owner

pfrazee commented Jan 6, 2015

i think youre miscalculating how many bits you can pack into a be1024 emoji. at most it's 10, right? 2^10 = 1024. you're saying 4 bytes, which would be 32 bits. we'd need 4,294,967,296 emojis for that

@jbenet
Copy link
Author

jbenet commented Jan 6, 2015

@pfraze arghhhhhh you're totally right. idk what the hell I was thinking with 4 byte/char.

@pfrazee
Copy link
Owner

pfrazee commented Jan 6, 2015

hah no worries, i did the same in some of the other issues

@jbenet
Copy link
Author

jbenet commented Jan 6, 2015

yeah, i'm way less excited about be1024 now.

what are there 65,536 of? (... there's ~50,000 kanji's. but i think that'd be way worse than looking at a hash, or even raw ascii.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants