Monday, December 17, 2012

Unique Short URLs


Today I was trying to create a unique short url.

I was using the UUID class provided by java earlier for id generation. This produced ids that were too large for our purposes.
So they recommended using something they were using on another project, hash(ip + timeofvisit)
I ended up using sha1(ip + timeofvisit), cut this in half from a 20 byte[] to ten bytes. Finally base64 encode the bytes into a url safe string.

Later I got into a discussion about why I was using base 64 encoding to shorten the length of the string. Here it goes.
My point was that if you started off with the md5 hash (which produces a 128 bit digest) in a byte []
Then of the two representations, hex coded string and base 64 encoding, the base 64 version would be a smaller string.

My partner argued the below:
As this shows base64 encoding a STRING obviously causes it to get larger.

dhcp199:apache-tomcat-7.0.33 randy$ php -r "echo md5('123').PHP_EOL; echo base64_encode(md5('123')).PHP_EOL;"
202cb962ac59075b964b07152d234b70
MjAyY2I5NjJhYzU5MDc1Yjk2NGIwNzE1MmQyMzRiNzA=

Here below you see my point. By passing true for raw_output the base64 encoded version is shorter.


randys-MacBook-Air:~ randy$ php -r "echo md5('123').PHP_EOL; echo base64_encode(md5('123', true)).PHP_EOL;"
202cb962ac59075b964b07152d234b70
ICy5YqxZB1uWSwcVLSNLcA==


Good python code on here for generating unique random looking ids from some sequential key
https://github.com/adecker89/Tiny-Unique-Identifiers/blob/master/tuid.py

No comments:

Post a Comment