Gnowt

[Permalink]     code inactive other web

gnowt A URL shortening service that uses memorable words

About

With Twitter's move over to shortening every url in every tweet, I quickly got sick of seeing URLs that look like shrt.lnk/sdl83JU, i.e. a domain name followed by an unfriendly muddle of uppercase letters, lowercase letters and numbers.

Gnowt is a proof-of-concept URL shortening service that stores URLs and provides a short link like:

gnowt.offend.me.uk/out/now/here

I felt that a URL shortening service that used words to identify the link would be more useful in a number of ways:

  • I can remember a short sequence of words much better than a sequence of characters
  • It's probably just as quick to type a few familiar words as it is to type an unfamiliar code
  • You can tell another human being in person or over the phone how to get to your URL much easier if it's made of words
    • "Go to shrt dot lnk slash ess dee ell eight three upper-jay upper-yoo"
    • "Go to gnowt dot offend dot me dot uk slash out now here" This will be better if I buy a nice short domain name of course
  • If the service knows it's looking for words, it can be flexible over input

What I mean by the last point is that, if you've entered a URL into gnowt and it says your new URL is http://gnowt.offend.me.uk/out/now/here, you can actually also access that as:

  • http://gnowt.offend.me.uk/out now here
  • http://gnowt.offend.me.uk/out_now_here
  • http://gnowt.offend.me.uk/out-now-here
  • http://gnowt.offend.me.uk/out.now.here

Whatever you find more aesthetically pleasing I guess.

Stats

Mmmmmmm numbers.

One concern I had was that my shortened URLs might get considerably longer than other services once gnowt is storing a good number of URLs and thus losing the benefit of being memorable.

To offset the possibility, I made sure that all the words in the list were well-known and unambiguous words. I did this by:

  1. Taking a list of the top 6000 most frequent words
  2. Removing all but nouns, adjectives, and adverbs
  3. Removing plurals
  4. Removing profanities (I'd be upset (ok, maybe amused) if my URL ended up being gnowt.com/cock/balls/poo)
  5. Removing any words that are spelled differently in different forms of english (I'm thinking of colour vs. color here)
  6. Sorting the remaining words in order of size so that we use up shorter words first
  7. Removing any words longer than 6 letters

The word list contains 1693 words. This means that there can be:

  • 1,693 one-word URLs
  • 2,866,249 two-word URLs
  • 4,852,559,557 three-word URLs
  • 8,215,383,330,001 four-word URLs

By the time it gets up to four-word URLs, that's a total of 8,220,238,757,500 URLs.

Let me say that again: 8 trillion, 220 billion, 238 million, 757 thousand and 500 hundred URLs.

(When you get to 5, it's nearly 14 quadrillion)

I reckon that's enough to be getting on with for now. If I'm wrong, I'm wrong :)

  ploxy   code command line inactive server server   miniserv   code command line inactive server server   dmenu-notify   code desktop inactive other   Violining a contemptible fellow   blog code git   Break In!   blog games   xmodmap Hints and Tips   blog linux   Black Jack - pick up seven!   blog games   My favourite spoonerisms   words   Good things that have happened in the past week   blog   Good times with git   blog git   It's all Geek to me   blog   Javascript Closures   blog code javascript   Waffle. Move along.   blog   Things I learned today   blog   Hey, at least I'm not rioting   blog   HTML5 and holidays   blog holiday   Blah, cold. Do not want.   blog   Worst.Landlord.Ever   blog   Wheeee   blog   First!!1!one!!eleven   blog