Monday, June 30, 2008

Stop! Words.

As promised, I have redone Wordle’s handling of so-called “stopwords”, words that are too common to visualize in most cases. Now, when you give Wordle some text, it does its best to figure out what language the text is in, and hides the stopwords for that language. Of course, as before, you can ask Wordle not to remove common words, via the “Language” menu.

I still need stopwords for Hebrew, Nepali, and (of course) many other languages. Please let me know, dear Wordle users, if you can scrape up these or any others that I’ve missed.

Make Wordles of Blogs

One of the more frequently requested features has been “Let me just paste the URL of a blog, so I can make a Wordle of the contents.” Because I don’t want to do any fetching or parsing of XML on the Wordle server side, I haven’t been able to implement this. However, our friends at the Google have done all of the hard work for me: The Google AJAX Feed API

So, now, you can enter the URL of anything that either is a feed (RSS or Atom), or has a feed, and Wordle will fetch all of the text it can get right there in your browser, and fire up a Wordle.

Thank you, the Google.

Friday, June 27, 2008

I wish I could remove stars from certain feature requests.

The top four feature requests in Google App Engine’s issue tracker:

  1. please add java or groovy support
  2. PHP support is a must
  3. Please add ruby support
  4. Add Perl support

While it’s a matter of taste to prefer one language over another, these are real head-scratchers for me. I can’t understand why the folks who have starred these issues don’t simply learn Python and start creating internet-scale apps. What are they waiting for? What are they afraid of?

The problem with those feature requests is that to implement them would take development, documentation, and QA time away from improving the existing stuff.

I know that the GAE developers pay attention to “star power” in their issue tracker, but I hope that this class of requests will be dismissed with prejudice.

New layout options

The “Layout” menu now lets you choose the overall shape of the Wordle cloud: straight(er) or round(er) edges. These shapes become especially apparent in Wordles with many words, more than a couple hundred. Play with the new settings, and see what you like.

One good strategy is to wail on the new “Randomize” button, which will automaticaly apply new color, font, and layout settings.

Remember that you can undo everything you do. It’s safe to try things, even if you have a layout you like.

Other settings you may have missed: maximum words, prefer alphabetical order. These have a profound impact on the result.

Thursday, June 26, 2008

Now with less cholesterol

I’ve spent a couple of hours in the kind of numerical tweaking that visualization programmers do a lot of. You wind up with many mysterious double-precision constants floating around your core layout routines, and you might not remember how you came to them, or why they’re there. If you’re a “software engineer”, like me, you’ll remember that it’s bad to have random numeric constants in your code, so you’ll replace them with named constants. On the other hand, the constants wind up with names like “SMUDGE_FACTOR”, “TWEAKISHNESS”, and “RHO”.

But anyway, my aim was to make Wordles a little less egg-shaped, especially high-density ones, with many words. It was hard to balance that desire with the requirement to keep the words fairly tightly packed, and I’m not sure I’m quite there yet. Now some of the Wordles take on rather a kidney shape, which is certainly no better. I’m working on it.

Wednesday, June 25, 2008

Things to come

For the curious, here are the improvements I’m most likely to make in the coming weeks.

  • Completely rewrite Wordle’s handling of “stop words”, common words that, for most uses, should not appear in the visualization. Currently, there one huge list. There needs to be a separate list for each language (easy) and Wordle needs to make a heuristic guess as to which language is in use (harder).
  • Custom palettes.
  • “Next”, “Previous”, and “Random” buttons on the wordle-viewing page.

There are other design problems I’m aware of—no search!? no [insert your favorite language here]!?—and I’ll get to them as soon as I’m able.

Please keep your emails coming. They’ve been critical in showing me where the flaws lie.

Monday, June 23, 2008

Keep Words Together with Tilde

Wordle now displays a “build number” in the lower right-hand corner of the page, which means that I can now say...

As of build 506, you may now join words together with the tilde character


So, for example, the text “The~Who played Leeds with Mott~the~Hoople and Bruce~Springsteen” might be rendered as

This ought to ease the pain folks have experienced in trying to emit a unicode non-breaking space.

Saturday, June 21, 2008

I Think I Understand

Wordle now guesses which Unicode block your text mostly inhabits, chooses an appropriate font to start with, and emphasizes suitable fonts in the font menu. Detected blocks include Arabic, Cyrillic, Greek and Coptic, Hebrew, Basic Latin, and the various Latin Supplements and Extensions. The Arabic block includes the whole Farsi (Persian) alphabet.

More and better Farsi fonts are coming soon, thanks to my frequent correspondent and personal authority on Farsi orthography, Alireza Farrokhi.

I am *almost* clever enough.

When I added non-breaking spaces, I warned that they render as "unknown character" boxes in most of the fonts. Blogger/hacker Ned Batchelder wrote to me and said, "Why don't you just replace the non-breaking spaces with regular spaces when rendering?"

I was like, "duhhhhhhhh."

Thank you.

Friday, June 20, 2008

Feeds for People

In designing Wordle, I felt it was important to put up as few barriers as possible in front of the act of saving one's work. Hence: no logins! On the other hand, that means that it's the wild wild West out there for user names. On the other other hand, you may want to say to someone, "Here's my work."

So, you may now click on user names in the gallery, and that will show you wordles created with that user name. The form of the URL is

and the associated Atom feed is

Because anyone can adopt any user name when saving, remember to be cautious when representing the URL with your special user name as "My Wordles".

I wonder if I should create logins with official user names. Opinions?

Keeping words together

Several people have written to me to ask whether they could keep certain words together, as in "The Who", or "Chevy Chase". I was about to compose a FAQ entry saying, "Sorry, no," when I decided instead to at least make it possible.

You'll need to generate the Unicode "non-breaking space" character, \u00A0, and place it between words you wish to keep joined. How you generate that character is operating system-specific. perhaps users can leave tips for each other here.