Blogger Hacks, Categories, Tips & Tricks

Wednesday, September 27, 2006
Keeping the subscribers in mind above all, Julie has some ideas for those of you who are recent migrants to beta to ensure that you keep your subscribers happy and connected to the good stuff that you're publishing. These include announcing your new feed URL & updating your feedburner source feed. All good stuff.....

Filed in:
Posted at 7:00 PM by John.
Tuesday, September 26, 2006

Taking as a starting point the idea that there is undiscovered quality content out there, I've been doing a lot of thinking lately (hence the smell of burning) about the structure of the blogosphere, "microspheres philosophy", and the pluses and minuses thereof. I'm interested in three main issues:

  1. The reputation / authority economy in the blogosphere, & how that creates & encourages "microspheres" - communities of shared interest with similar authority levels.

  1. The challenge of developing your blog, when the above-mentioned structure can be hostile to competition.

  2. The limitations of the structure of the blogosphere, and the emergence of parallel / duplicate microspheres.

So what is a microsphere? Well... the notion of "the blogosphere" is no longer very helpful (if it ever was). There are now blogs about anything you can imagine, and many thousands of "microspheres" made up of sites that are both interconnected by links and shared by a community of readers. To my mind, the microsphere is a much more useful tool for thinking about the social & informational connections between blogs than is the concept of the monolithic "blogosphere." The microsphere (at least the microsphere defined by Freshblog's inbound links) is graphically represented by the new graphics at The Truth Laid Bear:


I imagine each blog as having multiple microspheres.... Inbound links, outbound links, readers, subscribers, sources.... These are the online communities in which either bloggers or their content participate. Significantly, these seem to coalesce around a couple of different factors. The first of these, similar content, is no great surprise. I want to read, be informed by, & respond to blogs that post about stuff that I'm interested in. Far more interesting to me is the notion of microspheres and authority.


1. The Reputation / Authority Economy

I have riffed previously on how useless rankings are for anything other than estimating the "weight" of incoming links or assessing the relative performance of your blog against the blogs you want to emulate. What else is the reputation / authority economy about?

  • Adding to the conversation with relevant, original and insightful material

  • Drawing attention to that material

  • Actively participating in discussions to support your positions / respond to challenges / integrate new material

What is the currency in this economy? Links. The relationship here is, of course, cyclical:

  • The more inbound links you have, the more people will see your stuff

  • The more people see your stuff, the more links you'll have

This is great, because it is self-reinforcing, & a single post can reverberate through the blogosphere for a good while and change the fortunes of your blog. It also means, though, that it is hard to get started. Authoritative bloggers are selective in their linking, and very authoritative bloggers are very selective. This suggests that Microspheres tend to include blogs of similar levels of authority, & that it is difficult to gain attention from blogs which are more authoritative than your own. The process of community formation amongst peers seems to be the strongest phase in the formation of a microsphere. These blogs then grow in reputation and authority together.

2. Niche Building: Becoming Reputable and Authoritative

The Squidoo folks invite the authors of complete lenses to "create your own recommendation economy and traffic network online." Well, isn't that just the thing of it. The work of years in 10 words or less.... Ultimately (and at multiple levels as your microsphere expands) your blog will gather momentum and come to be viewed as both reputable and authoritative by some percentage of your readers. How to get this done? Time is critical, of course, (you can't turn this around in a week!) as are links, and active participation in the conversation.

Steve Rubel encourages us to "be generous to be effective." To my mind that's the key and has always been. Link as you wish to be linked to! I noticed a significant percentage of the blogs that I read regularly because they linked here, or because they were "recommended" by a link from another regular read. Dialogue in comments and e-mail also raises the profile of your blog & enables you to sharpen opinions / clarify confusions in exchanges with individual readers. Ultimately, I think that reputation and authority in the virtual world are derived from the same sources as in the real world (however much we'd like to think otherwise).... Civility and relevance / interest. To grab a niche, write well about something that you know about, and add value to the debate.

While we're talking about niches, Amit links to Mike Rundle's plea to "go find your own niche" rather than trying to emulate the top 100. Agreed. Write what you know. The structure of blogging as a medium will allow you to choose any topic that sounds good to you. So let's say you find a niche... a half-dozen blogs that talk about the same good stuff that you do. How do you get a seat at the table?

3. Parallel Microspheres

Whether deliberately or coincidentally, the blog world is a place in which multiple communities address similar concerns, and are not always integrated effectively with one another. What is it about the structure and functions of blogs that fails to discourage duplication?

  • The structure of blogs encourages and places disproportionate value on "fresh" content, and so there is a pressure to produce.

  • Mechanisms for finding content are imperfect, & so existing quality content can't always be found - this encourages duplication.

  • Authors may not search for existing content before making their contribution.

  • The volume of content in the blogosphere is too great to navigate comprehensively, and so we define our own "microspheres" and navigate those.

  • Duplicate content can be rewarded if your post has the highest profile.

Is the duplication of information necessarily a bad thing? No. Content theft is, of course, but the offering of similar information in multiple locations spreads the wealth and maximizes the chance that a curious reader will find what they're looking for. There is, though, an inefficiency there. If you can find my "page titles optimization" hack, for example, your time might be better spent improving upon it, rather than re-creating it.

What about integration? Well, there's a counter-argument there too. Maybe your "page titles optimization" hack is your ticket to the big time (or at least to a bigger time?) & so not only do you want to present your hack to the world, but you want to obliterate mine in the process (bwahahahah!) Now sure, competition drives creativity to a degree, but collaboration (at least in my experience) has a significant role to play.

How might the blogosphere recognize and reward collaboration? Mary Hodder has argued persuasively for a "visible microsphere" of declared interconnections between authors:

We want to see where people link, what the relationships are between them, and make our own decisions as readers and conversants about what those author relationships mean, as we take in the work. It's the author who matters, and the author who must decide how and what to show about their own biases and relationships. Because otherwise the online communities will decide for that author. It's so much cleaner if authors and creators give it to us up front. Readers like it and we need it to evaluate trust because authors have become uncommodified.

Such a "declared microsphere" would also make it easier for authors to locate related content, & to identify online communities that they'd like to integrate with. This would also increase the level of “personal” interaction in the blogosphere and therefore require a certain standard of civility.

4. Over to You

As you can tell by now, I'm interested in Blogging as a social process, and as an opportunity for community building, collaboration and the sharing / refining of ideas so that they might reach their greatest potential. Nor am I alone. Confused of Calcutta explores a list of six unintended consequences of blogging. They all have to do with testing opinions, forming communities and opening up new avenues for collaboration. Meanwhile John at Beelerspace develops a great theory of blogs as tools for social networking and communication:

We have created blogs – and other technologies like it (email, for example) – so that we can maintain our village, so that we don’t have to create new meaningful relationships every 2-3 years. Technology hasn’t created a global village, it has enabled global villages. Blogs and email are a direct evolutionary reaction to the demanding mobility of modern society.

I'm curious to hear your ideas for consolidation and integration.... I'm not talking about censorship, exclusion, or steamrolling the spheres that operate in parallel to yours, but about ways to form stronger bonds between blogs that have similar content, & ways to present a single (expanding) topical microsphere that encompasses more voices. This might be a new third-party service for blogs, tighter search, more widespread use of search by bloggers, a consistent pattern of tagging, a declared community hub, or more personal activity (off site) between bloggers. What do you think? Write your own post & track back, or leave a comment here.

  • How do we tie this thing together more tightly?
  • How can we find related and relevant content more effectively?
  • How do we restructure the process so that there is a greater reward for collaboration / inclusion & a diminished incentive for excluding / marginalising “competing” blogs?
  • How might we encourage or achieve greater "vertical" interaction between blogs to complement the "horizontal" process of peer-to-peer microsphere formation?
Answers on a postcard please.....

Filed in:
Posted at 8:44 AM by John.
Monday, September 25, 2006
More good stuff from Hoctro... This time conditional widget code to display the "recent posts" and "recent comments" feed widgets on the main page of his blog, & leave them off the post pages. Streamline your post pages with Blogger Beta's new if / then syntax, (which is also handy for fixing your comment count.) As Hoctro points out, the conditional code can be put to work with any widget. Cool!

Filed in:
Posted at 4:54 PM by John.
My eye was recently caught by a Slashdot comment on Scott's code-table submission. The uber-geek commenter in question, in an attempt to belittle Scott's efforts, wondered "just how hard is it to view source".... (har-de-har!)

Not hard, of course, but in the Blogger of the Beta it doesn't work. Beta Help says that "in your published blog, all <b:section></b:section> and <b:widget></b:widget> tags will be replaced with <div></div> tags, which will have the specified ID. So you are welcome to refer to, for example, div.header or div.myList in your CSS if you want to."

Long & short of it.... View source is no longer going to reflect cut & pasteable examples of features that you want to add.

I bring this up because it presents an opportunity for collaboration & for the sharing of widget code. Singpolyma has taken a major and significant step down this road by establishing & publishing a proposed shared standard for javascript widgets known as widget data. Excellent.

Might I also propose a "beta code" category on the wiki for swapping & sharing beta widgets?

Filed in:
Posted at 8:30 AM by John.
Sunday, September 24, 2006
Another very useful hack for Blogger Beta from Michael at Basang Panganip. This one takes advantage of Beta's new if - else architecture to correct the grammar of the comment count (0 comments, 1 comment, 2 comments etc). Michael also has a helpful review of the history of the hack, & catalogs your existing options for getting this done if you're not on beta yet. Great stuff!

See Also: Freshblog: Grammatically Correct Comment Count

Filed in:
Posted at 2:38 PM by John.
Saturday, September 23, 2006
Brian Clark at Performancing compares the two, & finds that, in the long run, 'tis better to be Del.icio.us'd than to be Dugg.

Creating content with the bookmark in mind tends to make you concentrate more on delivering truly useful resources, rather than just pulling stunts to pull traffic. Getting the right type of traffic (rather than just tons of traffic) is one of the main keys to a successful blog. So, aim for getting a bookmark, and you just might get Dugg too.

Go read the 3 reasons why.... As you know I'm philosphically pre-primed to agree with Brian that quality content is the critical issue. Whether you carefully craft your posts so that they get bookmarked, so that your ads get eyeballs, or so that they make sense and can inform / persuade your reader, the important part is that you craft (the majority of) your posts, rather than slapping up the first thing that comes to mind. Say it with me... Content then Linking then Ranking.

Filed in:
Posted at 5:25 PM by John.
In a post that I missed at the time, Phydeaux3 has decoded the info required to determine label-feed addresses in Blogger Beta. Now you can set your beta blog up so that readers can sub to your labels!

Filed in:
Posted at 5:08 PM by John.
Wednesday, September 20, 2006
In the ceaseless quest for stuff to roll into my sidebar, I thought I'd try that old standard, the reader poll. Beloved by all media, it's found a natural home in the blogosphere. And why not: it's a great way to make your blog interactive, foster a sense of community plus garner some insights into your readership.

The downside is that (as any losing politician will happily point out), polls are meaningless, arbitrary, prone to abuse, afford only a snapshot view and, for participants, finding out the results isn't exactly exhilarating. From a bloghacking perspective, the widespread availability of good quality in-blog polling options means it's very quick and easy to get a polished result with hardly any effort. Honestly, where's the fun in that? So I thought I'd turn all that on its head with integrated prediction markets.

Read on for how to build a deeply-committed interactive community around your blog, using the latest ideas in applied economics and cutting-edge web technology.

The Prediction Market Concept

The idea behind prediction markets is that if you want to know what will happen, you should let people speculate (ie bet) on those outcomes (or events). The hope is that the prices in the market represent the distilled aggregate assessment of the underlying odds (or probabilities) by informed participants. The net effect of a large group of people putting money on the line is that it encourages individuals to divulge their private information, plus movements in prices (ie changes in odds) are quickly assimilated by the market.

These markets have been put to use (very successfully) in a range of human activities. Some of the more notable ones include picking the US president, picking Oscar winners and picking champion sports teams. There was also the unfortunate matter of the US Defense Department's Policy Analysis Market, designed to predict coups and the like. It was lambasted as a "terrorism futures market" and unceremoniously dumped. Of course, big, forward-looking companies like Google are not stymied by knee-jerk popular opinions and so use these internally.

Prediction Markets as Polls

Generally speaking, a poll lets you pick one option (from a set), once. Some might ask about your preferences (not open to speculation), but a lot ask you to make a prediction about the future or other unknowns. This is where Prediction Markets come to the fore.

You need to set up a contract-type for each alternative, with a "pay-off" (say, $100) if that outcome eventuates. You then release a fixed number of contracts of each type (say, 100), and let readers buy and sell at whatever price they wish. If someone thinks the chance of an event is 15%, then they'll pay $15 for a $100 pay-off. (That is, a 15% shot at $100 is worth $15.) If a second person thinks the odds are really 30%, then they'll pay $30 - and the first person will gladly sell them the contract. In this way, market mechanisms ensure that prices tend towards the best assessment of the underlying odds. This is sometimes dubbed "the wisdom of crowds".

Participants can trade as often as they like across whichever outcomes they choose. Typically, they would have a "portfolio" of events at different prices, and they would buy and sell contracts as news comes to hand. When the event materialises, the market owner "closes out" the contracts - the "winning" contracts pay out $100 each and all the rest are worthless. Savvy pundits will make sure they're holding more of the winning contracts than anyone else - and they would have bought them at a cheaper price. (Hey, if you just paid $99 for a winning contract, you're hardly ahead, are you?)

So, you can see why prediction markets out-perform polls: you can "spread your bets" across multiple outcomes, revise your estimates continuously and see what "odds" (prices) others have set. As a blog publisher, this translates into more readers, more monitoring activity, more discussion and an emotional commitment to keep coming back as news comes to hand.

Implementing Prediction Markets

This might be sounding all a bit pie-in-the-sky, right. Well, as they say on the cooking shows, here's one I prepared earlier. The Aussie Rules Misbehaviour Market is a prediction market I set up last month that allows people to speculate on which football club will next see a player appear in court. (We have a problem with criminality amongst professional footballers where I live.) Check out the market link. You can see there are 16 mutually-exclusive options (one for each team in the league). There are presently 13 participants in the market actively trading contracts which, in turn, determines prices. The only footballers in court this past month have been for prior matters (which don't count under my rules), so I haven't "closed out" the current contracts. Once that happens, I'll start it all up again for the next round of speculation.

My technology partner in this dubious exercise is CrowdIQ, a Web 2.0 startup that, amongst things, makes a damn-fine free prediction market product. They host dozens of markets in all sorts of sectors, and I heartily recommend them for this purpose. Please note, this is play money only, so it's egos only at this stage. Still, that's a sufficient motivator to get things rolling.

CrowdIQ makes it easy to setup a market - one minute to create an account and perhaps ten minutes to get the market up and running. It has nice financial engineering flourishes: an IPO (Dutch auction style, like Google) to set the initial prices, limit buys/sells and short selling.

More importantly, they publish the market as an RSS feed. For those of you using Blogger Beta, you should be able to just plug the feed into a sidebar widget and get the information flowing. (Or, install it using WidgetData for nice presentation.) They also have a REST API, which means you can programmatically poke and prod at your market, if you so desire.

Collective Wisdom - Prediction Market Blog Integration

Not surprisingly, I wrote my own hack - humbly dubbed Collective Wisdom (source code) - for rolling in the information. The goals are two-fold. Firstly, to display a formatted list of current contracts and prices in the sidebar. Secondly, to highlight keyword occurrences in the blog text, with current prices and links, much as the financial media does for company name mentions.

While I sweet-talked the principals at CrowdIQ into adopting JSON (Hi, Chris!), there's still some technical foibles, so I run their RSS feed through Singpolyma's Ning converter. The client-side script uses that info to build a drop-down menu with the current contracts and prices, and extra details in the mouseover. It also goes through the blog page, looking for text that matches a contract name and then inserts a link to that contract along with price and other details.

At this point, you might want to check out an example page. In the sidebar, check under "Informants' Tips" for the contract menu. In the blog text, look in the second block of quoted text. You'll see that two team names (Cats and Tiger - it's a feline article) have been dolled up with a fancy stock code and price and linked to the contract on CrowdIQ, with more details in the mouseover.

The Getting of Wisdom

Want that for yourself? Once your market is up and running on CrowdIQ, install this code into the header of your blog (or in the external script, if that's how you arrange things):


<script type="text/javascript" src = "http://ghill.customer.netspace.net.au/wisdom/wisdom-dev-v01.js">
</script>

<script type="text/javascript">
function setCollectiveWisdom()
{
// Collective Wisdom Parameters

crowdRSS='http://www.crowdiq.com/opex/rss/ ...xml'; // URL of market RSS feed
crowdNode='collective-wisdom'; // ID of div to put list
scanNode='main-content'; // ID of div to scan for keywords

doHighlight=true;
doList=true;

return;
}
</script>

Update: Code truncated on suggestion of Singpolyma (thanks, chief!).

The options are straightforward: crowdRSS is the URL on CrowdIQ for your market's RSS feed. crowdNode is the ID of the element where the list will be put (see below). scanNode is the ID of the div to scan for keywords. Use this to stop the script interfering with other elements. Depending on your blog template, it might be something like "main-body" or "main" or something. The do... variables simply turn on/off their respective functions. Lastly, if you don't want the highlighting code to use the whole text of the contract name, you can define reN as your own custom regular expression. (In my case, I only wanted to use the last word of the contract name eg "Melbourne Demons".) Don't touch the launch code.

NB: If you don't already have the generic AddLoadEvent utility function defined, copy/paste the ten lines or so into your header too.

Next, in the body template of your blog, put this where you want to see the contract menu appear:

<p id="collective-wisdom"> ... fetching prices from CrowdIQ ... </p>

Change collective-wisdom to match the crowdNode variable above, if you don't like that. And you can use whatever loading message you like.

Finally, all the Collective Wisdom elements have a class associated with them, so you can make them spiffy with CSS. For starters, try putting this into your CSS file (or CSS block in your header):


/* ---( Collective Wisdom ) --- */

.CIQ-title
{
font-size: 120%;
text-align: center;
}

.CIQ-highlight
{
border-bottom: black 1px dotted;
font-weight:bold;
}

a.CIQ-highlight
{
font-weight: normal;
cursor: help;
background: #e0e0e0;
}


While this is highly-experimental, it seems to work okay under IE (eventually ... *wince* ...) and FireFox. The highlighting function is a bit clunky in that it does a search/replace on the entire document.body.innerHTML, causing a brief re-render at the end of the page load - not ideal, so I'm open to suggestions for improvements. Update: Use scanNode (above) to restrict the search/replace to your blog's main content div. It will now only highlight the first instance of a match, to save repetition throughout the blog.

Also, if I were a bit smarter (or less lazy), I'd have figured out how to re-purpose the FreshTags code for displaying different types of lists, including clouds which would be neat.

Well done on sticking through to the end of this monster post. I hope that you're inspired to set about being the first to create a prediction market based around your blog's topic. Any suggestions, feedback etc welcome - along with ideas for a Freshblog market! "When will Blogger support categories?" would have been great, but that's closed out now ...

Filed in:
Posted at 10:56 PM by Greg.
Imp reports that the bug preventing Beta users from commenting on regular ole' blogger blogs (like this one) has been fixed. The details are on Blogger Buzz, courtesy (as the post title suggests) of the Grogmaster himself. Arrr! Get that man a parrot...

...or maybe he already has one?

Filed in:
Posted at 6:18 PM by John.
The Blogger Hacks Wiki continues to roll on, gather steam & justify our faith in the format. The latest addition, courtesy of Tom Thomas, is a method for adding a full-service Digg It button to your post footer. This hack suffers from the same chicken / egg issue that other efforts do....
Chicken
: You need a published post to bookmark something to Digg.
Egg:
You need the URL of the dugg story before you can add the button....

This, however, is not Tom's problem, and his workaround has the required outcome. Perhaps a little too much effort if you're looking to have every story dugg, but if you want to raise the profile of a few key posts, this is a workable option.

Update 9/23: As Ariel points out in the comments, there's a Feedburner Feedflare by Ross Belmont that will include "Digg It" links on your posts. For instructions on adding Feedflare to your blog instead of / as well as your feed, see Feedburner's quickstart.

Filed in:

Related Posts
Posted at 5:14 PM by John.
Tuesday, September 19, 2006
Haloscan trackback / commenting is another feature / function that needs to be wrapped in a widget and edited before it will play effectively on Beta. Logical Philosopher has written a how-to for making Haloscan comments beta-friendly. Any thoughts out there on deploying the trackback feature in beta?

Update10/11: The wait is over. See today's post for pointers to code that is beta-friendly.

Filed in:
Posted at 1:46 PM by John.
Monday, September 18, 2006
FreshTags users who jumped ship early to the new Blogger platform have cause to rejoice: Singpolyma has coded up a version that supports the new environment! (Instructions and explanation on his post.) Kudos and praise to Singpolyma for this important breakthrough.

The new release is based on his earlier FreshTags version and integrates the tags and posts as sidebar widgets. The focus is on supporting the long-standing technique of tag passing between blogs. This is where readers select tags on one blog, follow a link to another and find that the same tag is already selected on this new page.

Singpolyma has also finally achieved what so many have clamoured for, but none has seen in Blogger: full text of tagged posts on a single page. That is, when a reader selects a tag (say, "folksonomy"), all posts with that tag are listed in full on the same page. There have been various work-arounds whereby summary text is provided using a page from the distant past, but this uses the new label search feature in Blogger Beta. As long as your tags in Delicious sync with your Blogger labels, the trick is flawless.

Under the hood, we're also seeing the first instantiation of important changes to the way FreshTags variants store, access and share configuration information. This new standard will be further developed and support a wider-range of future applications, particularly around multiple FreshTags-enabled widgets on the page.

With Singpolyma's recent development of Beta-friendly "FreshRolls" (a "tag-aware" extension to Singpolyma's Wrinks blogroll/blogring app), we're seeing a distinct ramp-up in FreshTags development. To keep that momentum going, if you're a FreshTags user still on Blogger Classic and keen to help try out some new presentation styles (eg asynchronous tag clouds and nested post titles), please leave a note and I'll see what we can do.

Filed in: , , ,
Posted at 3:46 AM by Greg.
Sunday, September 17, 2006
Google have added tabs to their personal page, allowing you to keep track of many more sources of information and to organise them by topic. As the comments at Paul Stamatiou note, though, how useful are tabs when compared to an RSS reader?

via Google Operating System

Filed in:
Posted at 3:09 PM by John.
Wednesday, September 13, 2006
As expected, Blogger Beta has kicked up a storm of blog hacking activity. It's appropriate to start the round up with the irrepressible Ramani at Hackosphere. Buoyed by being listed as a Blog Of Note, he's been burning the candle at both ends, calling in sick and leaving the phone off the hook in his efforts to roll out hack after hack.

First off is the multi-style label widget. This button lets your readers choose how your labels are presented to them - list, drop-down or cloud. It doesn't get much more flexible than that!

If you like clouds for your label, you might want to check out the Blogger Beta implementation of clouds at Phydeaux3. This is a great looking hack and while it takes a bit pfaffing about the end-result looks pretty sweet.

Not satisfied with that? How about using your labels as tabs. That's right, tabs running across the top of your blog that let your readers select content. Hoctro has the write up (via Hackosphere). I've just discovered Hoctro's blog - causing quite a bit of buzz amongst the blogoscenti - where there's also a post on re-conceiving your labels as a breadcrumb trail. That's two fresh takes on labels. Hence the buzz. Check it out.

Ramani's also been a busy boy with the post content too. He has an article on how to show only post titles (expandable with a single click of +/- button), or show only post summaries (linking to the full content with a read more ... link). Or, both at once. This packs a lot more information per page on both the main/archive pages and the label results pages.

Lastly, if you're still struggling to corral those pesky comment feeds into your sidebar widget, take a page out of 失踪's book and employ Dapper to scrape your comments for you and turn them into an sidebar-ready RSS feed. Neat.

We know this is just a taste of some of the work going on out there - if you've got hackage to share, we'd love to hear about it so please leave a link below.

Filed in:
Posted at 10:07 PM by Greg.
Thursday, September 07, 2006
It's not often that we get to see disruptive technologies emerge. The new Dapper service (from Dappit) is firmly in that category. Billed as a mashup tool, it allows you to grab content simply from static or dynamic web pages and integrate it in a mind-boggling range of ways. Services like this will further breakdown the distinction between content providers and consumers and force an overhaul of existing customs and business models around web content.

What Does Dapper Do?

Similar to Ning, it's a managed service to let people create and host their own applications, or clone and edit others'. At its heart, the service is an online web scraper; it lets you extract content and shunt it around. You nominate elements on the target page you'd like, and it will go and fetch them for you.

In principle, all you ever needed to do this was the good old Unix utilities wget and grep (this is how FreshTags started, in the days before JSON). Oh, but you'll also need a net-connected, secure machine. And the ability to write hairy regular expressions. Plus a lot of patience. And if you were thinking of actually doing anything with the results, then you'd need something like a PHP server and working knowledge of Perl, with various libraries for handling outputs and transformations. Not to mention a managed website to publish it all through. Yeah, it's sounding like one big headache, isn't it?

With Dapper, you simply create a "Dapplication". It's a very straight-forward, step-by-step process, with the requisite Web 2.0 flavour. You submit some examples of your target page and use a simple "point and click" interface to nominate the elements you wish to extract. These elements could be tabular data, links, text and so on. The Dapper system guesses (and is, in turn, corrected by you) to figure out the underlying structure of the page. You assign some names to these fields and (optionally) group them. That's it - your Dapp is done.

A really neat feature of the system is the way you can specify inputs for the target page (you rarely want to scrape exactly the same page each time). If your target page uses URL parameters, you can instruct Dapper to pass those in for you using curly brackets eg

http://somewhere.com/action?display=printer&mo={month}&da={day}&range=allusers

This will cause Dapper to prompt users for month and day variables. Or, you can nominate the existing input fields on the target page for data insertion in the same way. It's really very simple and intuitive.

Once it's up and running, there is a truly dizzying array of options for getting the data out: the usual vegetable soup of web standards (XML, HTML, JSON, YAML, RSS), plus some novel ones (email, image loop, Google Gadget and an alert mechanism). What's more, Dapper's not shy about accepting requests. The service was lacking callbacks for the JSON feed, which make it easy for lightweights like me to play with the data. I emailed the developer, Jon Aizen, and within a couple of hours it was done! Thanks, Jon!

Case Study

To test out the service, I built a Dapp in a few minutes to extract tabular data from a particular website (I'm currently negotiating an informal content and link-sharing agreement with a website related to Speccy, so I'm afraid I'll have to keep a bit vague). One of the roadblocks on negotiation is that some data I want from the other site is locked up in their SQL database, only served up as an HTML table by PHP. It would be very difficult for me to get at this. With Dapper, I was able to nominate the fields and extract the data I wanted, lowering the hassle involved and improving the chances of concluding a mutually-beneficial deal.

Dapper has two mechanisms to let you fine-tune your content selection: a slider that selects how "restrictive" it is in guessing what you're after, and a container that limits the grab to certain elements. Unfortunately, in my case, it wasn't too successful with either and I was getting unwanted extra content. I tried some other pages and both mechanisms worked as advertised; I must have been dealing with a pathological site. In the end, I opted to "over extend", grabbing more content than required, and knocked together some regular expressions to parse out the bits I wanted. It works beautifully: I can enter the parameters and Dapper builds the appropriate URL (with parameters inserted into the query string), fetches the pages, strips out the data I want (plus a bit more) and hands it back to me a JSON object - with a callback function!

With a bit of confidence and some practice, I'm sure that anyone can extract content from a page of interest and display it on their own page (perhaps as an iframe element or an image loop, for simplicity). Blogger Beta's new RSS display widget really open things up. This, I believe, is the disruptive element of the technology.

Implications For The Web

Many of us involved in blog hacking are comfortable with content being passed around like this; we provide RSS and Atom versions of our content (plus social bookmarking of titles and summaries) and actively encourage others to pick it up. We also have an informal code for link sharing ("link love") that defines norms and governs behaviour. Dapper knocks all that on its head, and provides new challenges for content production and consumption.

For starters, web feeds are a push technology; publishers elect to syndicate their content in this way. By contrast, Dapper is a pull approach, whereby others suck content out of your site without your permission or knowledge.

Despite what the odd (and I mean weird) lawyer might believe, no one can control who you link to on the web. But extracting slabs of content ... that's different. Clearly, new customs and practices - not laws - will have to emerge to deal with this. (I believe existing intellectual "property" laws are simply not up to it, being too clumsy and blunt.) Dapper has gone some way to facilitate this, with its "empowering content providers" (ie site-based access restriction) form. Hopefully, more content providers will see the benefits of their users figuring out new and powerful uses of their content rather than just blocking requests from the dappit.com domain.

(For what it's worth, in my case, I'm not using the tech as an excuse to barge in and pillage the target site. Instead, I see it as a means for lowering the barriers to exchange and thus (hopefully) allowing a fruitful partnership to develop where it might not have been possible before.)

Of course, content-sharing issues don't arise if you scrape your own stuff. You could create Dapps to parse out interesting bits and pieces from your blog and offer them as emails, alerts, feeds, looped images and the like. Nearly all blogs - and wikis for that matter - employ templating of some kind. This means that you can be more-or-less guaranteed that the Dapper will have a good shot at easily parsing the underlying structure. Headings, profiles, post titles, dates, leading paragraphs, links, quotes, tags, authors, comments, timestamps ... all that static content (ie not generated by JavaScript) is ripe for extraction and syndication.

Another suggestion is to look for stuff with RSS or Atom feeds. Chances are, if the publisher is happy pushing out content in this way, they'll also be cool with you grabbing it with a Dapp. Ditto for content released under (some) Creative Commons licences. And, hey, you can always ask: I'd love to know if someone's built a Dapp to do something novel with my content! I'm sure that proper attribution, link back, notification and being respectful of server/bandwidth load will all be part of the basis of an emerging Dapper netiquette.

If this post has piqued your interest, please go ahead and check out the growing list of Dapps already available, read the Dapper blog or just dive right in and create your own Dapp. I'm sure that within five minutes you'll grok the disruptive nature of this service and get a glimpse of the jaw-dropping possibilities.

Filed in: , , , ,
Posted at 2:08 AM by Greg.
Tuesday, September 05, 2006
Vivek Sanghi has built on Ramani's social bookmarking syntax for beta & published the code to add 6 social-bookmarking icons to your post-footer. These include Digg, Del.icio.us, Furl, Spurl, Simpy & My Yahoo. There are 11 others Vivek's post footer too. Fine work, Vivek! Great to see existing hacks being re-engineered for beta, & new contributors....

Filed in:
Posted at 10:48 AM by John.
Sunday, September 03, 2006
More great stuff at Hackosphere.... this time a couple of posts that note a change in the structure of Beta's label URL's so that they follow the rel-tag microformat

As rolled out w/ the beta, the format was:

http://yourblog.blogspot.com/search?label=xxx

which has now been amended to:

http://yourblog.blogspot.com/search/label/xxx

making the tagword the end of the URL, and the whole thing kosher as related to the specifications for tags. Methinks that having the query part of the URL back in play will open up some modification possibilities that had previously been closed off too.....

Filed in:
Posted at 8:06 PM by John.
Saturday, September 02, 2006
The Blogger Help pages with the meat of the new template language are live. These include Widget Tags for Layouts and the mind-bendingly expansive full list of layout data tags.

Let the widget-authoring commence!

Some great posts that will help you get started include:
Filed in:
Posted at 10:17 AM by John.
Friday, September 01, 2006
Ramani at Hackosphere has spent some time immersed in the new HTML interface, and has developed a method for formatting Blogger's default list of labels as a drop-down menu. In addition, he's also restructured the label results display so that it only shows permalinked post titles (rather than the default whole posts), which will be a much more user-friendly output style once we all end up with a large collection of labelled content. All good stuff...


Filed in:
Posted at 9:35 AM by John.

eXTReMe Tracker