Spare Cycles

Entries from January 2009

Eliminating Feed Noise

January 29, 2009 · Leave a Comment

Last month I did a major overhaul of my feed reader’s subscriptions. I was tired of the repeated content and stories I was seeing across tech feeds. Whenever a company had an announcement or bit of news, every single publication I followed would release a write-up.

I resolved to trim my tech feedss down to no more than 5 as part of an overall RSS pruning. Here were the criteria that I used to determine which feeds to keep:

  • I find myself gravitating towards that feed’s stories in general
  • Engaging editorial perspective
  • Provides a broad overview of relevant industry content
  • Provides actionable information
  • Professional (not gossip-y)
  • Does more than simply replay the same press releases out there

For tech news, this generally meant clearing out all of the personality-driven blogs: TechCrunch, Scobleizer, GigaOM, Silicon Alley Insider, Valleywag, etc. They’re all gone. The sacrifice is that I’m not totally up to date on the latest rounds of fundings of all the bay area start ups. I don’t know what the new feature is coming out of kyte.tv as soon as everyone else does, and I don’t have to sift through piles and piles and piles of unsubstantiated hear-say and gossip to find some relevant kernel of information.

For tech news I now just follow these sites:

  • TechMeme – it gives me a quality broad overview of what’s going on in the tech industry and pulls in the best of the sites that I dropped from my feed list
  • Wired – Quality original content alongside good coverage of major industry events
  • Ars Technica – There’s a lot of bloat in this publication, but I have to keep it around for its quality coverage of privacy, security, and innovation in tech.
  • ReadWriteWeb – The best, IMO, of the Web 2.0 coverage sites. I learn about new tools and their usefulness to me. This is contrary to TechCrunch which generally seems more concerned with the personalities and business implications of any new application released on-line. Plus RWW has more intelligent commentary than Arrington’s blathering.
  • Slashdot – Simply because I continue to learn interesting things from this site that I generally don’t find anywhere else.

There are some other publications that I’ve enjoyed for years which didn’t make the cut. The Register is one. There’s just TOO much content coming through on their main feed. It’s too much noise.

If you had only five tech feeds you needed to follow, what would they be and why?

Categories: Uncategorized

iPhone and iCal ToDo Sync with Appigo

January 19, 2009 · 2 Comments

Today I finally have ToDo syncing between iCal and my iPhone. It’s a feature that I’ve had on my cellular phones since, oh… about 2003. It is something I lost when I switched to an iPhone, and is one of the features that I can’t believe Apple decided not to tackle.

Thanks to appigo’s sync software, and their ToDo iPhone app, I can now manage my tasks from my desktop and my iPhone. The downside is that I had to pay $9.99 to get the iPhone app and this feature.

Well done to appigo for picking up where Apple left off. Now I can get back to efficiently adding items to my ToDo list from where every I am, and very inefficiently checking them off.

Categories: Uncategorized

Playing around with maatkit

January 13, 2009 · 1 Comment

I was doing some MySQL query performance tuning on DGM Live today using maatkit. Maatkit is a toolkit for measuring, what else, MySQL performance. The two tools that I played around with specifically were the slow query log parser and the query profiler.

I don’t use perl a whole lot, so the first thing I had to do was install perl’s DBI module.

I jumped into the cpan prompt:

shell> cpan

And then installed the module:

cpan> install DBI

And then force installed DBD::mysql. I had to use force install so that when the tests to connect to my local MySQL server failed, it didn’t rollback the install. I know that the MySQL server is working and can be reached from my local box, so I don’t care about the tests failing:

cpan> force install DBD::mysql

I then exit out of the cpan shell and the maatkit scripts are now part of my executable path. First, I ran the mk-query-profiler script over a couple of sluggish queries that were stored in a .sql file. NOTE: It’s important that each query in the SQL file not only have a semi-colon at the end of the statement, but if you’re analyzing multiple queries then there as to be a full line of white space between the queries, not just a line break.

shell> mk-query-profiler --user backend --askpass --database dgmms ~/queries.sql

The output shows an overview of the information provided by the show status call provided by MySQL.

+----------------------------------------------------------+
|                      2 (3.1201 sec)                      |
+----------------------------------------------------------+

__ Overall stats _______________________ Value _____________
   Total elapsed time                        6.231
   Questions                                 2
     COMMIT                                  0
     DELETE                                  0
     DELETE MULTI                            0
     INSERT                                  0
     INSERT SELECT                           0
     REPLACE                                 0
     REPLACE SELECT                          0
     SELECT                                  2
     UPDATE                                  0
     UPDATE MULTI                            0
   Data into server                        944
   Data out of server                   169031
   Optimizer cost                        10919.360

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      6
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             137
     Range sorts                             0
     Merge passes                            0
     Table scans                             2
     Potential filesorts                     2

The key value that’s helpful to see here, either as a bulk value for all queries or on a per-query basis, is the optimizer cost. I want to get this number as low as possible. I can add a -s flag to my command to break the results out by query.

+----------------------------------------------------------+
|                   QUERY 1 (3.0744 sec)                   |
+----------------------------------------------------------+
SELECT COUNT...

__ Overall stats _______________________ Value _____________
   Elapsed time                              3.074
   Data into server                        478
   Data out of server                    11224
   Optimizer cost                         5459.680

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      3
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                              10
     Range sorts                             0
     Merge passes                            0
     Table scans                             1
     Potential filesorts                     1

+----------------------------------------------------------+
|                   QUERY 2 (3.0618 sec)                   |
+----------------------------------------------------------+
SELECT COUNT...

__ Overall stats _______________________ Value _____________
   Elapsed time                              3.062
   Data into server                        466
   Data out of server                   157807
   Optimizer cost                         5459.680

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      3
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             127
     Range sorts                             0
     Merge passes                            0
     Table scans                             1
     Potential filesorts                     1

+----------------------------------------------------------+
|                   QUERY 2 (3.0618 sec)                   |
+----------------------------------------------------------+

__ Overall stats _______________________ Value _____________
   Total elapsed time                        6.136
   Questions                                 2
     COMMIT                                  0
     DELETE                                  0
     DELETE MULTI                            0
     INSERT                                  0
     INSERT SELECT                           0
     REPLACE                                 0
     REPLACE SELECT                          0
     SELECT                                  2
     UPDATE                                  0
     UPDATE MULTI                            0
   Data into server                        944
   Data out of server                   169031
   Optimizer cost                        10919.360

__ Table and index accesses ____________ Value _____________
   Table locks acquired                      6
   Table scans                               0
     Join                                    0
   Index range scans                         0
     Join without check                      0
     Join with check                         0
   Rows sorted                             137
     Range sorts                             0
     Merge passes                            0
     Table scans                             2
     Potential filesorts                     2

The difference between these two queries is that one of them has a LIMIT clause to reduce the result set size, which is the reason for the difference in teh size of the ‘Data out of server’, but it doesn’t make any difference to the optimization cost.

This was a difficult query to tune because the data was joined across three tables that all had a high cardinality, and the row traversals were minimal. I determined that a GROUP BY clause on the largest of the tables was causing a lot of CPU consumption which was slowing the query down. In the end, I decided to break the larger complex query up into two simpler queries. The total execution time was cut down to almost a tenth of the original query execution time, and resulted in less work for the processor overall.

The query profiler didn’t lead me to that conclusion right away, but it provided a very useful tool to quickly try different approaches and test them for performance.

The query profiler can do more than just take a file or standard input of queries. It can also analyze the performance of queries run against the server, by aggregating the show status information for all the queries. You simply use the –external flag instead of passing in a file and press ENTER when you’re done profiling the server.

This is very handy for getting a quick sense of the database overhead on a single page or action in your web application. I used ab, the apache benchmarking utility, to load test the server and ran the query profiler against the load test to get a better picture of how many queries are run and their performance when the server is under heavy load.

The query profiler is only one piece of the maatkit toolkit, and I look forward to playing with the rest of the available scripts to further tune the database layer of my applications.

Categories: MySQL
Tagged: , , , ,

Purely Anecdotal #256: The Power of Facebook

January 7, 2009 · Leave a Comment

This is purely anecdotal…

But, recently I was reading on a friend’s blog about how much they liked the facebook chat tool and how it had changed their lives.

Basically, this person had never really used IM chat before. The reason was that it involved installing new software and learning a new communications paradigm.

Facebook’s footer toolbar forces chat on its users, and it does so in a way that’s more intuitive than the built-in chat interfaces in google mail, yahoo! mail, or AOL.

Facebook continues to grow exponentially and attract a wider demographic of users. It may struggle to monetize as effectively as more established content powerhouses, but the latent power of the platform to achieve experiences like that of my friend are really amazing successes that will continue to entrench Facebook as THE social network, regardless of its current size relative to MySpace, Google, or Yahoo! in terms of the number of users.

Categories: Uncategorized

Internet Authentication Needs To Change

January 5, 2009 · Leave a Comment

Today’s big tech news is the hijacking of several prominent twitter accounts. The hijack method isn’t confirmed, but it was most likely the result of a recent phishing exploit.

Twitter is very vulnerable to phishing attacks. In order for third parties to interact with a twitter account they need the authentication credentials. There are a number of useful services, Snaptweet for instance, which require your twitter user name and password to operate.

Twitter does not yet support a token-based API authentication protocol, though twitter has announced support for OAuth, but it has not been implemented yet.

Authentication is one of those internet-wide thorny problems. Not only is it a primary vector for technological security problems, but it’s vulnerable to sociological exploits as well. A technically secure application can be easily infiltrated simply by tricking a user to give you their password.

There are a number of initiatives on the internet to simplify the authentication problem, including Facebook Connect and OpenID. While “single sign-on” architectures may be ideal within an organization, like a large company or university, I think they’re inherently dangerous as a means of granting authentication potentially to any service on the internet.

I would like to see a key chain style approach to service authentication. This would have the following features:

  • Would maintain unique credentials for any service, preventing users from recycling passwords.
  • Would require identification for usage. Ideally this would incorporate eye or fingerprint recognition, or some other method of physically verifying the identity of the user, while falling back on a passphrase.
  • Would be portable, by syncing the key chain with a mobile device or networked drive.
  • Usage of the key chain away from an unregistered computer (like your home or office computers) would result in text, email, or voice mail notifications sent to points of contact. You would get alerted by unauthorized usage.
  • Implementation of the key chain would be built into the browser’s architecture.
  • Usage of your keychain would be logged, and you could look at a usage statement similar to a credit card statement at the end of the month.

Obviously there are a number of big tech hurdles there, but overcoming them would lead to an improvement of the security of the internet as a whole.

Categories: Security