Ruby script to show the (Dutch) weather forecasts in your console

Just a quick ruby script to show the weather forecasts for next week using the html screen scraping gem nokogiri and a table formatter gem called hirb.

require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'hirb'

table = []
headers = []
page = Nokogiri::HTML(open("http://knmi.nl/waarschuwingen_en_verwachtingen/"))
page.css(".realtable")[0].xpath("tr").each do |tr|
  row = []
  tr.xpath("th").each do |th|
    headers << th.text
  end
  tr.xpath("td").each do |th|
    row << th.text
  end
  table << row if row.count > 0
end

puts Hirb::Helpers::AutoTable.render table, :headers=>headers, :description=>false

The result in my ZSH (using theme agnoster) looks like this:

knmi

Macbook + Windows + VS2010 + Resharper

Just a quick reminder for myself…

Recently I bought myself a macbook pro but as I am a .NET developer I also installed Win7 + VS2010 + Resharper. Until now I plugged in an external USB-keyboard just to be able to press Alt+Ins (Resharper’s shortcut for Generate Code) as there is no Insert key on a macbook. Today I finally spent a couple of minutes to find the key combination for the insert key so that I don’t need the other keyboard anymore.

Ins = Fn + Return
Alt+Ins = Fn+Option+return

How to determine if a ToolStripMenuItem has handlers registered to its Click event

In one of the WinForms applications I maintain there’s something that annoyed me for quite some time (but I never took the time to fix it). This application has a
large menu with lots of items. It contains some wellknown submenus (injection points) and there’s a mechanism to let modules dynamically inject their menu items.
The fact is that the wellknown submenus were rendered even if they didn’t have any children resulting in useless menu items cluttering up my screen.

So all I had to do is remove the menuitems that have no children and that have no registered handlers responding to the Click event, sounds easy. The problem is that
there is no easy way to determine if there are handlers registered to the click event of a ToolStripMenuItem. This seems to be by design, you can only add and remove
handlers to an event but you cannot iterate them or so. By using Reflector I found out how the ToolStripMenuItem works internally. System.Windows.Forms.ToolStripMenuItem
is derived from System.Windows.Forms.ToolStripItem and then from System.Component.ComponentModel.

When you write in your form:

// In the actual form
myToolStripMenuItem.Click += new System.EventHandler(OnMyToolStripIMenuItemClick);

the ToolStripItem class adds the handler to the protected property Events of its base class (ComponentModel) using EventClick as a key.

// In ToolStripItem
internal static readonly object EventClick;

public event EventHandler Click
{
   add { base.Events.AddHandler(EventClick, value); }
   remove { base.Events.RemoveHandler(EventClick, value); }
}

// In ComponentModel
protected EventHandlerList Events
{
   get
   {
      if (this.events == null) this.events = new EventHandlerList(this);
      return this.events;
   }
}

My first idea was to create a derived class MyToolStripMenuItem so that I could access the Events property:

public class MyToolStripMenuItem : ToolStripMenuItem
{
   public bool HasRegisteredHandlersForClickEvent
   {
      get
      {
         // THIS IS NOT WORKING!!!
         return this.Events[EventClick] != null && this.Events[EventClick].GetInvocationList().Length > 0;
      }
   }
}

The problem is the the key used to access the handlers in the EventHandlerList is marked as internal so that my derived class cannot use it.
Fortunately reflection can solve this, it’s more or less a hack and it is not guaranteed to work in future versions of the framework but for now
it does the job exactly as I want to. I don’t need a derived class anymore to access the Events property so I created a static helper method
somewhere in our framework:

public static bool HasRegisteredHandlersForClickEvent(ToolStripItem item)
{
   // Get the protected Events property and the internal static readonly field EventClick via reflection.
   var eventsProperty = item.GetType().GetProperty("Events", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.FlattenHierarchy);
   var keyField = item.GetType().GetField("EventClick", BindingFlags.Static | BindingFlags.NonPublic | BindingFlags.FlattenHierarchy);

   var eventList = (EventHandlerList)eventsProperty.GetGetMethod(true).Invoke(item, null);
   var theDelegate = eventList[keyField.GetValue(item)];

   return theDelegate != null && theDelegate.GetInvocationList().Length > 0;
}

And finally the method to remove the nonworking menu items (ie. no children and no handlers registered for click event):

// First call:  RemoveNonWorkingMenuItems(mainMenu.Items);

public static void RemoveNonWorkingMenuItems(ToolStripItemCollection toolStripItems)
{
   if (toolStripItems == null) return;

   var toBeRemoved = new List< ToolStripItem>();
   foreach (ToolStripItem item in toolStripItems)
   {
      var dropdownItem = item as ToolStripDropDownItem;
      if (dropdownItem != null && dropdownItem.HasDropDownItems)
      {
         // Recursive walk through the submenus.
         RemoveNonWorkingMenuItems(dropdownItem.DropDownItems);
         // After all the children are removed,
         // this item may be empty.
         if (!HasRegisteredHandlersForClickEvent(dropdownItem) && !dropdownItem.HasDropDownItems)
         {
            toBeRemoved.Add(dropdownItem);
         }
      }
      else
      {
         if (!HasRegisteredHandlersForClickEvent(item))
         {
            toBeRemoved.Add(item);
         }
      }
   }
   foreach (ToolStripItem item in toBeRemoved)
   {
      toolStripItems.Remove(item);
   }
}

Sorry for the layout of the code snippets but I hope you find it useful.

p5rn7vb

Text analysis in Lapsang

Lapsang is my personal programming project that will download article titles from an RSS-feed and then recommends which ones are probably the most interesting for the user. Initially it will only use the title for analysis (later versions may use the actual article content, url, poster, etc.). For every title the user must tell if it sounds attractive to read and based on this input the program will learn the interests of the user. As soon as the user rates a title, the scores of the individual words of this title will be adjusted. For attractive titles the word score will increase and for uninteresting titles it will decrease. Based on this word scores the program can give a recommendation for new titles. As soon as a title contains words that the program saw before it will use the word scores to calculate a recommendation.

Not all words in a title will be used for scoring. Stop words like ‘a’, ‘the’, ‘and’, etc. can be ignored as they add no significant value. How about singular and plural, should ‘language’ be considered the same word and thus have the same score as ‘languages’? And how to handle verbs, how should I deal with present and past? For simplicity matters I will assume all titles will be in English as multilingual text analysis is way over my head for now :-) Probably it is sufficient to store only the stem of a word. There are various stemming algorithms available and I found a C# implementation for Porter stemming that I could integrate in my own tokenizer. As this text analysis is getting more and more complex I decided to take a step back and stop my own implementation and have a look at already available open source libraries.

Lucene.NET (a port of the Java search engine) is a mature project and contains various text analyzers. Maybe other parts of this library are useful in my program, too, but first I will focus on the text analysis. I downloaded the binaries from the Lucene.NET website to start experimenting. The library is literally ported from the Java version (easier to maintain for them, they say) so the API feels very Java-ish and not so .NET-ish.
The download contains the core Luce.Net.dll and various additional libraries for more advanced analysis purposes.

I created a temporary project and added references to the Lucene.Net.dll and the Snowball.Net.dll (for language-specific analysis using stemming). The following code shows the results using various analyzers to tokenize a text:

using System;
using System.IO;

using Lucene.Net.Analysis;

namespace TryTextAnalysis
{
class MainClass
{
static void Main (string[] args)
{
	string title = 	"My husband is a programmer; I have no idea what that means.";
	Console.WriteLine(title);			
	
	ShowTokens(title, new WhitespaceAnalyzer());
	ShowTokens(title, new SimpleAnalyzer());
	ShowTokens(title, new StopAnalyzer());
	ShowTokens(title, new Lucene.Net.Analysis.Standard.StandardAnalyzer());
	ShowTokens(title, new Lucene.Net.Analysis.Snowball.SnowballAnalyzer("English", StopAnalyzer.ENGLISH_STOP_WORDS));
}

static void ShowTokens(string text, Analyzer analyzer)
{			
	Console.WriteLine(analyzer.GetType());
	TokenStream stream = analyzer.TokenStream("text", new StringReader(text));
	while (true)
	{
		Token token = stream.Next();
		if (token == null)
		{
			break;
		}
		Console.Write(" [{0}]", token.TermText());
	}
	stream.Close();
	Console.WriteLine();
}
}
}

The results when you run this program are as follows:

My husband is a programmer; I have no idea what that means.
Lucene.Net.Analysis.WhitespaceAnalyzer
 [My] [husband] [is] [a] [programmer;] [I] [have] [no] [idea] [what] [that] [means.]
Lucene.Net.Analysis.SimpleAnalyzer
 [my] [husband] [is] [a] [programmer] [i] [have] [no] [idea] [what] [that] [means]
Lucene.Net.Analysis.StopAnalyzer
 [my] [husband] [programmer] [i] [have] [idea] [what] [means]
Lucene.Net.Analysis.Standard.StandardAnalyzer
 [my] [husband] [programmer] [i] [have] [idea] [what] [means]
Lucene.Net.Analysis.Snowball.SnowballAnalyzer
 [my] [husband] [programm] [i] [have] [idea] [what] [mean]

The SnowBallAnalyzer suits my needs so now I can start coding the word scoring algorithm.

The birth of Lapsang

In my previous post I described a personal programming project I was planning. I wanted to use Ruby as the programming language of choice, but there are a couple of reasons for going back to C#: first, I couldn’t find a working GUI toolkit for Ruby (at least working on my OSX machine). I spent a full day and decided to move back to System.Windows.Forms (I can use the Mono implementation). Another reason is, as Michel already commented, a new language will slow me down too much and at this moment I prefer a working application above a new language.

The hardest part of a new project is the name and after a deep thought I came up with Lapsang (a black Chinese tea).

When downloading the data from the RSS-feed from HackerNews I saw there are only 25 items in the document. I need more items to train the network so I decided to make a quick-and-dirty screen scraper to browse the website and download the items of multiple pages. This will probably not be part of the final product, I prefer a pure RSS feed (or multiple, maybe via an OPML file) as my datasource.

My personal programming project

Introduction

Lately I am following the newsfeed Hacker News (http://news.ycombinator.com) where members can post links to interesting (IT related) articles. There are a couple of new items per hour and members can vote these items up or down in the list. Not all articles are interesting for me and my time is limited so I must choose which articles to read. The site shows 30 items per page, only the title and the website it’s referring to and no description. Based on the title I select the most interesting items and open them in a new tab in the browser, read them and close the tabs and finally move on to the next page, etc.

A cunning plan

My idea is to make an application that can help me by suggesting interesting articles based on my previous selected items. Over time it should be better in giving suggestions. I finally have a reason to dive into neural networks. Ten years ago during my traineeship in Bangkok I worked in the Software and Language Laboratory of NECTEC and colleagues were working on Thai OCR and text-to-speech software using neural networks. I got fascinated but never found a reason to use it in my professional developer career.

I have been developing applications in C# for the last decade so it’s time for something new! The application will be build using Ruby, as this is a language that is on my ‘learning’-wishlist for a couple of years. I am not sure what libraries to use for the user interface yet. At home I use OSX and at work I use Windows and it would be nice if it runs on both platforms. I still having trouble getting wxRuby up-and-running at home so I will have a look at IronRuby in combination with Mono (WinForms) or maybe another cross-platform GUI toolkit. At least this definitely is a reason to use a pattern like Model-View-Controller or Model-View-Presenter and make the UI layer as thin as possible.

The user interface will have a list of titles and a button to open a webbrowser with the selected article. I want to rate every title probably by dragging new items into one of three buckets/lists (must-read, maybe, not-worth-my-time) or by using 1-3 stars. The title and the score will be used in the learning part of the application. When the application downloads new items from the newsfeed, it will give a suggested score to each of them that should be visible to me (eg. colors, size, stars, sorted). First I will make it work, then I will make it beautiful.

Conclusion

This is a project with a lot of new things to learn (for me and for the application), but it is definitely a challenge :-)

Command line Twitter

Just for the ‘geekiness’ of it I was looking for a way to use Twitter from the Bash shell prompt.

To read your twitter home timeline:

curl -s -u userid:password -g http://api.twitter.com/1/statuses/home_timeline.atom 
| xpath -q -e '/feed/entry/title/text()' 
| tac

Curl is a tool to transfer data from and to a server. The -s option is to keep it silent. -u userid:password is used to send your twitter account/password (warning: this is HTTP basic authentication so not secure). -g is used to get the date from the specified url, in this case the data is in atom format (XML). Pipe the output to the tool xpath (-q to keep it quiet) and select only the title of the entries. Pipe the output to tac (cat in reverse) to reverse the list of lines so that the latest tweets are at the bottom (as this is more natural at the command line, no need to scroll up to see the latest updates), one at a line (every title is prefixed by the sender). The result is extremely readable (at least if the ones you follow know how to express themselves).

And to post a tweet:

curl -u userid:password -d status="Hello world." http://twitter.com/statuses/update.xml

The web interface of Twitter is much more convenient but at least now I know how to use the tools curl and xpath.

Hello world!

Welcome to TDev.org. This is my first post using WordPress to check if it is possible to use source code in my posts.

using System;

namespace HelloWorld
{
    static class Program
    {
        /// <summary>
        /// The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main(string[] args)
        {
            // TODO: Add some extremely useful code here.
            Console.WriteLine("Hello world wide web...");
        }
    }
}