Reading URLs from OS X clipboard with PyObjC

I’m definitely lazy. I don’t like making efforts bigger than needed, under any circumstances. And, when it comes to downloading a (quite big) bunch of files, I don’t like doing it manually. That’s why I’m currently using jDownloader on my iMac. However, it’s still got an issue: when I copy URLs from Safari, it instantly recognizes that URLs have been copied to the clipboard and adds them for download; but, when I copy HTML links from Safari whose text is just text (not an URL), this does not work anymore, and I then have to copy the links one by one, which really bothers me, because I’m lazy (and because I have to waste a lot of time for that).

So, the problem is that everything works fine when I copy something like

or something like

but it doesn’t work anymore when the link is like

this is a link to an example domain

And, since I’m lazy, I decided I had to write a program to copy a bunch of links at once, without all that hassle. The first idea that came to my mind was to write a Python script that used mechanize to copy to the clipboard all URLs in a web page, but that didn’t seem like the prettiest solution at that moment.

But then a second approach occurred to me: I could just copy the web page’s text to the clipboard and let a program read the richt text, HTML or whatever from the clipboard and find the URLs. Besides, that seemed to be an interesting program to write. I started considering RubyCocoa for that, but a deep knowledge of sorcery is needed to be able to use Xcode, and I’m not such a good sorcerer. After that, my tests with Qt weren’t successful, so I decided to go search on the internet for some recipes to access the OS X clipboard with Python.

After digging for a while, I found and article on Python and the Mac Clipboard which pointed me in the right direction: use PyObjC to access the NSPasteboard class. After taking a look at the documentation and testing a bit, supported by Python’s built-in dir() and help() functions, I found that

  1. all of the items that are currently in the pasteboard are returned as an NSArray by NSPasteboard.pasteboardItems method,
  2. the list of all types to which each pasteboard item can be converted is returned as an NSArray by the NSPasteboardItem.types method, and
  3. all text copied from Safari can be accessed both as with the NSPasteboardItem.dataForType_ method, and as public.rtf with the NSPasteboardItem.stringForType_ method.
Given that we’re interested in getting the URLs from the richt text representation, the best approach was to use regular expressions to extract all hyperlinks from the RTF document representation of the web page clip copied, resulting in the following Python code:
from AppKit import NSPasteboard

import re

pb = NSPasteboard.generalPasteboard()
pattern = re.compile(r'HYPERLINK "(.+)"')
for item in pb.pasteboardItems():
	if 'public.rtf' in item.types():
		matches = pattern.findall(item.stringForType_('public.rtf'))
		for url in matches:
			print url

You can then copy the output URLs manually to the clipboard, which would make jDownloader to recognize them, or you could also improve the script to add copying capabilities, so that the script would copy the output to the clipboard again.