Leaving Garlik. In Search of New Adventures

December 5th, 2011 by Mischa

Today I am writing a little note to say goodbye and big thank you to all at Garlik. After spending the last (nearly) 4 years of my life working, living, and breathing Garlik, I have decided to start a new adventure. Garlik was wonderful place to work, we built some exciting applications, ones which delivered utility to people, as well as ones which innovated, to the highest standard, pushing boundaries in the development of commercial semantic web applications.

I have decided to leave my role as the Senior Research Engineer at Garlik, and I am now mega excited to be starting my new job working at the award winning London start-up PeerIndex.

I would like to say a big thank you to Steve Harris, and the rest of the team at Garlik. I have had a ball of a time working with/or being sat next to Steve since starting my PhD at Southampton University. We built some awesome things using what I believe to be the first industrial scale commercial semantic web technology stack. Obviously I am biased.

Finally, I would like to say a big thank you to the whole of the Semantic Web community, all the folk on #swig, and all the lovely people working on making the web a better place at the W3C. And to all the people at Southampton Uni, who got me into web technology in a way I never was before. The Semantic Web will enable the data web of the future, am sure of it.

I am looking forward to staying as involved with the community as I have been over the last few years, but most of all am looking forward to starting this next adventure in my life, as Head of Research Engineering at PeerIndex.

Using Git

December 1st, 2011 by Mischa

I have been making use of git, initially designed and developed by Linus Torvalds, over the last few years in both my personal and professional lives.

Git is a fantastic piece of software, I have been using it for everything including document/paper writing, to adding version control to my /etc directory on my linux boxes.

In this post I will summarise how I have been using git over the last few years. Firstly though, I should mention a friend/former colleague of mine tialaramex who helped me out get my head around using git to start with.

Cloning a repository:

git clone git@github.com:mischat/sprotocol.git

or, the following command which will clone the repo to a directory named “sprotocol-dev”


git clone git@github.com:mischat/sprotocol.git sprotocol-dev

If this is your first time using git you can set your name and email address, so that your changes are labelled correctly when you push them upstream.

git config user.email "you@email.com"
git config user.name "Your Name"

Note that you can use the –global flag to set them globally, instead of just in a given repo.

Setting up a git repo.

You can always use an online service such as github – this will allow for pointing and clicking.

But sometimes you may wish to setup your own git repository. You can do it like so:

mkdir LAMEREPO
cd LAMEREPO
git init --shared=group
vim README "Readme file for LAMEREPO"
git add README
git commit -m "Initial commit for LAMEREPO"

Branches in git.

You can create branches in git, you may want to do this if you are planning on making considerable changes to your repo. Branches are most useful!

To find out which branch you are currently in:

git branch

To create a new branch, in turn checking it out:

git checkout -b lamebranch

To check out the master branch:

git checkout master

To merge branches, firstly you should checkout the branch which you want to merge into, and then use the merge command:

git checkout master
git merge lamebranch

Finally, you can delete branches in git using the -d flag:

git branch -d lamebranch

Note that when deleting a branch you can use -D which will delete the branch if it was pushed upstream at any point in time.

If you would like to track a remote branch, perhaps one created by someone else committing to the repo. This will allow you to track any changes made to a remote branch.

git checkout --track -b lame origin/lame

If you would like to create a remote branch, so that other people can track it, you need to create a local branch, and then you need to push it upstream to origin.

git checkout -b lamebranch
git push origin lamebranch:lamebranch
look in git/config (make sure it is a remote branch)

Cherry-picking changes in git

If you find that you would like to select commits from a different branch, and merge into a different branch without having to merge the whole lot, you can cherry-pick git commits individually.

git cherry-pick 1d67bdbbdb4b98d142bdcce1b78cbe4d2d396afd

Tagging your git repo

You can also create tags in a git repo. This is how I make tags in my repos:

git tag -a "TAG _NAME"
git push --tags

Cloning a repo so that multiple people can update it.

This is useful when working in a team.


git clone git@github.com:mischat/sprotocol.git
cd sprotocol
git config --add core.sharedRepository group
chown -R username:sharedgroup .
find -type d -print0 | xargs -0 chmod g+s {}

I should mention that I always use rebase when pulling commits from upstream on to my version of a repo. I have added an alias to my global git configuration, which allows me to type git up whenever I wish to grab upstream commits.

git config --global --add alias.up 'pull --rebase'

And finally, I also make constant use of git’s stashing and popping functions. Most useful if you have changes you wish not to commit.

git stash
git up (or git pull)
git stash pop

I have a blog post coming up on how one can add a submodule to a git repo. Thanks for your attention!

shareNice : unintrusive social sharing

July 19th, 2011 by Mischa

This post is about the shareNice social sharing widget I have been working on recently. I am pitching shareNice as a “uninstrusive social sharing” tool for webmasters.

Webmasters can add shareNice to their websites if they want to let their users’ share the pages they browse with their friend, via the many social networks platforms.

So, what is different about shareNice, and why should you choose to use it?

Below is an example screenshot of the shareNice tool being used on The University of Southampton’s OpenData site.

shareNice example


As it stands, shareNice is being used on the following sites : http://data.southampton.ac.uk, http://www.garlik.com, http://mmt.me.uk/, and the http://sharenice.org/ itself. And yes most of these sites are friendlies, but I would love to other people to start making use of the site too!

We are currently working on a WordPress plugin for the shareNice widget, and an eprints plugin too. I would love if someone would like to create a Drupal plugin for shareNice, that would be great!. You can see a list of feature requests on : http://sharenice.org/.

Given that the apache instance which this runs off of doesn’t generate ANY logs, I have no way of know if people are using the service, so please do let me know. Either via my blog, twitter, or github.

Finally, I should name dropped all the nice people which have helped in the development of shareNice: Monika Stepinska, Steve Harris, and Sebastien Francois. You guys rule!

Knocking up my own RSS reader

December 13th, 2010 by Mischa

Since Newsgators RSS Reader asked me to supply them with a Google Account, becoming Google Reader, I gave up on the service. I using an RSS reader which would sync my laptop with my phone and when my phone became capable of reading interwebs when traveling about. In short RSS is awesome, and Google Search is awesome, but I try and spread my personal data thin across many companies instead of giving it all away to one. I used the iGoogle page for a while, but I thought I could just emulate that on my website, so I did …

In short:

Google offer a great search experience, they can have my search history, Facebook offer a good way to stay in touch with your friends, Yahoo/Flickr provide a good photo sharing experience, and well Last.fm do an awesome job of recommending me music – get what I am hinting at. I could just used Google to do all of the above, but somehow I feel better about myself spreading my data around a bit (sorry crazy doesn’t it!).

So, I made a start at knocking together my own RSS reader which I can use to catch up on stuff : http://mmt.me.uk/rss/. I used this PHP RSS Reader library. Sadly, it doesn’t understand RSS 1.0 which is RDF, it only seems to parse RSS 2.0.

It was really easy to do not more than an hours work. I am going to start parsing in the descriptions of the items in the RSS feeds, but sadly this isn’t trivial, people are starting to flood their streams with linked to google-analytics, to facebook share (including links to icons hosted on facebook.com [naughty]), and other nastiness. So I will have to write some code to pull out all the evil in the descriptions before adding them to http://mmt.me.uk/rss/, I will update my blog when I get round it to it. I can also release code if people want, just shout…

On a similar topic, the below blog post, states:

“One privacy protection model is to scatter your data about to make it more difficult to parse, akin to keeping valuables in different hiding spots in your house to thwart intruders getting everything in one go.”

When referring to the use of Facebook, as your one stop shop for IM messages, photos, emails and your social graph.

http://blogs.forbes.com/kashmirhill/2010/11/29/how-facebook-applications-can-download-all-the-messages-in-your-inbox/

My Response to the NHS

November 30th, 2010 by Mischa

My letter to the NHS Choices Team dated 2010-11-30

Hi Team NHS Choices,

So, I just thought I would let you know that, as per my blog post, the NHS Choices website is sharing information with Facebook on pages which DONT have the Facebook Like button, as pointed out by this person, as well as being on my blog post for the last few days:

http://www.privacylives.com/v3-co-uk-ico-probes-nhs-choices-over-data-privacy-fears/2010/11/30/

Note that, NHS Choices keeps stating that its privacy policy is correct, but if you read the last few paragraphs of my blog post, right before the “comments” section, you will see that this is not the case.

http://mmt.me.uk/blog/2010/11/21/nhs-and-tracking/

Do have a look at the following page on your website, there is NO like button, and the same data exchange is STILL happening with Facebook.

http://www.nhs.uk/livewell/depression/pages/depressionhome.aspx

The following screenshot illustrates this. People are free to replicate, all you need is Firefox and the Firebug plugin (both free and open source).

Firebug + Firefox + NHS Website + Facebook.com HTTP request + No Like Button

I also talked about how there is a German website which changed the manner in which it implemented the Like button functionality in a non-intrusive manner. That is, a manner which does NOT send any information to Facebook.com unless the user ACTIVELY CLICKS (i.e. OPT-IN) the Like button; quoting my blog post.

“There is a way to deploy the Facebook Like button which would resemble an OPT-IN based user interaction, instead of the intrusive standard iframe based approach. This involves the use of an “onClick” function call in Javascript which would tell Facebook only when explicitly “liked”. Obviously this method of interaction does not display the “social information” such as like counts, and whether or not you would be the first of your friends to “like” a given page. The German social networking site jetzt.de moved from the iframe to the self-hosted version after vigorous backlash from the userbase about being tracked (see for instance http://jetzt.sueddeutsche.de/texte/anzeigen/385237, line 350). This example was given to me by Sören Preibusch from the University of Cambridge.”

Please see the Garlik blog for more information : http://www.garlik.com/blog/?p=419

Warmest Regards,

Mischa

NHS.uk allowing Google, Facebook, and others to track you

November 21st, 2010 by Mischa

The NHS is allowing Google, Facebook, and others to track your http://www.nhs.uk/ browsing habits, regardless of the fact that people use the page to seek medical advice. It was recently pointed out to me that the NHS Choices website’s social features include the Facebook Like button (see e.g. the page on Testicular Cancer). Due to the fact that the standard method of Facebook Like button deployment is intrusive to say the least, I thought I would look into identifying which third party companies have been given permission to track users on NHS Choices, and my results are rather disconcerting.

In short there are four third-party, advertising/tracking companies which are informed every time a user visits one of the “conditions pages” on the NHS Choices website. These listed below, all get to make a call from the user’s browser, in turn allowing the four companies to access their cookies, tracking the users (explained in a previous blog post of mine, and in Bala’s research). This means, that if one has ever logged into a Google account, or a Facebook account and then visits one of the pages on the NHS site, the company will then know that their user X was just looking at a page about condition Y on the NHS website.

These are the four third party companies that make requests every time a “conditions page” on http://www.nhs.uk/ is viewed by a user:

jambi:~ mt $ grep "Host" tcpdump.ext.20101121.log | sort -u
Host: l.addthiscdn.com
Host: statse.webtrendslive.com
Host: www.facebook.com
Host: www.google-analytics.com

Two of the four third-party sites (facebook.com and addthiscdn.com) are contacted in order to provider the “social functionality” shown in the following screenshot. This intrusive OPT-OUT method of adding social features to the NHS website, in my opinion is NOT acceptable. I would only deem this to be acceptable if NHS has written declarations from the two aforementioned services stating that they WOULDN’T be tracking peoples’ browsing habits on http://www.nhs.uk/.

And the other two sites contacted (webtrendslive.com and google-analytics.com) seemed to be used for analytics purposes. In my view, this task should NOT be outsourced to a third party. If this was a website about pub reviews these third-party services would be acceptable, but due to the nature of the information on the Choices website, I feel the NHS should be hosting their own analytics code. Ok, I understand that the NHS needs to gather statistics about their website usage, but their user’s privacy should be of utmost importance, there do exist a high number of open sourced analytics software which the NHS should run themselves.

In order to show that I am not making this up, I have captured all of the HTTP requests made by my browser when loading the HIV and AIDS information page on NHS Choices.

http://www.nhs.uk/conditions/HIV/Pages/Introduction.aspx

The below two files are logs of all HTTP requests made when loading the HIV page:

http://mmt.me.uk/misc/nhscookies/tcpdump.full.20101121.log

And this cut down log file shows all of the third-party HTTP requests made by one’s browser when loading the aforementioned page:

http://mmt.me.uk/misc/nhscookies/tcpdump.ext.20101121.log

The above logs where captured using the following bash command:
tcpdump -A -s 1024 -i en0 dst port 80

An example:

My colleague Steve captured output from the HTTP trace via the NHS website, it can be found on http://pastebin.com/4TfDRRZJ

The browser (Safari) had it’s history cleared, logged into facebook, the facebook window closed, then sent to the NHS page.

Bits of confidential data replaced with XXXs

GET /plugins/like.php?href=http%3A%2F%2Fwww.nhs.uk%2fConditions%2fHIV%2fPages%2fIntroduction.aspx&layout=button_count&show_faces=true&width=450&action=like&colorscheme=light&height=21 HTTP/1.1
Host: www.facebook.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-gb) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Referer: http://www.nhs.uk/conditions/HIV/Pages/Introduction.aspx
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
Cookie: presence=DJ290173073BchADhA_22112.channelH1L60XXXXXXXXXXXXXXXXX4104WMblcMsndPBXXXXXXXXXXXsbPBtA_5b_5dBfAnullBuctMsA0QBblADacP0VXXXXXXXXXXXX0K290173073QQQ; x-referer=http%3A%2F%2Fwww.facebook.com%2Fhome.php%23%2Fhome.php; xs=976XXXXXXXXXXXXXXXXXXX0e11a0e600; sid=2; sct=12XXXXXX70; made_write_conn=12XXXXXX70; lu=ghXXXXXXXXXXXXXXXXXXYgqQ; datr=12XXXXXXXX-XXXXXXXXXXXf24

Possible Data Protection Violation

It was pointed out to me that I was reference the incorrect ICO filing. The data controller for the NHS Choices website is the Department of Health and not NHS Direct.

Find below, my amended version of this and the following section of this blog post. – Mischa 2010-11-24 11:00:00.

In order to see the NHS’s Data Protection Policy, we had a look at their ICO filing, which led me to the following page:

http://www.ico.gov.uk/ESDWebPages/DoSearch.asp?reg=4693360
http://www.ico.gov.uk/ESDWebPages/DoSearch.asp?reg=4906007

I should start this section by saying that I am not a lawyer. But it seems like sections 6 and 4 purpose 2 are is relevant to my question of “how come the NHS website has third-party tracking enabled, especially given that the tracking is provided by for profit advertising companies?”. Firstly, it should be noted that by contacting facebook and google, data is being sent outside of the European Economic Area. Which is in violation of their Data Protection commitment of “Transfers: None outside the European Economic Area”.

And as per the ICO filing the potential recipients are: Business associates and other professional advisers, Central Government, Data subjects themselves, Employees and agents of the data controller, Healthcare, social and welfare advisers or practitioners, Local Government, Ombudsmen and regulatory authorities, Other companies in the same group as the data controller, Persons making an enquiry or complaint, Police forces, Relatives, guardians or other persons associated the data subject, Survey and research organisations.

I will like to point out that no where in the ICO filing can one see that the NHS will be sharing data with advertising companies.

It should be noted that the ICO filing does not make it explicit that the Department of Health would not:

  • Sell advertising to patients
  • Sell/Provide user data for third party advertising

Next Steps: FOI

I am about to post off a Freedom Of Information request (tomorrow morning), asking the NHS to please supply the minutes of all policy and technical meetings involved in the decision to deploy iframes referencing non-NHS sites and to use third-party analytics software on NHS choices pages.

Next Steps: Official Complaint

Further to the FOI request I am going to submit an official complaint via the official NHS Choices feedback form :http://www.nhs.uk/aboutNHSChoices/Pages/ContactUs.aspx.

I have the latest copy of the FOI Request updated FOI Request and the Letter of Complaint up in .pdf format on my website.

Note that:

There is a way to deploy the Facebook Like button which would resemble an OPT-IN based user interaction, instead of the intrusive standard iframe based approach. This involves the use of an “onClick” function call in Javascript which would tell Facebook only when explicitly “liked”. Obviously this method of interaction does not display the “social information” such as like counts, and whether or not you would be the first of your friends to “like” a given page. The German social networking site jetzt.de moved from the iframe to the self-hosted version after vigorous backlash from the userbase about being tracked (see for instance http://jetzt.sueddeutsche.de/texte/anzeigen/385237, line 350). This example was given to me by Sören Preibusch from the University of Cambridge.

And Finally…

I would like to thank Steve Harris and Dan Brickley for helping me decide how to take this forward. And I would like to thank Richard Northover for the link.

Amendment 2010-11-24 16:57 Conflict in terms of NHS privacy policy
It has been pointed out to me that in relation to the NHS stating that how only on pages with the Like button…. communications with Facebook.com occurs, this is also not true :

See the following page :

http://www.nhs.uk/livewell/depression/pages/depressionhome.aspx

I see no “Like” button on this page. But according to firebug, There is still an HTTP request made to facebook.com from my browser.

The following quote from the NHS Choices privacy policy states :

“While we only share your information with the Data Processors, when you visit pages on our site that display a Facebook Like button, Facebook will collect information about your visit. For more information, read the relevant section of the Facebook privacy policy.”

Which is NOT true.

The following screenshot illustrates this. People are free to replicate, all you need is Firefox and the Firebug plugin (both free and open source).

Firebug + Firefox + NHS Website + Facebook.com HTTP request + No Like Button

NHS privacy policy

I thought I would cut and paste the NHS’s privacy policy, in case it changes dated 2010-11-25 15:58:00 GMT

“As a government department, we do not share data with other organisations unless the law permits us to do so. We do not sell individual information. We will share it only with our authorised Data Processors, who must act at all times on our instructions as the Data Controller under the Data Protection Act 1998. Before you submit any information, we will notify you as to why we are asking for specific information and it is up to you whether you provide it.

While we only share your information with the Data Processors, when you visit pages on our site that display a Facebook Like button, Facebook will collect information about your visit. For more information, read the relevant section of the Facebook privacy policy.”

Furthermore

It should be noted that it has been pointed out to me that AddThis’ privacy policy reveals they monetise their product through behavourial targeting. Would you be somewhat surprised if you started to received adverts about their ailments on third-party websites.?

Disabling Referer Headers in Firefox

November 21st, 2010 by Mischa

Given the awesome work detailed by Bala from AT&T, and some recent privacy related measures I have been taking in my Firefox browser (see https-everywhere and adblocking fb), I have decided to instruct my browser to stop sending the Referrer Header (nb: incorrectly referred to as the ‘referer header’), when I am clicking around on the web.

The following example shows the Referrer header of the HTTP request telling facebook.com, that I have just been looking at a page about HIV on the NHS choices website.

GET /
Host: www.facebook.com
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_4; en-gb) AppleWebKit/533.18.1 (KHTML, like Gecko) Version/5.0.2 Safari/533.18.5
Accept: application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5

Referer: http://www.nhs.uk/conditions/HIV/Pages/Introduction.aspx
Accept-Language: en-gb
Accept-Encoding: gzip, deflate
Cookie: presence=DJ290173073BchADhA_22112.channelH1L60X...

I followed instructions on the following blog post http://cafe.elharo.com/privacy/privacy-tip-3-block-referer-headers-in-firefox/ to configure my Firefox instance to not send the “referer header”.

In short, the steps needed are as follows:

  • Type about:config into your firefox awesome bar, to bring up your settings
  • find the setting network.http.sendRefererHeader. This is probably set to 2.
  • Choose one of the following values:
    • 0: Completely disables the referer header (mischa’s setting)
    • 1: Sends a referer header when following a link to another page, but not when loading images on the page
    • 2: Always sends the referer header (default)

I am going to experiment with setting it to 0, disabling the referer header all the time, I will post back here to say if it causes me any problems.

HTTPS: Making more use of SSL

October 26th, 2010 by Mischa

There has been a lot of talk about how more and more people are using their laptops on public wifi connections, and with the advent of the Firesheep plugin, there has been a number of scares around session hijacking, and unencrypted login details being sent through the ether.

As a result, I thought I would describe the steps I have taken in securing my Firefox instance on my laptop. These are :

  • Installing the HTTPS Everywhere plugin from the eff, which attempts to select https if available when accessing a site. I have tested it with Facebook, Google, Hotmail, LinkedIn and a few other sites
  • I have set my homepage to be encrypted.google.com
  • I have changed the search engine in top right hand of my Firefox instance to use the encrypted google service, by installing their plugin
  • I have set a master password on my Firefox keychain, which gives my stored passwords some level of protection
  • And I run Adblocking software, (with a custom Facebook Like Button blocking extension) as per an earlier blog post

Furthmore, I use Firefox as my main browser, I have chrome installed, but I hardly ever use it, and I have a locked down, stateless Safari instance which I wrote about earlier.

NQUADS -> TRIG … A Noddy Perl Script

October 22nd, 2010 by Mischa

I keep coming across nquads files, and libraptor doesn’t support this serialisation, it only supports quad-based TriG format. I knocked together a dirty perl file which will parse a nquads file to TriG if ever need be.

You can find the perl file on my site :

http://mmt.me.uk/examples/perl/nquads_to_trig.pl


#!/usr/bin/perl
use strict;

my $files = scalar(@ARGV);
if ($files != 1) {
  print "NQuads importer. ./nquads_importer nquads_file\n";
  exit();
}

my $input_filename = @ARGV[0];

open (INPUT,$input_filename) || die { print "Input file does not exist\n"};
open (OUTPUT,">$input_filename.trig") || die { print "Failed to open output file\n"};

my $line = "";
my $model_uri = "";
my $triple = "";
my $count = 0;
my %quads = ();

while (<INPUT>) {
  $line = $_;
  if ($line =~ m/^(.*?)(<[^>]+>?)\s*\.$/) {
   $model_uri = $2;
   $triple = $1.".\n";

   $quads{$model_uri} .= $triple;

  } else {
   print ERROR "boo this nquad doesn't pass regex\n$line\n*************\n";
   }
  $count++;
}

foreach my $forth (keys %quads) {
  print OUTPUT "$forth { ".$quads{$forth}." }\n";
}

print "Finished\n";
close(INPUT);
close(OUTPUT);

# vi:set ts=8 sts=4 sw=4 et:

Facebook and their Horrible “OPT-OUT” Policy

August 20th, 2010 by Mischa

So Facebook announced their new Facebook Places functionality a couple of days ago, the service seems well implemented, and following the uptake of 4square, probably a timely service for fb – good luck to them.

What I am most disappointed about (**rant) is the way that Facebook, seem to think that an “OPT-OUT” policy is the right way to go about landing new functionality on their users. By default, Facebook allows your friends to log your geolocation at given point in time. And this is simply NOT ACCEPTABLE. As far as I am aware (and please do let me know if I am wrong), none of the other popular geo-logging services allow for other people to log your location at a given point in time. I see this as a massive invasion of your privacy, and as have others, as discussed in the following CNET article:

Shots already fired over Facebook Places privacy

An OPT-OUT policy to services which compromise your privacy and your personal information is simply NOT acceptable, and DRACONIAN. I mean, Facebook, DID NOT even attempt to inform me, that friends of mine can can geolog my location at any given point in time. I mean, what is stopping a friend of mine, who is hanging out in a brothel from geologging me, and defaming my character, by suggesting that I too was at the same place as him.

I noticed this yesterday, and then I got round to tweeting it, and had a lot of people thanking me for informing them of this change of service. So, I thought I would expand what is going on in a bit more detail. If you would like a more verbose write up on how to disable this new “feature”, visit the Garlik blog article:


Garlik Blog: Disabling Facebook Places
.

As far as I am aware there has been no recent changes to Facebook’s privacy policy or their terms of service as illustrated on the awesome Terms of Service Tracking site. From my point of view, Facebook should inform their users about new functionality, especially new functionality which by definition shares your geolocation information both with people within Facebook, and with the Skyhook geolocation gazetteer.