## Cosine Similarity for Dummies

Have you ever wonder how recommender system works? Or How Spotify or Amazon recommends what songs you might like or what product you might like to buy. I do. In this post, I’m going to try to explain how the recommendation algorithm works. First, let’s create a perfect scenario. I like to create an ideal example, it’s easier to understand.

Let’s say you have a very simple data of movies that users like collected from your site and you would like to match those people together based on their interests. How would you do that? One of the most popular methods is Cosine Similarity. When I first saw the name I was so confused; why Cosine? I remember when I was a kid I remembered my teacher told me about trigonometry so why does it have to do with that?

Here’s the sample data.

User 1 likes these movies

User 2 likes these movies

Even without any algorithm we can say that two users like the same movies. But we want the algorithm to tell us that the two users are very similar. Before we get into the mathematical formula world. We have to understand what a vector is?

# What’s a vector?

In Pyhsics, a vector has two things; magnitude and direction which can be written as

I’d like to explain what a vector is but this site explains a lot better.

However, in Computer Science, 1-dimentional array is called a vector. But list in Python cannot perform vector operation so we have to use numpy or you have to build your own which I don’t recommend.

Now we know what a vector is but how does it relate to Cosine Similarity. In a nutshell, Cosine Similarity is a measure that calculates the cosine of the angle between them.

# Cosine Similarity

In order to find the angle between the two vectors, we need to find the dot product of the two vectors as the formula below.

\begin{align} \text{cosine-similarity}(A,B) = \frac{\left<A,B\right>}{||A||\cdot||B||} \end{align}

# Show me the code

Ok. enough about explanation, show me the code.

In the perfect example, we can see that the two users have the same interests.

Oct 29th, 2015

## Python's Monkey Patch for Dummies

Alright, I’m going to cut to the chase here. I’m having problems with Monkey patching in Python and I want to make it clear for myself and anybody who might stumble upon my post in the future. So, what’s the big deal here?

Let’s say you have a model

And you have a Phonebook class that’s trying to access the database

Now, we know that get_name is accessing some database and we don’t want that to happen in unit test. We would like to stub that.

Coming from Java, I’d write my test like this.

It makes sense right? I want to stub something from models.person.get_name so I’m telling mock to stub that class but my test failed miserably.

Why? Because patch behaves differently than what we expected. This is explained in Where to patch. I’m going to summarize for you. Basically, patch is going to take effect from where it is looked up… For me after reading that I’m still confused. I might be the only one who’s confused here so I’m going to continue writing.

If we take a closer look how import behaves in Python, it would be clearer.

The line says please import get_name to the namespace in models/phonebook.py. So, when we want to use it we can just called get_name() without having to write models.person.get_name() Now if you change your code to be

You test would pass. Because now our Phonebook is looking up models.person.get_name namespace instead of having function get_name being imported to its namespace.

Now if you want the old test to work, your patch has to be changed to

That’s it for now. If you’re wondering why this is the case then looking at the source code of patch would help a lot. It’s using __import__ function.

Oct 25th, 2015

## Why You Should or Should Not SSL Your Blog

After I switched to Octopress, I knew that I wanted to write about performance and SSL. Those are the main reasons why I switched.

Last year, Google announced that they will include Https as a single in their ranking. So, if you want to be the cool kid, go and SSL your site now. But what does SSL really do to your site? Have you seen that in action? I only know that from reading all the blog posts about this. In this blog, I’ll show you what SSL does to your site.

Thanks to my friend Suksant who helped me conducting the test.

## What will you need?

1. Wireshare is a network protocal analysis.

## Simple website.

I’ve created a simple site that you can fake login form. So, you can go ahead and deploy that to your heroku. I chose Heroku as the platform of choice because you can try the site with and without SSL.

There’s a couple things you need to do before you can capture the password.

1. Open your wireshark and go to Capture -> Interfaces and click en0 that should be your Wifi connection.

Then click ‘Start’ to capture the packets

1. In the Filter section, put this frame contains topsecret (That’s going to be your password)

1. I deployed the application here. Go ahead and enter “username” in username and “topsecretpassword” in password It could be anything. Try to check if the URL is not SSL.

without even trying to do anything hard. You can clearly see the password.

## Now with SSL.

1. Change your URL to be https:

## What gives?

In conclusion, what have we learned here? SSL encrypts everything being sent to the server will be encrypted. It’s safer and make the site more trustworthy. However, if you’re just running a blog you probably won’t need SSL. If you have a website that capture anything from the user then big ‘YES’ you need SSL. For me, I just want to be a cool kid so I SSLed my site.

Oct 9th, 2015

## Why Algorithm Matters?

If you have been to one of those technical interviews, you will like this.

I’m not going to rant about how broken the technical interviews are. There’s enough people who are more qualified to rant about this more than me.

So, why this matters? I just want to give a quick example of why algorithm matters. Please no haters! I know some of you might read this and will say “I do that all the time, what’s the big deal”. I’m still a bad developer and I’m still learning.

## Scenario

You are a general, your home country is at war and you have to fight for your country. You are given a group of soldiers. You want to come up with a strategy to win the battle. Here’s the example of the soldiers.

You came out of a high-profile meeting and all the generals agree that this formation will be best to fight the enemy; infantry, machine-gun and rocket-man. How can I rearrange this quick enough because we’re going to attack tomorrow? Simple I go ahead and write the code.

First, let’s generate a bunch of soldiers

Then let’s rearrange them.

It works, but you’re too late you can’t form the soldiers in time. If you take a closer look, this algorithm takes O(nm)* for the time complexity given the number of soldiers is n and the category is m. If you have a million soldiers and a million categories you would get O(n^2). How can we make this one faster?

Here’s my second version. Hmm, rearrange into category… category is bucket. How about using map?

This is the time difference of those two algorithms.

By just changing the data structure, you can see that the map version is almost twice as fast. I hope I can demostrate how choosing the right algorithm matters in your program.

Oct 8th, 2015

## Octopress on Crack

I believe you will find a ton of blogs writing about makeing Octopress faster. I believe nearly everyone who migrated their blog to Octopress, the first post would be the migration and the next would be performance tuning. I want to be the cool kid so here you go.

Once you got out of WordPress now there’s much more you can do about your performance. But before you start, nothing can be done before you have the baseline numbers

## Before

Here’s my before performance from Google PageSpeed.

And from webpagetest.org

In a nutshell, my page starts to render in 2.4s and finishes in 4s. Yikes! I wouldn’t even want to wait for my site to load. Let’s see where’s the lowest hanging fruit.

A picture is worth a thousand words. Who doesn’t like image right? Also, I tend to have a lot of screenshots and they’re all PNGs. And they tend to be big. As they’re all screenshots so I don’t really care about losing some of the quality so I convert them to be jpg. If you’re using OSX, there’s a command that you can run.

Now what can we do to reduce the size? There’s lots of tools out there that you can just throw your image in and it will compress the image for you. I use compressor.io. It’s really good. Just try throwing your image in there and see what happens. Compressor.io can reduce my images up to 60% and that saves a lot of bandwidth.

## Minify CSS/JS

Fortunetely, I use Cloudflare and they have the feature to minify that for you. So, I just flipped the switch and it works like magic.

## Browser caching

Going back to what Google PageSpeed tells us again. It complaints about Leverage browser caching. I’m lucky again because in Cloudflare you can set the cache to be 30 days and that seems to be ok for Google.

## Inline CSS

I use Slash theme and when I look at the CSS, it’s only 22K. So why not just inline it and save one more request for the browser. So, in _includes/head.html you can change the line that includes your stylesheet to be.

This might be different from theme to theme but it shouldn’t be difficult to find that out.

## After

Now, let’s have a quick look at the after performance.

Here’s my after performance from Google PageSpeed.

It’s a lot better now but there’s one tiny problem with mobile which I’ll write another blog post about that.

Let’s have a look at webpagetest now.

A lot better! It’s still not the best but now my page loads within 0.5s and finishes in 1.9s. However, there’s still a lot to work on mobile.

I could shead about 200ms by not using SSL but I’m willing to compromise that with a little bit of security.

## What’s next

I’d like to try hosting the images on S3 to see if that would speed up the load time. And I want to make the images responsive for mobile.

Oct 7th, 2015

## Migrated From WordPress to Octopress

I have heard so much about Octopress but I haven’t got the time to actually get to know what it can offer. Finally, this week I finally made the time to migrate my blog to Octopress and I love it. Thanks to HabitRPG and Pomodoro (These will be another blog).

Don’t get me wrong I still love WordPress. In fact, I’m actually a WordPress Developer myself. However, for the post 9 years that I have noppanit.com I haven’t really used any of the WordPress features. What I really need is just a blog that I can write and publish. Octopress serves me well.

I’d like to thank Scott Muc that inspired me for this migration.

Here are some of the reasons why I migrated my blog to Octopress.

## Hosting

Having a hosting is nice but it would be nicer if I don’t have to maintain it especially when I have to contact the customer service. Octopress gives you various choices of how to deploy your blog to. I chose to deploy my blog to Github as it’s the fatest and easiest option. I don’t need anything else just github.com which I use it everyday and a couple of command lines. Now, I don’t have to worry about upgrading my plugins or WordPress anymore. However, usually my hosting would do it for me if I haven’t updated for a long time.

## Writing code snippet

I think this is personal, because any good WordPress developer might argue that you can do the same in WordPress editor which I agree. However, having written a lot of MarkDown on Github I’m just used to use codeblock like this.

WordPress gets on my nerves everytime when my blog crashed and all the identations of my code blocks failed.

## Performance

There’s nothing faster than plain HTML. Well, that depends on your server. After deploying to Github pages all my posts are backed up and secured in Github. Then after that, it’s up to Github how fast they can deliver my HTML to the browser which I think it’s pretty fast.

One can argue that having WordPress properly tuned then you can get the same performance out of it as well. However, I just believe that Octopress is just easier to make changes. One reason relates to how limited your hosting can offer as well. Some of them are quite limited of how much you can do with the webserver without the dedicated server.

Here’s some of the performance breakdown which I haven’t tuned.

### Before

noppanit.com performance

### After

noppanit.github.io performance

The numbers don’t lie. After I switched to Octopress, my website renders in 1.392s. That’s without doing anything else. I’m going to write another blog post how I tune my blog to get the most out of it. WordPress is a great framework but when it comes to performance tuning, it requires a lot of hacking.

I have attached the link to webpagetest as well in case anybody is interested in more details.

## No Database

Everybody knows that maintaining a database is a headache. Everytime you need to upgrade WordPress you need to backup database just in case something goes wrong and when you want to transfer the site you will need to do some hack to ensure that the data is transfered properly. However, one might argue that not having the database for Octopress can cause another migration a problem. If in the next year or two another blog framework comes along how will we integrate the data? I guess that’s the same question for WordPress as well. Personally, when the time comes I think somebody will create a plugin to transfer that. Open source is the best.

## Problems

Here’s some of the problems I have encountered during the migration. All of them were minors and it didn’t take much time to resolve them.

1. Code snippet wasn’t converted correctly. Some of the <code block got transfered during the export as pure HTML. So, I went through all my posts and convert it manually to Markdown format. It’s a plus for me so I took the opportunity to clean up old posts.
2. One of my posts was encoded in UTF-8. I got lucky because it’s just one post. So, I didn’t have any problem with it.
3. Disqus comment wasn’t visible. This is because during the export comments: true wasn’t added so I had to do it manually. This is my command to get it done.

You need to run this in your posts directory. It will add comments: true after the title

I got really lucky that I didn’t have many issues during the migration.

## What tool did I use

1. wordpress-to-jekyll-exporter This is an awesome tool. One click and I get everything including all the pictures I uploaded.
2. Cloudflare This is for my SSL.

## SSL

This is just me. You can do the same for WordPress as well. I think every blog I came across about migrating to Octopress on Github pages mentioned having SSL to their website. It’s the perfect solution with Cloudflare but it’s the best free option I can find on the Internet. If Github decides to support full SSL later, I’d be happy to switch.

## It’s not for everyone

Octopress claims to be blogging framework for hackers and it’s true. You can’t just give Octopress to non-technical people and expect them to be fluent. I think that’s what WordPress’s good for. You need to be familiar with git and command line to get it setup and deployed.

Oct 2nd, 2015

## TL;DR

If you don’t know Code Review Stackoverflow you gotta get on it now. It works and it’s awesome.

## Longer version

In Agile world, everybody is talking about fail fast and faster feedback. If you’re a serious coder then code review and feedback are just as important. If you’re like me (maybe I’m alone), I used to hate code review so much because I don’t want someone to criticize my work, coding is like an art. You spent a lot of time on a piece of code and somebody just come in and say it sucks or why would you do something like that?. Not everybody is a great coder like Linus Torvalds or Ryan Dahl. Especially me, Because of rigorous code review I came from the worst coder every to bad coder (I’m still bad and I’m still learning). I believe the best way to learn any skill is first you have to be wrong and learn from your mistakes. The quickest way to do that is to code a lot and get your code reviewed by someone who’s a stronger coder than you, which you will have a lot on the Internet. If you work with some of the best programmers in your life then you’re lucky. But not everybody is that fortunate. I’ve found a better solution. http://codereview.stackexchange.com/

## Let’s get to the meat

For example, I wrote this piece of code to learn about dynamic programming.

I know that my code is not bad (or maybe is really bad), but let’s see what I get out of it from this thread. From within a day I have at least 2-3 points to make my code a lot better. Not only that I learn more about Python, I also learn how to optimize the script.

If you want to learn a new language fast, you need to learn from the expert. In just a few days I improved my code substantially. The best thing about the Internet is, you will have someone to review the reviewer as well. So, you can be sure to some extent that the one who answered your question is trustworthy.

Now you might wonder, how about http://stackoverflow.com/? I use Stackoverflow too but Stackoverflow has its own purpose. If you’re stuck in some problem and you can’t really wrap your head around it posting to Stackoverflow will give you answer in no time. However, if you ask somebody on Stackoverflow to review your code, it’s likely that your thread will be voted down or closed.

## What about if you only have an idea.

I always have new idea and somehow I want experts to review or confirm my idea so I don’t waste time building something that somebody has done it before. Then, Stackoverflow or codereview.stackoverflow.com is not really the place you’d go for. That’s what I love about Quora.com.

For example, I wanted to understand more about machine learning and I couldn’t really ask on stats.stackexchange.com. That’s why I turned to Quora. For example, this thread, I got a really detailed response from whom I can trust because of the number of upvotes.

## Some note

I have one bad habit of just copying and pasting someone else’s code. Once you received the feedback, don’t just copy and paste that. Or if you get feedback from Github don’t just merge that right away. You will not fully understand the concept behind that. In the book I read recently, A Mind For Numbers, Barbara taught us about how we can master in math and science and she explains that we have two modes of learning, focused and diffuse mode. If you just copy and paste someone else’s work, then you just imagine that you understand what’s going on. The best way to learn that is to actually do it and go into diffuse mode which is just close your eyes and think about it and go back to focused mode to do that again. I made a big mistake as an Engineer (I still do) that I just copy someone else’s code and I think that I understand that fully which I don’t. So, don’t just copy the feedback and commit that make sure you fully understand the code and actually type it. Barbara also suggested that writing is better than just typing but I think it might be difficult for us programmers.

Sep 22nd, 2015

## Maps All Parking Signs in NYC.

This post might be too easy for any Data-viz people but as a beginner tapping in this area. It took me quite a long time to figure it out so I just want to share this hoping that it might save people’s time.

Parking in NYC is really a pain, especially street parking where there’re lots of signs and regulations. There are some apps on Android and iPhone that you can download and see signs but I haven’t been able to find the one that suits my needs. I want an application that can tell me where to circle arounds on which day and time to find parking spot. For example, I parked my car on a Tuesday which has street cleaning on Wednesday and Friday from 8:30 to 9:30. That means I have to move the car on Wednesday morning to find a spot for Friday. The problem is I don’t know what’s around me. The closest application I find is http://www.nycparklife.com/streetparker/ which it doesn’t have Manhattan.

## Getting the data

So, I googled and thank NYC.gov that provides the data for us. First, I played around with the csv files but they don’t have coordinates that I can place location on the map. And I have no experience dealing with shapefiles.

## Extracting the data

After a few hours of mangling and munching the csv with Pandas and R. I would not be able to get the exact locations of all the signs. So I turned to shapefile hoping that I might get lucky. And I found this http://www.shpescape.com/ which promised to transform shapefile to Google Fusion Table. Awesome! I went ahead and tried it. It works great! but it only gives you the first 100,000 rows. So dug deeper. After another few hours of googling I found qGis which is a opensource project that you can use to open and view shapefile.

Here’s the example.

One thing I learnt was that you cannot just click Open and choose the file. What you need to do is to Add Vector Layer</strong>. You can use shortcut **Ctrl+Shift+v**

Now what we need is to import what we have to Google Fusion Table. How are we going to turn this beautiful layer to Google Maps. We need CSV…

In QGis you can download the file as CSV. It’s in Layer > Save As. You just need to make sure that Geometry has to be set so you get the coordinates.

## Import to Google Fusion Table

Now we can use that CSV to import to Google Fusion. It would look something like this.

what you need to make sure is that you need to specify which fields are Lat, Long so Google can plot that for you.

Here’s how you do it.

The click change You will see something like this and you can choose which field you want to be Longitude or Latitude.

Once the coordinates have set we can go to Map tab and see beautiful little dots that show where all the signs are in NYC.

Walla! Now you have something you can build an application on top of it. The next post, I will create an application on top of this map to make use of our data.

## Things I have tried and failed

I’ve tried using Proj4 on both R and Python to convert X,Y WGS84 to Lat, Long. Here’s my little snippet.

The result is not quite accurate which I think it’s because I need to find a correct proj4string.

Python has the same wrapper which is quite what I want as well.

Jul 30th, 2015

## I’m Never Going Back to Modern Editor Again.

Ok, the title might be a little bit exaggerating. But let me clear this up first. I still use modern editors, TextMate, Atom.io or IntelliJ. They are pros and cons. But here’s my real answer if people ask me. I want to be the cool kid!.

 I don’t know if you’re like me but I get asked almost all the time I go to a meetup or conference, “What’s your favourite editor”?. People almost always say Vim Emacs. I tried both of them in the past and gave up more than I can remember because of all the shortcuts and plugins and I just get too frustrated to use either of them. Finally, I have made a decision that I’m going to stick with Vim for a month and let’s see if at the end of the month I still can’t use it. It’s been almost a month and I’d say I’m never going back to modern editors again (at least when I’m coding scripting languages). Here’s why.

### It’s just easier

I know it’s not really easier than TextMate or Sublime. You still have to learn a lot of shortcuts and commands. Also, bare-bone Vim just doesn’t have when you get in TextMate or Sublime (e.g, Find file, Command+T, Directory structure or code completion). However, after you powered through that learning phrase everything is just natural to you.

If you want to clear any trailing whitespaces you just have to type %s/\s+\$// and hit enter. If you want to run some external command you can always do in Vim. For example, if you just want to git status you can install Git wrapper or you can just run :!git status. You don’t even need to go to your terminal which I think it’s faster.

If you’re a good engineer, you will always do TDD. Going back and fourth between the editor and terminal, it’s just too annoying. I remember when I was coding Java I can just hit Cmd+t in IntelliJ or .NET and you see green bars. Isn’t it just awesome? I realised that TextMate and Sublime have plugins to do that as well or you can just write your own plugin or package. I tried a couple of plugins and they never work for me. In Vim, it just works!.

### Community

 If Vim Emacs doesn’t do what you want, you will almost always find the plugin that does it for you. Currently, my standard plugins are. NERDTree, CommandT, vim-fugitive, vim-rooter, vim-virtualenv. All these plugins just make my Vim functions the same as Atom.io or Sublime.

### Not for everyone

I have to admit that the first few weeks I got really frustrated. I almost banged my keyboard against the monitor. I found myself opening Atom.io every time I couldn’t do some basic editing in Vim. For example, using vimgrep wasn’t what I expected and I ended up googling a lot. Now I get the hang of it and I don’t use Atom anymore. The worst case is I will just use command line instead.

Jul 17th, 2015

## How to Start Doing TDD for jQuery Plugin.

I’m a big fan of TDD. I get nervous every time when I put some code it without having tests. I’m developing a simple jQuery plugin and I think hey we can TDD this.

## What is it?

The plugin is really simple. It turns ul tag to be taggable field. It’s similar to tag-it but with a lot less functionalities and doesn’t depend on jquery-ui

## What you need

I decided to use Karma because I’m going to test a lot of behaviours and Karma seems like a good fit as it run on real browser. Here’s how I setup my project.

I chose jasmine-jquery because it’s easier to create some element to test and it’s easy to setup.

This is my gulpfile.js

This is my karma.conf.js

Here’s my first test

You will see that the test failed now we implement some code.

Now the test passed.

Now let’s add some event so when you hit enter the tag is added. So, I added one more test

Now the test failed.

I’ll fix the test by doing this.

Now I want to add some negative test case.

Oops the test failed, looks like I missed something

I will fix the test by

That’s it. I hope you enjoy and love TDD more. And here’s the github repo

Jul 9th, 2015