In Valid Logic

Endlessly expanding technology

One of these things is not like the other, except to SQL

without comments

Was recently working on an issue that was so perplexing, that I almost had David Penton, our resident SQL expert stumped.

We had an issue with some username handling and it ended up boiling down to there we could go into SQL Server, run this query, and get a result when you’d normally expect it to not match:

select 1 where N'ß' = N'ss'

After that, I was truly stumped. I’d Googled everything I could think of any came up empty. Had I known what the German sharp S was, life would have been easier. Come to find out through some sources, that some of the SQL standards (in this case, SQL92 is referenced) call for the German sharp S is translated down to “ss” in string comparisons. So although it might seem like a bug, it is to-spec. Oracle is also reported as working this way.

To work around the behavior, one option is a workaround function for SQL Server which essentially has you convert the string to a binary column and store it. This may be outside what you can do in your application though, as in ours, we’d need to update every place that does string comparisons on the username as well as altering the ASP.NET Membership tables and stored procedures.

An alternative is to change your database collation to use one that is more binary dependent such as Latin1_General_BIN. This will treat ß and ’ss’ separately, though also be aware since it is binary, it is case insensitive and some behaviors such as sorting may also change.

A topic like this is important for any developer for a few reasons:

First, being in the US, it is easy to be ignorant of other culture’s language handling. You may work to add resource labels so your application can be translated and such, but every so often you will find behaviors related to internalization that perplex you because you’ve never heard of the German sharp S (as in this case).

Second, when its an issue in SQL and you’re in doubt, look at collation first. SQL Server has a huge list of collations, and each one does something different. Its either a behavior with the one you’re using, or the user is using a collation you haven’t tried (tested) before.

And finally… need to try harder to stump Penton. It can be done!

Written by krobertson

February 8th, 2010 at 10:56 am

Posted in Code

Hiring in today’s market

without comments

It is crazy how today’s job market is. Earlier I was looking over the new topics on Hacker News for the “Ask HN” threads and was surprised to find three “whose hiring” ones in the past 7 hours.

Today’s market is super competitive, especially at the entry level jobs. My sister graduated from CSU Fullerton with a degree in Journalism and Graphic Design, which has a strong communications program and is well respected. She was also a contributing author to the school newspaper, interned at Us Weekly in Los Angeles for two years, and interned for Us Weekly in New York over the summer.

She graduated in May, but just now was able to get a job in her field. That is nuts. Granted over the summer she was doing an internship in New York, expected to stay there but later decided to come back to California. But she still was actively looking for a job from August to the end of January. On the job she finally got, she beat out 250 other people who’d submitted resumes. She was also in consideration for another job, that one was up against even more people and it came down to her and one other person.

The market today is a stark contrast from when I was in college and all my professors and advisors were touting how easy it’ll be to get a job and making $60k easy. I would be scared to be graduating in today’s market.

Today, a degree is just a piece of paper. Everyone else who is graduating with you has the same piece of paper. To beat them out, its who you know and the experience you have. I’m by no means saying college is worthless, but rather it isn’t everything. I’m a college drop out… I left college to work at Telligent. I’ve been luck enough to have it work out, and now my work experience would be more relevant.  Its no longer enough to just go to college.

Building on James’s comment in my last post, a web developer today needs Google juice behind them. A degree is great, but so is a blog with relevant topics, involvements in projects, a Github profile, and more.

Written by krobertson

February 5th, 2010 at 12:29 pm

Posted in Life

Being professional and managing your identity

with 2 comments

This new podcast has been causing quite the discussion on Twitter this morning and I thought I’d take a moment with a few thoughts. The podcast is this one and you might want to listen to it, for the first couple minutes, but it will be truly be a waste of your time and will leave you feeling dirty about some of the other people in your profession. Sexist, racist, homophobic, they run the full gamut. One person was brave enough to post on forums.asp.net “announcing” the podcast with his normal user account, but it got deleted as the podcast is quite inappropriate. He had another post looking for contributors, but that whole thing is besides my point. I won’t link to him to spare him additional Google hits to damage what remains of his credibility. And I don’t want to necessarily talk about the podcast.

I want to talk about professionalism and your brand.

It is kind of ironic that the podcast talks about how you should always be professional in your code comments while producing what they did (think beyond just code comments). In this day and age, the internet is pervasive and data is everywhere. Google crawls everything, and it remembers everything. The post looking for contributors talks about other podcasts being “careful about how they talk”. When its online, it isn’t about being careful how you talk, or even being professional. It is about managing your identity.

The days of “dress to impress” are over. You are no longer judged just by how you physically present yourself, you are your Google results. You could go into a job interview dressed to the T, well presented and the pillar of what they are looking for. But its becoming more common where they’ll Google your name and finding out who you really are.

What you do online stays with your forever. You don’t need to be prim and proper by any means. Just use common sense. You are managing your brand. What your coworkers see, your family, your future employers. You can be bold and opinionated, you can still speak your mind, but do it intelligently. Something like that single podcast can ignite a fire of comments and posts, links, name mentions, and then when someone Google’s your name, they’ll find your drunk rants And it can be hard to manage your brand once it has taken off like that.

A while ago, I’d found this presentation through Twitter about “Evangelizing Yourself” by Whitney Hess. It is chalked full of wisdom. Listen to it (the play button has the audio, I missed that at first). It is all about what I’m trying to say and says it much better.

Written by krobertson

February 5th, 2010 at 10:55 am

Posted in Uncategorized

Named many-to-many relationships in DataMapper

without comments

When implementing the collaborators feature in Trunks, ran into a bit of a roadblock with how to describe the relationships between the repository and the users.

Trunks is built on Merb and DataMapper, and the implementation involved two models: a User and a Project.  In Trunks, a repository is a Project because “repository” is a reserved attribute in DataMapper and I was running into issues early on, so I changed it to Project.

A project belongs to a user, and a user has many projects.  A user can also be a collaborator on many projects, and a project can have many collaborators.  So I already had the first relationship which was a simple 1-to-many relationship.  The problem came about with adding collaborators.  In a sense, it is a normal many-to-many, which you can do in DataMapper similar to how you would in ActiveRecord using a “has_many …, :through => …” relationship, but I was creating a relationship between models that had an existing and separate relationship.

I didn’t want the user.projects collection to return the projects a user collaborated on, because it was intended to only be the user’s own projects.  And similarly, a project only has one owner, not many.

I essentially found I needed a “has many through” with named attributes.  To do this, I had to actually create the proxy class that goes between the user and the project model, rather than let DataMapper create it automatically.  That way, I can control the attribute names on the proxy class so that “users” and “projects” on the two models won’t get munged.

The result looked like this:

class User
  include DataMapper::Resource
  ...
  has n, :collaborations
  has n, :collab_projects, :model => 'Project', :child_key => [:id], :parent_key => [:user_id], :through => :collaborations
end

class Project
  include DataMapper::Resource
  ...
  has n, :collaborations
  has n, :collab_users, :model => 'User', :child_key => [:id], :parent_key => [:project_id], :through => :collaborations
end

class Collaboration
  include DataMapper::Resource

  property :id, Serial

  belongs_to :collab_user, :model => 'User', :child_key => [:user_id]
  belongs_to :collab_project, :model => 'Project', :child_key => [:project_id]
end

Having a “collab_users” on the Collaboration class allowed me to have the project model to get an attribute name “collab_users” instead of trying to use “users”. And on user, I could have “collab_projects” instead of mixing things up with the existing “projects” attribute.

When creating the relationships though, I had to be careful to spell out the actual parent and child fields rather than let them be automatic, otherwise it’ll run into issues where it’ll try to create ‘collab_user_id’ and ‘collab_project_id’ and inserts will fail because those columns aren’t being set.

Take the relationship definition on the user model:

	has n, :collab_projects, :model => 'Project', :child_key => [:id], :parent_key => [:user_id], :through => :collaborations

I am essentially saying create an attribute named ‘collab_projects’ through the ‘collaborations’ relationship. ‘collab_projects’ will be of type Project. The child key on Collaboration is ‘user_id’ and the primary key on User is ‘id’. Here, the fields I specify are betwen User and Collaboration, not on Project.

Then on Collaboration, I have:

	belongs_to :collab_user, :model => 'User', :child_key => [:user_id]

Here I tell it that the attribute on Collaboration will be collab_user, but the model it is linking to is User. Basically saying I belong to user, but don’t create an attribute named ‘user’. I set the child key, which is the column on Collaboration, to be ‘user_id’. I want to do that else it will try to create it as ‘collab_user_id’.

Confused enough? All boils down to having two separate relationships between two models and having control over what the attributes are named. After all this, I end up with these calls:

  user.projects        # the user's own projects
  user.collab_projects # the projects the user is a collaborator on

  project.user         # the owner of the project
  project.collab_users # the users who are collaborators on the project

Written by krobertson

January 27th, 2010 at 5:59 pm

Posted in Code

Tagged with , ,

Migrated to Wordpress

with 4 comments

Last night, I managed to migrate my blog from Graffiti CMS to Wordpress.  I’d been planning on migrating for a while now, but was just a matter of finally deciding to sit down and do it.

Now, why did I migrate?  I know a few months ago, I caught some slack on Twitter for stating that I didn’t see what all the drama of Graffiti was (this was pre-open sourcing).  And I still agree.  Graffiti is a great tool, I liked it and enjoyed it, however I ended up deciding to migrate for a few reasons.  Most notably is the plugin availability/community.  Wordpress has tons of plugins for different tasks, and Graffiti simply doesn’t compare.  The developer in me may say “hey, I can write my own” but the reality is that I don’t have the time to or really want to.  I wanted a drop dead easy way to include code in posts and didn’t want to fiddle with it.  I wanted easier media management, and didn’t want to come up with something new on my own.

I had tossed around the idea of writing my own blog, since I wanted it off of my servers, but other services like Tumblr didn’t have quite what I wanted.  I thought it’d be nice to write a simple one in Sinatra and use MongoDB (via MongoHQ), but it again came back to prioritizing my different projects, and I’ve called blogging apps the new Hello World.

In the move, I also wanted to bring over all the content from my original blog, qgyen.net.  Now I have basically merged the two blogs and everything from my original domain redirects here.  Gain more Google juice, and sadly, my old dead domain had still more subscribers than my new domain.  Yes, sad.

How was the process?  Not too bad.  For a good guide, I’d recommend Jef’s post.  I made a few changes through:

  1. I used the original VB.NET version of the GraffitiToBlogML tool (link).  I simply found it before the C# port.
  2. Both the VB.NET and C# version have the potential of producing invalid XML.  They basically just write out XML directly rather than use the XML libraries within .NET.  One place it broke for me is they produce the old post uri by basically taking the post title and replacing spaces with dashes, it doesn’t strip invalid characters.  I had some posts with quotations in the title (“).  This broke the XML.  I changed it to use category.LinkName and post.Name, since those are the url-able portions used directly by Graffiti itself.  I also had it not append .aspx, since Graffiti didn’t actually do that.
  3. BlogML seems to treat the post’s creation date as when it was published.  This can be an issue if you had migrated from to Graffiti from something else, like Community Server.  I had 300 or so posts who up in December 2007 at first.  The migrated data had a creation date of December 2007, but Graffiti had a separate Published field that marked when it was published.  I changed the tool to handle that so I could migrate my data right.
  4. At least in the VB version, it was missing some null checks, I ran into one with TagList being null instead of empty.
  5. I imported using the BlogML importer rather than MoveableType one.  For a really simple walk through, see this guide.

I will definitely miss Graffiti, since it was so drop dead simple to use.

Written by krobertson

January 27th, 2010 at 9:08 am

Posted in Life

Expressing your passion

without comments

How many times have you had a dialog like this: I should blog more. Well, what should I blog about? I don’t know, but I should blog more. Though I don’t want to sound boring, repetitive, or like an idiot.

There are probably two problems at work. First, over thinking a simple problem. Second, finding what you’re passionate about and how to best express it.

There are a number of top notch developers who hardly blog, if at all. But when they do, it is pure gold. They speak more with code than on a blog. You don’t need jaw dropping libraries or masterpieces, but just useful stuff someone else might want to use or read.

Want to find who the true leaders are? Find prominent people on Github and see what public projects they have. You’ll quickly find who is down in the trenches.

Having just gone to CodeMash 2010 last week, I am certainly reinvigorated. Especially after Joe O’Brien’s talk on "Refactoring the Programmer". It was really spot on with so many aspects of the developer’s lifestyle. Some things I’ve taken aware include:

  • Scale back Twitter. It is a time sink and I rarely learn anything in that time.
  • Blogs are great, but scale back what I read. All about finding the most value for the time. Ignore banter and marketing. Find others who show their passion in their writing.
  • Write more code. Practice makes perfect. Athletes aren’t just naturally talented, they practice and are learning continuously.
  • Read more code. Find good leaders, and learn what they’ve already learned.
  • Read more books. And not necessarily on coding, but on improving yourself overall. Happiness comes from improving overall, not just in your profession.
  • Go out on limbs. If you put something out there, ask people to check it out and give some honest feedback. If you have a question, seek out an expert and ask them. Be clear in what you’re asking, courteous of their time, and you’ll likely get your answer.
  • Reciprocate. Asking for help is a two way street, so you need to pay it forward. Forming connections goes a lot further than hoarding your time.

My passion is in doing. Nothing makes me happier than hacking away on some random idea until 1-2am. And it is contagious. It bleeds over into all that I do. I put out better work during the day. I’m glad to go to the grocery store before dinner. I’m happier paying my bills. I came out of CodeMash with a whole list of things to hack on and hopefully get even more to keep me invigorated until next year. And hopefully I can throw the ideas out there, get some feedback, and create some value for others.

Written by krobertson

January 19th, 2010 at 4:21 pm

Posted in Life

Branding Gone Wild

with 2 comments

Typically, branding is a good thing. It brings you recognition, people remember your product/company, and you hope all that translates into sales. But branding can also be over done. Take for example, Smart Assembly (or rather "{smartassembly}") which was acquired by Red Gate back in September. Don’t get me wrong, Smart Assembly makes and awesome product, but the whole "{…}" thing has been over done.

For example:

  1. All emails from them use "{smartassembly}". Even one hand typed support response did.
  2. The press release when Red Gate acquired them uses "{smartassembly}".
  3. The MSI you download is "{smartassembly}.Setup.msi"
  4. The default install location is "C:\Program Files\{smartassembly}"
  5. The executables are "{smartassembly}.exe" and "{smartassembly}.com" (.com is the command line runtime)
  6. And yes, the project files you save are ".{sa}proj".

The branding can also get in the way of using the product. Smart Assembly is an obfuscation tool, so it would seem reasonable to integrate into your build process. NAnt is of course a popular tool for automated builds. NAnt scripts use $"{…}" to denote variables. While this doesn’t create a conflict, in my opinion, it clouds your scripts with excess brackets and takes away from the experience. (Note: I originally said it conflicts with using NAnt, I was wrong, since I jumped ahead of myself forgetting that it needs a ‘$’ too)

Branding is cool and all, but when it gets in the way of your user experience by overuse, its gone a bit too far.

Written by krobertson

January 11th, 2010 at 1:10 pm

Posted in Software

New Papercut release!

with 3 comments

This afternoon, finally zipped up and released a new version of Papercut. This release features:

  • Our first release to CodePlex! Yes, Papercut is now a CodePlex project, and also hosts the source. Feel free to grab the source, contribute patches, and enjoy.
  • Fixes a long standing bug with the handling of quoted printable messages where stray equal signs (=) would show up within messages. I had fixed this bug a while ago, but was slow to actually package up a release.
  • The forward message dialog box now remembers the settings you enter. I had to use Papercut this afternoon, found myself wishing it remembered the settings, so I quickly opened the project and added it in.

So please head on over and download it now!

Written by krobertson

December 18th, 2009 at 4:32 pm

Posted in Software

Trunks: Its alive!

without comments

Pleased to announce that Trunks is live! Head over to the new Trunks blog for the announcement.

We are now open to beta requests and will be sending them out here and there. I’m really wanting to progressively increase the number of users, rather than unleashing hell on it.

I’ve had everything pretty much ready to deploy since Friday, but then decided to wait until Monday, but then after doing the whole family pictures with Santa on Sunday night, was too pooped to write up the blog post and the final tweaks on the website. Plus I was going to a concert last night. Last night, finally got the last two things all done and everything was set for this morning.

So please head to the site, sign up to receive a beta invite, and enjoy!

Written by krobertson

November 17th, 2009 at 7:39 am

Posted in Code

Trunks: Pricing plans and features

with 5 comments

With every new online service, the question on everyone’s mind is always "what is it going to cost?" Or perhaps these days, "how much will I get for free?" The "freemium" model has become very prevalent in online services, where you get a basic set of services for free and then pay for the really good stuff.

Trunks is a bootstrapped service on a small budget. Trunks will run off hardware I already own, in datacenter space I already have space in. No cloud wizardry here. It is cheaper for me to shift around stuff I was already running to free up a few servers than to do anything else.

My goal with Trunks is to grow it organically (sweet, got in a buzz word). As it grows, I want to ensure it is covering the costs it generates. If it grows slowly, that is fine. That would actually be better. I’d love to see the hardware gradually grow in utilization than to have explosive growth and be beyond capacity.

In my view, there are two problems with the freemium model to where it won’t fit with my goals.

First, conversion ratios are unknown. At the time you launch your service, you have no idea what your conversation ratio will be, and thus your profitability and financial stability. Good ratios are often considered to be 4% or greater. Some companies get awesome conversion ratios. But it can go very bad too. You may end up with only 2%, 1%… or 0. With my budget, I don’t have the capital to risk on a bad conversion ratio.

Second, your paid users are carrying your free users. A conversion ratio of around 5% is often considered good, but with that you have one paid user covering 19 free users. The argument for freemium is that with most online services, the cost per user is so minimal that it doesn’t matter. But with hosted source control, I’m essentially selling storage which has a much more noticeable cost. Storage is made up of capacity and throughput, and you can only have one of the other. Throughput is definitely better since otherwise you’ll get poor performance before you use all your capacity.

So where is all this leading?

Trunks will be a "premium" service. When you sign up, you will get a 30 day trial. At the end of the trial, you either upgrade to a paid plan or your repositories get locked. If your account gets locked, the repositories won’t be accessible, but will be kept around for a while in case you decide to come back.

The cost of a paid plan? The entry level plan will be $20-30/year for 250mb storage. Yes, you read that right if it sounds too low. Since my target audience is individual developers, I wanted to keep it to a reasonable price. Developers often love getting toys, but need to get spouse approval too. Since everyone using the service beyond 30 days is paying, users are paying for themselves and not themselves and 19 others. I can have a lower price while still maintaining a decent profit margin. Additionally, I sell based on space. You get 250mb, which will likely be acceptable for most. If you need more space, then that is when you upgrade.

Other features will include:

  • Unlimited repositories
  • Unlimited collaborators
  • SSL Encryption
  • Dual Remote Backups

With collaborators, one thing that will be possible is for you to collaborate with others, where you can give them commit access to one of your repositories. They don’t need a paid account either. After your trial period, if you don’t pay for an account, you will be classified as a collaborator. You can’t have your own repositories, but you can still commit to other user’s repositories that have been shared with you.

SSL Encryption is obvious: keep your communications secure. The website is entirely over HTTPS. SVN access is entirely over HTTPS. And all mercurial and git access is over SSH.

The dual backups is where data is backed up to two offsite locations. One site is backed up to with every commit, the other one is backed up to nightly, and maintains a history of repositories.

Written by krobertson

November 14th, 2009 at 8:19 pm

Posted in Code