Sunday, 30 October 2016

Talking at WordCamp Europe 2016 about News Corp Australia's migration to WordPress VIP

At the start of 2016 myself and Juan (a friend and colleague) decided it would be a great idea to present on NewsCorp Australia's migration to WordPress at WordCamp Europe in Vienna. We had been through a huge migration project over the previous 18 month, moving Australia's largest media websites over to an enterprise version of WordPress.

Jump forward 2 months and we found out we were accepted, followed by 2 months of planning and cramming 18 months of project work into a 30 minute presentation. Before you know it we were flying out to Vienna from Sydney Australia for a quick 3 day trip at WordCamp Europe - the largest WordCamp event in the world.

Below is the talk we presented and transcript on how we scaled WordPress to support over 3 million posts across 22 sites, with over 16 million users a day.




Transcript

00:00

18 months ago, News Corp Australia, started its journey migrating all of its websites over to WordPres. Both Juan here and myself were on the software engineering team part of that migration project. We had often heard at the start of this project, WordPress doesn't scale. Well our talk today is to prove them wrong, and to demonstrate that WordPress provides a really good base. It's a really solid platform, and if you implement the correct technical solutions to extend WordPress, it can scale really well for large and high volume traffic websites like ours.

00:48

Historically people may have interpreted News Corp Australia to be a little like this, a 90 year old company, potentially focusing a lot on their newspaper side of the business. But over the past 18 months, a few of the people you can see in this photo plus a lot more in News Corp, migrated 22 of Australia largest sites including news.com.au, theaustralian.com.au & foxsports.com.au over to WordPress VIP. We've imported over 3 millions posts, and are receiving around 16 million users a day.

01:26

Why did we move to WordPress VIP? First of all, what is WordPress VIP? WordPress VIP is a hosting option for WordPress sites that is powered by wordpress.com which is an enterprise level hosting. We needed to escape our old CMS, which was unscalable and very expensive to maintain. WordPress was chosen because it is a very reliable tool, and most of our journalists are already familiar with, making training and up skilling across thousands of journalists much easier. We just needed to find a way to scale WordPress to meet the needs of our business.

02:06

Lastly and possibly the best part, with WordPress VIP we didn't need to worry about this. wordpress.com manages our production infrastructure, so updates to the core get directly deployed, we don't need to worry about updating core, or server patches. Our main focus is developing the best new features for our customers.

At a high level we have developed 45 WordPress plugins and 22 themes. apart from our production infrastructure, we run 58 AWS non production environments, which are a combination of EC2, RDS and SQS, where we try to mimic as much as technically possible the Wordpress.com environments. From a deployment perspective, we run 25 deployments to production each week through our CI/CD platform; and we love our unit tests, we have over 5000 and counting.



03:21

Our journey started in November 2014, when we look back at those 18 months, there are 5 key factors that really challenged us. We wish we had of known about these before we started the project. however, encountering these challenges and discovering their solutions were part of our journey, and actually a great team building exercise. I'm not going to lie though, it was a tough tough journey.

We wanted to share our story for 2 main reasons. First, to demonstrate that WordPress can scale for the enterprise, and secondly, to share the challenges we faced and the solutions we implemented.

4:03

We are going to focus on 5 key areas; site build; front-end; content ingestion: authoring - how we are creating and updating new stories; as well as continuous integration and deployments. We are going to talk about the challenges we faced and the solutions we implemented.

4:24

The first talking point, side build. We really needed the most simplistic way for our WordPress admins to manage and organise the content on our sites. Customizer was a great start to this, it provided a really good base, but we needed to extend it, and this is the functionality we are going to take you through. Working really closely with the XWP team who I can see a lot of in the front here. we added a lot of functionality into customizer, a fair bit of which has been moved into WordPress core. Here is a short video demonstrating just some of the few key features that have been added in or may already now exist in WordPress core.

05:02

All of our sites utilize multiple content areas - that is pretty standard stuff. Within each of our content areas, we have this concept of default widgets, or also known as global widgets, which will render on every single page. We are currently on the business page, and we are looking at the right hand rail, it's just a list of stories. If we now go to the news section in the navigation, you will notice that the widgets we just saw and actually the same widgets that were just rendered. So we can spin up a lot of pages that have very similar structure very quickly. Then to extend that, there is a plugin which we developed which allows you to override the widgets on a specific page and content area. We are on the homepage now, and by checking the 'localise to the current page' checkbox, it allows your to add a completely different set of new widgets for a specific content area for that specific page.

05:54

We call these contextual settings and they are stored as a JSON data structure as a custom post type in WordPress. The next thing you are going to see is our homepage, and that is the next big challenge we had. We have a lot of widgets that keep scrolling, and scrolling, and it's maybe a bit too much content, and that's what caused us a few challenges. By default WordPress as I'm sure a few of us know, stores all of its widget data in sidebars which is then stored in WordPress options. Because we have a huge huge homepage, it meant that the amount of data we were storing in WordPress options was more than 1mb. We leverage Memcache quite extensively, and 1mb is unfortunately the limit in Memcache, so as soon as any key has more than 1mb of data, it's not longer getting cached. This meant that for every single page request, we weren't caching WordPress options, which was a huge performance issue for us.

Another plugin was developed, we called this widgets-plus. Essentially it's a new custom post type and it stores all of the widget data now in its own custom post type in WordPress. This means that we can store everything relevant to just a single widget in isolation, and cache it in isolation, we don't need to worry about caching everything as part of WordPress options. Within our previous CMS it took us hours, sometimes a day to build a relatively simple page. within WordPress, we only showed you a very small part of it, but we can build really complex pages in literally a few minutes. It's a huge efficiency boost for our business.

07:24

After a few months we had a solution for site build, now the front-end architecture was really key for us. We needed to make sure when all of the sites were moving across into WordPress, we weren't going to end up with a mess of spaghetti front-end code.


07:38

I'm sure all of you here like us who are developers (front-end or backend) know that PHP as a templating language, isn't necessarily the best approach. It gets really messy really quickly. We needed a templaitng engine to ensure we decoupled our front-end code from our backend. Twig was our templating engine of choice, and there are 2 plugins that power this implementation. The first one is called vip-twig, and this one is responsible for integrating Twig into WordPress core - it is a very standalone plugin. Our next plugin, template-integration is responsible for exposing all of those core WordPress functions into our twig templates. It basically bridges the gap between WordPress core and the data our templates need to render out posts on the front-end. The good part about this is if we need to implement a new templating language because twig is no longer the laungauge to use, we can keep our template-integration plugin as it's all reusable, and we can simply build a new vip-smarty or vip-ejs plugin to manage that integration for that new templating engine.

08:40

We have now built our sites, and we have a templating engine in place, now we need content. We have been working on content ingestion functionality for around 6 months in parallel, and in May 2015, we were moving into beta testing in a VIP production environment, planning on launching our first site in June 2015. As you can see from our slides, the timeline is no where near the far right of the screen - yes we ran into some problems.

09:14

In our first attempt we ran into 2 major problems. First our posts are authored in a different platform, they are not authored in WordPress, and we had a benchmark of 60 seconds that content needed to be published into the WordPress front-end from the moment it was saved. Secondly we were seeing some rare race conditions in content ingestion, which caused content to be duplicated on our WordPress sites.

As well as the above 2 major problems, when we moved into production we needed to do an initial content import of around 300,000 posts per site. We didn't have MySQL access as our sites are hosted on wordpress.com - we couldn't expect them to give us access. It was a challenge to import that amount of information. Lastly, after that we, would have a constant stream of data updates including news stories, videos, photos coming into our sites at a rate of around 16 updates per second.

10:28

We needed an architecture to support this. Here is a simplified diagram (it is a very simplified diagram) of what we have implemented. It can be read from bottom to top. First our journalist will create a story or update it in one of our editorial tools. This post will then be sent to our API platform which manages all of our content. It will organise it, categorise it, and make it available to other platforms to consume it, WordPress being one of those platforms. On top of that, as soon as a message arrives into the API platform, it will send a notification to our SQS system in WordPress informing it of the new story and to 'start' ingesting.

11:18

With the help of the WordPress VIP team, we developed an ingestor daemon, we call it Turbo, because in the beginning it was built using WordPress crons, but wp-crons were not meant to work in that way. Like I said before, we were doing 16 updates a second and wp-cron were not coping with that, they were not fast enough. We needed to create this multi-threaded daemon. This whole process that you are seeing is what we call 'end to end publishing'. With this architecture in place, we have a single entry point for content imports, whether we are doing 300,000 updates in an initial import, or 16 updates a second. We also have with SQS, reduced the amount of duplicate content, actually it is now at zero, and finally we can ingest content in less than 60 seconds, we are basically ingesting content in less than 10 seconds, as soon as a journalist saves a story, it appears on our site within about 6 seconds.

12:38

Another challenge that we faced was the amount of fields and data we had that represented a 'NewsStory', or a 'WordPress post in our business. This is stored in our APIs, this is the object we get. As you can see there is a trimmed down version of how our JSON object looks, it's massive, it's a really big object. At the beginning we thought lets split this into pieces, save everything into post meta fields, and build them up again to send writes back to our APIs. However, we thought, why re-invent the wheel, if we are already using this JSON object across all our other platforms, let's just save it in our WordPress database, and consume it from there, and read it from there. Due to this we decided to store this JSON object in the post_content field in the posts table. With this architecture, we needed to change how our templates work, and how our plugins work with this field. We are no longer reading it as text field, but as a JSON object.

Saving the JSON object here allowed us to have revisions of an entire post object. Also when rendering a post on the page, we can do it with a single database call. We finally kept a simple rule, we would only use post_meta for searchable parameters.

All this gave us a huge performance benefits by more than halving the amount of database calls.

14:19

As you can probably guess, June came and past and we didn't launch our sites. But they stayed in beta testing for a few months. Our internal development team, the XWP team, and WordPress VIP team, worked for around 2 months to implement solutions to all the problems we just discussed; and one night in September, a very very late night, we launched our first site on WordPress VIP. It was a night filled with anxiety, way too much pizza, and then a sigh of relief when it switched on and everything just worked.

By now we have implemented a simplistic site build approach, our front-end is clean, and we have content quickly ingesting into our WordPress sites.

15:21

Now we needed to use WordPress for what it is meant to be used for; for editing and creating content. We developed these authoring screens within WordPress that expose our data JSON structure that defines our News story content so people can edit it in the most simplistic way. As you can see we leveraged meta boxes quite extensively, and used CSS and Javascript to provide the look and feel we were after. We tried not to deviate from WordPress core at all so any updates that come through to WordPress.com did not affect us.

15:58

Here is another image of one of our authoring screens to demonstrate what we are doing for image management within posts. We are leveraging off the media element in WordPress and allowing authors to edit/crop images within the UI.

16:15

Our category management is also a little different, our stories have to have in the permalink, the URL of the specific category they belong to. So we needed to add in a primary category (or section as we call it), that they have to select when creating a story. Then you can put that story in multiple categories if needed.

16:48

Within the first week of February in 2016, we launched our last 4 sites into wordpress.com in just one week. And everything just rolled out very very smoothly. Then gradually over the next 4 months, until pretty much last week, all that authoring functionality that Juan just showed you, rolled out into a production environment.

17:08

So we needed to maintain this complex WordPress platform that we had setup. We also needed to ensure that at all costs it was impossible for any cowboy coders to rise up and try and take control of our codebase.

For this it was really critical that we had a solid continuous integration and deployment platform setup. We needed to make sure that a developer can get his code or her code from their local environment all the way into production in the most stable and risk free way.



17:31

For us it comes down to feature based branching and deployments. Every feature for sits within it's own feature branch, branched off master in GIT. It's always developed in isolation, it's always tested in isolation, it's always deployed in isolation. Our team chose to never bundle up more than one feature and deploy it at once, as for us this added too many risks and too many dependencies.

Our team then utilizes pull requests to control any code merges into master. This is where code reviews happen, which is one of our favourite pastimes. Every single line of code is code reviewed by another developer. No single line of code reaches production without being code reviewed by at least 1, 2 or sometimes 3 other developers in the team.

18:14

We then utilize Atlassians Bamboo continuous integration platform to manage all of our automation. This is where on every code commit we run PHPCS to make sure all of our coding standards are being met, we run PHPUnit to make sure all of our unit tests are passing and there are no regression issues we are aware of. Once all of those tests pass we package up the application, running any front-end build tasks in gulp or grunt, we compile our TWIG templates, then we deploy that out to our 50+ non-production environments.

This is when testing happens, either manual regression testing, or we have a suite of selenium based automation tests using the robot framework. Once all manual and automated tests have passed we deploy this feature through our continuous integration platform all the way into the WordPress VIP environment. We celebrate with a few beers, and then we start that process all over again for the next feature.

The best part is, this isn't the process we started with 18 months ago. This evolved over countless retrospectives, team huddles, and big issues with our previous processes, to what it is today.

19:48

With a solid continuous integration and deployment platform setup, the only other challenge we faced was with WordPress VIP you cannot control the exact time your code is deployed to production, and if your feature has changes in 3, 4 or 5 different code repositories, you can't guarantee the order the code in those repositories are always deployed in.

19:32

We needed a way to be able to turn features on and off in a production environment. Welcome to the feature toggle,  This was developed by one of our developers called Roman back in Australia, and it is an implementation of the feature toggle technique by Martin Fowler. It abstracts all of the complexities around feature toggling. So with just a few lines of code you can setup a new feature, you can give it a nice descriptive name, and in the last parameter (the boolean) you can decide if it is active or deactivated by default.

20:00

Then all of our themes and plugins have access to check if this feature is enabled or not. If it is enabled you can execute a certain chunk of code. Lastly this plugin exposes an admin page in WordPress, that lists all of the features we currently have available to us, it's de-active and activated state, and this is where you can toggle a certain feature on or off. This feature toggle has allowed us to, for our really big features, to roll them out to all of our sites from a deployment perspective, and then gradually toggle them on one site at a time so we can stagger the regression testing required.

20:32

So we have come to the end, and what have we really learnt? First of all, as Juan pointed out, migration projects are hard hard work. No matter matter how much planning you do, you are always going to run into unknowns. For us it was really critical for us to get our code into a beta version on the WordPress VIP infrastructure, so we could test our end to end solutions in their environment, rather than our non-production setup.

To all those people that said to us that WordPress isn't going to scale for what we are doing. Well if you just download WordPress core, install it on a single server an try and support 16 million uses, then well no it isn't going to work. But if you implement the correct solution architecture, with things like memcache, elastic search, asynchronous content ingestion, it scales really well for sites like ours.



21:21

Lastly, us partnering with two really awesome teams. With XWP who really put a lot of effort in at the start of our project, their expertise in WordPress was absolutely priceless for us. They were very pivotal in our project, especially in the first 12 months, and a lot of us have become really good mates.

Secondly, with the WordPress VIP team, they just kept going and going just like the energizer bunny, they never stopped no matter what challenges and issues we threw their way.

21:58

At at very high level that was our teams journey to WordPress over the last 18 months. If anyone has any questions I believe we have some time now otherwise please come find us throughout WordCamp in Vienna if you’re chasing any more details on our talk, or just want to have a chat.





Thursday, 4 August 2016

Experimenting with mob programming at NewsCorp Australia


Mob programming has been gathering more and more momentum over the last 6-12 months so we decided to experiment with it in some of our software development teams at NewsCorp Australia. So what exactly is Mob Programming?

”The basic concept of mob programming is simple: the entire team works as a team together on one task at the time. That is: one team – one (active) keyboard – one screen (projector of course). It’s just like doing full-team pair programming.”


Why mob programming?

How often do specific developers in a team only understand certain aspects of the code? And while code reviews are a great collaboration technique, sometimes you need more. Mob programming is a great technique to solve technical problems together as a team, allowing everyone's input to be voiced. As a positive side affect it can also help to encourage pair programming within a development team.

So how did mob programming go in NewsCorp?

Like anything you rarely find success by forcing a particular practice onto a team, it needs to adopted by a team naturally if they see it really adds value. We started by simply running a few exercises with a few of our front-end and back-end development teams to experiment on how mob programming could work within our business.

Front-end development team

 

Our session had 9 developers (rather large for a single session), all taking part in a JavaScript FizzBuzz challenge. Each developer was driving for a single 3 minute interval (we only had time to go around once).

Opening remarks started with, "ahh Fizzbuzz, this should be easy", and "I'm not sure how I feel about people watching me code".

Round 1: & 2:

Mood: Cautious and careful
It started off fast as the first person to drive took the laptop and started coding.. until... the team quickly pointed out, they have to listen to the navigators. This is when everything slowed down and everyone started discussing, "do we do unit tests first", "can we re-read the instructions in the readme", and "this is harder than you think when we all have to do it together". By the end of the first round, the team had created the start of the "FizzBuzz" method.

By the end of the second round the basic FizzBuzz method was complete.

Round 3:

Mood:  Focused and more opinionated
Seeing the team started with the core functionality, at the start of round 3, questions were being raised about starting unit tests, and how they were going to test this functionality. By the end of round 3, we had an initial unit test that tested the basic 'FizzBuzz' was outputting correctly.

Round 4-5:

Mood: Positive and excited
Round 4 and 5 saw a wider set of unit tests being created to test the validity of Fizz, Buzz and FizzBuzz outputs.

Round 6:

Mood: Calm
There was some refactoring of the FizzBuzz application, as well as some cleanup to the unit tests to improve coverage.

Round 7-8:

Mood: Frantic and fun
By the last 2 rounds the team focused on outputting all of the numbers from 1-100 in the console. Challenge complete!

The teams observations:

The team started off slow in giving directions, well, in the way the driver understood the instructions being given. It took a few rounds for the team members to be talking the same language. After this, all someone had to say was, "created a unit test", "copy that function and past it there", "rename this to that".

As expected, there were the more confident developers navigating more than others at the start, but when it was their turn they immediately became the listeners, which gave an opportunity for other developers to voice up.

The mood was also really positive. At the start people were saying, "oh now everyone sees how I code in real time", but by the end everyone realized their peers were quite open and not judgemental.

Back-end development team:

This session had 5 developers, all taking part in the same FizzBuzz challenge (but in PHP). As before, each developer was driving for a single 3 minute interval, however we went around 2 rotations.

Opening remarks started with, "ok, lets see how this goes"

Round 1, 2 & 3:

Mood: Energetic
Team was keen to get started quickly and smashed through most of the requirements in about 2-3 rounds.

Round 4, 5 & 6:

Mood: Strategic and careful
These rounds saw the team work closely on unit testing to get confidence in the core code that was developed. A lot of discussion around correct architecture also took place, when the team decided what and how to refactor.

Round 7, 8 & 9:

Mood: Agreement, disagreement and focus
Within the last 3 rounds, the team rapidly implemented refactoring to allow better unit testing of the implementation of FizzBuzz.

Observations:

Withing 3 rounds the application was done. It was a quick MVP to get the basics in place. This development team had been working together for over 12 months so they knew how to work with each other very well.

The team took the approach of fleshing out an MVP, then adding unit tests, then refactoring the application to improve scalability. Why did they not do unit testing first? This is a team that has thousands of unit tests around their current code base, and there is a culture that code never reaches production without unit tests. To them, getting a quick MVP with the understanding that unit tests will be there for release is already well accepted.

What the team did uncover, was using sessions like this to flesh out complex features together could really speed up the overall delivery of a future feature. Currently code reviews are heavily used in this team, and this is when potential issues in design decisions are picked up. But having a 1-2 hour session to mob out the basic architecture of a new feature could really reduce feedback in code review process.

What's next? Will we keep mob programming?

This is up for each team to decide... I believe we all agree, sessions like this can be great to get an entire team over a single concept or feature, or to architect a new application with everyones input right from the start.

It was definitely a fun exercise that we will keep experimenting with.


Monday, 14 March 2016

Making code reviews part of your team culture


Code reviews are like unit testing, as developers, we really want to do them, but not every team is able to introduce them successfully. This is a post around a presentation I recently gave at What Do You Know Sydney about how to make code reviews part of your development team culture.

I want to share how code reviews in one of the teams NewsCorp are no longer a manual chore, but are a part of the team culture and are treated in the same way as scoping, development and testing.

I've worked with amazing development leads over my career who dedicate so many hours each and every night code reviewing and providing great feedback on the code developers in the team have developed. Although this works and adds huge value in ensuring the highest level of code, it is not a scalable solution.

What is a code review?

“Using your experience as a developer to review and assume joint ownership of that code”

What do code reviews involve?

  • Checking out the code locally and ensuring it runs as expected.
  • Making sure the code complies to the teams syntax standards.
  • Running the unit tests (and ensuring there are unit tests).
  • Making sure the code architecture & design patterns being used are the right ones.
  • Being 100% certain there are no security vulnerabilities in the code.
  • ...Every team will have it's own interpretation of what a code review entails...

Why do code reviews fail within some teams?

  • It's an ad-hoc process.
  • Developers don't understand what a code review should involve.
  • There is not enough time allocated to code reviews.
  • When only the leads or senior developers are doing the code reviews.
  • Most importantly, when there are no surrounding processes in place to encourage code reviews.

How do you make code reviews part of your development team culture?


Make it part of your task estimation

  • Unit testing is usually around 20% of the development time (on average), we have found that code reviews are around 5% of the development time. If a feature takes a 5 day week to develop, usually half a day is enough to code review this newly developed feature. 
  • Remember to allow some time to implement development feedback from the code review.


Utilize Pull Requests (and enforce approvals before merging)

In their simplest form, pull requests are a mechanism for a developer to notify team members that they have completed a feature. In the open source community this is normally how new changes are submitted to core contributors to review and implement into the core open source platform. Within your team ensuring that at least one developer must code review and approve a Pull Request before it is merged is a great step to encouraging code reviews within a team.

Use a great tool like Stash for code reviews

  • Stash is a great tool from Atlassian that makes code reviews simple.
  • You can configure stash to enforce an approver for every code review before it can be merged. 
  • You can actually take it a step further and not allow developers to merge code (only allow testers to merge Pull Requests, this allows the testing team to control the flow of code that goes into UAT and production environments). 
  • Stash has an awesome UI to make comments on code reviews easy and to even allow adding tasks to Pull Requests that have to be completed before merging.

Add it to your Agile board

  • Don't hide the code review process - keep it out in the open and display it on your Agile board.
  • This encourages the wider business to accept code reviews as part of the day-to-day.
  • If a tasks fails code review it moves back into development.


Use code reviews as a knowledge sharing tool

This is the single greatest value code reviews have given the team. I often hear in development teams that this developer knows this, and that developer knows that. When your team embraces code reviews, you obviously still have people experienced more in different areas of the code, but at least 2 developers understand the same area of the code. This is because 1 developer has coded it, and at least 1 other developer has reviewed it - they have checked it out locally, ran the unit tests, understood the code and gave their tick of approval for it to be released.

How can you introduce code reviews in your team?

  • Don't ask permission just do it!
    As developers it is our responsibility to own building stable, secure and scalable code. If we need to spend time unit testing and code reviewing our code, then that is our responsibility to do so (and fight for it).
  • Use a great tool to encourage code reviews like Stash.
  • Use Pull Requests within your branching strategy and enforce approvals.
  • Don’t have exemptions - it's all or nothing!!
  • Have a reward each sprint or month for the person who does the most amount of code reviews
  • Have a token for code review immunity if you find some developers are getting too many code reviews.
    We use a Jedi robe, if you are the one wearing the Jedi robe (of which there is only 1), then no person can assign you a code review until you take off the robe.
  • Most importantly, be an honest code reviewer.
    Don't just approve a Pull Request, every constructive piece of feedback you add, is making your whole team stronger and better developers.

The team I am referring to in this presentation has code reviewed  over 172,000 lines of code in the last 12 months which in my books is an impressive statistic.