Tuesday, 9 December 2014

How to run stand-ups efficiently

Stand-ups are designed to be a short, fast and efficient meeting each morning to keep projects on track and to expose and provide visibility on blockers to the team, the SCRUM master and the product owner.

Although this sounds relatively straight forward, I've seen and been involved in too many stand-ups that get distracted when its not effectively managed by the SCRUM master. Below are some areas to focus on when running and trying to improve your daily stand-ups.


Each person talks about 3 things only

Each team member has the permission to talk about 3 things only. 
  • What they did yesterday
  • What they are doing today
  • Any blockers they are facing

They do not need to go into technical implementation details, or how they solved a really hard problem. Most people can communicate the 3 points above in less than 60 seconds. Each person needs to focus on communicating to the team the highlights and not the technical details. If this doesn't happen, the SCRUM master needs to intervene, make a note of the discussion subject and take it offline.

Start the stand-up at the same place and same time every morning

Consistency is so important when it comes to stand-ups. Find a time in the day that your entire team is available and a place that is close or next to the task board.
  • If needed, you can get the team to pick the time of the stand-up so that it encourages ownership from the team to make it to the stand-up on time each morning.

Make sure everyone stands up!

Enough said - it will make your stand-ups more efficient.

Communicate with the team, not the task board

Stand-ups are about communicating progress and blockers to the team, not to the SCRUM master and not to the task board.

SCRUM master to record blockers and ensure their resolution

When a person raises a blocker, or a problem, it needs to be recorded, mostly just by the SCRUM master so they can follow up to find out if they need any help resolving, and also more importantly, so that it doesn't get forgotten and that it gets resolved.

Take technical discussions offline

If any person in the team starts going into technical details (or starts talking about their weekend), they need to be stopped by the SCRUM master and told to take the conversation offline after the stand-up. The SCRUM master can take a quick note about the subject, and grab the relevant people after the stand-up to discuss in more detail.

Team members should know what they have to say before the stand-up

Team members shouldn't spend the first 60 seconds of their update trying to remember what they did yesterday, and what they are going to do today. Everyone should prepare before hand and if they need to, come to the meeting with a list of things to talk about.

A team size of 5-8 people is ideal - don't make it larger than 12 people

12 people in a standup is large, but if everyone can stick to 60 second updates you can get it done in the 15 minutes. Essentially you want to make sure that everyone who attends the standup is providing benefit to the other people in the stand-up. If their updates are not relevant or required, think about if they need to be involved in the stand-up. 

No phones or tablets at the stand-up

People participating in the stand-up should be focused and always paying attention - there is no need for people to be on their phone checking emails throughout stand-up.

Last team member to arrive starts first

This is a good practice to help get your team to arrive promptly to stand-ups. If they are the last person to arrive in the morning, they need to go first at the stand-up (as the entire team has had to wait for them). Don't start the stand-up until everyone is present.

Randomise the order of who speaks at the stand-up

You want to keep things interesting and everyone paying attention. Some days you can go left to right, other days right to left. You can even use a 'talking token', we've previously used a football and once you have finished your update you would throw it to the next person of your choosing. People often look for the person that wasn't paying attention to throw it to - it's a great way to keep everyone paying attention.

Use a heavy ball as the 'talking token' if required

An approach to stop people from talking too much and getting into too much detail is to use a heavy object (like a medicine ball) as the 'talking token'. People usually don't want to be holding it for too long so they will tend to be more efficient in their update. Don't make it so heavy that you cause your team back damage though.

Stand close together

Huddle together as a team, there is no need to have a metre of space between each person. The SCRUM master should ensure people don't have to yell to be heard.

The team should run the meeting (not the manager)

Stand-ups aren't about providing updates to the manager (although it is a useful way for managers to stay up to date), it is about the team communicating and providing visibility about user stories to the team. The SCRUM master is there to keep the stand-ups under control and on track.

Lastly, remember to have fun at your stand-ups!

Friday, 5 December 2014

What makes you a senior software engineer?

Are you a senior software engineer?

Software engineers are a sought after resource at the moment. You will sometimes see job adverts stating someone with 4 years+ experience can satisfy the requirements of a senior software engineer. But regardless of experience, what really makes someone a senior software engineer?

1. Accountability and ownership of their code

Ownership of a project is one of the biggest steps a senior software engineer makes. No longer are they a junior engineer that can rely on their coding issues being picked up by senior engineers, now they are the last line of defence for stable, secure applications that are delivered on-time. They are 100% responsible for ensuring that the their own code and their teams code is well tested, scalable and secure.

2. They can solve complex technical problems (any problem)

Senior software engineers have the expertise to look at any problem and find a solution (or undertake the required research to find a solution themselves). They sometimes work with their team members to find a solution, but they never come back to their team lead, or manager without a solution or variation of a solution. I believe this is one of the most distinguishable traits of a senior software engineer.

3. They develop reusable code

Senior software engineers understand that the code they develop may be used through the command line, from a controller in an MVC framework, or even from within a unit test. They develop their code to be used in any environment without duplication. This essentially means building loosely coupled code within the platforms they work in.

Service Orientated Architecture (SOA) is a really good methodology that promotes decoupling of code and reduces code duplication. Also have a read of coding to interfaces to help achieve code loosely coupled code.

4. They code for the future

Senior software engineers approach coding a solution to a problem by looking into how the component they are developing may need to be used in the future or extended in the future. They write their code so it is easily extensible and doesn't require a full re-write to add additional functionality at a later date. This is easier said than done, but after working on a variety of platforms, your past experiences in software development really help you excel in this area.

5. They unit test their code

Whether you have allocated time or not, a senior software engineer should and will unit test their code. They see the benefits of automated testing and will write unit tests to cover off at least the core libraries of their codebase.

Some helpful links on unit testing

6. They can estimate work effort and gather requirements

Work effort estimation is a skill that takes many years to master. It is one skill to be able to accurately estimate work effort for your own tasks, but being able to effectively estimate work effort for your team members is a really valuable skill. Although in Agile development environments the entire team is involved in work effort estimation, senior software engineers should be able to identify when work effort estimates are too low or too high and voice their concern to correct the situation.

7. They mentor junior developers

It is the responsibility of the senior engineer to always look for continuous improvement within the team. That can be exposing the team to more efficient technologies, helping them write more secure code or ensuring the code is well written and unit tested. This mentorship from senior engineers to other software engineers in the team really helps move the team towards greatness.

8. They have a solid understanding of programming methodologies (not just a language).

Senior software engineers have either studied or been exposed to different programming methodologies. It doesn't matter which language they are coding in, software methodologies cross over the boundary of coding languages. These best practices may enhance scalability of an e-commerce website, or provide secure APIs for mobile applications. It is important to understand that senior software engineers can take their learnings from multiple languages and frameworks and apply them to their current projects. They also find it very easy to learn new languages as they already understand the theory behind programming.

9. They're always looking for areas to automate

Senior software engineers are some of the most experienced technical people within a team. They should have a good understanding of how to automate aspects of their teams work. This can include:

  • Partial automation of code reviews (eg: using GIT hooks to run automated tests of coding standards using tools like PHPCS or JSLint).
  • Unit testing code to ensure its security and stability.
  • Implementing continuous integration and deployment services - Checkout Bamboo by Atlassian

10. Complies with and promotes coding standards

Senior software engineers understand the importance of consistency of code within applications. They are responsible for ensuring coding standards are defined within their team, and that everyone within the team is following them. They should also actively review peers code and coding standards to ensure they are up to date.

11. They can explain problems and solutions to non-technical people

A senior software engineer needs to be able to communicate to non-technical people. Although I have met some great senior engineers that maybe don't excel in people skills, they are still able to communicate through the use of diagrams, presentations or user stories. Communication is a crucial skill in progressing past a senior software engineer role to a team lead role.

12. They are always up to date with latest technology

This is actually a hard thing to consistently maintain as there is so much technology out there that is constantly evolving. However, it is crucial that a senior software engineer does their best to stay updated (at least with technologies in their space). You don't always get to experiment with different technologies within the workplace, but side projects you work on outside of work are an amazing opportunity to explore a new framework or technology. Senior software engineers also take this one step further and look for ways to implement their learnings within their current organisation, as well as up-skilling their team members on what they have learnt.

If you want to read and learn about how to become better a software engineer, check out this great post by Artur Ejsmont.

So if you can cover off most (or all) of the above, you've got a pretty impressive chance of landing that senior software engineer role you have been wanting to apply for.

Good Luck!

Wednesday, 26 November 2014

Ideas on Implementing Agile Development


Do you need Agile Methodologies?

The first question to ask yourself is, "do you need to change your current project management processes?"

Don't change something that is already working efficiently just because Agile development is what everyone is doing. Implement Agile (or any project management process for that matter) as a response to solve a problem that you and your team are facing. If your companies backlog or task tracker is feeling something like the image above, maybe keep reading...

At a high level, possible reasons to implement Agile development methodologies could be:

  • To improve team efficiency.
  • To reduce developer distraction and frustration.
  • To improve team focus and operational efficiency.
  • Improve team moral.
  • Provide better visibility of priorities to your team and wider business.
  • Encourage team ownership and accountability for features they develop, test and release.

Get your team's buy in

The number 1 most important thing to do before you implement an Agile development environment is to get your teams support. Without your teams buy in, you are destined for failure. Demonstrate to your team the benefits of Agile methodologies:

  • Run some informal training sessions with your team.
  • Demonstrate how it can solve problems your team is facing
    • Reduces changing priorities.
    • Can reduce stress within the team.
    • Can deliver a more focused development environment.

It is also just as important to get the wider business to support the changing processes. They need to understand how it will affect them, but more so the benefits they will gain as part of this implementation.

  • Reduction on changing business priorities and more focused efforts.
  • ROI drives priorities which drives development effort.
  • Encourages forward planning.
  • Less important features will be de-prioritised and most likely not built.
  • There are shorter more manageable timeframes that are consistently met by the development team.

Start small but aim big

Agile is a massive suite of project management methodologies that can be implemented in so many different ways. It's not advisable (or possible) to implement every single aspect of Agile development in one go - you are really setting yourself up for failure if you do this.

Agile is about creating the most efficient ways to run projects for your team - and remember every team is completely different. What works in one team may not work for your team. So it is crucial that you start with the basics and constantly iterate and improve the processes to create the greatest efficiency - remove what's not working, and add new processes that do work.

Some advice a very experienced Agile coach told me was, "If your offline process for running projects don't work, then how can you expect your online processes to work?".

What he was driving at, is that you really need to get your Agile processes working offline first before you start implementing any online technologies or project management tools. A gradual roll-out plan I have successfully implemented is loosely documented below:

  1. Spend a few weeks training up your team on Agile methodologies (what it is, how it will help).
  2. Document all of the projects/tasks in a single backlog for your business.
  3. Prioritise all of these projects/tasks with your executive or management team (all department heads need to be on the same page).
  4. Work with your team to estimate the work effort for each priority.
  5. Get the most important projects/tasks and setup a basic Agile task board (Swim lanes can include "TODO",  "In Progress", "Done"). Do this on a physical wall or whiteboard.
  6. Start with a 2 week sprint (try really hard to not change priorities).
  7. Run morning stand-ups each day (no more than 15 minutes a day).
  8. Have a single row up the top of the board that is 'unplanned tasks'. This will demonstrate distractions your team is facing that is limiting them from delivering on the sprint.
  9. Run a retrospective at the end of the 2 week sprint (last day, Friday).
  10. Run sprint planning on the next Monday.

Start with a physical task board, create user stories on cards and sub-tasks on post-it notes. After a few sprints, once you have adapted the processes so that they are working, you can move some of these processes to an online project management suite. Atlassian has some great project management tools for managing Agile teams including JIRA and Confluence.


Continuously improve

Agile isn't about putting in place a process and leaving it. It's about constantly improving the processes every single sprint, week or feature release. Perhaps the most important part of the Agile methodologies are running retrospectives frequently and determining what has worked and what needs improving in the next sprint.


Retrospectives

Retrospectives should include everyone that was part of the sprint (developers, designers, QA, product managers). It should be a constructive and quick meeting at the end of each sprint (usually 30 minutes is enough). If you are running the retrospectives you need to ensure the team focuses on constructive feedback and are not negative in their responses.

Get the team to identify:
  • What went well that the team should continue to do.
  • What needs to be improved next sprint.
  • Each improvement recommendation should come a potential solution to the issue (this encourages your team members to always be thinking about solutions, not just problems). 
As a team you can then vote on which areas should be improved within the next sprint. Assign action points and owners to ensure improvement actually occurs (create tasks for your next sprint if you really need to allocate time).

Summary and learnings about Agile

  • It should create visibility for your team and wider business about priorities to focus on.
  • Allows your team to focus on the priorities at hand and reduce changing priorities.
  • Define priorities based off ROI, operations efficiency or other company goals.
  • Run sprint planning throughout the sprint, not just at the start of the next - this way you are constantly planning.
  • Get your team to estimate task work effort, don't estimate as a manager.
  • Assign ownership for each user story so your team develops ownership and accountability for the work they are delivering.
  • Agile should create an environment that your team can flourish in.
  • Allow your team to release quality features, often! Always be delivering.
  • Always improve - your Agile development processes are never finalised, you should always be looking for and promoting improvement.
  • Always run retrospectives at the end of each sprint.



Friday, 7 November 2014

How to lead software developers

I want to start with the keyword 'lead' in the title of this post, and not the word 'manage'. If you can lead a software development team to success it means you are managing them well. But 'managing' a team in no way guarantees greatness.


Work as part of the team

Be a leader who works closely with their team, who sits with them in an open plan environment. If you are always working directly with your team you will absorb so much more about people in your team like:
  • What they excel at.
  • Areas they are struggling in.
  • Any frustrations they may have.
  • Who in other departments is causing distractions to your team (this is critical to understand).
  • If they are being efficient in their day-to-day, or just getting distracted.
  • If they are being challenged or are just bored at work.
  • Understanding the overall sentiment and culture of your team.
You don't want to be that manager guy who sits in his closed off (or open door) office that is sending commands down the wire to his troops.

Remember, software developers are humans just like you.

Don't keep changing priorities of the team

Developers like to understand the scope of work, figure out how to develop it and then start coding away. If you are running a team that has constantly changing priorities that results in already coded solutions being thrown out or re-done just because the scope was outlined wrong, it will really get your developers offside.

How would you like it if you were told to go to a meeting at one of your suppliers, then only to arrive to find out it was actually meant to be in your company's board room, then going back to your office only to then find out it was rescheduled till the following week? Frustrating right? That's the same as a manager constantly changing priorities on a developers day-to-day work.

Developers understand that priorities do change due to financial reasons, change of company direction, but there is no excuse in priorities changing week after week after week.

Running an Agile development environment using SCRUM in 2 week sprints is a great way to define priorities for a 2 week block of work and allow your developers to focus on delivering those priorities without interruption for 2 solid weeks.

Provide feedback in realtime

I admit this is one of the hardest things to do, especially if it is critical feedback. But I always put myself in a developers shoes and think, I would rather know how to improve something I'm doing wrong now so I can better myself now, rather than hearing about it in 3 months or 6 months time when I can barely remember what happened.

As a leader, providing critical feedback comes with the responsibility of providing different ways your team member can improve themselves - be constructive with your critical feedback.

Positive feedback on the other hand is a lot easier. It's much easier to tell a person they are doing a great job and even a thanks for putting in extra hours to get a project over the line.

Also be cautious about giving feedback in public (in a team meeting) or in private (email or one-on-one chat) - different people prefer different methods. It's up to you to be able to understand what each member of your team prefers.

Always make time for your team

This goes for leading any team. If a team member makes the time to come up to you to ask a question, or for help on a problem, they have taken the time to do so, so make the time yourself to listen to them and help them in their query. It's important to stop what you are doing and really interact with them - don't sit at your computer typing away at a financial report while only listening to 50% of what they say - it's rude to them and its making you less efficient in what you are doing. If you always put your team first no matter what you are doing they will respect a whole lot more.

Earn their respect


Respect is a very hard thing to earn, especially if you are walking into a new team. Respect is earned in many ways and it's usually different with every developer. Some areas that help earn respect include:

  • Stand up for you team - don't let 'your' managers control or boss around 'your' team.
  • Be flexible with working hours - understanding team members have a life outside of work.
  • Really technical developers often want a manager to prove they can play ball in the technical space. Show them you can cut code, or help them solve a day to day issue they have been having with a software platform they were working on.
  • Remove distractions - show your team that you can give them an environment that allows them to focus and get the work done.

Performance Reviews

Some people love them, some people hate them, but official performance reviews are an important part of running a business and leading a team. It is so important to frequently catchup with each member (once a fortnight is a good frequency). This means that there are little to no surprises for both of you when you need to do an official performance review. In a performance review, give your team heads up so they can prepare, give them some questions to think about before hand. Within the review, focus on what they did well, areas they can improve on, also provide direction on how they can further their career etc...

Celebrate releases and team successes

Software development goes in cycles of high intensity during projects and slower cycles usually a few weeks after a release. It is really important to celebrate the success of a release or a job well done. Different teams like to celebrate in different ways, whether its going to the pub after work for a few beers, or taking the afternoon off for a few hours of online gaming - be open to whatever your team wants - just make it happen.

Challenge your developers

Most developers love a challenge in their work. There is normally nothing worse than working on bug fixes day in and day out. Developers are usually creative in their own way in how they tackle a software problem or how they code a new feature. You need to balance giving your developers enough scope so they build the solution inline with the requirements, but not enough scope that you are telling them how to code the solution (as that is why you hired them).

Sometimes its helpful to recommend different approaches you would take but ultimately let your developer explore a bit and learn some things for themselves. This way they get more satisfaction about delivering the solution to you as it's their solution. However, it is very important to keep an eye on progress so there aren't any surprises 2 weeks later when the wrong feature was built - this is why daily stand ups are a great way for transparency within the team.

Create an amazing work environment

Work should be fun, it's where you spend more than 8hours a day. As a leader it is up to you make the work environment for your team amazing. Whether that's supplying MacBooks to all your team, setting up a Foosball table, purchasing a PS4 with FIFA or giving the team a fridge full of beers - it's up to you. Just make it fun to work as part of your team!


Thursday, 6 November 2014

Magento Performance Tips for Scalability

Magento is one of the most powerful open source e-commerce platforms out there. However in saying that, it can be a real pain to setup and make it run fast! Below are some areas to focus on if you want to get the most out of your Magento e-commerce website. I wont be going into huge detail about each section, these posts are just here to point you in the right direction when setting up Magento.


Magento Baseline Setup

Installations

The information provided in the below posts are based on setting up 3 different Magento platforms:
  • Installation 1
    1 Magento shopfront running around 4,000 products.
  • Installation 2
    2 Magento shopfronts running over 80,000 products combined.
  • Installation 2
    4 Magento shopfronts running over 700,000 products combined.

Performance

When the Magento sites were first setup (before production & primarily the 700,000 product site) page loads were more than 15seconds, the database was always crashing and we could handle about 10 users on the site - it was just a bad experience. With the approaches I go through in the following posts we can:
  • Serve hundreds (thousands if we add simply add more servers to our load balancer) a second.
  • Magento will return the HTML within 200ms to the browser.
  • Total time to load page is around 1.1 to 1.6 seconds.
  • The site just feels fast and responsive to the user - which is what we all want.

Magento Performance Tips



Magento Part 6 - Caching

<< Back to Magento Performance Tips for Scalability Homepage
With Magento you need to utilise some sort of caching service for it's performance to be acceptable.

MemCache

Memcache is a really easy caching service to setup. Simply setup memcache on all of your front-end servers.

Add this config to your local.xml (I just use it for staging and production environments).

<cache>
    <backend>memcached</backend>
    <slow_backend>database</slow_backend>
    <slow_backend_store_data>0</slow_backend_store_data>
    <auto_refresh_fast_cache>0</auto_refresh_fast_cache>
    <lifetime>259200</lifetime>
    <memcached>
        <servers>
            <server>
                <host><![CDATA[127.0.0.1]]></host>
                <port><![CDATA[11211]]></port>
                <persistent><![CDATA[0]]></persistent>
                <weight><![CDATA[2]]></weight>
                <timeout><![CDATA[5]]></timeout>
                <retry_interval><![CDATA[5]]></retry_interval>
                <status><![CDATA[1]]></status>
            </server>
        </servers>
        <compression><![CDATA[0]]></compression>
        <cache_dir><![CDATA[]]></cache_dir>
        <hashed_directory_level><![CDATA[]]></hashed_directory_level>
        <hashed_directory_umask><![CDATA[]]></hashed_directory_umask>
        <file_name_prefix><![CDATA[]]></file_name_prefix>
    </memcached>
</cache>

Redis Cache

Redis is an amazing caching platform that you can use with Magento. The great news is you can easily setup Redis on AWS ElasticCache.

Below is an example of a Redis configuration in local.xml

<cache>
    <backend>Mage_Cache_Backend_Redis</backend>
    <backend_options>
        <server><![CDATA[REDIS_SERVER_HOST]]></server>
        <port><![CDATA[REDIS_SERVER_PORT]]></port>
        <persistent><![CDATA[]]></persistent>
        <database><![CDATA[0]]></database>
        <password><![CDATA[]]></password>
        <force_standalone><![CDATA[0]]></force_standalone>
        <connect_retries><![CDATA[1]]></connect_retries>
        <read_timeout><![CDATA[10]]></read_timeout>
        <automatic_cleaning_factor><![CDATA[0]]></automatic_cleaning_factor>
        <compress_data><![CDATA[1]]></compress_data>
        <compress_tags><![CDATA[1]]></compress_tags>
        <compress_threshold><![CDATA[20480]]></compress_threshold>
        <compression_lib><![CDATA[gzip]]></compression_lib>
    </backend_options>
    <id_prefix>YOUR_CACHING_PREFIX</id_prefix>
</cache>

Varnish

This is where the fun begins! Varnish is a front-end proxy (or reverse proxy). Varnish is amazing, however with Magento it can be a pain to configure correctly (give yourself a few days to get this configured correctly). The setup is easy, it is the testing and tuning that takes some time. I use a free Magento extension called 'Turpentine - varnish cache' to connect Magento and Varnish.

Grab yourself a copy here
http://www.magentocommerce.com/magento-connect/turpentine-varnish-cache.html

This is the basic setup

  1. Install the Varnish Turpentine extension for Magento.
  2. Install Varnish on all your front-end servers (You could run this one separate servers if you wish, but I haven't needed to).
  3. Change apache over to run on port 8080.
  4. Configure Varnish to run on port 80.
  5. How the request cycle works is:
    1. All/most HTTP traffic comes in and hits your Varnish server.
    2. Varnish checks it's local cache for that content/URL.
    3. If it exists in cache, Varnish returns it immediately to the user. 
    4. If it doesn't exist in cache, Varnish passes the request to Apache (in the background on port 8080).
    5. Apache returns the content to Varnish.
    6. Varnish returns the content to the user.
    7. All subsequent requests are then served from Varnish cache.
Varnish will reduce your CPU usage massively, so it's a great feature to install and configure correctly (it is worth the effort).

Make sure you clear your varnish cache on deploying a new codebase (otherwise you may notice strange behavior/bugs on your site). The deployment scripts in Part 2 - Prepare for Scalability have an example of clearing varnish cache on deployment.

Here is an example of a Magento configuration for Varnish




That's about it - head on back to the Magento Performance Tips for Scalability post if you missed anything.


Magento Part 5 - Bulk Importing

<< Back to Magento Performance Tips for Scalability Homepage


This part isn't really related to Magento performance, but it does provide some helpful recommendations about efficiency when it comes to importing bulk products/images into Magento.

Magmi (Magento Mass Importer)

If you need to import products in bulk (once off or frequently then this is the extension for you, if not feel free to skip this section). I have setup many different Magento sites that import over 80,000 product updates each night, to other sites that are updating products from their ERP system constantly throughout the day all using MAGMI.

MAGMI essentially uses SQL to import your products into Magento REALLY FAST. I have been able to import around 15,000 products in 10-15 seconds.

Some handy extension in MAGMI to enable are:
  • On the fly category creator/importer (This will automatically create categories for you)
  • On the fly indexer v0.1.5 (This will index as you import products)

Bulk image import

I don't want to go into too many details, but here are 2 options to import bulk images into Magento.

Use Magmi to import images

Pros

  • Can import images from a URL (remotely scrape an download images from URLs).
  • It is quite fast.

Cons

  • It's really fiddly to get working.
  • If you only have 1 image, you need to import it 3 times (for the base_image, small_image & thumbnail image).
  • I had problems with it correctly setting the 'Base Image', 'Thumbnail', 'Image' radio buttons for the media gallery.
  • I found it would only import the image to the filesystem (I couldn't get it to upload to the database file storage).

DataFlow Profiles Image Import

Pros

  • Magento built-in functionality.
  • Can import images.
  • Can store images in the database file storage (if enabled in Magento admin).

Cons

  • It's a little slow.
  • Same as MAGMI, you cannot import a single image and set it for 'Base image', 'Small image' & 'Thumbnail' in the gallery of a product.
  • It automatically triggers a re-index after uploading (this could be a 'pro' in some scenarios).

An approach I've used

I ended up using Magento dataflow profiles to import a single image for a product, and then used a custom script to set that image correctly in the media gallery for all image types. A little hacky, but I'm sure there are a few of you out there who may want to do something similar.

Basic steps

  1. Create a new DataFlow profile in System -> Import/Export -> DataFlow Profiles.
  2. Create a CSV with 2 columns, with the headers "sku","image"
  3. Create a record for each SKU you want to import an image for, eg: "1023819","/1023819.jpg"
    Hint: Test with just 1 image first
  4. Notice the '/' at the start of the image filename.
  5. Now upload all of your images that correspond to what you entered in the CSV to /media/import/ directory in Magento.
  6. Upload this file in the 'Upload File' tab.
  7. Once uploaded run the profile in the 'Run Profile Tab'.
  8. Once import is completed I run the below script to set the 'Base image', 'Small image' & 'Thumbnail image' as the first image in the gallery (if not already set).
Create a PHP file in the root Magento directory called 'script-fix-product-images.php' and copy this content in (run at your own risk and always test first):
<?php 

setCurrentStore(Mage_Core_Model_App::ADMIN_STORE_ID);

$entityTypeId = Mage::getModel('eav/entity')->setType('catalog_product')->getTypeId();
$mediaGalleryAttribute = Mage::getModel('catalog/resource_eav_attribute')->loadByCode($entityTypeId, 'media_gallery');

// All products
$products = Mage::getModel('catalog/product')->getCollection()->addAttributeToSelect('thumbnail');

$productCount = count($products);

foreach ($products as $product) {

    $product = Mage::getModel('catalog/product')->load($product->getId());

    $gallery = $product->getMediaGalleryImages();

    if( NULL === $product->getThumbnail() || 'no_selection' === $product->getThumbnail() ||
         NULL === $product->getImage() || 'no_selection' === $product->getImage()) {

        $paths = array();
        foreach ($gallery as $image) {
            $paths[] = $image->getFile();
        }
        sort($paths);

        $path = array_shift($paths);

        if( NULL !== $path ) {
            try {
                $product->setSmallImage($path)
                        ->setThumbnail($path)
                        ->setImage($path)
                        ->save();
            } catch (Exception $e) {
                echo $e->getMessage();
            }
        } else {
            var_dump('No Base Image ' . $product->getId() . ' of ' . $productCount . ' [' . $product->getSku() . '] [' . $product->getThumbnail() . ']');
        }
    }

    // If product has multiple images, delete all but the first one (optional if you only ever have 1 image per product like one of my clients):
    if( count($gallery->getItems()) > 1 ) {
        $index = 1;
        foreach( $gallery->getItems() as $item ) {
            if( $index > 1 ) {
                $mediaGalleryAttribute->getBackend()->removeImage($product, $item->getFile());
                $product->save();
            }
            $index++;
        }
    }
}

print 'Images Done';
exit();

This isn't the most straight forward process but it does work and get around some of the frustrations when trying to bulk upload image in Magento. There are extensions out there that apparently do this.

Part 6 - Magento Caching

Magento Part 4 - Application Tuning

<< Back to Magento Performance Tips for Scalability Homepage


Flat Products/Categories

Enabling flat products and categories within Magento admin makes your website run faster (It allows your Magneto application to do less processing on the front-end).

It's really simple to enable.

In "System -> config -> catalog -> catalog -> frontend"

Enable flat products and categories, and clear your Magento cache.

Merge CSS/Javascript

In "System -> config -> developer" (down the bottom of the page) you can choose to merge your Javascript and CSS files. This can reduce your file requests by 10-30 requests per page load. It's simple to do and you will be able to support a lot more users on your website.

Magento Indexing

This is a blog post in itself (but I will try and be brief). Magento's indexing is such a pain! But it is there to make your site run faster (much faster, it's just unfortunate that their built-in indexing processes are super slow)!

If you are going to be running multiple stores with more than a few thousand products you are going to want to use an asynchronous indexer. I haven't used this extension below but it should give you huge improvements:

In the development team I was leading we wrote our own Magento extension that allowed us to run the indexing in the background with multiple processes running at one time. Instead of indexing 80,000 products one at a time, we can index 1,000 products individually but have 80 threads running which dramatically reduces indexing time.

Indexing (especially the 'Catalog URL Rewrite' was taking over 2.5 hours to complete). It now takes less than 1 minute to run!

If you want me to connect you with the developer who built this amazing extension let me know.

Enable Magento Compiler

Magento has a feature called 'compilation' in 'System -> tools -> compilation' that essentially compiles all files to run Magento so a single include path can be used.

Using the compiler can increase your website performance by more than 25%.

Disable modules you are not using

Magento has a bunch of built in modules that you most likely are not using, but are costing you CPU and Memory resources on your servers.

Go to 'System -> config -> advanced -> Advanced' and simply disable the modules you are not using.

Modules I have disabled in the past include:
  • Mage_Authorizenet
  • Mage_Captcha
  • Mage_Downloadable
  • Mage_Poll
  • Mage_Rating
  • Mage_tag
  • Phonenix_Moneybookers
Thoroughly test your site if you choose to disable any modules.

Set config at a website level

Not really a performance tip, but more of a recommendation if you plan on having multiple stores on your Magento installation.

Before you set any 'default' values within Magento 'System -> config', think to yourself,

"Will this setting apply to every site I run on this Magento installation?"

If it doesn't, drill down to the site config level and override the config there. Even if you are only running 1 store at the moment, try to think about future plans. If you set every config variable at the 'default' level now, if you ever need to install multiple sites you may have to go through and adjust a lot of config variables (which most likely means re-testing your entire website again). Some config options to think about are:
  • Timezone
  • Themes
  • Base URLs
  • Category display (list/grid)
  • Category pagination
  • Customer configuration
  • Facebook configuration

Block bad crawlers?

Magento can really struggle if you are getting hit by a lot of crawlers. Not that I really recommend blocking access to your site from crawlers, but there are scenarios where it may make sense. We have a client that runs a large Australian based e-commerce shop. Their site was being crawled by a bunch of Russian, Chinese and backlink bots that offered no benefit to them. In this case it made sense to block some of those bots (within the .htaccess file) rather than add 1-2 new servers to their cluster.

Below is an example of a few lines you can add to the bottom of your .htaccess file to block some bots (I have remove a bunch of lines so it wasn't a massive file - if you google bad bots you will be able to find the list).

# Block Bad Bots & Scrapers
# -----------------------------------
SetEnvIfNoCase User-Agent "^AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "Aboundex" bad_bot
SetEnvIfNoCase User-Agent "80legs" bad_bot
SetEnvIfNoCase User-Agent "360Spider" bad_bot
… I have removed a few hundred lines from here …
SetEnvIfNoCase User-Agent "^Xenu" bad_bot
SetEnvIfNoCase User-Agent "^Zeus" bad_bot
SetEnvIfNoCase User-Agent "ZmEu" bad_bot
SetEnvIfNoCase User-Agent "^Zyborg" bad_bot

# Vulnerability Scanners
SetEnvIfNoCase User-Agent "Acunetix" bad_bot
SetEnvIfNoCase User-Agent "FHscan" bad_bot

# Aggressive Chinese Search Engine
SetEnvIfNoCase User-Agent "Baiduspider" bad_bot

# Aggressive Russian Search Engine
SetEnvIfNoCase User-Agent "Yandex" bad_bot

<Limit GET POST HEAD>
    Order Allow,Deny
    Allow from all

    Deny from env=bad_bot
</Limit>

You can also block using robots.txt file (but some robots do not honor robots.txt file).

Part 5 - Magento Bulk Importing

Magento Part 3 - MySQL Setup & Performance

<< Back to Magento Performance Tips for Scalability Homepage


MySQL Master/Slave

As mentioned in Part 1 - Infrastructure & Hosting post, you need to take advantage of the MySQL master/slave support in Magento.

Gone are the days of running a single MySQL server for websites. MySQL's Master/slave replication is a great way leverage the power of multiple MySQL servers with very little effort.

Once you have setup your RDS (AWS database instances) as detailed in Part 1 - Infrastructure & Hosting, make sure your local.xml config file looks similar to the below (replace with your database connection details).

<resources>
    <db>
        <table_prefix><![CDATA[]]></table_prefix>
    </db>
    <default_setup>
        <connection>
            <host><![CDATA[RDS_HOST_MASTER:3306]]></host>
            <username><![CDATA[USER]]></username>
            <password><![CDATA[PASSWORD]]></password>
            <dbname><![CDATA[DATABASE]]></dbname>
            <initStatements><![CDATA[SET NAMES utf8]]></initStatements>
            <model><![CDATA[mysql4]]></model>
            <type><![CDATA[pdo_mysql]]></type>
            <pdoType><![CDATA[]]></pdoType>
            <active>1</active>
        </connection>
    </default_setup>
    <default_read>
        <connection>
            <use/>
            <host><![CDATA[RDS_HOST_REPLICA:3306]]></host>
            <username><![CDATA[USER]]></username>
            <password><![CDATA[PASSWORD]]></password>
            <dbname><![CDATA[DATABASE]]></dbname>
            <type><![CDATA[pdo_mysql]]></type>
            <model><![CDATA[mysql4]]></model>
            <pdoType><![CDATA[]]></pdoType>
            <initStatements>SET NAMES utf8</initStatements>
            <active>1</active>
        </connection>
    </default_read>
</resources>

With this basic setup, Magento will push all 'READ' queries to your slave database, and all of the writes and critical reads to your master MySQL server.

Tune MySQL

If you have just released a Magento site and it's not performing don't loose hope, you most likely need to tune MySQL a little to perform better.

There is this great script you can run on your server which helps identify issues you may need to fix. You can find it here:
http://turnkeye.com/blog/magento-performance-optimize-mysql/

Here are some of the config changes 1 usually make to MySQL servers:
key_buffer                 = 16M
max_allowed_packet         = 16M
thread_stack               = 192K
thread_cache_size          = 8
max_connections            = 120
query_cache_limit          = 1M
query_cache_size           = 48M
table_open_cache           = 3000
Part 4 - Magento Application Tuning

Magento Part 2 - Prepare for Scalability

<< Back to Magento Performance Tips for Scalability Homepage


Use a continuous integration/deployment server

Whenever you deploy a Magento store, there are always a bunch of scripts/processes that you will need to run to ensure the environment is setup and running correctly (you really don't want to be doing this manually on each release). I use Capistrano for all code deployments - it's lightweight and relatively easy to setup. Below are some of the tasks the deployment process does:
  • Swaps in production or staging configuration files (discussed below).
  • Clears filecache.
  • Clears memcache.
  • Clears varnish cache.
  • Removes previous builds.
  • Sets correct file permissions on directories (for uploads/imports etc...).

How to setup

deploy.rb

set :application, "APPLICATION_NAME"
set :scm, :git
set :repository, "YOUR_GIT_REPOSITORY"
set :user, "USER"
set :use_sudo, false
set :deploy_via, :remote_cache
set :copy_exclude, ['.git']
set :ssh_options, {:forward_agent => true}
set :keep_releases, 5

set :stages, ["staging", "production"]
set :default_stage, "staging"

default_run_options[:pty] = true

namespace :cache do
  desc "Clear Magento cache\nUsage: cap [stage] cache:clear -s type=[all|image|data|stored|js_css|files]"
  task :clear do
    if type.nil? || type.empty? || type == "all"
      cache_type = "all"
    else
      cache_type = "--clean #{type}"
    end
    run "if [ -e '#{current_path}/magento/shell/clearCache.php' ]; then cd #{current_path}/magento/shell && php clearCache.php -- #{cache_type}; else rm -rf #{current_path}/magento/var/cache/* ; fi "
  end

  task :flush do
    run "cd #{current_path}/magento/shell && php cleanCache.php -- flush"
  end

  task :varnish do
    #run "if [ -e '#{current_path}/magento/shell/varnish.php' ]; then cd #{current_path}/magento/shell && php varnish.php -- apply; fi "
  end
end

# if you want to clean up old releases on each deploy uncomment this:
after "deploy:restart", "deploy:cleanup"
after "deploy:restart", "cache:varnish"
after "deploy:restart", "cache:flush"

deploy/production.rb

server "localhost", :app, :web, :db, :primary => true
set :deploy_to, "DEPLOY_DIRECTORY"
set :branch, "master"

namespace :deploy do
  task :restart, :roles => :web do
    # Copy production config into local.xml
    run "cp #{ current_path }/magento/app/etc/local.xml.production #{ current_path }/magento/app/etc/local.xml"
    run "cp #{ current_path }/magento/errors/local.xml.sample #{ current_path }/magento/errors/local.xml"
    #run "cp #{ current_path }/magento/downloader/connect.production.cfg #{ current_path }/magento/downloader/connect.cfg"
    run "cp #{ current_path }/varnish/default.vcl.production #{ current_path }/varnish/default.vcl"
    run "cp #{ current_path }/varnish/secret.production #{ current_path }/varnish/secret"
    run "chmod 755 #{ current_path }/magento"
    run "chmod 755 #{ current_path }/magento/media"
    run "mkdir -p #{ current_path }/magento/media/catalog/product"
    run "mv #{ current_path }/magento/robots.txt.production #{ current_path }/magento/robots.txt"
    run "php -f #{ current_path }/magento/shell/compiler.php -- compile"
  end
end
Now checkout this repository onto your production servers.

To setup

Run the following command in the root directory:
cap deploy:setup

To deploy

Run:
cap production deploy

Support for multiple environments

Magento is a bit of a pain in supporting multiple environments (development, testing, production). But it can be easily achieved (you should be using a deployment server as mentioned above).
In /app/etc/ directory you can setup something like:

Config files

  • local.xml.development.example (Example config for dev)
  • local.xml.development (use .gitignore so this file is not checked in)
  • local.xml.staging (For staging environment)
  • local.xml.testing (For unit testing environment)
  • local.xml.production (For production environment)
Use a .gitignore to stop people from committing in local.xml.

The continuous integration server swaps in the correct config file based off the environment (as you can see in the deploy scripts above).

System logging

Magento stores its errors in a variety of places (/var/report/, php ini error log file, apache error log file). When error logs are stored all over the place it takes longer to debug issues, and when something goes wrong, time isn't something you have on your side. If you are running Magento on multiple servers its even harder to debug!

I have built a "system_log" extension which stores every single error message in a system_log MySQL table using delayed writes (so as to have minimal impact on the live site). This database table of error logs is an aggregation of all errors from all of your Magento servers, making it so much easier to debug issues. You could look at pushing these errors to external providers if you don't have the resources in house to have higher spec servers (PaperTrailApp and Loggly are worth a look).

Another helpful feature is a simple cron that runs every minute and emails the software teams if any error messages occur (that are not debug messages). This saves someone checking the system_log table constantly. Because this is an asynchronous service, even if something goes horribly wrong and you trigger thousands of errors, you will only be sent a combined email of errors once a minute (so it wont bring your server down).

If you are interested in this 'system_log' extension send me a message.

DB migrations

You may already know this, but you don't need to be creating SQL files to manage migrations in Magento. For database/SQL migrations setup a directory:

app/code/local/ORGANISATION/MODULE/sql/ORGANISATION_MODULE_setup

Inside this directory create an SQL file to manage running migrations 

eg:
startSetup();

$installer->run("
CREATE TABLE IF NOT EXISTS {$this->getTable('system_log')} (
  `SystemLogId` int(10) NOT NULL AUTO_INCREMENT,
  `Message` text NOT NULL,
  `PriorityName` varchar(100) DEFAULT NULL,
  `PriorityLevel` int(5) DEFAULT NULL,
  `UserIp` varchar(50) DEFAULT NULL,
  `UserHost` varchar(250) DEFAULT NULL,
  `SectionId` int(10) NOT NULL DEFAULT '1',
  `Attachment` text,
  `Created` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`SystemLogId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `system_log_section` (
  `SystemLogSectionId` int(10) NOT NULL AUTO_INCREMENT,
  `Name` varchar(50) NOT NULL,
  PRIMARY KEY (`SystemLogSectionId`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

INSERT INTO `system_log_section` (`SystemLogSectionId`, `Name`)
VALUES
 (1,'Default'),
 (2,'Magmi');
");

$installer->endSetup();


I found very little documentation about these SQL migrations, but nearly all modules use them in the Magento world, so it's fairly easy to find some examples.

File storage, sessions and images

For any system to scale you need to have your data (or file assets in this case) stored in a central location (single server or cluster of servers). Why? Because if you move from a single server environment to a multi server environment if you are storing data like images, session files etc they will only exist on 1 server (not on all of your servers) which can be problematic.

Sessions

In Magento you should use database session storage so your users sessions persist across multiple servers. Within your local.xml file set database storage for sessions

<config>
    <global>
 <session_save>db</session_save>
    </global>
</config>

A good article about performance of different session storage options
http://magebase.com/magento-tutorials/magento-session-storage-which-to-choose-and-why/

Image Assets (product images)

I typically store all product images in the database of Magento (or a central storage platform that isn't the file system) - before you worry about performance, read on. Magento has 'partial' functionality to store images in the database but it is no-where near complete. A team I was working with ended up developing our own extension to enhance the Magento core so we had a more seamless integration with database storage for product images.

If you upload and store images on the filesystem only, with every deployment you will wipe out and delete those images. Unless you store them in a folder outside the Magento directory and symlink them in (but then you need to worry about another type of backup).

Storing all product images in the database gives you a central place where all images are stored (you can copy databases from production to testing or to development) and have everything continue to work very easily, without the need to copy images from different servers. However, most of you are probably thinking about performance, rendering images from the database is seriously slow.

The good news is, Magento automatically writes the images to the filesystem on first request. This means on first request the image may take around 1 second to load, but on every subsequent request it is being read directly from Apache and the filesystem which is very fast with no application overheads.

Go a step further and implement CDN and you have completely offloaded the product image loading process to your CDN servers and they will load within milliseconds.

To implement the above requires a custom Magento extension, so send me a message if you are interested.

CDN (Content Delivery Network)

As mentioned above, with Magento you need to offload as many resources as possible from your servers to make it perform fast. Serving all of your product images from CDN is a great way to do this.

Grab this extension to start:
http://www.magentocommerce.com/magento-connect/onepica-imagecdn-1.html

A team I worked with patched this extension to support Origin-Pull for CDN (this is a far simpler implementation of CDN, which doesn't require your application to push your content to CDN, your CDN pulls the content from your server on first request).

If you want to learn more about this extension send me a message.

Magento Part 3 - MySQL Setup & Performance

Magento Part 1 - Infrastructure & Hosting

<< Back to Magento Performance Tips for Scalability Homepage


Shared Hosting

Getting straight to the point - simply don't use it - ever :-) Magento requires tuning of your MySQL servers, your web servers, caching servers etc.. On a shared hosting environment you don't have access to these settings. Save yourself a lot of pain and setup your own infrastructure.

Recommended Infrastructure Setup

Setup your Magneto environment on AWS (Amazon Web Services). It's extremely cost effective and if you follow the below guidelines you can scale up your servers as your server load increases over time.

Route32 for DNS

Why? Because it is super simple to setup, easy to maintain, has redundancy and is really cheap!

EC2 & Load Balancer for Front-end web servers

You will want to run at least 2x EC2 front-end servers for your Magento website both sitting behind a load balancer.

"Load balancers allow you to run multiple front-end (Apache/PHP) servers for your Magento installation. They are great for performance, redundancy and for creating downtime free releases."

  1. Setup 2x medium instance web servers for your Apache/PHP/Magento codebase.
  2. You will want to use EC2 servers that have at least 2 CPUs (Magento in most cases will perform much better with more CPU power than memory).
  3. Setup the 2 servers identically (look into CloudFormation if you want to automate this).
  4. Now setup 1 load balancer for your 2 front-end servers.
  5. Add your 2 front-end servers to your load balancer (You will need to setup a polling end-point so the load balancers know the server is in service). It could be as simple as having a PHP file in the root directory like below:

    /load-balancer-status.php
    <?php var_dump($_SERVER['REMOTE_ADDR']); ?>

    In your Amazon Load Balancer your health check should look something like:
    HTTP:80/load-balancer-status.php

EC2 & Load Balancer for admin web servers

It is good practice to run your Magento admin on its own servers. This will ensure that your staff and admin users wont affect your front-end website if using Magento admin heavily. It also allows you to add a layer of security to your Magento admin (and lock it down to just your office network).
    1. Setup 2x medium instance admin web servers (these can be identical to your front-end web servers) for your Apache/PHP and magneto codebase.
    2. Again, you will want to use EC2 servers that have at least 2 CPUs.
    3. Setup the 2 servers identically.
    4. Now setup 1 new load balancer for your 2 admin servers.
    5. Add your 2 admin servers to your load balancer (like you did you front-end servers).

    RDS for your database

    You will want to run your database on a separate server to your admin and front-end servers. This will give you much better performance and decouple your application servers from your database (good for scaling).
    1. Setup 1x medium RDS instance for your database.
    2. Import your initial MySQL database dump to this database.
    3. You will want to leverage the power of MySQL replication for your Magento website, so within the RDS configuration of your database above, create a 'read replica'.
    4. This way you can share the load of your MySQL queries over both of your databases (more details about how to configure Magento for read replica in Part 3 - MySQL setup and performance).

    Elastic Cache

    Magento needs cache for it to support even just a few users efficiently. 
    1. Setup an elastic cache instance (Redis), or setup Memcache on your front-end and admin servers.
    2. Configure Magento to use this caching service
    3. More details about how to configure Magento for caching in Part 6 - Magento Caching.

    CloudFront CDN for product images

    Using a CDN allows you to offload a lot of static resources (like product images, CSS, Javascript etc...) to CDN servers all around the world. This will free up your server resources.
    1. Setup a CloudFront distribution within AWS.
    2. This will give you a URL like:
      d1u5cic2xm0cb9.cloudfront.net
      that we can configure later in Magento to serve content from.
      Part 2 explains how to setup CDN within Magento

    SES for transactional emails

    Your Magento website will send out transactional emails (contact form emails, order emails etc...). You may as well keep this on AWS with simplicity.
    1. Setup SES within AWS.
    2. You will need to configure your domain, and allowed senders.
    3. You will also have to apply for production use (which can take up to 48 hours). Make sure you do this a few days before launch.

    Reserved instances

    You can dramatically reduce your AWS costs by purchasing reserved instances. You should purchase 'heavy utilisation' reserved instances for all of your EC2 and RDS instances (your costs will be about 1/3 to 1/2 of the costs of not purchasing reserved instances).

    Part 2 - Prepare for Scalability


    Tuesday, 3 June 2014

    Why every database table should have created/modified columns

    This is a short and sweet post about those crucial 'created and modified' database columns that can be your saviour down the line for any project.

    I'm sure almost everyone has worked on applications where the original developer missed adding a created or modified timestamp column on a database. There are many reasons for it - the developer forgot, ran out of time, or sometimes failed to realise the importance of time stamping every database record.

    The simple rule is:

    "Every table should have a column for "created timestamp" and "modified timestamp".

    There are extensions to this in regards to having an 'owner' who modified and versioning history, but that is a far greater conversation.

    The benefits you will get within your application:
    • You will always have a record of when the record was created.
    • You will always have a record of when the record was last modified.
    • It will dramatically speed up debugging of issues.
    • It provides necessary auditing information for your application.
    • Allows you to implement data archiving practices in the future.
    What can happen if you don't implement:
    • You will have no idea when records were created or modified.
    • Makes debugging issues related to times and external error logs much harder.
    • You can't recover records based off date/times for legal purposes.
    • It will be near impossible to archive data in the future.
    Gotchas:

    • Use a consistent method for timestamps. What this means is don't use native database timestamps in one area of your application (eg: 'NOW()' in SQL), and then use PHP timestamps in another area of your application (eg: new \DateTime()). This can cause inconsistencies if your application logic and database logic are on different servers in different timezones and not configured correctly. Use a single method (either PHP or in SQL) and stick with it.
    Most frameworks (eg: Symfony2, Zend) provide events you can plug into to make implementation much easier. You can even look at creating an abstract class all of your entity classes extend that provide this core functionality.