I do a lot of Code Reviews, and without proper automation of most of the low level items that you are usually "remarking" to colleagues it is a frustrating experience for everyone involved and takes more time that needed.
My natural tendency is towards action over planning, so I tend to find arguments over coding style are a waste of time if sufficient standards exist that you can just adopt. In my opinion this is is something team leads and managers need to rule on to avoid that their teams become unproductive in little coding-standard and style discussions.
Choosing a set of strict rules is very important and enforcing them helps reducing time spent on unimportant details during code review and allows more junior developers to prepare their changes in a way that most feedback can be automated.
Coding Styles alone are not enough in most cases. There should also be rules with respect to what type of language constructs and building blocks are preferred over others.
Equipped with a fixed ruleset, you can automate the low-level parts of code review in such a way that every developer can perform it themselves and fix violations before you even get to see the code in the first place.
In this post I give an (probably incomplete) overview of rules and patterns we follow for Tideways code base.
The first thing I put in place when Tideways hired its first engineer was PHP Code Sniffer with a slightly adopted Doctrine Standard (PSR-2 + a lot more), coupled with static analysis using Psalm. Combined with Github Checks Run API and Scotts diff filter tool all the violations are shown directly within Github commits and Pull Requests.
Using the phpcbf
command and a lot of manual work we are down to zero
violations. If you start using a standard from the beginning this won't happen,
but strict enforcement of coding standard is not something I was used to
before.
Violation check runs on Github include a message with a command to copy paste and auto-fix violations if available for PHP Code Sniffer, ESLint and TSLint.
We also have PHPStorm configurations to integrate each tool into the IDE.
In my opinion PSR-2 is a good first step, but it leaves a lot of details open that the Doctrine Coding Standard locks further down. A lot those rules are very strict and we had to disable them on our code base. If you are interested our phpcs rule definitions are in this Gist.
Besides "simple" code style checks, we have some clean code guidelines that we don't check with phpcs, because they are not black and white but require some deliberation.
For Javascript we enforce the strict defaults of ESLint/TSLint and for Go the default coding standard that `go fmt` produces.
We use Psalm for static code analysis. It automatically tries to find out all types of all variables in your application and then checks hundrets of different rules that might indicate bugs or improve the design of your application.
While type safety is awesome, without generics forcing Psalm or PHP-Stan at the highest level leads to a lot of clutter and boilerplate code that is only necessary to make the tool happy.
This is why I recommend to only fail the build on the obvious rules and leave the other ones as Warnings for the developer visible from the IDE.
We use a lenient psalm.xml and the Baseline feature to get down to zero errors. Every error fails the build.
A lot of accidental complexity in code comes from exposing implementation details to the outer layers of your application. This is mostly due to function arguments or return values that expose the implementation details.
This concept is called Information Hiding.
When adding arguments or return values think about the client calling the function:
- Do they only get the information required to work, or does it return additional information (extra array keys, objects) that are unncessary?
- Does a function argument expose the internal implementation of a datastructure? Can you turn it into a simpler input that is converted to the desired variable internally?
This is extremely difficult skill to master and takes a lot of practice.
One of the most influential blog posts on me has been To new or not to new by Misko Hevery.
It clearly defines which types of objects can be constructed with new in place (data objects, entities, value objects) and which ones should also be injected and centrally constructed by a factory object or a dependency injection container.
Blocks of statements and blocks of declarations usually get separated by an empty line, to make a very visual distinction. Also statements with a different context tend to be split by empty lines as well. See this example from one our controllers, where each statement and declaration is in its own paragraph.
<?php
public function deleteAction(PageContext $context, $serverName)
{
$server = $this->findServer($serverName, $context);
$this->serverRepository->remove($server);
$this->jobQueue->queue(new BalanceServerTask(['applicationId' => $context->application->getId()]));
return $context->redirectRoute('XhprofProfilerBundle.Server.environment');
}
I would define paragraphs are statements or declarations that belong to each other. This concept closely follows paragraphs in natural languages and is easy to grasp and follow.
Code almost never needs the else keyword. It can be written in a way to use early exits with return or continue instead. This following code could have been written with an else. This rule keeps the main/happy path of the code on the main level of the method.
<?php
public function formatQueryString(string $query): string
{
if (strlen($query) === 0) {
return '';
}
parse_str($query, $queryArgs);
$queryArgs = $this->truncateArrayValues($queryArgs);
return json_encode($queryArgs, JSON_PRETTY_PRINT);
}
For loops we continue to the next iteration for early exits:
<?php
private function parseErrorTrace(string $trace, $removeArguments = false)
{
$traceResult = [];
$parts = explode("\n", $trace);
foreach ($parts as $line) {
if (strpos($line, '{main}') !== false) {
continue;
}
if (strpos($line, '#') !== 0) {
continue;
}
// more
}
}
Another rule I try to follow is "`only one level of indentation per method <https://williamdurand.fr/2013/06/03/object-calisthenics/#1-only-one-level-of-indentation-per-method>`_" from the Object Calisthenics rulebook.
Guilherme from the Doctrine team introduced me to Object Calisthenics many years ago and I found it a very good guidestick to improve my code without having to think too much.
The deleteAction
in the controller example above inhibits this rule, having
no additional level of indentation. This second example shows a controller
with one level of indentation, using the "Guard clause" pattern instead of
nested conditions (See the refactoring)
<?php
public function process(Task $task)
{
$application = $this->organizationRepository->findApplication($task->applicationId);
if (!$application) {
return;
}
$retentionDays = $application->getRetentionDays();
$daysDiff = (new \DateTime('now'))->diff($task->date)->days;
if ($daysDiff > $retentionDays) {
return;
}
$this->historicalDataAction->compute($task->applicationId, $task->date, $task->overwrite);
}
When methods use loops two levels of indention are ok, if the second level is used only for guard clauses.
<?php
$lastProductionCommunication = null;
foreach ($servers as $server) {
if ($server->getEnvironment() !== 'production') {
continue;
}
if (!$server->getLastCommunication()) {
continue;
}
if ($lastProductionCommunication !== null && $lastProductionCommunication > $date)
continue;
}
$lastProductionCommunication = $date;
}
It is not easy to follow this rule to the dot, and we do have a lot of code in Tideways that doesn't follow this rule as we don't enforce it. But I view the violation of the one level rule as an automtaic, immediate signal of both complexity and potential for refactoring.
This rule is easier in PHP than in Go (by design). As we use both languages, we force ourselves to write out what a variable, class, constant or function is doing, and avoid abbreviations.
func (previous PreviousSpans) GetLastIndex(category, summary string) int {
if entry, ok := previous[category]; ok {
if entry.Summary == summary {
return entry.Index
}
}
return -1
}
Certain names still get abbreviated though, for example i, idx, j
and other
common ones.
We still work with a lot of arrays, because PHP makes it much easier to use
them than complex types (objects). The risk of using arrays is not knowing
if a key exists, and code being littered with if (isset($data['key'])) {
statements all over the place. It is much better to use the null coalesce
operator ??
to define default values for keys that you don't know exist
on an array:
<?php
$hostName = $serverNames[$currentOccurrance['sId']] ?? 'unknown';
If something doesn't exist it is NULL or false right? Wrong, if you care about simplicity of your code.
- If a method is supposed to return an object but can't, it may be better to throw an exception.
- If an array is empty, then it is empty, not null or false. Iterating over an empty array can correctly do nothing.
I prefer methods written in a way, where they either only handle the default (happy path) by throwing exceptions otherwise, or by defining all variables in a way where even the "unhappy/error" paths can run successfully through the happy code path and yield the exactly right result.
PHP allows this, but it can lead to very confusing code if you re-use the same variable with different types in a method or function. Avoid at all costs.
<?php
public function deleteAction(PageContext $context, string $serverName)
{
$server = $this->findServer($serverName, $context);
In the previous block the argument $serverName
could easily have also be
called $server
and therefore be both a string and an object of type
Server
within different parts of the method.
PHP has no way to typehint the items of collections (for example arrays, iterators).
You either typehint array
, \Doctrine\Common\Collections\Collection
or
any other class in your methods, with no way to specify type of the key or value
once you start iterating with foreach
.
There are various docblock supported styles, to define these types, which I recommend using on class properties and return values such as:
<?php
class XhprofReport extends Struct
{
/** @var array<string> \*/
public $namespaces = [];
}
But to help both PHPStorm and Psalm (our static code analyser) to understand
items of a collection, we use assert()
with instanceof, if we can't
provide the types ourselves on return values where we cannot define the
typehint, for example when using Doctrine repositories:
<?php
$organizations = $organizationRepository->findAll();
foreach ($organizations as $organization) {
assert($organization instanceof Organization);
// ..
}
To avoids typical complex array problems such as not knowing what keys exist or
what types values have, arrays should always be wrapped in objects extending
from a Struct
base class.
The struct base class maps an array to properties of the class in the constructor:
<?php
abstract class Struct
{
public function __construct(array $data = [])
{
foreach ($data as $property => $value) {
if (!property_exists($this, $property)) {
throw new \RuntimeException();
}
$this->$property = $value;
}
}
}
Often data in Tideways is best represented in domain objects that doesn't fit the entity or database model 100%. In those cases we manually map the database to the desired objects using this Struct pattern for simplicity.
Take this ServerConfiguration
struct which is a combination of data from
three Doctrine entities and is assembled in a repository from custom SQL.
<?php
class ServerConfiguration extends Struct
{
/** @var int */
public $id;
/** @var string */
public $environment;
/** @var int */
public $tracesPerMinute;
/** @var bool */
public $disabled;
/** @var bool */
public $newlyCreated = false;
/*\*/
}
I am really looking forward when we deploy PHP 7.4 with typed properties where this pattern will become much more powerful and will allow us to write a "mini orm" for mapping SQL results to typed objects.
You can read more about Struct objects on the Qafoo blog and look into Kore's DataObject package that implements this pattern in a reusable Composer package.