Last week, I reviewed a seemingly simple patch to an Apache Crunch pipeline. Functional details don't matter, but the pipeline processes loads of hits from HTTP proxies. Hits have a timestamp and the patch aims to enrich every hit with some metadata indexed by user and business date (ie. we query a repository with a composite key, part of it is the date formatted as a
Functional requirements are clear, the code is simple and the repository is more than fast enough to not affect the performance profile of the pipeline
However, while QA'ing the patch I noticed an unexpected slowdown. Since our end-to-end tests are configured to always profile a few map & reduce tasks with HPROF, it was easy to generate a CPU Flame Graph which showed that the following line accounts for ~25% of the on CPU time:
String date = localDate.format(DateTimeFormatter.BASIC_ISO_DATE);
According to its Javadoc
BASIC_ISO_DATE is exactly what we should use to format a
LocalDate as a
The ISO date formatter that formats or parses a date without an offset, such as '20111203'.
This returns an immutable formatter capable of formatting and parsing the ISO-8601 basic local date format. The format consists of:
- Four digits for the year. Only years in the range 0000 to 9999 are supported.
- Two digits for the month-of-year. This is pre-padded by zero to ensure two digits.
- Two digits for the day-of-month. This is pre-padded by zero to ensure two digits.
HPROF is far from …