Server Developer, 2021 yearly Retrospection

The year 2021 went by fast.
I sorted out three things that I remember this year and organized them.

Batch Jobs

Batch jobs are essential for server developers. This year, there were many changes in our service operations, and I ended up designing and developing a lot of batch jobs for more efficient management.
Rather than using any specific advanced technology, I feel that the most valuable thing I learned this year was improving design.

Here are the lessons I learned from working on batch jobs:

1. Split Deployments Whenever Possible

Deployments in our company is a bit cumbersome. While it varies by module, the processes of verification and approval make deployments tiring and uninteresting.
Because of this, when multiple changes were deployed at once, it became challenging to pinpoint where issues originated. (Though this is something that might need a change in our deployment system itself.)

2. Ensure Migration Batches Are Rollback-Friendly

While working on many migration batches, I occasionally became overconfident with simple migrations and neglected the rollback process. However, the thought of not being able to roll back after deployment is terrifying.
Since migrations handle user data, it’s crucial to clearly define the pre-check and post-check steps before execution.

3. Design for Scalability

Most of the batches we manage have fixed deadlines.
To meet deadlines or adjust resource usage (e.g., CPU or the number of instances), scalability of batch workers (instances or threads) is critical.
For example, if you need to add instances but can’t or must restart everything, it’s inefficient.

It’s helpful to track metrics like TPS (Transactions Per Second) and estimated batch completion time to ensure deadlines are met.

4. Consider Resumability and Idempotency

Batch jobs often need to be paused or restarted for various reasons.
For instance, an instance might go down, memory issues may require changing instance types, or DB throttling might force you to shuffle input data and retry.
In such cases, batch jobs must be designed with idempotency in mind to handle repeated processing gracefully.
Idempotency here means not only that repeated tasks won’t cause issues but also that tasks already processed can be flagged to avoid unnecessary reprocessing.

5. Confirm the Data Flow

We currently operate across five global regions.
For batch jobs involving file transfers, binary data may cross regions. Efficiently managing this binary flow is critical.
Once, despite having a well-designed system, we overlooked binary flow optimization, which significantly hurt performance and required redevelopment.

Minimizing code duplication is a developer’s mission.
For batch jobs with shared life cycles, separating code can lead to significant problems.
For example, this year, a new feature was deployed on our WAS, but the batch didn’t include the corresponding code update, which almost resulted in user data loss.

Design

As mentioned earlier, I learned a lot about design in 2021.
Rather than listing specific lessons, I focused on studying and applying DDD (Domain-Driven Design), which noticeably improved my design skills.

1. The More You Learn DDD, the Harder It Feels

Previously, I thought I had designed modules following DDD principles. However, after studying more, I realized there were many areas for improvement.
While I feel I’m getting better, I still find parts of DDD unclear.

2. Include Statistics in the Design

When extracting requirements for design, I didn’t include statistics as part of the requirements.
This resulted in inconvenient workarounds, like analyzing logs or creating additional batch jobs, every time statistics were needed.
Services should be designed to quickly and easily provide relevant statistics.

3. Use First-Class Objects

We have older modules developed before our team took over, and they’re often hard to read due to excessive use of structures like HashMap<String, Object>.
Worse, these objects sometimes contain nested HashMaps, making them extremely confusing.
By introducing wrapper classes (e.g., using an ID class instead of plain strings), we reduced errors and improved clarity.

Database

This year involved a lot of database-related work.

1. Database Consolidation

One module stored a single entity across two databases.
While having interfaces that support multiple databases is good for flexibility, using multiple databases without a clear purpose should be avoided.
For example, using Cassandra for pages and DynamoDB for comments might make sense. However, we were storing the same page data in both Cassandra and DynamoDB (due to historical decisions).
Consolidating databases was challenging but greatly improved efficiency in development, code reviews, and testing.

2. DynamoDB Table Consolidation

We merged multiple DynamoDB tables that were previously split unnecessarily (e.g., separate tables for page main data and page metadata).
This brought significant efficiency in development, code reviews, and testing, and even reduced elapsed time by half.
This reminded me of the saying, “The database is the most critical factor for service response time.”

3. DynamoDB Blob Limit Issues

I gained a deeper understanding of DynamoDB this year.
For example, creating LSI (Local Secondary Indexes) can lead to issues if not handled carefully.
For more details, refer to DynamoSizeLimitException.

Looking back, 2021 felt less exciting than 2020 in terms of the work I got to do.
While I feel like I accomplished a lot, there are still tasks that weigh on me.

For 2022, I hope to further improve my design skills and explore and master new technologies.