Lesedi Molemi

Introduction

So the idea behind doing this came about when I was interested in knowing how the company I work for turn payroll data imported from a .csv file and make payments.
The concept behind it is simple, instead of making companies enter employess 1-by-1 and creating a never ending list of beneficiaries, they can simply go to whatever accounting software they use. Export a file with all the employee data. Load it up and release a bulk payment at once. Simple.
I could've went to the codebase and made sense of it there, but I decided to spin up my own project and make it work. To my surprise, it was simpler than I thought.

The Approach

I don't yet have a use case for this so I didn't want to hard-code a strict format the file should align with. With the bulk payment feature, you want to have a strict format that clients will stick to so it's much simpler to handle and so we don't collect a bunch of data we don't need for the payment.
With my solution, I wanted it to be fluid so you decide how many columns you want and what data you want to map. The only caveat is that the header has to be in the first row of the sheet.
I have other rules like how I only accept STRING or NUMERIC types and I don't accept empty cells. I wrote up the entire service in an hour so the logic isn't quite refined just yet.
There also 2 approaches I had in mind for how to structure the data, I call refer to the as row-based retrieval and column-based retrieval.

Row-based retrieval

Row-based retrieval is essentially a 2D array a storing the data per row

Column-based retrieval

Column-based retrieval is slightly different to row-based. It would've been an array that held a list of Maps with lists as their value pairs. The reason I would've done it this way is to seperate the headers of the columns with the data.

Downloading the model

The logic

I'm going to skip over the lesser important bits and dive into the logic that might not come as obvious. In order for me to do this I had to make use of the apache.poi library which allows us to interact with excel files.
In the code block below, we are just declaring the variables that will be used to store data. The headerRow variable stores the first row in the sheet which should be made up of the headers. The headers List is going to store the String value of the cells in that header for use later. The rows List is essentially the 2D array that makes up how the data is stored in the row-based retrieval option as shown in an image above.

Row headerRow = sheet.getRow(0);
List<String> headers = new ArrayList<>();
List<List<Object>> rows = new ArrayList<>();

This next block is shows how to loop through the cells in a row to get the string value of a cell. In this case we are getting the values in the first row of the sheet which should be the headers and adding them to the headers list.

for (Cell cell : headerRow) {
    headers.add(cell.getStringCellValue());
}

The code below is the guts of the entire application. We read an Excel sheet starting from the second row (the first row contains headers), extracts cell values for each column, and store the row’s data into a list called rows. We handles different cell types and throw custom exceptions when we encounter invalid formats.
This code is what allows us to extract values from a sheet and create objects that will be stored in memory and returned as json to our frontend.

// Loops through the "data" rows in a sheet (row[1] and beyond)
for (int rowCount = 1; rowCount < sheet.getPhysicalNumberOfRows(); rowCount++) {
    Row row = sheet.getRow(rowCount);
    List<Object> rowData = new ArrayList<>();

    // Loops through the cells in a row
    for (int i = 0; i < headers.size(); i++) {
        Cell cell = row.getCell(i);

        if (cell == null) {
            throw new InvalidCellTypeException("Invalid cell format in column %s".formatted(headers.get(rowData.size())));
        }

        switch (cell.getCellType()) {
            case NUMERIC -> rowData.add(cell.getNumericCellValue());
            case STRING -> rowData.add(cell.getStringCellValue());
            case BOOLEAN ->
                    throw new InvalidCellTypeException("Please remove the boolean value in the '%s' column".formatted(headers.get(rowData.size())));
            default ->
                    throw new InvalidCellTypeException("Invalid cell type in the %s column".formatted(headers.get(rowData.size())));
        }
    }
    rows.add(rowData);
}

Conclusion

To wrap things up, this logic is the first interation and does a good job at getting the data, throwing exceptions when needed (with a detailed response informing us of the column it failing at) and parsing the data. The next step will be writing the logic that will display the data and allow for manipulation by the user that will allow them to view it in whatever sequence they desire.

Breakdown: Mapping any excel file into objects

Introduction

The Approach

Row-based retrieval

Column-based retrieval

The logic

Conclusion