Traversing Folder Structure

Published in

Level Up Coding

9 min readJan 12, 2023

From Unsplash green leafed tree surrounded by fog during daytime (unsplash.com)

Blogging after a long time. This time a very interesting use case that I had to work on in one of my current projects. So, let’s begin!

We allow the end users to create a directory structure in the web application. A user can log in to our web application. There logged-in user can create folders as needed. Folders could be nested as well. Further, each folder can contain both files and folders.

But the story does not end here. The requirement said that any folder in the hierarchy could be downloaded as a ZIP archive. When the folder is downloaded, it must download all the subfolders and files in that folder. These contents are to be zipped and then delivered to the calling client/front-end application.

To add to the complexity, the files are not present in any local folder or database. Rather they needed to be fetched from another microservice. If the folder is empty, it still needs to be created in the ZIP archive.

Let’s first break it into two components to tackle the above scenario.

Gather the data.
Iterate over the folder structure.
Download the files from the external service.
Create the entries (both files and folders) in the ZIP archive.
Zip the contents and write the byte array to the output stream.

In the above steps, I am going to skip step 2. This is because external service is a block box for me. I have added the step for the sake of completion.

For creating the ZIP archive, I will be using the ZIP Archive class of .NET Core.

By the way, the code already contains ample comments so I will skip the detailed explanation here.

Getting the data

We have our data in the database. We are using ORM (EF Core) to get the data from the database. Below are the entity classes for both the Folder and File.

In this post, I am assuming that the folders have been picked with the root having all the folders in the tree structure. To fetch the data in a tree-like structure, I will be blogging that as well soon.

public class File
{
  // Primary Key
  public int Id { get; set; }
  public string FileName { get; set; }
  public int FolderId { get; set; }

  // To reference the Foreign Key entity
  public Folder Folder { get; set; }
}

public class Folder
{
  // Primary Key
  public int Id { get; set; }

  // Self-join for the nested structure. For the root folders, this Parent is a nullable
  public int? ParentFolderId { get; set; }
  public string FolderName { get; set; }

  // To reference the foreign key relation
  public List<File> Files{ get; set; }
}

Iterate over the folder structure!

The starting point here is what data structure is it and how I store it. Well, I am a fan of the List class of .NET Core. So that’s going to be my go. By the way, the data structure is called Tree. However, I will be representing it with the List class.

Next, there are two methods of traversing the Tree data structure: depth-first and breadth-first.

I chose to travel depth-first. Why? Because I was the one writing the code! Fair Enough?

I will be using the recursive structure. The reason is that any folder can have further folders. Further folders could have successive folders. So, the traversing structure is going to remain the same. Hence recursion!

To traverse:

First, we will pick the root folder. This root folder is always going to be one. So, we will start using the foreach structure.
We iterate over each folder one by one. At every iteration, we hold the folder being iterated in the Stack data structure using the .NET class Stack.
Next, we are going to see if any folder has subfolders or not. If the folder has any subfolder, then this routine (read points 1, and 2) is called again. This way we keep track of all the folders being traversed.
If the folder does not have any subfolders, then ZIP files are created. If there are no files, then an empty folder is created.
Once we have created the leaves (files or empty folders), then we have to go one level up. This is because when traversing down, we were going one level down with each iteration.

To go one level up, I am using the below code:

private void GetOneLevelUp(Stack<string> paths)
{
      if (paths.Count != 0)
      {
            _logger.LogInformation("Level popped: {level}", paths.Pop());
      }
}

The above code is very simple. It just pops the value from the Stack. Recall that Stack is a FIFO (first in first out) data structure.

To traverse the folders, I am using the below code:

/// <summary>
/// This method takes the zip archive to add files to and recursively iterate through the payroll folders in a depth first approach to create folders in a tree like structure.
/// </summary>
/// <param name="zipArchive">ZIP archive to write to</param>
/// <param name="paths">The stack to hold the paths we are iterating</param>
/// <param name="payrollFolders">Payroll folders to iterate on. When the method is recursively called, children of a payroll folder is passed</param>
/// <returns>Does not return anything. Just used to await this method</returns>
public async Task CreateDirectoryStructure(ZipArchive zipArchive, Stack<string> paths, List<Folder> folders)
{
      // Iterate over each of the folder
      foreach (var folder in folders)
      {
            // Add current path to the list
            paths.Push(folder.FolderName);

            _logger.LogInformation("Processing {folder}", folder.FolderName);

            // Check if the payroll folders have their children or not
            if (folder.SubFolders.Any())
            {
                // Because there are children of this payroll folder, we call the method recursively
                await CreateDirectoryStructure(zipArchive, paths, folder.SubFolders);
            }

            // If there are no children, add the files if any
            _logger.LogInformation("{path}", GetIterationPath(paths));

            // Now create entries inside the ZIP archive
            // If there are no files empty directory would be created
            await CreateZipEntries(zipArchive, folder, paths);

            // Because we have iterated over the whole path, reset the parent
            // We have to do this to avoid iterating over one path
            GetOneLevelUp(paths);

            // As we have iterated over whole path return
            if (paths.Count == 0)
                return;
        }
  }

So, when is the time when we have to stop the iteration and return from the recursive loop? When all the paths have been popped out. This is what we do as the last step.

Adding the entries (both files and folders) in the ZIP archive

In the above part, we traversed the whole folder structure. Now we will be creating files and folders inside the ZIP archive.

We are going to use the ZIP archive class that comes with .NET Core 3.0. It contains all the methods and functionalities that we need to finish this user story.

If you have some other programming language background, such as Java, or Python, you can replace this functionality with yours.

Below is the code for creating entries in the ZIP archive. The documentation for the ZIP Archive class can be found here.

Below let me summarize what is happening here:

We get the iteration path. Recall that we were traversing depth-wise. When we get the contents of the stack in .NET Core, the structure is in the reverse direction of the folder path. Hence, we have to reverse the contents of the array. Below is the code to get the iteration path.


/// <summary>
/// Constructs the path from stack. 
/// This path is used to create entries inside the ZIP archive
/// </summary>
/// <param name="paths">Stack to construct path from</param>
/// <returns>The path from the stack</returns>
private static string GetIterationPath(Stack<string> paths)
{
      // Initialize a new array to hold the stack. We have to create a new array to avoid modification of state of stack 
      string[] pathsDirectory = new string[paths.Count];

      // Copy the contents of the stack in the array
      paths.CopyTo(pathsDirectory, 0);

      // Because we are using the stack, it needs to be reversed to get the path directory
      pathsDirectory = pathsDirectory.Reverse().ToArray<string>();

      // Return the path with having directory separator at the end
      return String.Join(Path.DirectorySeparatorChar, pathsDirectory) + Path.DirectorySeparatorChar.ToString();

}

If the folder does not have any files, then we create the folder only and then return from the method. From the requirements perspective, we had to create folders even if no files are present.
Next, we get the bytes from the external service. Since this implementation is beyond the scope of this article, just keep in mind that this method returns the byte array of the file.
Finally, we create the zip entries in the ZIP archive. An entry represents the file or folder inside the ZIP archive. We use the memory stream so that we can deliver the bytes to the consumer.

/// <summary>
/// This method takes the ZIP Archive and the files. Then it adds the files to the ZIP archive on the passed folder path
/// If the list of files passed is empty, this method just creates the folder entry in the ZIP archive
/// </summary>
/// <param name="zipArchive">ZIP Archive to add to</param>
/// <param name="folder">The folder to create entries of. This method also creates the directory if there are no files inside it. Files of this folder are downloaded from the DMS service</param>
/// <param name="folderPath">Folder path is the path inside the ZIP archive to add the files to.</param>
/// <returns></returns>
private async Task CreateZipEntries(ZipArchive zipArchive, Folder folder, Stack<string> paths)
{
    // Get the current path to create files/folder in
    string folderPath = GetIterationPath(paths);

    // If there are no files in the folder and path is not empty, create entry
    // This will create empty directories if no files are present.
    if ((folder.Files == null || !folder.Files.Any()))
    {
        // In case if there are no files in the folder just create the folder
        zipArchive.CreateEntry(folderPath);
        return;
    }

    // Download the files and the get their byte array
    var bytes = await DownloadFilesFromDMS(folder.Files);

    // Take out the entries from the DMS
    // DMS returns ZIP of all the files in a single file
    // So we unzip the file from DMS to pick one file at a time and add to our archive
    // We have to do this since we cannot return zipped archive from DMS inside the folder in our archive
    using (var zipFromDMS = new ZipArchive(new MemoryStream(bytes)))
    {
        // Get all the files from the DMS entry
        var dmsEntries = zipFromDMS.Entries;

        // No need to move forward if there are no entries
        if (dmsEntries == null || !dmsEntries.Any())
        {
            return;
        }

        // Iterate over all the entries one by one
        foreach (var entry in dmsEntries)
        {
            // Create an in memory stream to hold this entry's bytes
            using (var ms = new MemoryStream())
            {
                // Copy the entry's bytes to memory stream
                entry.Open().CopyTo(ms);

                // Create the path inside the ZIP archive to add to it
                string name = folderPath + entry.FullName;

                // Create a new entry on the above created path
                // This entry is the file we take from DMS and add to our ZIP
                var zipEntry = zipArchive.CreateEntry(name);

                // Finally write the above entry with the contents from the DMS entry
                using (BinaryWriter writer = new BinaryWriter(zipEntry.Open()))
                {
                    writer.Write(ms.ToArray());
                }
            }
        }
    }
}

Zipping the contents and extracting the byte array

Now, I don’t know who is going to utilize my API. Its consumers can change from time to time. So how do I ensure that any client is able to read and download the contents? Simple: output the ZIP archive as a byte array.

The byte array is simple. It is just a sequence of data. A byte is the smallest possible memory division supported by probably all the programming languages.

I have added detailed comments in this code piece to help the readers along the way.

// The final ZIP archive bytes
byte[] documentBytes = null;

// Belomemoryry stream contains the bytes of the final ZIP archive
using (var ms = new MemoryStream())
{
    // Create the ZIP archive from the memory stream. We dont leave the file open by passing false in the arguments
    using (var zipArchive = new ZipArchive(ms, ZipArchiveMode.Create, false))
    {
        // Call this method to iterate over all the folders and then create the directories
        // We pass the ZIP archive to add entries or files
        // Pass empty parent since the parent is always empty
        // The paths argument contains the list of the paths that we will be holding the paths
        // This method iterates and add the files to the ZIP archive
        await CreateDirectoryStructure(zipArchive, paths, folder);
    }

    // Take out the bytes from the memory stream and safely close the stream
    documentBytes = ms.ToArray();
}

Conclusion

It is a very common use case to traverse a tree-like structure. While above, I implemented this tree structure for the folders and subfolder's structure, this structure can come for many other scenarios like location hierarchy, family tree, and so on.

Thanks for reading. If you like this content, you can support me by buying me a coffee.