top of page

Advent of code in SPL - 2025 day 9

  • Writer: Gabriel Vasseur
    Gabriel Vasseur
  • 1 hour ago
  • 9 min read

Day 9 is here.


Part 1


The first part is easy. We already know how to enumerate pairs from day 8, so all we have to do is that plus calculate the area.


So as usual we save the challenge data to a csv, add a header line of "x,y" a the top, upload to splunk as a new lookup, and we can easily upgrade to the full challenge data for part 1:


That was easy!


Part 2


Now this is another story. I'm just going to document the various ideas and iterations I went through, just because it's more interesting to see how one stumbles through to a solution than just to feed the final product as if it's obvious. I won't go into as many details explaining the SPL, partly because it would take too long for this post, and partly because by day 9 I expect you're starting to be more comfortable with advanced SPL.


My first instinct was to maintain a lookup for each colour of tiles.

So for instance the red tiles for the example are:


For green tiles, I initially went into quite a naive approach. This is because I overlooked the fact that the red tiles are not listed in just any random order. Instead they are listed like destinations for a turtle path that can only go up and down and left and right. I initially overlooked that. So I thought for green tiles I can start by going line by line and column by column, identifying the 2 furthest apart red tiles with min and max, and joining them. That should give me the green perimeter, and then I can repeat the same approach but looking at the green tiles we have so far instead of the read tiles. (That's bad thinking but that's what I did initially)


But before we go there, I thought we needed a search to visualise what we're doing:


So like I say, start by doing the green perimeter with the naive min & max approach:


Which amazingly is working for this example:


And then do an extra iteration with the green tiles themselves:


Again, that's working in this case:


Note: in real life a tile can't be both green and red, but here I don't mind.


I started to think about what could possibly go wrong and started to craft alternative red tiles samples:


The green perimeter (version 1 of our green search) almost works (with 3 extra green tiles inside the shape):


And the full green search unfortunately create green tiles inside the concave shape:


So we need to rethink. For a long time, I couldn't think of a way to handle the inside green tiles. So my approach was:

  • start with red tiles

  • establish the green perimeter properly

  • any tile outside the green perimeter is blank (easy to know where to start, then can propagate)

  • any tile that hasn't had a colour yet is green


I'm not saying it's a great approach, we'll do better later, but this is where I started.


Let's ignore the sample data:


And play instead with the made-up concave sample set:


Let's also improve our visualisation search so it'll work with all cases:


Notice how we can use the highlighted_cells multivalue to highlight any positions as "O". This will be useful later.


The first step is to establish the green tile perimeter correctly:


Notice the "circular streamstats" code that ensures the path is looped properly.


This works much better than the previous approach!


Then we can start the blank tiles by looking outside the limits:


And from there we can handle concave shapes by propagating blank tiles until they hit green/red tiles.

This enumerates all the positions (not the most elegant approach) and if it's not already coloured and has a blank tile for neighbour, then it becomes blank:


That last search needs to be repeated until it returns 0.


It's quite fun to visualise the progress:


Then we can deduce the inner green tiles:


And only then we have all the data sorted to start answering the question!


Answering the question


The first problem to solve is: Given a pair of red tiles, do the rectangle they define include any blank tiles? Once we have a solution, we'll work on iterating it over all the possible pairs.


We need to enumerate all the tiles in the rectangle and look them up in our blank tile lookup. All tiles? Maybe not. Maybe we can just enumerate the perimeter of the rectangle. (we'll look into the truthfulness of this assumption a bit later)


This is what I ended up with:


This is correct for these highlighted ("O") pair of red tiles:


Cool. Now all we have to do is enumerate all pairs like in part 1 and plug our solution in. We get:


Notice how we try to optimise by ignoring rectangles where one of the corners is a blank tile.


This does look like the correct solution:


That should do it!


Note: There is a weird error caused by the second mvexpand in the subsearch:

However, I couldn't figure it out and it does not affect the search in any way, so I ignored it.


If we redo the whole set of searches for red1, we get the correct answer 24:


And I also came up with a third red tiles sample set that is not only concave but I guess I would say cavernous (you'll see what I mean):


For this starting point, the solution we get is 24 with 3 different options:


So we did it! We solved the problem and we had fun visualising it.


Scale


We kind of did it, but our solution is totally unscalable. Look at the scale of the full challenge data:


If we want to have a perimeter of blank tiles around the existing red tiles, we need to deal with a canvas of 9.3 trillion positions!


This is quite a scalability challenge. Keep in mind that Splunk itself when used normally is very scalable. What we're trying to do here happens entirely on the search head and doesn't benefit from having a cluster of indexers. Also we're completely abusing SPL functions that were not designed to work on these scales.


Our biggest problems:

  • mvrange is limited to 10k (and does not generate an error when the limit is reached!!)

  • mvexpand is not memory efficient and triggers an error when it reaches its limits.


Even if we can work around these limits, 9.3 trillion positions would be huge in a CSV and/or in memory. I did try though. Of course I did.


First I generated a list of x values (convoluted to work around mvrange limits):


And then I tried to build the 9.3 trillion a list of all the positions (Do not run this search!):


There be dragons:

  • this search is difficult to interrupt. Even if you ask for it to end or if the search "auto cancels", splunk will still diligently run the 100000 sub searches one after the other...! I thought one thing that would help is to flush TEMP_Gabs_100k.csv, but it still took days to clear.

  • if you're creating this in an environment with an indexer cluster, you should make sure you have a knowledge bundle exception so that this massive CSV is not included in the knowledge bundle sent to all the indexers.


Again, there is no doubt we are completely abusing the platform here.


I kept looking for optimisations and ways to avoid enumerating positions where possible.


The initial blank tile search is just generating a perimeter, we can improve it quite a bit by going column by column and line by line and blanking all the tiles between the sides and the min/max x/y of the perimeter green tiles. We ran into mvexpand limits quite a bit and had to do batches.


We could also rewrite the blank tile propagator search so that instead of looking at every position one-by-one, it would only look at the existing known blank tiles. An obvious improvement in retrospect.


I also spent a lot of time optimising the answer search.


But the whole approach was doomed.


That's when I went into the overdue "be clever and think beforehand" mode.


Figuring out the blank tiles, and by extension the "inner" green tiles, is killing us. Can we just not have to do it?


Assumption: "there cannot be blank tiles if there are no red tiles within the rectangle (inside the perimeter, excluding the perimeter)"

FALSE! E.g.:

      #-#
#-O   |.|
|.|   |.|
|.|   |.O--#
|.#---#....|
|..........|
#----------#

Key:

  • # red tile

  • O highlighted red tile

  • - and | perimeter green tile

  • . inner green tile


Next assumption: "there cannot be blank tiles if there are no green tiles within the rectangle (inside the perimeter, excluding the perimeter)" That's better, but actually still false:


      #-#
#-O   |.|
|.|   |.|
|.|   |.#--#
|.#---O....|
|..........|
#----------#

Damn it!


These ideas do seem silly in retrospect (even to me now, the past is a foreign country) but I find it interesting to document where the brain goes and gets muddy before eventually finding solid ground.


My solution at small scale relies on the idea that we only need to look at the perimeter of the rectangle formed by the two chosen red tiles and we don't really need to look inside the rectangle. Let's explore that more: Is it always true?


For instance, consider the following example:

O----##------#
|....|##.....|
|....| |.....|
|....#-#.....|
|............|
#------------O

In this case, the rectangle made by the "O" red tiles does not have blank tiles in its perimeter, but it does have blank tiles inside. However, this requires two or more red tiles to be close together. So maybe we need to check if there are no two red tiles next to each other?


Or look at this similar case which does not require two red tiles next to each other:

   #--#
   |..|
   |#-#
   ||
O-#|#-----#
|.||......|
|.|#-#....|
|.|  |....|
|.#--#....|
|.........|
#---------O

I was also concerned to know if the path of red tiles (and the green tiles perimeter) ever crossed itself or not as that would make things really weird:

    #---#
    |...|
#---+---#
|...|
#---#

I think if none of this happens, then my assumption holds. So let's check this with the full data:


And now check the size of the steps:


Let's do a more sophisticated version, that also checks if the path is always alternating between going up/down and left/right:


The total distance should give us the count of green tiles. That's where we noticed there was a problem with the green tiles: we had too few of them! Of course, that blasted mvexpand does not generate an error message when its 10,000 limit is reached!


Time to sort it once and for all with the following macro:


So the new green tiles perimeter search is:


Now let's check if the red tiles path is crossing over itself. I think if it is there will definitely be green tiles with more than 2 neighbours:


Ok so the turtle path is not crossing itself. And is never coming right next to itself either.


Note: I hadn't noticed the number of green tiles was incorrect initially, so here I saw some green tiles with only one neighbour, which should be impossible. It was tough to debug.


This should be a lesson: always validate the data! And don't trust mvrange...


The more we do, the better we learn to think about the problem. Suddenly we crystalized around this question:


How can we distinguish this:

      #-#
#-O   | |
| |   | |
| |   | #--#
| #---O    |
|          |
#----------#

From this:

#---#
|   |
|   #-#
#-O   |
  |   |
  |   |
  #---O

When we just look at the highlighted rectangle, it looks the same:

O   |
|   |
|   |
#---O

We really need to figure out blank VS inner green tiles, there's no way around this.


However, we can do it a much simpler way than what we've done before. All we need is figure out the immediate neighbours of the green perimeter tile:

      #-#
#-O   |.|
|.|   |.|
|.|   |.#--#
|.#---O....|
|.......  .|
|.        .|
|..........|
#----------#

Versus:

#---#
|...|
|. .|
|...#-#
#-O...|
  |. .|
  |...|
  #---O

That way we don't have to keep track of trillions of green or blank tiles.


When walking along the green perimeter, in any one direction, the left-hand neighbour tile is always one colour (e.g. green) and the right-hand tile is always the opposite (e.g. blank). That's the key! What we need first is to figure out in which direction we're turning, and therefore which side is which colour:


If the total total_angle is 360, we are globally turning clockwise, so as we follow the perimeter, anything on our left-hand side is green and anything on our right is blank.


Ok, so we're onto something. Let's go back to our cavernous example:


And get the green perimeter:


Now if we extend our stepcheck search we can start making blank tiles:


The purpose of this first search is to workout what needs to be expanded. We split it there to avoid running out of memory for the next step when running on the whole data. The next step is simply:


Very promising.


Note: a lot of blank tiles are missing, but my hope is that they are not needed to invalidate the incorrect red tile pairs. My understanding is that it should always be ok.


We can now run our answer search:


It works!


Full challenge data


Let's now run on the full challenge data. Note: because of the scale, we won't bother with screenshots.


Now the problem we have is that the answer search is very slow. We can make it 4 times faster with the following approach.


First we add an extra blank tiles step to make a lookup with x and y separate:


Then we modify our answer search so that instead of enumerating each position and looking them up in the blank tiles lookup, we lookup either the line or column and check if any of the blank tiles is within range:


That's better but this search still took 16 hours! It did give us the right solution though.


When running such long search, make sure to extend its lifetime by editing its job settings. This will give you a link you can go back to the next day...!


The next day:


We got the correct answer!

Comments


©2021 by Gabriel Vasseur. Proudly created with Wix.com

bottom of page