Skip to main content

How to access the latest file in a local folder using PowerQuery (in about 32 clicks with no code needed)

In a recent post, I showed you how to connect to multiple files in a folder.

This is a similar method but it connects to just one file that meets a specific rule or criteria such as the latest file with a filename beginning with a certain string. A common example of this would be where a system automatically generates a file and saves it in a folder then there is a need to analyse just the latest file.

1. Open PowerQuery and use the 'get data' function to access the 'More…' option

2. Select the 'Folder' option and click 'Connect'

3. In the next window, click 'browse' to locate the folder that contains the file you're attempting to connect to then click 'OK'

4. In the next window, click 'Transform Data'

Note, clicking 'Combine and Transform Data' will initiate the process of combining the files. As we need to apply a transformation to the list before accessing the file, we'll need to work with the metadata from the files (to sort and filter) before loading the contents of a single file

Once loaded, the query preview will display the data seen in the previous window

5. Now we'll transform the list to only show a single file before loading its contents. To do this, filter out any unwanted files using the drop-down options. For this example, I'm going to remove all the files that aren't CSV files by filtering the 'Extension' column

6. Next I'm going to apply a filter to the 'Name' column to remove filename that contains a dash or the word faulty. This extra step is to demonstrate the method of removing any unnecessary files which you may not need to do if the folder you're accessing contains only the desired files.

Now the list is clean and only contains the files we're interested in

Next is to sort the list in chronological (date) order so the most recent is at the top of the list. Normally this would be done by sorting the Date Created' column. In most cases this would be fine however, in this example the files have all been moved to this folder so the 'Created' dates are all the same. If your files can be sorted by the 'Date Created' column the next steps won't apply (skip to step 14 / section titled 'Sorting the rows into Chronological order')

7. In this example, the date is in the filename in the format ddmmyyyy. There are a number of methods to extract this, the easiest method is to use the relatively new 'Column From Examples' which can be initiated from the buttons in the ribbon, under 'Add Column' and then 'From Selection' as we only want to use the 'Name' column to determine the date associated with the file.

Once clicked the following will be displayed

8. Tick the column that you would like to use to derive the date associated with the file. In this instance, it’s the 'Name' column. Once done it'll look like this

9. In the first cell in the Custom row, the value we'd like to return is 01/01/2011 as the date in the filename is the first of January 2011. Once this value is typed, we'll see the rows below are empty. This demonstrates that Power Query has not identified the pattern we need to extract. Hit enter to access the next row where we'll enter the date from the second filename

10. Type in the next value (in this example 05/01/2020) and hit enter

After typing the value, the subsequent rows are blank but as we hit enter the values are populated

Now that there are suggested values in the subsequent rows, PowerQuery has shown it is able to identify a rule to apply to each row. A quick visual check of the rows suggests that the function hasn’t correctly identified the month element of the date as seen here on row 3.

This is also apparent when we look at the function applied to each row. We can see the month is hardcoded as /01/. When we look at the first two values entered, this is logical because the month for each row is the same.

11. Enter the next (3rd) value in the 'Custom' column and hit enter.

As we hit enter, the suggested values for the subsequent rows changes as does the transformation formula in the 'Add Columns From Examples' section

We can see the suggested function is 

    'Text.Combine({Text.Middle([Name], 10, 2), "/", Text.Middle([Name], 12, 2), "/", Text.Middle([Name], 14, 4)})' 

Which uses the Text.Middle function to isolate the desired values and then Text.Combine to combine each element to create the desired string

12. Double click on the column name to give it a more meaningful title, in this example I'll use 'FileDate' then click 'OK' to confirm the additional column

13. Set the column type to Date using the drop down option in the title row

Sorting the rows into Chronological order

14. Sort the date in descending order to put the latest value at the top

15. Using the quick menu, select 'Keep Top Rows…

16. Enter 1 into the following window and click 'ok'

This will return a single row which will be the latest file

17. Select the 'Content' and 'Filename' columns, right click on the title row and then select 'Remove Other Columns'

18. Click on the word 'Binary' to access the content of the most recent file. This will then display the contents of the latest file

19. Click 'Close and Apply' to load the data

I've just repeated the steps above and it takes around 32 clicks (excluding the extra steps to generate the date column) and around 2 minutes to create. This is all done using the Power Query interface and no code is required.

I'd love to hear how this has helped you, pop a comment below or get in touch at eddie.r.thomas@gmail.com




Comments

Popular posts from this blog

How to combine multiple files with Power Query (with no VBA and just 10 mouse clicks!)

The need to combine information from multiple files is one that most users of Excel will have come across at some point in the use of Excel. I've personally spent far too many hours aggregating data from multiple files, that are identical in structure, so that I can analyse larger datasets and provide insights into products and processes. For anyone who has also done this and not yet discovered Power Query you'll probably be amazed how simple the process has become. I realise there might be some who will say "just use VBA, its easy once you learn how to code..." and they would be right. The method using Power Query provides a zero code solution that is an evolution of the Excel interface that many will already be familiar with. In this example, I've created a sample file and created a number of duplicates of the file which I've saved in a folder. The folder contains only these files and i'd recommend you do the same if you're looking to try out this pr

Extracting data from Word (.docx) files into Power Query

Word and Excel don’t usually get along too well so it's no surprise that Power Query isn't directly compatible with its estranged cousin Word either. If you are presented with the need to import data from Word into Power Query you'll be please to hear it is possible however it requires a couple of manual steps to make it work.  The manual steps could fairly easily be completed by a batch file which would automate the process further. Here is the Excel data pasted 'as values' in a Word file which i'll use for the first example Here is the Excel data pasted with 'keep source formatting' which i'll reference a couple of times in the article. Although the steps I've covered below aren't complex, this whole process has some unknowns around it so you may find the result in your instance varies from mine. The Word file I've used contains the contents of a range of excel cells that I deliberately pasted as values into Word to create a test file f

How to automate the import of all files in a Google Drive folder to PowerBI, now updated, please read the first paragraphs!! (PowerQuery)

Update!!! This method no longer works although there is a new Google Sheets connector for PowerQuery that is currently in beta. If you're using PowerBI, you'll need to enable the preview features to enable it. https://docs.microsoft.com/en-us/power-query/connectors/googlesheets In the interest if demonstrating how it 'did' work, the original post is provided below. If you attempt to replicate this, you'll quickly realise the website doesn't behave as it used to. (here is the original article)  Using PowerQuery to access multiple files within the same folder on a local or network drive is a game-changing feature that will almost certainly save many people hours of effort. This functionality is great if your data exists in a place that is easily accessible but what do you do if your data is somewhere less accessible like Sharepoint, OneDrive or even Google Docs? I have previously connected to data on SharePoint and found it fairly straight forward which raises the