How to access the latest file in a sharepoint folder using PowerQuery (in about 32 clicks with no code needed)
In a recent post, I showed you how to connect to a folder and combine multiple files on Sharepoint.
This is a similar method but it connects to just one file that meets a specific rule or criteria such as the latest file with a filename beginning with a certain string. A common example of this would be where a system automatically generates a file and saves it in a folder then there is a need to analyse just the latest file.
- Open PowerQuery and use the 'get data' function to access the 'More…' option
- Select the 'Sharepoint Folder' option and click 'Connect'
- In the next window, paste the URL of the sharepoint site (not the folder) that contains the file you're attempting to connect to then click 'OK'
- In the next window, click 'Transform Data'
Note, clicking 'Combine and Transform Data' will initiate the process of combining the files. As we need to apply a transformation to the list before accessing the file, we'll need to work with the metadata from the files (to sort and filter) before loading the contents of a single file
Once loaded, the query preview will display the data seen in the previous window
- Now we'll transform the list to only show a single file before loading its contents. To do this, filter out any unwanted files using the drop-down options. For this example, I'm going to remove all the files that aren't CSV files by filtering the 'Extension' column
- Next I'm going to apply a filter to the 'Name' column to remove filename that contains a dash or the word faulty. This extra step is to demonstrate the method of removing any unnecessary files which you may not need to do if the folder you're accessing contains only the desired files.
Now the list is clean and only contains the files we're interested in
Next is to sort the list in chronological (date) order so the most recent is at the top of the list. Normally this would be done by sorting the Date Created' column. In most cases this would be fine however, in this example the files have all been moved to this folder so the 'Created' dates are all the same. If your files can be sorted by the 'Date Created' column the next steps won't apply (skip to step 14 / section titled 'Sorting the rows into Chronological order')
- In this example, the date is in the filename in the format ddmmyyyy. There are a number of methods to extract this, the easiest method is to use the relatively new 'Column From Examples' which can be initiated from the buttons in the ribbon, under 'Add Column' and then 'From Selection' as we only want to use the 'Name' column to determine the date associated with the file.
Once clicked the following will be displayed
- Ensure only the columns needed to create the additional column are ticket. In this instance, it’s the 'Name' column. Once done it'll look like this
- In the first cell in the Custom row, the value we'd like to return is 01/01/2011 as the date in the filename is the first of January 2011. Once this value is typed, we'll see the rows below are empty. This demonstrates that Power Query has not identified the pattern we need to extract. Hit enter to access the next row where we'll enter the date from the second filename
- Type in the next value (in this example 06/07/2010) and hit enter
After typing the value, the subsequent rows are blank but as we hit enter the values are populated
Now that there are suggested values in the subsequent rows, PowerQuery has shown it is able to identify a rule to apply to each row. A quick visual check of the rows suggests that the function has correctly identified the day, month and year elements.
We can see the suggested function is
Which uses the Text.Middle function to isolate the desired values and then Text.Combine to combine each element to create the desired string
- Double click on the column name to give it a more meaningful title, in this example I'll use 'FileDate' then click 'OK' to confirm the additional column
- Set the column type to Date using the drop down option in the title row
Sorting the rows into Chronological order
- Sort the date in descending order to put the latest value at the top
- Using the quick menu, select 'Keep Top Rows…
- Enter 1 into the following window and click 'ok'
This will return a single row which will be the latest file
- Select the 'Content' and 'Filename' columns, right click on the title row and then select 'Remove Other Columns'
- Click on the word 'Binary' to access the content of the most recent file. This will then display the contents of the latest file
- In this instance, the first row needs to be promoted to the header row using the 'Use First Row as Headers' function.
- Finally, click 'Close and Apply' to load the data
I've just repeated the steps above and it takes around 32 clicks (excluding the extra steps to generate the date column) and around 2 minutes to create. This is all done using the Power Query interface and no code is required.
Comments
Post a Comment