Skip to main content

Use Power Query (M Language) to scrape a web page and output it to a CSV file

When using PowerQuery / M / PowerBI to scrape web pages, it's often useful to output the data to a CSV file. This enables the recording of multiple scrapes (with the time stamp in the filename) and also makes transformation and reloading more efficient.

Below is a piece of M code I recently used to capture a webpage and record its details to a CSV file. This is a simplified piece of code to demonstrate the application, if there are additional transformations that are required, these can be done before the RScript line/step in the code.

Click here to download the file in the this example



let

    Source = Web.BrowserContents("www.bbc.co.uk"),

    

    #"Extracted Table From Html" = Html.Table(Source, {{"Column1", ".module--highlight .media__link"}, {"Column2", ".module--highlight .media__tag"}, {"Column3", ".module--highlight .block-link__overlay-link"}, {"Column4", ".module--highlight .module__title__link.tag"}, {"Column5", ".module--highlight .media__summary"}}, [RowSelector=".module--highlight .media__link"]),


    #"Changed Type" = Table.TransformColumnTypes(#"Extracted Table From Html",{{"Column1", type text}, {"Column2", type text}, {"Column3", type text}, {"Column4", type text}, {"Column5", type text}}),

 

    RScript = R.Execute("write.csv(dataset,   paste0(""C://Temp/bbc "", format(Sys.time(), ""%d-%b-%Y %H.%M""), "".csv"")",[dataset=#"Changed Type"]),

    

    out = RScript

in

    out





Comments

Popular posts from this blog

How to combine multiple files with Power Query (with no VBA and just 10 mouse clicks!)

The need to combine information from multiple files is one that most users of Excel will have come across at some point in the use of Excel. I've personally spent far too many hours aggregating data from multiple files, that are identical in structure, so that I can analyse larger datasets and provide insights into products and processes. For anyone who has also done this and not yet discovered Power Query you'll probably be amazed how simple the process has become. I realise there might be some who will say "just use VBA, its easy once you learn how to code..." and they would be right. The method using Power Query provides a zero code solution that is an evolution of the Excel interface that many will already be familiar with. In this example, I've created a sample file and created a number of duplicates of the file which I've saved in a folder. The folder contains only these files and i'd recommend you do the same if you're looking to try out this pr...

Extracting data from Word (.docx) files into Power Query

Word and Excel don’t usually get along too well so it's no surprise that Power Query isn't directly compatible with its estranged cousin Word either. If you are presented with the need to import data from Word into Power Query you'll be please to hear it is possible however it requires a couple of manual steps to make it work.  The manual steps could fairly easily be completed by a batch file which would automate the process further. Here is the Excel data pasted 'as values' in a Word file which i'll use for the first example Here is the Excel data pasted with 'keep source formatting' which i'll reference a couple of times in the article. Although the steps I've covered below aren't complex, this whole process has some unknowns around it so you may find the result in your instance varies from mine. The Word file I've used contains the contents of a range of excel cells that I deliberately pasted as values into Word to create a test file f...

How to automate the import of all files in a Google Drive folder to PowerBI, now updated, please read the first paragraphs!! (PowerQuery)

Update!!! This method no longer works although there is a new Google Sheets connector for PowerQuery that is currently in beta. If you're using PowerBI, you'll need to enable the preview features to enable it. https://docs.microsoft.com/en-us/power-query/connectors/googlesheets In the interest if demonstrating how it 'did' work, the original post is provided below. If you attempt to replicate this, you'll quickly realise the website doesn't behave as it used to. (here is the original article)  Using PowerQuery to access multiple files within the same folder on a local or network drive is a game-changing feature that will almost certainly save many people hours of effort. This functionality is great if your data exists in a place that is easily accessible but what do you do if your data is somewhere less accessible like Sharepoint, OneDrive or even Google Docs? I have previously connected to data on SharePoint and found it fairly straight forward which raises the...