Credit: Data Science Central
Majority of modern companies deal with processes which they want to be automated. This need can be caused by various reasons, in particular, due to the routine, repetitive and boring nature of manual processes. Another shortcoming is that such processes often require a lot of time and human resources; additionally, office processes are prone to input mistakes. As a result, the staff loses their motivation and the companies lose time and money. Everyone wants to build up an effective business that can be achieved by application of modern automation technologies. One of the most promising technologies in this field is Robotic Process Automation (RPA), which relies on constructing agents that can simulate different types of user activities (mouse click, keyboard input, data scraping, etc.) for the routine, mostly Windows-based, tasks implementation. RPA provides many use cases: finance and banking, insurance, telecommunications, healthcare, retail, government, HR, IT and many others. In this post, one of the simplest and popular automation tasks – data scraping – will be considered, which can be interesting for business analytics and web developers.
RPA top instruments
Different RPA instruments from various vendors are presented on the market today. The most known RPA companies are Automation Anywhere, Blue Prism, Daythree Business Services, IntelliCog Technologies, Kofax, Kryon Systems, Pegasystems, QIHAN, Softomotive, Visual Cron, WorkFusion and UiPath. To get the first look on RPA technologies and for an easy start, it is reasonable to choose the vendor who provides software on free or trial terms. These include Visual Cron (45-day free trial), WorkFusion RPA Express (free) and UiPath Community Edition (free). In this blog-post the UiPath Community Edition was chosen because it is one of the most powerful and easy to start RPA software. As a significant advantage, this instrument provides an extensive tutorial basis (video instructions, documentation, and community forum).
Let’s start with simple UiPath automation related to data scraping. As an example, we will parse topic starters from https://www.kaggle.com/discussion and save them to a CSV file. To perform this automation, you have to install the UiPath software, construct a respective automation agent and run it.
UiPath Software installation
To start using UiPath, you should follow the instruction below:
- Go to https://www.uipath.com/, click “GET STARTED” and choose Community Edition.
- Click the “GET COMMUNITY EDITION” button.
- Fill in the form and click the “REQUEST COMMUNITY EDITION” button.
- After that, you will receive an email with a download link.
- Download UiPath.
Implementation of data scraping automation using UiPath
- Launch the UiPath Studio by executing the downloaded file “UiPathStudioSetup.exe”.
- You will see the following window:
- Click “Continue Free”.
- At the left part of the window, choose “Blank” to start a new project.
- Fill in the “Name” field and change the “Location” (if you want), then click “Create”.
At the left of the window, you will see the list of available activities, which can be used by dragging and dropping.
- Choose the Open browser activity, drag and drop it to your project.
In the input field, type https://www.kaggle.com/.
If you want to change the default browser, select this activity and at the right part of the window and choose a browser that you want.
Add the Delay activity. Time delay is needed to let the webpage finish loading.
In the Duration field type 00:00:03 (3 seconds).
Then add a Double click activity.
Click “Indicate on Screen” to indicate the necessary element in the previously opened browser. We want to indicate the “Discussion” element.
Now we need to scrape the topic starters on this page. But before doing this, let’s add one more Delay activity with about 3 seconds (as described above) if you want to see the page that will be scrapped.
Add a Data Scraping activity.
Now click “Indicate on Screen” again and choose the elements you want to scrape in the previously opened browser. To do this, follow the UiPath instructions.
As a result, you will obtain your UiPath project flow similar to depicted below:
When you select Extract Structured Data and Variable tab you will see that ExtractDataTable variable was automatically created.
Now we need to write the scraped information to CSV file.
Add Write CSV activity.
In the “File path” field write the path to the file you want to save (for example, “res.csv”). In the “Data table” field write the name of the variable which was created during the previous step (ExtractDataTable).
Now press the F5 button (or click the Run button).
You have just run your first automation! After this, go to the directory you specified and check your CSV file. The source page which was scraped looked as follows:
The obtained CSV file looks as follows:
Interest to the Robotic Process Automation instruments is growing fast nowadays because they provide handy and functional digital solutions for business processes organization and optimization. Among the huge variety of modern PRA instruments, UiPath Community stands out by its accessibility, convenience, and simplicity. Having implemented a simple data scraping automation, one can come to the conclusion that it is not necessary to be a programmer to perform similar and more complicated tasks. UiPath ecosystem includes software solutions with a user-friendly interface and a variety of learning resources, namely, thorough documentation, a lot of video tutorials and diverse usage examples.