iRobo.Activity.Smart.Scrapper
Scraps data out of HTML page using multiple methods e.g. (Xpath), then sets the output into a scrapped element data-table.
Input
- Keep Different Length - keep the differences in column lengths rather than ignoring them.
- Max Number of Elements - The number of elements you need to scrap 0 for all.
- Parent XPath - XPath of HTML tag that contain the elements we need to scrap from the site.
- Scraping Tasks – The list of scraping tasks formatted in Json.
- URL – URL of the website we want scrap.
Misc
- DisplayName - Activity header name.
Output
- Scrapped Results - Scrapped Element DataTable.
Activity Designer Controls
- Scarping Tasks Combo box : display the created scraping task that will send to be executed.
- Detect Button Functionality : to detect current active tab URL.
- Add – open build scrap object form: create the scraping task object by filling this controls in the form :-
- Identify Task Tab
- Alias: the name of task and the column name in the output data table.
- URL: the URL of the website we want scrap.
- Parent Xpath: the xpath of HTML tag that contain the elements we need to scrap from the site.
- HTML Tag: select the tag that will be scrapped
- Scrap: identify what we need to scrap from this element (attribute value like src or href ) or inner text or inner HTML
- Scrolling: if the website needs to scroll down to load more elements like YouTube so you need to enable the scrolling option.
- Scraping Exp: open the query expression form which we will use to create expression to identify which selected tag in the parent xpath we want to scrap like ( [@class = "item-name"] ).
- Filtrating Exp: create expression to filter the shape of element we need to scrap this add more accuracy in scraping Exp like ( [contains(@text, "iPhone")] ).
- Paginator: if we need to apply the pagination on scraping item then enable the pagination option.
- Max pages : the number of pages we want to get the scraped element from.
- Xpath : the xpath of the paginator section in the website.
- Next navigator tag: identify the next button in the paginator section if it is <a> or <button> tag.
- is Clickable : if the Next navigator tag is button so you need to checked is clickable checkbox.
- Get link from : if the Next navigator tag is <a> tag then you need select href from the combo box.
- Next paginator expression : open the query expression form and create an expression to specify the next navigator tag like ( [text() = "Next"] ).
- Perform Pre-Actions Tab : if we need to perform action before scrapping like click on button or input text in a field.
- Actions : choose the action we need to perform from the combo box.
- In case of choose click action then we need to add waiting time in the next text box to wait after click.
- In case of choose input text action we need to add the text to be added in the next text box.
- Sensitive: if the input text is password or sensitive text then checked the sensitive check box to hide the content of it.
- On: choose Absolute Xpath: if the element is clear in the website choose search for element: if the element is not clear and to add expression to find it.
- Xpath: add the xpath of the element if the absolute xpath is chosen, or parent xpath if the choose search for element is chosen.
- HTML tag : in case of search for element choose the tag that will be searched for in parent xpath.
- Expression : add expression on the searched element to be more specific.
Note: We can add more than one action and they listed in a list a box with some functionality of edit , clone , delete for each action from the list or clear all of them from the list. |
- Review & Add to list Tab : display the generated task or list formatted in json and a Save Button To save it in Scraping tasks List.
- Clone : Create new task by select existing one has Similar attributes then modifying and save it as new task
- Import : import formatted json Task or list of task to the activity
- Edit : Edit Selected task from the list
- Delete : Delete selected scraping task from the list
- Clear : Clear all the scraping tasks from the list
Steps of Using Smart Scrapper Activity
- Open iRobo Studio, and then open a new Workflow.
- Drag Scrapping Designer Activity to the Sequence.
- Create Tasks using the build scrap object form or import pre-created one.
- Identify the maximum number of elements you need to retrieve and Keep Different Length or not, declare Data table variable and put it in Scrapped Results output Argument.
- Drag write excel range to the workflow and assign the output data table to it then run the workflow and see the result.
Created with the Personal Edition of HelpNDoc: Easy EPub and documentation editor