iRobo.Activity.Smart.Scrapper

Scraps data out of HTML page using multiple methods e.g. (Xpath), then sets the output into a scrapped element data-table.        


                                                                                                                                                               

Properties

Input

  • Keep Different Length - keep the differences in column lengths rather than ignoring them.
  • Max Number of Elements - The number of elements you need to scrap 0 for all.
  • Parent XPath - XPath of HTML tag that contain the elements we need to scrap from the site.
  • Scraping Tasks The list of scraping tasks formatted in Json.
  • URL URL of the website we want scrap.


Misc

  • DisplayName -  Activity header name.


Output

  • Scrapped Results - Scrapped Element DataTable.


Activity Designer Controls

  • Scarping Tasks Combo box : display the created scraping task that will send to be executed. 
  • Detect Button Functionality : to detect current active tab URL
  • Add – open build scrap object form: create the scraping task object by filling this controls in the form :-


    • Identify Task Tab


      1. Alias: the name of task and the column name in the output data table.
      2. URL: the URL of the website we want scrap.
      3. Parent Xpath:  the xpath of HTML tag that contain the elements we need to scrap from the site.
      4. HTML Tag: select the tag that will be scrapped 
      5. Scrap: identify what we need to scrap from this element (attribute value like src or href ) or inner text or inner HTML
      6. Scrolling: if the website needs to scroll down to load more elements like YouTube so you need to enable the scrolling option.
      7. Scraping Exp: open the query expression form which we will use to create expression to identify which selected tag in the parent xpath we want to scrap like ( [@class = "item-name"] ).
      8. Filtrating Exp: create expression to filter the shape of element we need to scrap this add more accuracy in scraping Exp like ( [contains(@text, "iPhone")] ).
      9. Paginator: if we need to apply the pagination on scraping item then enable the pagination option.
      10. Max pages : the number of pages we want to get the scraped element from.
      11. Xpath : the xpath of the paginator section in the website.
      12. Next navigator tag: identify the next button in the paginator section if it is <a> or <button> tag.
      13. is Clickable : if the Next navigator tag is button so you need to checked is clickable checkbox.
      14. Get link from : if the Next navigator tag is <a> tag then you need select href from the combo box.
      15. Next paginator expression : open the query expression form and create an expression to specify the next navigator tag like ( [text() = "Next"] ).


    • Perform Pre-Actions Tab : if we need to perform action before scrapping like click on button or input text in a field.
      1. Actions : choose the action we need to perform from the combo box.
        • In case of choose click action then we need to add waiting time in the next text box to wait after click.
        • In case of choose input text action we need to add the text to be added in the next text box.
      1. Sensitive: if the input text is password or sensitive text then checked the sensitive check box to hide the content of it.
      2. On: choose Absolute Xpath: if the element is clear in the website choose search for element: if the element is not clear and to add expression to find it.
      3. Xpath: add the xpath of the element if the absolute xpath is chosen, or parent xpath if the choose search for element is chosen.
      4. HTML tag : in case of search for element choose the tag that will be searched for in parent xpath.
      5. Expression : add expression on the searched element to be more specific.

     Note: 

        We can add more than one action and they listed in a list a box with some functionality of edit , clone , delete for each action from the list or clear all of them from the list.


    • Review & Add to list Tab : display the generated task or list formatted in json and a Save Button To save it in Scraping tasks List.
      1. Clone : Create new task by select existing one has Similar  attributes then modifying and save it as new task  
      2. Import : import formatted json Task or list of task to the activity 
      3. Edit : Edit Selected task from the list 
      4. Delete :  Delete selected scraping task from the list 
      5. Clear : Clear all the scraping tasks from the list


Steps of Using Smart Scrapper Activity   

  1. Open iRobo Studio, and then open a new Workflow.
  2. Drag Scrapping Designer Activity to the Sequence.
  3. Create Tasks using the build scrap object form or import pre-created one.
  4. Identify the maximum number of elements you need to retrieve and Keep Different Length or not, declare Data table variable and put it in Scrapped Results output Argument.
  5. Drag write excel range to the workflow and assign the output data table to it then run the workflow and see the result.

Created with the Personal Edition of HelpNDoc: Easy EPub and documentation editor