• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Help with Web scraping script

darius12

New Member
Hello,

So i am not to experimented with VBA or HTML, but i am a pretty technical as i am a electrical technician by profession. I am trying my luck with web scraping to help in my endeavors of making something extra on the side.
The issue with my code is that it does not output anything but it does not give me any errors.


Code:
Sub TutorailsPoint()
Dim objIE As InternetExplorer

Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
Dim x As Long
Set objIE = New InternetExplorer

objIE.Visible = True
objIE.Navigate "https://www.unibet.ro/betting#filter/all/all/all/all/in-play"
    'wait for page to load
    Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
x = 2



Set topics = html.getElementsByClassName("KambiBC-list-view KambiBC-event__list KambiBC-js-event-list KambiBC-list-view__column")
For Each posts In topics
    For Each topic In posts.getClassName("KambiBC-event-item")
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic
Next posts
End Sub

If anyone could point me in the right direction i would be nice.
Thank you
 
I understand that i need to get to the container the contains the names of the participants

so i tried :
Code:
Set topics = html.getElementsByClassName("KambiBC-event-participants")
For Each posts In topics
    For Each topic In posts.getElementsByClassName("KambiBC-event-participants__name")
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic
Next posts


But again nothing
 
Looks like you are mixing up two separate code in your code.

You start off by navigating objIE (Internet Explorer). But then you suddenly jump to MSXML2 (http variable). Since you have not sent the initial request through MSXML2, you are not getting any result.

Refer to one of many examples of web scraping codes found in the forum to learn how to use MSXML2.
 
Indeed you are right. That is because in the example that i found with MSXML2 i did not see the line that made the browser wait for the web page to load as i was getting the error:"The data necessary to complete this action is not yet available".
So i tried to mix it with the do while loop from the html lib. example.

This is the MSXML2 code:

Code:
Sub TutorailsPoint()
Const URL = "https://www.unibet.ro/betting#filter/all/all/all/all/in-play"
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
Dim x As Long

x = 2

http.Open "GET", URL, True
http.send
html.body.innerHTML = http.responseText

Set topics = html.getElementsByClassName("KambiBC-event-item__participants-container")
For Each posts In topics
    For Each topic In posts.getElementsByTagName("KambiBC-event-participants__name")
        Cells(x, 1) = topic.innerText
        x = x + 1
    Next topic
Next posts
End Sub


and this is the Html lib code:
Code:
Sub useClassnames()
    Dim lists As IHTMLElementCollection
    Dim anchorElements As IHTMLElementCollection
    Dim ulElement As HTMLUListElement
    Dim liElement As HTMLLIElement
    Dim row As Long
    Dim ie As InternetExplorer

    Set ie = New InternetExplorer

    With ie
        .navigate "https://www.unibet.ro/betting#filter/all/all/all/all/in-play"
        .Visible = True

        Do While ie.readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With

    Set lists = ie.document.getElementsByClassName("KambiBC-event-item__participants-container")

    row = 1

    For Each ulElement In lists
        For Each liElement In ulElement.getElementsByClassName("KambiBC-event-participants")
            Set anchorElements = liElement.getElementsByClassName("KambiBC-event-participants__name")

            If anchorElements.Length > 0 Then
                Cells(row, 1) = anchorElements.Item(0).innerText
                row = row + 1
            End If
        Next liElement
    Next ulElement
End Sub


What is strange with this last code is that it is working, as it is doing what i wanted it to do, but only if i go trough it line by line, with F8. Otherwise it won't output anything. That is a real head scratcher for me.
 
1. You won't be able to scrape the info as is using MSXML2 since fields are populated using Java script. You can use following URL to get json response.

Code:
https://e2-api.kambi.com/offering/api/v3/ubro/listView/all/all/all/all/in-play.json?lang=ro_RO&market=RO&client_id=2&channel_id=1&ncid=1496076045203&categoryGroup=COMBINED&displayDefault=true

You will then need to parse json via various methods (RegExp, String manipulation, VBA-JASON, Script.Control etc).

EDIT: You also should Open using following arguments (i.e. bAsync argument should be set to false).
Code:
http.Open "GET", URL, False

2. This is because of asynchronous javascript query. Since it runs in background after page is loaded just checking for readyState won't capture it.

Try following code.
Code:
Sub useClassnames()
    Dim lists As IHTMLElementCollection
    Dim anchorElements As IHTMLElementCollection
    Dim ulElement As HTMLUListElement
    Dim liElement As HTMLLIElement
    Dim row As Long
    Dim ie As InternetExplorer

    Set ie = New InternetExplorer

    With ie
        .navigate "https://www.unibet.ro/betting#filter/all/all/all/all/in-play"
        .Visible = True

        Do While ie.readyState <> READYSTATE_COMPLETE
            DoEvents
        Loop
    End With
        Do Until ie.document.getElementsByClassName("KambiBC-event-item__participants-container").Length > 0
            DoEvents
        Loop
      
    Set lists = ie.document.getElementsByClassName("KambiBC-event-item__participants-container")

    row = 1

    For Each ulElement In lists
        For Each liElement In ulElement.getElementsByClassName("KambiBC-event-participants")
            Set anchorElements = liElement.getElementsByClassName("KambiBC-event-participants__name")

            If anchorElements.Length > 0 Then
                Cells(row, 1) = anchorElements.Item(0).innerText
                row = row + 1
            End If
        Next liElement
    Next ulElement
End Sub
 
Last edited:
I was thinking the same that the script ran faster than the page could load but i was thinks of a solution involving "delay" or "wait". But this is clever as it makes sure that the information that i am waiting for is loaded on the page. Thank a lot.

So you advice me to stick with Html for web scraping especially if it involves java script.
 
So you advice me to stick with Html for web scraping especially if it involves java script.

Depends on your preference. I tend to lean toward XMLHTTP since it's faster and less prone to error. However, for many it's much easier to use IE automation. Since you can just use object model provided, instead of parsing json response.
 
Back
Top