How to download files from web using vba?

shahin · Dec 13, 2017

I've created a macro to download some files from a webpage. The thing is I can only scrape the links to those files. As I never worked with downloading files from websites, I got stuck and can't go ahead. Any help as to how i can accomplish this will be greatly appreciated. Thanks in advance.

Here is what I've written so far:

Code:

Sub download_files()
    Dim HTTP As New XMLHTTP60, html As New HTMLDocument, post As Object

    With HTTP
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        html.body.innerHTML = .responseText
    End With
   
    For Each post In html.getElementById("latest").getElementsByTagName("a")
        Debug.Print post.href
    Next post
End Sub

Output:

Code:

http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.pdf
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.txt
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.xls
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.xml

How can i download these files and save in my desktop?

Chihiro · Dec 13, 2017

Something like below. I don't like saving things on Desktop, so used my test folder.

Code:

Sub DemoDownload()
    Dim intFF As Integer
    Dim fData() As Byte
    Dim whttp As Object, html As New HTMLDocument
    Dim x, Post
    Dim fPath As String
    Set whttp = CreateObject("WinHTTP.WinHTTPrequest.5.1")
    With whttp
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each Post In html.getElementById("latest").getElementsByTagName("a")
        With whttp
            .Open "GET", Post.href, False
            .send
        End With
        fData = whttp.responseBody
        intFF = FreeFile()
        fPath = "C:\Test\" & Split(Post.href, "/")(UBound(Split(Post.href, "/")))
        Open fPath For Binary Access Write As #intFF
        Put #intFF, 1, fData
        Close #intFF
    Next Post
End Sub

Though I'd recommend making portion within the loop into function like Marc L has done it in link below.
https://chandoo.org/forum/threads/e...-work-in-a-few-cases.28239/page-2#post-169277

shahin · Dec 13, 2017

@sir Chihiro, It perfectly did the job. Few terms used in your script is totally new to me. What if I wish to do the same using IE (in case any site is javascript enabled and i can't avail with "winhttp request")?

Code:

Sub download_files()
    Dim IE As New InternetExplorer, html As HTMLDocument, post As Object

    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
  
    For Each post In html.getElementById("latest").getElementsByTagName("a")
        Debug.Print post.href
    Next post
  
    IE.Quit
End Sub

Chihiro · Dec 13, 2017

Same thing. Only difference is inside for loop and needing to createobject for winhttp.

Nothing new here. Pretty much same concept as your thread below from earlier this year.
https://chandoo.org/forum/threads/unable-to-save-images-using-vba.32987/#post-195627

shahin · Dec 13, 2017

It didn't come to my mind in the first place that downloading images and files are almost identical. Thanks a lot sir for your suggestion and solution as you have always enriched me with.

shahin · Dec 13, 2017

One thing to mention sir: if i go for xmlhttp instead of winhttp request then the process of downloading is way faster. Any reason I should stick to winhttp request. Thanks again sir.

Chihiro · Dec 13, 2017

Read the thread I referenced for the reason.
At any rate I had no speed difference on my end.

shahin · Dec 14, 2017

Does "WinHTTP request" has the ability to handle (i meant parse) the content of javascript enabled sites? I'm slightly confused. Thanks sir.

Chihiro · Dec 14, 2017

? Javascript has nothing to do with downloading a file.

If you need to deal with Java, pilot IE or use Selenium. Or code it to handle/fire script as needed.

shahin · Dec 14, 2017

Sorry sir, my question was not explicit enough to make you understand what i meant. What i wanted to know is whether `winhttp` requests has the capability to catch the dynamically generated contents as in, parsing titles etc? I'm not talking about downloading files.

Marc L · Dec 14, 2017

As both are just for requesting, not for browsing !

But, again, as yet pointed out by Chihiro, just read post #2 link
so you must understand why winhttp instead of xmlhttp,
see also from first page of the link where there is a sample link
for a direct file URL (DownloadFile) …

shahin · Dec 14, 2017

Seems to have found another way to download files from the web using xmlhttp request.

Code:

Sub Savingfiles()
    Dim http As New XMLHTTP60, htmldoc As New HTMLDocument
    Dim htmla As Object, stream As Object, tempArr As Variant
   
    With http
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        htmldoc.body.innerHTML = .responseText
    End With
       
    For Each htmla In htmldoc.getElementById("latest").getElementsByTagName("a")
        tempArr = Split(htmla.href, "/")
        tempArr = tempArr(UBound(tempArr))

        With http
            .Open "GET", htmla.href, False
            .send
        End With

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next htmla
End Sub

shahin · Dec 15, 2017

I've also got it working with IE. The only thing i need to optimize is place this line "IE.Quit" within the below scraper in it's right position. Where I've kept this line at this moment (if uncommented) throws an error "Interface Unknown". I would like to quit the "IE" browser in such a way that "Winhttp" request will take up where "IE" has already left off.

Code:

Sub Savingfiles()
    Dim IE As New InternetExplorer, html As HTMLDocument, http As New WinHttp.WinHttpRequest
    Dim htmla As Object, stream As Object, tempArr As Variant

    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
    
    For Each htmla In html.getElementById("latest").getElementsByTagName("a")
        tempArr = Split(htmla.href, "/")
        tempArr = tempArr(UBound(tempArr))

        http.Open "GET", htmla.href, False
        http.send

'        IE.Quit         I'was talking about this line to place in the right position

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next htmla
End Sub

Chihiro · Dec 15, 2017

You can't quit until end of your routine.

shahin · Dec 15, 2017

That is also possible but definitely not a good idea to pursue. Here it is:

Code:

Sub Savingfiles()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim http As New WinHttp.WinHttpRequest
    Dim htmla As Object, stream As Object
    Dim tempArr As Variant, link As Variant
    Dim data As Variant, link_storage As Variant
   
    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
       
    For Each htmla In html.getElementById("latest").getElementsByTagName("a")
        data = data & IIf(data = "", "", vbNewLine) & htmla.href
    Next htmla
   
    IE.Quit  ''closed the browser (act like relay race)

    link_storage = Split(data, vbNewLine)
    For Each link In link_storage
        tempArr = Split(link, "/")
        tempArr = tempArr(UBound(tempArr))
       
        http.Open "GET", link, False
        http.send

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next link
End Sub

Chihiro · Dec 15, 2017

Same thing no? You are terminating IE only at end of routine where you use htmla. Whether you have another routine inside the loop is beside the point

shahin · Dec 15, 2017

Right you are sir. However, It would be very nice if there was any library like "BeautifulSoup" in vba, then it was possible to catch the response within IE and reuse it outside IE very smoothly..

Chihiro · Dec 15, 2017

Basically, you are doing just that, by putting it in to string.

If you wanted to, you could write your own class module to handle it. But then, for the amount of time it takes to write one and debug. Not really worth the effort.

How to download files from web using vba?

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Marc L

Excel Ninja

shahin

Active Member

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Chihiro

Excel Ninja

shahin

Active Member

Chihiro

Excel Ninja