• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

How to download files from web using vba?

shahin

Active Member
I've created a macro to download some files from a webpage. The thing is I can only scrape the links to those files. As I never worked with downloading files from websites, I got stuck and can't go ahead. Any help as to how i can accomplish this will be greatly appreciated. Thanks in advance.

Here is what I've written so far:

Code:
Sub download_files()
    Dim HTTP As New XMLHTTP60, html As New HTMLDocument, post As Object

    With HTTP
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        html.body.innerHTML = .responseText
    End With
   
    For Each post In html.getElementById("latest").getElementsByTagName("a")
        Debug.Print post.href
    Next post
End Sub

Output:
Code:
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.pdf
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.txt
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.xls
http://usda.mannlib.cornell.edu/usda/current/wasde/wasde-12-12-2017.xml
How can i download these files and save in my desktop?
 
Something like below. I don't like saving things on Desktop, so used my test folder.
Code:
Sub DemoDownload()
    Dim intFF As Integer
    Dim fData() As Byte
    Dim whttp As Object, html As New HTMLDocument
    Dim x, Post
    Dim fPath As String
    Set whttp = CreateObject("WinHTTP.WinHTTPrequest.5.1")
    With whttp
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        html.body.innerHTML = .responseText
    End With

    For Each Post In html.getElementById("latest").getElementsByTagName("a")
        With whttp
            .Open "GET", Post.href, False
            .send
        End With
        fData = whttp.responseBody
        intFF = FreeFile()
        fPath = "C:\Test\" & Split(Post.href, "/")(UBound(Split(Post.href, "/")))
        Open fPath For Binary Access Write As #intFF
        Put #intFF, 1, fData
        Close #intFF
    Next Post
End Sub

Though I'd recommend making portion within the loop into function like Marc L has done it in link below.
https://chandoo.org/forum/threads/e...-work-in-a-few-cases.28239/page-2#post-169277
 
@sir Chihiro, It perfectly did the job. Few terms used in your script is totally new to me. What if I wish to do the same using IE (in case any site is javascript enabled and i can't avail with "winhttp request")?

Code:
Sub download_files()
    Dim IE As New InternetExplorer, html As HTMLDocument, post As Object

    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
  
    For Each post In html.getElementById("latest").getElementsByTagName("a")
        Debug.Print post.href
    Next post
  
    IE.Quit
End Sub
 
It didn't come to my mind in the first place that downloading images and files are almost identical. Thanks a lot sir for your suggestion and solution as you have always enriched me with.
 
One thing to mention sir: if i go for xmlhttp instead of winhttp request then the process of downloading is way faster. Any reason I should stick to winhttp request. Thanks again sir.
 
Does "WinHTTP request" has the ability to handle (i meant parse) the content of javascript enabled sites? I'm slightly confused. Thanks sir.
 
? Javascript has nothing to do with downloading a file.

If you need to deal with Java, pilot IE or use Selenium. Or code it to handle/fire script as needed.
 
Sorry sir, my question was not explicit enough to make you understand what i meant. What i wanted to know is whether `winhttp` requests has the capability to catch the dynamically generated contents as in, parsing titles etc? I'm not talking about downloading files.
 
Last edited:

As both are just for requesting, not for browsing !

But, again, as yet pointed out by Chihiro, just read post #2 link
so you must understand why winhttp instead of xmlhttp,
see also from first page of the link where there is a sample link
for a direct file URL (DownloadFile) …
 
Seems to have found another way to download files from the web using xmlhttp request.
Code:
Sub Savingfiles()
    Dim http As New XMLHTTP60, htmldoc As New HTMLDocument
    Dim htmla As Object, stream As Object, tempArr As Variant
   
    With http
        .Open "GET", "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194", False
        .send
        htmldoc.body.innerHTML = .responseText
    End With
       
    For Each htmla In htmldoc.getElementById("latest").getElementsByTagName("a")
        tempArr = Split(htmla.href, "/")
        tempArr = tempArr(UBound(tempArr))

        With http
            .Open "GET", htmla.href, False
            .send
        End With

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next htmla
End Sub
 
I've also got it working with IE. The only thing i need to optimize is place this line "IE.Quit" within the below scraper in it's right position. Where I've kept this line at this moment (if uncommented) throws an error "Interface Unknown". I would like to quit the "IE" browser in such a way that "Winhttp" request will take up where "IE" has already left off.

Code:
Sub Savingfiles()
    Dim IE As New InternetExplorer, html As HTMLDocument, http As New WinHttp.WinHttpRequest
    Dim htmla As Object, stream As Object, tempArr As Variant

    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
    
    For Each htmla In html.getElementById("latest").getElementsByTagName("a")
        tempArr = Split(htmla.href, "/")
        tempArr = tempArr(UBound(tempArr))

        http.Open "GET", htmla.href, False
        http.send

'        IE.Quit         I'was talking about this line to place in the right position

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next htmla
End Sub
 
Last edited:
That is also possible but definitely not a good idea to pursue. Here it is:
Code:
Sub Savingfiles()
    Dim IE As New InternetExplorer, html As HTMLDocument
    Dim http As New WinHttp.WinHttpRequest
    Dim htmla As Object, stream As Object
    Dim tempArr As Variant, link As Variant
    Dim data As Variant, link_storage As Variant
   
    With IE
        .Visible = True
        .navigate "http://usda.mannlib.cornell.edu/MannUsda/viewDocumentInfo.do?documentID=1194"
        Do Until .readyState = READYSTATE_COMPLETE: Loop
        Set html = .document
    End With
       
    For Each htmla In html.getElementById("latest").getElementsByTagName("a")
        data = data & IIf(data = "", "", vbNewLine) & htmla.href
    Next htmla
   
    IE.Quit  ''closed the browser (act like relay race)

    link_storage = Split(data, vbNewLine)
    For Each link In link_storage
        tempArr = Split(link, "/")
        tempArr = tempArr(UBound(tempArr))
       
        http.Open "GET", link, False
        http.send

        Set stream = CreateObject("ADODB.Stream")
        With stream
            .Open
            .Type = 1
            .write http.responseBody
            .SaveToFile ("D:\Test\Files\" & tempArr)
            .Close
        End With
    Next link
End Sub
 
Same thing no? You are terminating IE only at end of routine where you use htmla. Whether you have another routine inside the loop is beside the point ;)
 
Right you are sir. However, It would be very nice if there was any library like "BeautifulSoup" in vba, then it was possible to catch the response within IE and reuse it outside IE very smoothly..
 
Basically, you are doing just that, by putting it in to string.

If you wanted to, you could write your own class module to handle it. But then, for the amount of time it takes to write one and debug. Not really worth the effort.
 
Back
Top