• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Looking to speed up big data analysis

So I've got a workbook that's big and slow it's generally somewhere between 75 - 100 Mb. It contains inventory transaction data for a rolling 3 years which populates from linked tables that have half a million or more lines each.

These tables have been pivoted, then summaries are done by VLOOKUPs to the pivot tables.

Obviously, INDEX(MATCH works faster than VLOOKUP, but would SUMIF or SUMPRODUCT work faster straight to the source data than pivoting all that data to begin with?

Would also need to run some MIN MAX functions on the source as well.

This sheet works so slowly that I have to set formula calculations to manual, and when I do refresh the sheet I go get a cup of coffee and read Chandoo.org while I wait for it to process, it's killing my productivity.

Thanks Ninjas,

The Doctor
 
Get PowerQuery and PowerPivot. This will speed up your workbook.

PowerQuery to pull in data from source (be it Worksheet or Database) and filter and transform.

PowerPivot to add measures and KPIs.

Then you can summarize via Pivot or CUBE functions on worksheets.

Deals well with large data set be it 32 bit or 64 bit Excel. Though 64 bit will out perform 32 bit on large data set when you have enough RAM on your machine.
 
PowerQuery is very easy to use. Especially if you know SQL database. But even without that knowledge, it's pretty easy to learn. M function will need some work, but there are some good resources on the web.

PowerPivot will need some getting used to, especially using DAX functions. When to use Measure vs. Calculated Columns etc.

However, it's very good skill set to have, especially if you use Excel as your primary data analysis/visualization tool and deal with large data set.
 
Looks like I've gotten myself lost. I've pulled in my data tables and linked them very much like I would in an access database. Now I need to consolidate the data on them. If I were still using them in excel as normal, I would use a SUMIF to grab all the quantities from the received in table if the item matches, would also use an array to get min date from that same received in table. Then would have a similar pair of columns to pull in the usage quantity and max usage date.

With PowerPivot I'm not sure how to begin merging the tables.
 
Looks like I've gotten myself lost. I've pulled in my data tables and linked them very much like I would in an access database. Now I need to consolidate the data on them. If I were still using them in excel as normal, I would use a SUMIF to grab all the quantities from the received in table if the item matches, would also use an array to get min date from that same received in table. Then would have a similar pair of columns to pull in the usage quantity and max usage date.

With PowerPivot I'm not sure how to begin merging the tables.

Did you use PowerQuery to get to where you are? Are you lost in PowerPivot or PowerQuery - sounds as if you're PowerPivot ....
You'll need to use some DAX formulas to get where I *think* you're trying to get to .... The most recent good resource I've seen out there is Curbal on YouTube - nice short, succinct videos on specific DAX expressions
 
Most of data transformation and SUMIFS/LookUp type operation can be done at PowerQuery stage, even before considering DAX formulas.

Which I'd recommend doing.

Also, take it one step at a time and ensure that you have final consolidated data set up before you load the data model to PowerPivot. Otherwise, you may find yourself rebuilding your DAX over and over as you make changes to the model.
 
You should be doing the Merging, Filtering and Sorting as part of the Power Query stage
Then sumarising/reporting in Power Pivot
 
Hi ,

The simple fact is that you are using Excel to store data , and voluminous data , at that.

Excel is not intended for data storage , it is intended to manipulate data.

The ideal thing to do would be to use Access or any other database management system to deal with data input and storage , and use Excel solely for the data manipulation and presentation part.

The interfacing of Access and Excel is straightforward , and the improvement in response will be dramatic.

Power Pivot is recommended only if your data manipulation requirements are more sophisticated than what Excel can deal with , and if you are unfamiliar with Access and familiar with Power Pivot.

Stepping up from being an reasonably proficient user of Excel to becoming a reasonably proficient user of Power Pivot is a large step , and unless your job requires it , may not be advisable. Getting to work with Access only for data input and data storage is far easier.

This is just my opinion.

Narayan
 
Did you use PowerQuery to get to where you are? Are you lost in PowerPivot or PowerQuery - sounds as if you're PowerPivot ....
You'll need to use some DAX formulas to get where I *think* you're trying to get to .... The most recent good resource I've seen out there is Curbal on YouTube - nice short, succinct videos on specific DAX expressions
Well, I inherited this workbook from a previous analyst that held this position. It had 7 OLE DB queries already in it using SQL scripting to pull the data in from the ERP system, creating a read only linked table. These tables were then pivoted in various ways and vlookups used to report out age of Raw Material, WIP, and Finished Goods inventoried.

It's a quite messy and slow book, which I'm wanting to clean up. It seems like a good power query of the same dbo tables could reduce the need to touch the same data over and over.

Where I'm running into issues are trying to parse the power query in the same way as the sql. For example, one script uses a date limiting function WHERE lst_trx_date > (getdate()-400). I can't figure out how to make a dynamic filter in power query that would only pull transactions that are less than 400 days old.
 
PowerQuery has its own Date Functions - have a peek at this
https://www.powerquery.training/portfolio/time-intelligence-with-power-query/

I have a book entitled M is for (Data) Monkey, subtitled the Excel Pro's Definitive Guide to Power Query that I found useful in getting to grips with PQ.

If you intend to do the date filtering from within Power Pivot, which others have suggested is not the best idea, you will need a Calendar Table. Power Query can help you create one of those too!

I'm running round like a fool today, but I'll be interested to hear how you get along with this - it's a clever tool, that Microsoft *seems* to want to keep a bit secret - you have to go looking for help on ..
 
For example, one script uses a date limiting function WHERE lst_trx_date > (getdate()-400). I can't figure out how to make a dynamic filter in power query that would only pull transactions that are less than 400 days old.

Actually, you can use native SQL query in PowerQuery.

When you initiate connection, click on advanced option. And you can enter SQL Statement (as long as it's from SQL server, if you don't have full Pro-Plus license. Pro-Plus adds ability to connect to various other database types).

upload_2016-10-11_8-50-56.png

Note that some native functions may not be available or require special configuration (such as Pivot function).

Also, you can use M fuction to filter dynamically, but this will be less efficient in most cases as it needs to pull entire data set prior to filtering operation.

EDIT: For dynamic date filter using M function only. See my post below.
http://chandoo.org/forum/threads/powerquery-filter-for-today-1-using-m.30079/

Just change "-1" to "-400" and change "[ReportDate]" to your date column.
 
Last edited:
Back
Top