• Hi All

    Please note that at the Chandoo.org Forums there is Zero Tolerance to Spam

    Post Spam and you Will Be Deleted as a User

    Hui...

  • When starting a new post, to receive a quicker and more targeted answer, Please include a sample file in the initial post.

Conditional Formatting for Similar Items

I've got a huge inventory list (approx. 25,000) individual items that we've imported into a new ERP system. The upload seems to have items that could potentially be duplicated, but only have a similar item number, things such as leading zeros being dropped, or a dash missing in the middle of the number. I am hoping to find a way to filter this huge list down to only items that may be potential matches. I've attached a sample that shows items that could potentially be the same item and need reviewed.

As always, I appreciate all the help these forums offer.
 

Attachments

  • find similar spreadsheet example.xlsx
    9.8 KB · Views: 4
Do all item numbers tend to follow the same format? Like,
FirstPart-SecondPart-ThirdPart(optional)

Would these two be considered similar?
001D00013
1D13

Taking a shot, here's a possible solution using a UDF. Once you have the names cleaned, it's much easier to write the CF looking for duplicate items, doing something like:
=COUNTIF(SomeRange,E2)>1
 

Attachments

  • Clean IDS.xlsm
    17.9 KB · Views: 3
Unfortunately, there is not much consistency with part numbering as they are generally set to match to customers part number, we may very well have 001D00013 and 1D13 that are completely different and would bog down the process more than speed it up. Although the ability to drop the dashes, periods, and hash marks will definitely help, trying to take out any of the other letters/numbers can cause more issues. I think I'll give it a try with just symbols and see how many more hits I get.
 
Let me know what you find. Alternatively, here's my attempt at modifying the UDF. If you take the 0 off the "ignore list", and change UDF to this, this function will just strip out leading 0's from groups of numbers.
Code:
Function CleanID(oldWord As String)
Dim i As Long
Dim badItems As Object
Set badItems = CreateObject("scripting.dictionary")
Dim c As Range
Dim newWord As String
Dim xLet As String
Dim numWord As String


'Build dictionary of all characters to ignore
For Each c In Range("rngIgnore")
    badItems.Add CStr(c.Value), c.Value
Next

'Search through ID, only keeping characters of importance
For i = 1 To Len(oldWord)
    xLet = Mid(oldWord, i, 1)
    If Not badItems.exists(xLet) Then
        'Checks for leading zeros
        Select Case xLet
        Case 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
            numWord = numWord & xLet
        Case Else
            If numWord <> "" Then newWord = newWord & CLng(numWord)
            numWord = ""
            newWord = newWord & xLet
        End Select
    Else
        If numWord <> "" Then newWord = newWord & CLng(numWord)
        numWord = ""
    End If
Next i
'Add on any trailing numbers
If numWord <> "" Then newWord = newWord & CLng(numWord)
   

CleanID = newWord

End Function
 
Back
Top