John's Blog

Everyone and his dog has a 'blog these days, right? So why not me? If you are interested in anything I'm saying, great! Let's talk about it! (Leave comments.) If not, try one of my other links. Better yet, why are you wasting your time staring at your computer screen, go do something real!

Let me know if you'd like to be to be updated whenever I post a new article. It won't happen every day!

Querying a database with wildcards

Tuesday, June 12, 2007
Occasionally I have a need to look up something in a database where one or more fields may contain a wildcard. For example, here is a table you could use to see what to call a shape given some of its properties:
nbr_of_sidessides_equalangles_equalshape_name
3**Triangle
4Y*Square
4NYRectangle
***Polygon

So, how do you look something up in this table?
(Click title to find out...)

First, note that in this example, I am using the * character as the wildcard, forcing all columns to be a character type even if they otherwise look like numbers. There are some others ways to do it, but you have to be careful. See below for details. I chose '*' because it is easily recognized as a wildcard in many situations.

You will probably want to put this query into a procedure or function, so that you can reuse it and just pass each set of inputs. Let's say in this example we are passing the number of sides and the two "equal" parameters. We'll call them p_nbr_of_sides, p_sides_equal, p_angles_equal. Using these parameters, your query can be as easy as this:


SELECT shape_name FROM shapes
WHERE nbr_of_sides IN ( p_nbr_of_sides, '*' )
AND sides_equal IN ( p_sides_equal, '*' )
AND angles_equal IN ( p_angles_equal, '*' )
ORDER BY nbr_of_sides DESC, sides_equal DESC, angles_equal DESC


That's it!

How does it work? Easy. For each input parameter, you are selecting rows where the corresponding table column either matches that input, or is the wildcard. So you could potentially match multiple rows, some more specific than others. That's where the ORDER BY clause comes in. Because '*' (and most other punctuation) has a lower sort value than numbers and letters, we tell it to sort the results DESCENDING, to ensure that more specific matches are listed first. This is like Google returning search results in order of most likely to less likely. This ensures that the first row of multiples (if any) is always the most specific, thus the one we want. Try it for several variations of input and see for yourself!

In your procedure, then, just return the results from the first row and discard the others. Syntax for all this will vary by database platform, and so is intentionally left as an exercise to the reader.

So what about that '*' as the wildcard? There are some drawbacks. As noted above, you may not want to force all your columns to be character type if the specific values are all numbers. You might be tempted to just use NULL, but I'm not sure how that will be treated in the ORDER BY, and it may well vary by platform, so I would advise against that. The '*' may also not be the greatest choice if you expect that your column could start with other punctuation, some of which sorts lower than the asterisk. (See any ASCII table for actual values.) This shows that a series of spaces may actually be the best value for character fields, since the space is lowest sorting printable character.

The bottom line is that any non-NULL "special" value will do, as long as it sorts lower than any of your expected actual values. For numeric columns, if you know you will only have non-zero positive values (like in this example), 0 would do. Or some negative low number such as -9999999.

Hope you found this helpful.

Labels: ,



0 comments