Penn Computing

Penn Computing

Computing Menu Computing A-Z
Computing Home Information Systems & Computing Penn

Service Alerts


PennNames Generate Algorithm

Following is a discussion of how PennNames are generated when using the PennNames Generate command.



Terminology

In this discussion of the Generate command algorithm the following terminology will be used:

Term   Description   Example
PennCommunity-Fullname   This is the full name of the user as stored in PennCommunity, in the format, LAST_NAME FIRST_NAME MIDDLE_NAME.   Smith, John H
User-Supplied-Name   This is any name information supplied as an optional argument to the Generate command in the <FULLNAME> variable. This may be the full name of the user, or it may be something completely arbitrary.   Jack Smith
Seed   This is a string which the user prefers for his/her username. This information is supplied as an optional argument to the Generate command in the <SEED> variable, the string that is prepended by a colon when issuing the Generate command. Please note that the colon is not factored into the algorithm. It has the highest weight.   bozo
Name-Material   This is the combination of data from PennCommunity_Supplied_Fullname plus the Seed or the User-Supplied-Name plus the Seed and is used to generate a list of potential usernames. Any punctuation is stripped out so that "John H. Smith, Jr." becomes "John H Smith Jr".   bozo Jack Smith

Overview

A good-looking username consists of a large portion of one part of a user's name, either followed by or preceeded by a small portion of another part of his/her name.

For example, let's consider the user whose PennCommunity FIRST_NAME is "Ziggy", whose PennCommunity LAST_NAME is "Bozo", and who doesn't have a PennCommunity MIDDLE_NAME. It is likely that Ziggy would prefer the following names:

  • bozo
  • ziggy
  • zbozo
  • bozoz
  • ziggyb
These should be considered the best possible names for this user.

In practice, some of those names will be unavailable since they might be assigned to some other person on campus or reserved by another sponsor. Likely secondary choices for Ziggy might be:

  • bozo2
  • ziggy2
  • zbozo2
  • bozoz2
  • ziggyb2
If the names postpended with "2" are taken, we might suggest the same names ending with 3, and so on.

Of course there are a lot of variables. There's no guarantee that we'll have a middle initial. Even if we do, it's possible that the user prefers to go by their second name, the user could have two middle names, the PennCommunity information might be inaccurate, and so on. So a wide variety of names should be generated to try and cover as many of these possibilities as is practical.

Previous versions of the PennNames name generation algorithm were considered to have too high a weight on the middle name since the first name, last name and middle name had been weighted equally. The current version of the algorithm tries to deduce which name is the middle name, and give it a lower priority than other seed material.

Sources of the Names

The implemented algorithm has three sources of potential Name-Material. They are :

  • the PennCommunity-Fullname
  • an optional User-Supplied-Name
  • an optional Seed

Basic generation of the names

The basic premise is that all seed material falls in to one of three categories: high, medium or low weight. The PennCommunity-Fullname is examined, and middle names are given a low weight. The optional Seed, if provided, is given a high weight. Titles and suffixes (e.g. Mrs., Jr., Sr.) are discarded. Everything else is given a medium weight.

The namestream is a series of "lazy enumeration" functions which generate more results on demand. These functions are stacked on top of each other such that the generate results from the high priority material first; then the high and medium priority material; then the high, medium and low material in conjunction. The lazy enumeration will only do as much work as it needs to in order to return the target number of suggested names.

There is also a time limit placed on the namestream generator. If the time expires then the result list will be cut short. This prevents a runaway server or malfunctioning service from denying access to the PennNames service.

The mixing of names is done in stages:

  • Each piece of material by itself
  • Mixed pairs of material
  • Mixed triplets of material

No more than three pieces of material are ever considered for a generated name.

In our simplest example where the PennCommunity FIRST_NAME is "Ziggy", the PennCommunity LAST_NAME is "Bozo", the PennComunity MIDDLE_NAME is blank, and no User-Supplied-Name or Seed has been supplied, the Name-Material will be "Bozo Ziggy" (note: last name first) and we will generate these names:
  1. bozo
  2. ziggy
  3. bozoz
  4. ziggyb
  5. bziggy
  6. zbozo
  7. bozozi
  8. ziggybo
  9. bz
  1. zb
  2. boziggy
  3. zibozo
  4. bozozig
  5. ziggyboz
  6. bzi
  7. zbo
  8. boz
  9. zib
  1. bozziggy
  2. zigbozo
  3. bozozigg
  4. zboz
  5. bzig
  6. zibo
  7. bozi
  8. zigb
  9. bozz
  1. ziggbozo
  2. bzigg
  3. ziboz
  4. bozig
  5. zigbo
  6. bozzi
  7. ziggb
  8. bozigg
  9. zigboz
  1. bozzig
  2. ziggbo
  3. bozzigg
  4. ziggboz
  5. bo
  6. zi
  7. zig
  8. zigg

Those are the only 44 possible names generated from the given source material.

Extended generation of the names

As a final resort, the lazy enumerator will begin to postpend numbers if the material is completely used up. The names generated will look like this:

  1. bozo1
  2. bozo2
  3. ziggy1
  4. bozo3
  5. ziggy2
  6. bozo4
  7. ziggy3
  8. bozo5
  9. ziggy4
  10. bozo6
  11. ziggy5
  12. bozo7
  1. ziggy6
  2. bozo8
  3. ziggy7
  4. bozo9
  5. ziggy8
  6. bozo10
  7. ziggy9
  8. bozo11
  9. ziggy10
  10. bozo12
  11. ziggy11
  12. bozo13
  1. ziggy12
  2. bozo14
  3. ziggy13
  4. bozo15
  5. ziggy14
  6. bozo16
  7. ziggy15
  8. b1
  9. bozo17
  10. ziggy16
  11. b2
  12. z1
  1. bozo18
  2. ziggy17
  3. b3
  4. z2
  5. bozoz1
  6. bozo19
  7. ziggy18
  8. b4
  9. z3
  10. bozoz2
  11. bozo20
  12. ziggy19
  1. b5
  2. z4
  3. bozoz3
  4. ziggyb1
  5. bozo21
  6. ziggy20
  7. b6
  8. z5
  9. bozoz4
  10. ziggyb2
  11. bozo22
  12. ziggy21

You'll notice that this starts with the first results from the initial generate, above. The order in which the seeds themselves are consulted is guaranteed, but the point at which they're used is not. In this case there are two uses of 'boz o', then one of 'ziggy'. This may not be true of all cases. In general, this number-postpending mechanism is considered a mechanism of last resort so it is hoped that one need not rely on the efficacy of the extension; if you get here, nothing that you want is avaiable anyway.

Name-material weighting

As mentioned above, not every piece of Name-Material carries the same weight. For example, if we add a middle name ("Quartermaine") for Ziggy, we then have the Name-Material "Bozo Ziggy Quartermaine" and generate these names:
  1. bozo
  2. ziggy
  3. bozoz
  4. ziggyb
  5. bziggy
  6. zbozo
  7. bozozi
  8. ziggybo
  9. bz
  10. zb
  11. boziggy
  12. zibozo
  13. bozozig
  14. ziggyboz
  15. bzi
  16. zbo
  17. boz
  18. zib
  19. bozziggy
  20. zigbozo
  21. bozozigg
  22. zboz
  23. bzig
  24. zibo
  25. bozi
  26. zigb
  27. bozz
  28. ziggbozo
  29. bzigg
  30. ziboz
  31. bozig
  32. zigbo
  33. bozzi
  34. ziggb
  35. bozigg
  36. zigboz
  37. bozzig
  38. ziggbo
  39. bozzigg
  40. ziggboz
  41. bozoq
  42. ziggyq
  43. bo
  44. zi
  45. qu
  46. qbozo
  47. qziggy
  48. zig
  49. qua
  50. bozoqu
  51. ziggyqu
  52. zigg
  53. quar
  54. bq
  55. zq
  56. qb
  57. qz
  58. bozozq
  59. bozoqz
  60. ziggybq
  61. ziggyqb
  62. quart
  63. qubozo
  64. quziggy
  65. quarte
  66. bozoqua
  67. ziggyqua
  68. quarter
  69. bqu
  70. zqu
  71. qbo
  72. qzi
  73. bozozqu
  74. bozoqzi
  75. ziggybqu
  76. ziggyqbo
  77. quarterm
  78. boq
  79. ziq
  80. qub
  81. quz
  82. bziggyq
  83. zbozoq
  84. qbozoz
  85. qziggyb
  86. quabozo
  87. quaziggy
  88. bozoquar
  89. qboz
  90. bqua
  91. zqua
  1. qubo
  2. qzig
  3. bozozqua
  4. bozoqzig
  5. qziggybo
  6. boqu
  7. ziqu
  8. quab
  9. quzi
  10. bziggyqu
  11. zbozoqu
  12. ziggyqub
  13. qbozozi
  14. bozq
  15. zigq
  16. quarbozo
  17. quaz
  18. bozoziq
  19. bozoquz
  20. ziggyboq
  21. zqbozo
  22. qzbozo
  23. quboz
  24. bqziggy
  25. qbziggy
  26. bquar
  27. quabo
  28. qzigg
  29. boqua
  30. zquar
  31. quarb
  32. quzig
  33. zqb
  34. qbozozig
  35. qzb
  36. bozqu
  37. ziqua
  38. quazi
  39. bozoquzi
  40. zbozoqua
  41. zigqu
  42. quaboz
  43. quarz
  44. bozoziqu
  45. bqz
  46. qbz
  47. bquart
  48. ziggq
  49. quarbo
  50. bzq
  51. zbq
  52. zqbo
  53. qzbo
  54. boquar
  55. quartb
  56. quzigg
  57. quziggyb
  58. bozqua
  59. zquart
  60. quazig
  61. ziquar
  62. quarboz
  63. quarzi
  64. bqzi
  65. zqboz
  66. qbzi
  67. qzboz
  68. bquarte
  69. zigqua
  70. quartbo
  71. quartz
  72. qubozoz
  73. boquart
  74. ziggqu
  75. quarteb
  76. bozquar
  77. quazigg
  78. bzqu
  79. zbqu
  80. zqubozo
  81. qzibozo
  82. zquarte
  83. quartboz
  84. quarzig
  85. boziggyq
  86. bqzig
  87. zibozoq
  88. qbzig
  89. bquarter
  90. ziquart
  91. quartebo
  1. quartzi
  2. qubozozi
  3. boquarte
  4. zigquar
  5. quarterb
  6. quartez
  7. bozoquaz
  8. zqub
  9. qzib
  10. bozquart
  11. ziggqua
  12. bquziggy
  13. ziqbozo
  14. qboziggy
  15. quzbozo
  16. quarzigg
  17. bqzigg
  18. qbzigg
  19. zquarter
  20. quartzig
  21. zqubo
  22. qzibo
  23. ziquarte
  24. quartezi
  25. bzqua
  26. zbqua
  27. ziqb
  28. quzb
  29. zigquart
  30. quarterz
  31. bquz
  32. zibozoqu
  33. ziggquar
  34. bozozigq
  35. boqziggy
  36. zquboz
  37. qubziggy
  38. qziboz
  39. ziqbo
  40. quzbo
  41. bquzi
  42. qbozi
  43. boqz
  44. ziqboz
  45. qubz
  46. quzboz
  47. bzquar
  48. zbquar
  49. bquzig
  50. zquabozo
  51. qbozig
  52. qzigbozo
  53. boqzi
  54. qubzi
  55. bziq
  56. zboq
  57. quabozoz
  58. zquab
  59. qzigb
  60. bquzigg
  61. ziqubozo
  62. qbozigg
  63. quzibozo
  64. boqzig
  65. qubzig
  66. zquabo
  67. qzigbo
  68. ziqub
  69. qbozz
  70. quzib
  71. bzquart
  72. zbquart
  73. zigqbozo
  74. quazbozo
  75. boqzigg
  76. zquaboz
  77. qubzigg
  78. qzigboz
  79. ziqubo
  80. quzibo
  81. bziqu
  82. zboqu
  83. zigqb
  84. qbozzi
  85. quazb
  86. bquaz
  87. zibq
  88. ziquboz
  89. quziboz
  90. zigqbo
  91. quazbo
  1. qbozzig
  2. bquazi
  3. qubozi
  4. boquz
  5. zigqboz
  6. quabz
  7. quazboz
  8. bzquarte
  9. zbquarte
  10. qbozzigg
  11. qziggb
  12. bquazig
  13. qubozig
  14. bziqua
  15. boquzi
  16. zboqua
  17. quabzi
  18. bozqz
  19. zibqu
  20. qziggbo
  21. zigbozoq
  22. zquarb
  23. quzigb
  24. bquazigg
  25. qubozigg
  26. boquzig
  27. quabzig
  28. qziggboz
  29. bozqzi
  30. zquarbo
  31. quzigbo
  32. ziquab
  33. qubozz
  34. quazib
  35. boquzigg
  36. zquarboz
  37. quabzigg
  38. quzigboz
  39. bozqzig
  40. ziquabo
  41. quazibo
  42. zigqub
  43. qubozzi
  44. quarzb
  45. bziquar
  46. bquarz
  47. zboquar
  48. zibqua
  49. ziquaboz
  50. quaziboz
  51. bozqzigg
  52. zigqubo
  53. quarzbo
  54. zbozq
  55. ziggqb
  56. qubozzig
  57. bquarzi
  58. quabozi
  59. boquaz
  60. zigquboz
  61. quarbz
  62. quarzboz
  63. ziggqbo
  64. quziggb
  65. bquarzig
  66. quabozig
  67. boquazi
  68. ziggqboz
  69. quarbzi
  70. bozquz
  71. quziggbo
  72. bziquart
  73. zboquart
  74. zquartb
  75. quazigb
  76. zibquar
  77. boquazig
  78. quarbzig
  79. bozquzi
  80. zbozqu
  81. zquartbo
  82. quazigbo
  83. bzigq
  84. ziboq
  85. ziquarb
  86. quabozz
  87. quarzib
  88. bozquzig
  89. ziquarbo
  90. quarzibo
  91. zigquab
  1. quabozzi
  2. quartzb
  3. bquartz
  4. zigquabo
  5. quartzbo
  6. zibquart
  7. ziggqub
  8. bquartzi
  9. quarbozi
  10. boquarz
  11. zbozqua
  12. quartbz
  13. bzigqu
  14. ziboqu
  15. ziggqubo
  16. boziq
  17. zigbq
  18. quaziggb
  19. boquarzi
  20. quartbzi
  21. bozquaz
  22. zquarteb
  23. quarzigb
  24. bozquazi
  25. ziquartb
  26. quarbozz
  27. quartzib
  28. zbozquar
  29. bzigqua
  30. ziboqua
  31. boziqu
  32. zigbqu
  33. bozzq
  34. zigquarb
  35. quartezb
  36. bquartez
  37. ziggquab
  38. boquartz
  39. quartebz
  40. bzigquar
  41. ziboquar
  42. boziqua
  43. zigbqua
  44. bozzqu
  45. bozquarz
  46. bziggq
  47. zibozq
  48. boziquar
  49. zigbquar
  50. bozzqua
  51. bziggqu
  52. zibozqu
  53. bozigq
  54. zigboq
  55. bozzquar
  56. bziggqua
  57. zibozqua
  58. bozigqu
  59. zigboqu
  60. bozziq
  61. ziggbq
  62. bozigqua
  63. zigboqua
  64. bozziqu
  65. ziggbqu
  66. boziggq
  67. zigbozq
  68. bozquarz
  69. bziggq
  70. zibozq
  71. boziquar
  72. zigbquar
  73. bozzqua
  74. bziggqu
  75. zibozqu
  76. bozigq
  77. zigboq
  78. bozzquar
  79. bziggqua
  80. zibozqua
  81. bozigqu
  82. zigboqu
  83. bozziq
  84. ziggbq
  85. bozigqua
  86. zigboqua
  87. bozziqu
  88. ziggbqu
  89. boziggq
  90. zigbozq


Note that the middle name is not consulted until the other Name-Material is nearly exhausted (at position 41). This means that typical generate request which is usually for 15-25 names will not generally use the middle name at all.

Name validation and results

Since this algorithm uses a lazy iterator, the results are vetted immediately upon generation. Names which are not available are immediately discarded. This results in a significantly faster algorithm than the previous iteration; generating 1000 names takes approximately 8 seconds with the new algorithm, whereas we saw generates of 100 take over 30 seconds with the previous algorithm. This should help avoid server deadlock under high load.

The list of results is returned to the PennNames client, up to but no more than the maximum number of names that were requested. This iterator should never run out of material (there are an endless supply of numbers), and so should always return the exact number of requested names. Even so, do not assume that asking for 25 names will return 25 names. It will return up to 25 names. Some cases wind up returning only one name, or only the set of names that are already held for a given PennID. It is only guaranteed that the server will not return more results than what you requested.

top

Information Systems and Computing
University of Pennsylvania
Comments & Questions


Penn Computing University of Pennsylvania
Information Systems and Computing, University of Pennsylvania