Paula Thornton’s notes on #SharePointSearch from March 30, 2010
Paula Thornton (@rotkapchen) tweeted some great notes about Tagging and Taxonomies in SharePoint 2010 along with some valuable insight into deciding whether or not to pay for the FAST search license.
- Materials being presented by Smartlogic, with product Semiphor — a taxonomy product integrated with FAST search.
- Why are taxonomies needed? Terms have different meanings, is contextual. Expert vocabularies may not match search terms.
- Terms may be in documents that are really irrelevant to the overall focus of the piece and would be meaningless results.
- Taxonomy: Allows for capture of a domain knowledge, vocabulary and topical relationships. Ontology is multiple taxonomies.
- Content as an asset should be organized/managed. Ontologies help with this (what’s missing, what are the relationships).
- Ontologies can help support records management policies.
- [Professional services firm Ascentium now presenting their experiences doing implementations w/2010 including inside MSoft
- 2010 has a utility Term Store Manager for facilitating taxonomical activities.
- Concerns: Rely on users for taxonomies? Preferrably not, but allow them to add their own folksonomies.
- Concerns: If you build a taxonomy, many companies are not willing to add staff to maintain and they must be maintained.
- The manager is extensible, but there are challenges (running into while inside Microsoft via 'eat own dogfood' effort).
- One primary goal: improve findability. Architecture of Term Store: Group, Term Set, Term
- [search for blogs that deal with Term Store Manager -- they offer great advice]
- [BTW Term Store Manager is still in beta and not yet generally released. Great steps forward, but one step back.]
- [Showing UI. Have created two groups: fruits, vegetables. Adding tomato to both. Can 'borrow' but also have multi-terms]
- [Have talked to Microsoft about the issues with disambiguation when multiple meanings are presented for a term.]
- Added Term Set “vine” to house “tomato”. But have different Term Sets of Vine under Fruit and Vegetables. Different GUIDs
- [Issue: Right now anyone with access can change definitions. Governance models needed to manage for disambiguation.]
- [I'm still trying to figure out why they're going to all this effort when FAST does most of this automatically.]
- [I'm guessing this is still 'brute force' version of SharePoint, which still requires 'brute force' search management
- [The dropdown lists can get unwieldy because there is no description in the dropdowns, no way to know which term is which]
- [Back to Smartlogic staff] Small or Large Ontology? Small: Easier to maintain, buckets larger, results less granular.
- Asking users to self-tag against a really large taxonomy requires considerable effort (requires understanding of whole)
- Companies restructure, change clients, add lines of business — all effects the total taxonomy (and existing content base)
- In 2007 there were no real taxonomies available. Keywords are not the same thing. Results can be tuned via taxonomy.
- Smartlogic now demoing Semaphore. The taxonomy is only half the equation. The content itself must have relevant metadata.
- [Semaphore is totally integrated as an add-on to the SharePoint UI, as if it were simply utilities in SharePoint.]
- Adding a document to SharePoint brings up the Semaphore UI for taxonomical additions, via ‘assisted’ tagging.
- That is, there are recommendations made to which edits can be made. Effectively ‘automated’ with overrides.
- This UI is useful for existing SharePoint stores to be reviewed and classified as well. [Again, only relevant w/o FAST]
- [With this level of effort, I'm still trying to figure out why a company wouldn't pay for the FAST licenses instead?]
- [Interesting navigation of topic maps and managing the 'collection' from the whole rather than the discrete elements.]
- Semaphore Architecture: Allows for imports of existing structures and reorganized. Text mining and classification.
- Rules-based approach used for classification server to make recommendations (rules can be tweaked).
- Now talking about the FAST Server and explaining the greater control over the results, multi-content results, etc.
- May 11th there will be another event to cover the FAST Server in more detail.
- Q&A raised issues of cross-cultural-language issues for global taxonomies. See Motorola study http://twurl.nl/f3crub
- Bottom line, assuming that you can get meaningful search results out of the box from SharePoint is erroneous.