<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Janes&#039; Code Weblog &#187; oracle</title>
	<atom:link href="http://code.davidjanes.com/blog/tag/oracle/feed/" rel="self" type="application/rss+xml" />
	<link>http://code.davidjanes.com/blog</link>
	<description>Just another WordPress weblog</description>
	<lastBuildDate>Sun, 11 Apr 2010 12:32:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>How to do multi-column multilingual full text searching in Oracle</title>
		<link>http://code.davidjanes.com/blog/2008/10/27/full-text-searching-in-oracle/</link>
		<comments>http://code.davidjanes.com/blog/2008/10/27/full-text-searching-in-oracle/#comments</comments>
		<pubDate>Mon, 27 Oct 2008 23:43:37 +0000</pubDate>
		<dc:creator>David Janes</dc:creator>
				<category><![CDATA[db]]></category>
		<category><![CDATA[oracle]]></category>

		<guid isPermaLink="false">http://code.davidjanes.com/blog/?p=41</guid>
		<description><![CDATA[Here&#8217;s how you do full text searching across multiple different columns in a multilingual environment on Oracle 9 and better, quick and easy. I&#8217;m more of an MySQL guy, so you&#8217;ll have to excuse me if my Oracle lingo isn&#8217;t up to scratch.
One time setup, as system
This has to be execute by a privileged Oracle [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s how you do full text searching across multiple different columns in a multilingual environment on Oracle 9 and better, quick and easy. I&#8217;m more of an MySQL guy, so you&#8217;ll have to excuse me if my Oracle lingo isn&#8217;t up to scratch.</p>
<h4>One time setup, as system</h4>
<p>This has to be execute by a privileged Oracle user. If there is an error message, it&#8217;s probably because CTX_DDL has not been installed and you&#8217;ll need to talk to a skilled DBA or figure out how to do it yourself.</p>
<pre>GRANT EXECUTE ON CTX_DDL TO mydb;</pre>
<h4>Lexer setup</h4>
<p>One the step above is done, you can do everything as the Oracle user ID you normally work under.</p>
<p>This step sets up the &#8216;global_lexer&#8217; which determine how Oracle understands text (e.g. that oxen is the plural of ox in the English locale). If you are only working in English I believe you can just drop all the French references.</p>
<pre>begin
ctx_ddl.drop_preference('global_lexer');
end;
/

begin
ctx_ddl.drop_preference('english_lexer');
end;
/

begin
ctx_ddl.drop_preference('french_lexer');
end;
/

begin
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','printjoins','$_#@*&amp;^%/\()');
ctx_ddl.set_attribute('english_lexer', 'skipjoins', '-');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_themes','no');
ctx_ddl.set_attribute('french_lexer','base_letter','yes');
end;
/

exec ctx_ddl.create_preference('global_lexer','multi_lexer') ;

begin
ctx_ddl.add_sub_lexer('global_lexer', 'french', 'french_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'default','english_lexer');
end;
/</pre>
<h4>Table setup</h4>
<p>If you&#8217;re doing a multilingual set up, you&#8217;ll need a field in your table that specifies the language. One can never be quite sure about how things are done in Oracle, but the values EN and FR seem to do the right thing. Our table looks something like this (we&#8217;re just showing the important stuff here):</p>
<pre>CREATE TABLE Post
(
    search    CLOB,
    subject   NVARCHAR2(256)  NOT NULL,
    body      CLOB,
    lang      VARCHAR2(6)    NOT NULL
);</pre>
<p>The <code>subject</code> and <code>body</code> fields have the data we want to search. Because Oracle doesn&#8217;t let you search multiple fields, we concatenate them into <code>search</code> at UPDATE/INSERT time. <code>lang</code> is storing the language code for this particular row, i.e. EN or FR.</p>
<p>Next we must set up a trigger that will maintain the <code>search</code> field for you:</p>
<pre>CREATE OR REPLACE TRIGGER post_text_trigger
BEFORE UPDATE OR INSERT ON post
FOR EACH ROW
DECLARE
    a CLOB;
BEGIN
    a := :NEW.subject || ' ';
    a := a || :NEW.body;
    :NEW.search := a;
END post_text_trigger ;
/</pre>
<p>Now you may be asking yourself: why is that &#8216;a&#8217; assignment all over the place? Simple: who the hell knows, it&#8217;s Oracle.</p>
<p>After that we create a new INDEX on the table to do the full text searching:</p>
<pre>CREATE INDEX post_text ON Post(search)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER global_lexer STOPLIST ctxsys.default_stoplist LANGUAGE COLUMN lang');</pre>
<h4>Maintaining the Index</h4>
<p>Unlike what you might expect, Oracle doesn&#8217;t magically keep the full text search index up-to-date. This is something you have to maintain on your own. If you were to execute a search, you wouldn&#8217;t find anything right now.</p>
<p>To bring the index up to date, use the following command:</p>
<pre>EXEC CTX_DDL.SYNC_INDEX('post_text');</pre>
<p>In our implementation we just run that command from a cron script every 30 minutes. There&#8217;s also a way to do this in Oracle I understand, but we&#8217;re quite comfortable with UNIX commands. How often you want to run will vary depending on how much text you have and how often you update it.</p>
<p>You also have another option: create a trigger that will update the index every time a row is modified. There may be performance issues involved with this, but if you want to try it, here&#8217;s the magic:</p>
<pre>CREATE OR REPLACE TRIGGER post_text_trigger2
AFTER INSERT OR UPDATE OR DELETE ON post
DECLARE
    v_job NUMBER;
BEGIN
    IF deleting THEN
        DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.optimize_index(''post_text'',''FULL'');', SYSDATE);
    ELSE
        DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.sync_index(''post_text'');', SYSDATE);
    END IF;
END;
/</pre>
<h4>Doing searches</h4>
<p>Oracle provides all sorts of various methods, obtusely documented. See the references below for more. What you probably want to do is, well, look for stuff. Here&#8217;s what we did</p>
<p>First, we convert the search string into a <em>safe</em> list words &#8212; no punctuation, etc.. The we create a search string that looks something like the following:</p>
<pre>SELECT * FROM Post WHERE contains(search, '${mutt} AND ${and} AND ${jeff}') &gt; 0;</pre>
<p>Note the <code>{}</code>: this stops the word &#8216;and&#8217; being searched from from being recognized as a Oracle Keyword</p>
<h4>Further reading</h4>
<p>There&#8217;s lots of info here and the first reference in particular told me most of the information I needed to know.</p>
<ul>
<li><a href="http://www.oracle-base.com/articles/9i/FullTextIndexingUsingOracleText9i.php">http://www.oracle-base.com/articles/9i/FullTextIndexingUsingOracleText9i.php</a></li>
<li><a href="http://www.oracle.com/technology/pub/articles/asplund-textsearch.html">http://www.oracle.com/technology/pub/articles/asplund-textsearch.html</a></li>
<li><a href="http://download-west.oracle.com/docs/cd/B10501_01/text.920/a96518/cqoper.htm">CONTAINS Query Operators</a></li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://code.davidjanes.com/blog/2008/10/27/full-text-searching-in-oracle/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
