OTN Logo Banner
forumssample_codedocumentationsoftwareregisterbuysearch
 
News
Technology Centers
Products
Services   
header image header image header image
header image
header image
header image header image header image
  header image
 
Concierge Banner
Technology Centers
Grid
Java
Linux
Web Services
Windows
XML
Products
Oracle Database
Oracle Application Server
Oracle Developer Suite
Oracle JDeveloper
Oracle Collaboration Suite
Services
Order Software on CD
Security Alerts
Hosted Development
Get Support arrow
Submit Articles
Events and User Group Meetings
Oracle University
Oracle Magazine arrow
TechBlast
Buy Books
left bottom curve

The CTXRULE Index Type - A New Oracle Text Indextype for Building Document Classification Applications
 

What is a Document Classification Application?

A document classification application is one that classifies an incoming stream of documents based on their content. These applications are also known as document routing or filtering applications. For example, an online news agency might need to classify its incoming stream of articles as they arrive into different categories like politics, business, sports, health, and travel.

What is the CTXRULE Index Type?

CTXRULE is a new index type designed for classification, which is the inverse of information retrieval. In traditional information retrieval applications, you index a set of documents and find documents with a text query. In classification, you index a set of queries, and find queries with a document.

Creating a CTXRULE index

The first step is to create a table of queries that define our classifications:

       create table queries (

         query_id      number primary key,

         category      varchar2(40),


         query         varchar2(80)

       );

We know populate the table with the classifications and the queries that define each:

       insert into queries values (1,'Sports','tennis or football or soccer');


       insert into queries values (2,'Health','cardiovascular medicine');

Now we create the CTXRULE index. We can also specify lexer, storage, section group, and wordlist parameters if needed:

       create index query_idx on queries(query) indextype is ctxsys.ctxrule;

Classifying a Document

We'll be using the MATCHES operator instead of the CONTAINS operator used for a CONTEXT index. Assuming that the incoming documents are stored in a table like:


      create table news (

         newsid         number,

	 author         varchar2(40),

	 source         varchar2(30),


	 category	varchar2(50),

	 news_article   clob

      );

We can create a trigger with MATCHES to classify each document:

      ...


      begin

       for c in (select category from queries

                 where matches(query, :new.news_article) > 0 )

		 loop

	   :new.category = c.category;

	 end loop;

      end;


      ...

An article could have multiple categories. In our example, this would require another table to record category-article id.

Instead of indexing the article text and periodically running stored queries, we index the query strings, and use incoming document text to query the queries. The scalability problem gets solved because we don't have to run every query all the time.

CTXRULE indexes require sync and optimize, just like a CONTEXT index. Simply use the ctx_ddl.sync_index and ctx_ddl.optimize_index calls, passing a CTXRULE index name instead of a CONTEXT index name.

More Info
Visit the Oracle9i Text page for more information
Visit the OTN Discussion Forum that the Oracle Text team monitors
Oracle9i Text Reference
Oracle9i Text Application Developers' Guide
   
Oracle9i Database Daily Features
Archives
   

Copyright © 2003, Oracle Corporation. All Rights Reserved.

About OTN I Contact Us I About Oracle I Legal Notices and Terms of Use I Privacy Statement

Powered by Oracle Application Server Portal