DELIVERY OF W5: I KNOW THAT I HAVE BEEN ASKING FOR URLS, BUT FOR THIS LAST ASSIGNMENT, SEND ME A .TXT FILE AS AN ATTACHMENT IN EMAIL. DO NOT PUT IT ON YOUR WEB PAGE.
Oracle has a CTXRULE index type that allows you to build a document classification application.
Before we used the CONTEXT index type. We would build an index on a table and search the table on the indexed field. We inserted files from URLs or as direct inserts, and did searches with text operations. But, we had to rebuild the index when we added something new.
Some businesses want to group documents in categories. For example, when a news article comes in, they wish to put all the sports articles together under one category. The CTXRULE index type allows one to do this. The advantage of this approach is that as new documents come in, the index does not have to be rebuilt. The index is on the query structure and not on the incoming news articles.
You must do these steps. You can refer to the Oracle document CTXRULE Index Type
Step 1. Create a table of queries that define the classification. Queries in your query set should include the use of operators: ABOUT, STEM, AND and OR. You can see in the Oracle document above that there is both a category and query field. You should insert 4 categores in this table.
Step 2. Create a CTXRULE index on your query table.
Step 3. Create a news table like the one in the Oracle document above.
Step 4. Create a trigger that will create a connection between incoming articles and the "queries" table. This trigger must be created before you do Step 5. Here is some outline below. The example in the reference Oracle document is a little bit incorrect. One mistake is that the "=" should be ":=". Also, an article can belong to multiple categories. You can concatinate category names in the assignment statement using the || symbol.
create trigger news_categorization
before insert on news
for each row
begin
(YOU INCLUDE YOUR MATCHING AND ASSIGNMENT STATEMENT HERE.)
end;
Step 5. Insert 8 values into this table. Leave the category field empty. This field will be assigned by your trigger (if it works). Your insert values in this example are short texts, but they are suppose to represent entire news articles coming in. So, have a few words in the clob field that let some of the inserts fit under 2 or more categories.
For example, the insert statement can be a direct insert statement. But make sure the clob field is at least 2 or 3 lines of text in length. Also, make sure some of the words in your clob field will respond to your queries.
To further explain, suppose I made a "queries" table and it has an entry where "cateory" is 'insect' and "query" is 'fly or wasp or bee'. Then I might have an insert such as:
Insert into news Values (105,'MFA', 'Judith', '','Molde Fekte Akademi is a little known sports club in Molde. If I say the epee stings like a bee, then this document will be catorgorized with insects. But maybe it comes under sports too.');
Step 6. Do 2 select statements on "queries" to show hits and misses. Using these two statements on my queries table, I get the first is a miss and the second is a hit.
Select * from queries where matches (query,'Is a spider an insect?');
Select * from queries where matches (query,'Is a bee sting dangerous?');
Step 7. Do the select statement:
Select * from news;
This will show that the category field was populated by the trigger.
Step 8. Create a new index called "con-index" of the type CONTEXT index on the table NEWS on the field CATEGORY. Do a select from news using the con-index. Explain what is the difference between the select statement that uses the con-index and the select statement and results in Step 6.
Step 9. Include all of your sql input and output statements in a text file and send it to me in email.
The most difficult part of this is the trigger, and it is not that much to write. You can also look at the following document for help, or Oracle's OTN site. http://www-db.stanford.edu/~ullman/fcdb/oracle/or-triggers.html
Good luck with the last assignment, W5. It should be finished by November 28th. If you finish early, congratulations. Please understand the statements that you send in.
-Judith