Wednesday, January 07, 2009

Searching over structured and unstructured data in 10G

In my previous blog post I've explained a nice little feature introduced to Oracle Text Context indexes in 11G which allows you to do efficient searches involving both structured and unstructured data. It is time to talk about what could be done by those of you using previous Oracle versions.

I'll start with the same table and data which I've used in my previous blog post.

Tag it

Literally. Because the best way to search over structured and unstructured data is, well, not to mix it. I'll show you how to use your own datastore procedure to archive this goal.

First of all, we need to declare an author tag, we will use it for our searches later:

SQL> begin
2 ctx_ddl.create_section_group('books_section_group', 'basic_section_group');
3 ctx_ddl.add_field_section('books_section_group', 'author', 'author', false);
4 end;
5 /

PL/SQL procedure successfully completed
All we need after that is a simple custom datastore procedure to merge it all together (this function is what Oracle Text will index as well) and tag the author:
SQL> create or replace procedure books_ds(
2 p_rid in rowid,
3 p_clob in out clob
4 ) is
5 begin
6 for cur in (select '<author>'||author||'</author>' author, text from books where rowid=p_rid)
7 loop
8 dbms_lob.copy(p_clob, cur.author, dbms_lob.getlength(cur.author));
9 dbms_lob.append(p_clob, cur.text);
10 end loop;
11 end;
12 /

Procedure created
Let's create an index now:
SQL> alter table books add (books_info varchar2(1));

Table altered

SQL> begin
2 ctx_ddl.create_preference('books_ds', 'user_datastore');
3 ctx_ddl.set_attribute('books_ds', 'procedure', 'books_ds');
4 end;
5 /

PL/SQL procedure successfully completed

SQL> CREATE INDEX ctx_books_info
2 ON books (books_info)
3 INDEXTYPE IS CTXSYS.CONTEXT
4 PARAMETERS('filter ctxsys.null_filter section group books_section_group lexer books_lexer datastore books_ds memory 64m');

Index created
Continuing on the example I've used in my previous post, searching on books which mention procedures and are written by author XDB will look like this:
SQL> select author, title, published
2 from books
3 where contains(books_info, 'xdb within author and procedure', 1) > 0
4 order by score(1) desc;

37 rows selected.


Statistics
----------------------------------------------------------
13 recursive calls
0 db block gets
40 consistent gets
0 physical reads
0 redo size
1379 bytes sent via SQL*Net to client
372 bytes received via SQL*Net from client
4 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
37 rows processed

No comments:

Post a Comment