If you can’t load FTS3 on a Mac, this post on StackOverflow tells you how to solve the problem.
The final bit of magic is in:
from pysqlite2 import dbapi2 as sqlite3
If you can’t load FTS3 on a Mac, this post on StackOverflow tells you how to solve the problem.
The final bit of magic is in:
from pysqlite2 import dbapi2 as sqlite3
I always seem to forget this:
$ mysql -u root -p Enter password: enter mysql-root-user passwd mysql> CREATE DATABASE DATABASENAME mysql> GRANT ALL PRIVILEGES ON DBNAME.* to "DBUSER"@"localhost" IDENTIFIED BY "DBPASSWORD"; mysql> FLUSH PRIVILEGES; mysql> EXIT $
Freebase is a user-editable, user-extensible structured database, a sort of one-stop shop semantic web/Wikipedia application. I started playing with Freebase about a year ago and the application has made significant strides over that period, especially in the usability department. Freebase also provides a very nice API which I’m using in GenX, with the caveat that it’s currently almost useless because of query timeouts.
I just came across the following page on Freebase: http://vancouver.freebase.com/. This page is what Freebase calls a Base, which is a collection of Tables/Views, which are things like “Vancouver Bloggers“, “Mayoral Candidates 2008” and so forth. A Table/View is a list of Topics, which are basically the equivalent of a Wikipedia page. Get all that? It makes sense after a while
A few observations:
Here’s a few things I was reading about over the weekend.
SQLAlchemy is a full-featured Design Pattern-heavy pythonic database ORM. I am totally going to use this for my next Python SQL database project and may even do some playing with old datasets (using the reflection features, yum) soon. If you are considering doing SQL work on your next Python project, don’t even bother with the usual PEP 249 stuff, start with this.
Note that if you’re working with Django it handles the DB in its own way so SQLAlchemy may be of limited utility.
CouchDB “is a distributed, fault-tolerant and schema-free document-oriented database accessible via a RESTful HTTP/JSON API”. I couldn’t have written that more succently myself, so I didn’t. I qualified the paragraph above on SQLAlchemy that I’m going to use that for my next SQL project because I’m really biting at the bit to try CouchDB out. The CouchDB design philosophy – a REST API a returning lists of JSON-objects – reflects my current design paradigm very closely, and the only question I have is whether in practically scales to millions of rows.
A caveat that it’s written in the-cool-nerds-are-doing-it language Erlang, but because you don’t have to interact with that it should be OK for us mortals.
CouchDB is about to officially become a “top level” Apache project, though none of the documentation on the Apache.org site reflects this yet.
Virtuoso is a “high-performance object-relational SQL database”. It apparently can perform well. As I came across through the Planet RDF aggregator, this may be something you want to look into if you’re working on an RDF/SPARQL project.
That’s a mouthfull, isn’t it? Amazon is offering to host public datasets on EC2 for free. What’s the catch? It will host the data, but you have to pay for the computing resources to use that data in the normal EC2 manner. Still, if you’re using a large public dataset and you’re already EC2-friendly, you might want to consider this program. An even more interesting thought occurs (though I’m not sure if it will fly): if you’re using large amounts of your own data on EC2, you may want to offer it up as a free resource.
There’s more on this on by Lidija Davis on Read/Write Web.
Here’s how you do full text searching across multiple different columns in a multilingual environment on Oracle 9 and better, quick and easy. I’m more of an MySQL guy, so you’ll have to excuse me if my Oracle lingo isn’t up to scratch.
This has to be execute by a privileged Oracle user. If there is an error message, it’s probably because CTX_DDL has not been installed and you’ll need to talk to a skilled DBA or figure out how to do it yourself.
GRANT EXECUTE ON CTX_DDL TO mydb;
One the step above is done, you can do everything as the Oracle user ID you normally work under.
This step sets up the ‘global_lexer’ which determine how Oracle understands text (e.g. that oxen is the plural of ox in the English locale). If you are only working in English I believe you can just drop all the French references.
begin
ctx_ddl.drop_preference('global_lexer');
end;
/
begin
ctx_ddl.drop_preference('english_lexer');
end;
/
begin
ctx_ddl.drop_preference('french_lexer');
end;
/
begin
ctx_ddl.create_preference('english_lexer','basic_lexer');
ctx_ddl.set_attribute('english_lexer','index_themes','yes');
ctx_ddl.set_attribute('english_lexer','printjoins','$_#@*&^%/\()');
ctx_ddl.set_attribute('english_lexer', 'skipjoins', '-');
ctx_ddl.create_preference('french_lexer','basic_lexer');
ctx_ddl.set_attribute('french_lexer','index_themes','no');
ctx_ddl.set_attribute('french_lexer','base_letter','yes');
end;
/
exec ctx_ddl.create_preference('global_lexer','multi_lexer') ;
begin
ctx_ddl.add_sub_lexer('global_lexer', 'french', 'french_lexer');
ctx_ddl.add_sub_lexer('global_lexer', 'default','english_lexer');
end;
/
If you’re doing a multilingual set up, you’ll need a field in your table that specifies the language. One can never be quite sure about how things are done in Oracle, but the values EN and FR seem to do the right thing. Our table looks something like this (we’re just showing the important stuff here):
CREATE TABLE Post
(
search CLOB,
subject NVARCHAR2(256) NOT NULL,
body CLOB,
lang VARCHAR2(6) NOT NULL
);
The subject and body fields have the data we want to search. Because Oracle doesn’t let you search multiple fields, we concatenate them into search at UPDATE/INSERT time. lang is storing the language code for this particular row, i.e. EN or FR.
Next we must set up a trigger that will maintain the search field for you:
CREATE OR REPLACE TRIGGER post_text_trigger
BEFORE UPDATE OR INSERT ON post
FOR EACH ROW
DECLARE
a CLOB;
BEGIN
a := :NEW.subject || ' ';
a := a || :NEW.body;
:NEW.search := a;
END post_text_trigger ;
/
Now you may be asking yourself: why is that ‘a’ assignment all over the place? Simple: who the hell knows, it’s Oracle.
After that we create a new INDEX on the table to do the full text searching:
CREATE INDEX post_text ON Post(search)
INDEXTYPE IS CTXSYS.CONTEXT
PARAMETERS('LEXER global_lexer STOPLIST ctxsys.default_stoplist LANGUAGE COLUMN lang');
Unlike what you might expect, Oracle doesn’t magically keep the full text search index up-to-date. This is something you have to maintain on your own. If you were to execute a search, you wouldn’t find anything right now.
To bring the index up to date, use the following command:
EXEC CTX_DDL.SYNC_INDEX('post_text');
In our implementation we just run that command from a cron script every 30 minutes. There’s also a way to do this in Oracle I understand, but we’re quite comfortable with UNIX commands. How often you want to run will vary depending on how much text you have and how often you update it.
You also have another option: create a trigger that will update the index every time a row is modified. There may be performance issues involved with this, but if you want to try it, here’s the magic:
CREATE OR REPLACE TRIGGER post_text_trigger2
AFTER INSERT OR UPDATE OR DELETE ON post
DECLARE
v_job NUMBER;
BEGIN
IF deleting THEN
DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.optimize_index(''post_text'',''FULL'');', SYSDATE);
ELSE
DBMS_JOB.SUBMIT(v_job, 'ctx_ddl.sync_index(''post_text'');', SYSDATE);
END IF;
END;
/
Oracle provides all sorts of various methods, obtusely documented. See the references below for more. What you probably want to do is, well, look for stuff. Here’s what we did
First, we convert the search string into a safe list words — no punctuation, etc.. The we create a search string that looks something like the following:
SELECT * FROM Post WHERE contains(search, '${mutt} AND ${and} AND ${jeff}') > 0;
Note the {}: this stops the word ‘and’ being searched from from being recognized as a Oracle Keyword
There’s lots of info here and the first reference in particular told me most of the information I needed to know.
Powered by WordPress