[Users] Claws mail to relational database?

Troy Vitullo troy at troyvit.com
Sun Jul 7 18:32:09 CEST 2013


On Fri, 5 Jul 2013 10:10:03 -0700
"Victoria S." <1 at VictoriasJourney.com> wrote:

> I'm thinking ahead here, but I hope to one day (soon?) attempt to
> copy most of my (many thousands) of archived mail messages (science
> research-related) into a relational database, for data
> mining/visualization purposes - and I'd like to ask you folks for
> ideas/suggestions, on the best approach (if any).
> 
> I'm interested (e.g.) in key genes/functional genomics ... I work
> from home (self-employed; other areas) but have a background in
> genomics/myriad other areas of molecular genetics.
> 
> Any ideas would be much appreciated. Thank you!  :-)

Victoria,

My main experience is with relational DBs. Despite that (or maybe because of it) my first thought was to look at other means to store the data like: 

- the recoll method posted in a separate thread
- a pure search engine like Solr which could be faster and allow for detailed searches without burdening yourself with relationships
- a noSQL solution like hadoop and/or hive (although those look tough to set up).

All that said, whether you go with a sql solution or another search option the biggest challenges I see would be:

- building a solid schema,
- cleaning the mails for insertion into the data source,
- building reusable queries,
- automating the transfer of future mails to your data source.

Claws seems like it would be helpful with the second and fourth challenges.

... Maybe SQL wouldn't be bad if it's fewer than a few million records. Solr or a search engine like it would allow you worry a little less about cleaning your data, so you could get away with less pre-processing of your mails.

Then the next step would be to build a text classifier ... </evil laugh>

Anyway that's my 2 cents. Sorry for the diversion from the topic of the mailing list.

Troy



More information about the Users mailing list