Apache Solr, indexed attachments and search through Drupal Views 3.x

I now have my Apache Solr Server setup and it appears that all the tales you hear in the Drupalverse about Solr being the way to go are true. But there is quite a bit of work that goes into a Solr setup unless you decide to take an outsourced route – Acquia for example.

Search for Drupal may be the least documented feature of the whole CMS. Which is kinda weird considering how important search is overall.

I plan on putting together something fairly comprehensive for the setup and implementation of Solr for Drupal including the indexing of attachments (15k PDFs, in my case) and search UI from Drupal Views 3.x. But for now, here are a few of the highlevel items for Solr running on Centos 6.4 Linux.

  • Apache Tomcat
  • Java
  • Tika (to parse documents)
  • Search API
    Solr search (for Search API) 7.x-1.2

    or

  • Apache Solr framework7.x-1.4
        Apache Solr search 7.x-1.4
  • Attachments
  • Facets
  • Highlighting
  • Drupal Views with exposed operators for flexible searches

To be honest, it’s kind of a mess and there are many of other options than what are listed here. And the documentation is not as thorough as it might be.

Apache Solr, indexed attachments and search through Drupal Views 3.x

I now have my Apache Solr Server setup and it appears that all the tales you hear in the Drupalverse about Solr being the way to go are true. But there is quite a bit of work that goes into a Solr setup unless you decide to take an outsourced route – Acquia for example.

Search for Drupal may be the least documented feature of the whole CMS. Which is kinda weird considering how important search is overall.

I plan on putting together something fairly comprehensive for the setup and implementation of Solr for Drupal including the indexing of attachments (15k PDFs, in my case) and search UI from Drupal Views 3.x. But for now, here are a few of the highlevel items for Solr running on Centos 6.4 Linux.

  • Apache Tomcat
  • Java
  • Tika (to parse documents)
  • Search API
    Solr search (for Search API) 7.x-1.2

    or

  • Apache Solr framework7.x-1.4
        Apache Solr search 7.x-1.4
  • Attachments
  • Facets
  • Highlighting
  • Drupal Views with exposed operators for flexible searches

To be honest, it’s kind of a mess and there are many of other options than what are listed here. And the documentation is not as thorough as it might be.

Good book for PHP, MySQL and Apache

I have been working with this particular book now for a couple of months now and I have to say that it is one of my favorites.

Sam’s Teach Yourself PHP, MySQL and Apache. – this link goes to Amazon.Image

This book is perfect for where I want to go with Drupal. xAMP is the Drupal backbone and this book covers everything that I need from the last three letters of this acronym. There code sample are well documented and helpful. Check it out if you plan to be able to develop for Drupal and not just implement the modules and themes that are on Drupal.org.

SSL on Production

I was able to install the SSL cert on prod in less than 5 minutes. that is the advantage of setting up a quality environment that truly matches prod; you can make the prod changes so easily and help keep your prod environment pristine.

now back to the multilingual stuff. i hate to get distracted but i really hate creds in clear text. ssl had to be done.

SSL, Linux and you

Today I am going to place an SSL cert on my quality site and then, following change management BPs, my prod site.

i’ve worked with ssl for years in the corporate environment and relatively easy to do. and simpler than a lot of people realize. you don’t need to spend four figures with verisign to leverage ssl. ssl can be free and the only trade off is that you’ll need to install the cert in your local store (not hard to do at all) so that your browser will recognize and trust it, or just deal with a message that the cert may not be valid because it doesn’t “chain” in the local store.

Here’s one good link:

http://slacksite.com/apache/webserver.php

http://wiki.centos.org/HowTos/Https – this is applicable to centos, which is what i am using in qual/prod

if you want to see all the certs in your local store you can do it with the certificates snap in for the mmc on your windows box. i won’t go into all the steps but if someone has questions, post and i’ll help.

the bottom line is, use ssl. you don’t want to use a user name and password field over port 80. that is just inviting trouble. and even the cheap certs like digicert are fine. with verisign, you’re paying for a cert that will be recognized by as many browsers worldwide as possible. and you’re paying for their cust service. which is really good. if your site will be local to the us, buy a cheap cert and use it.

WAMP and Aptana code editor

well, i have all my great drupal environments setup. but as i started working with php code in aptana, i realized pretty quickly that i was going to need a local web server to test the php pages. so, install apache for windows xp. let me digress. windows xp is one of the best OSs ever developed. this MCSE isn’t big on MS much these days but XP is still a masterpiece. it will run so much stuff and run it well. i have a nearly ten year old dell running XP and it works great. it will run all the modern open source crap that i want and it is as solid as a rock. there is a reason that the biz community ran XP for as long as it did. and DOES. there are still millions of PCs out there in offices all over the place that run xp. there just isn’t a good reason to get rid of it when it will do 96% of what you need on hardware that is now pretty old. the only thing it won’t run that i actually care about is IE9. and i would never use that anyway if i didn’t have to plan my development around its css quirks.

i love XP. it’s great and it is a product like americans used to make: no planned obsolescence.

so, i’m installing a full WAMP server. most of the config for WAMP was pretty easy. the hard part was getting aptana to work properly with apache and the installed browsers so that the preview features could be used. for clues on that i am attaching a SS of what my server config looked like in aptana. it was a pain and i don’t believe that it was well documented. so, i posted what i did to stackoverflow here: http://stackoverflow.com/questions/12188045/aptana-studio-3-generic-server-doesnt-support-start/14402908#14402908

Import/export Aggregator feeds part II

well, i now have it running in prod and there were a few minor differences that i wanted to document.

since i don’t have an X window type desktop, i had to access the phpmyadmin via a remote web browser. so, i had to change the file /etc/conf.d/phpmyadmin.conf in three places to allow connections from all instead of just localhost. i was then able to access the web gui, import the file, and test the aggegator feeds. i had to chanmge som permissions as well to be able to access the file so i could edit it remotely. i could have done it locally via VI Editor, but i hate that thing. it sucks. so after the changes were made i made sure to change the perms back as well as change the file itself so that attempting to access the phpmyadmin gui from the remote host yielded again a “forbidden” error.

Getting all the environments ready.

So, I have all three environments up and running. I have the prod site out of DC hosted by blackmesh.com, the best drupal hosting company there is. The quality site, which mimics the prod site perfectly past the virtualization level, from the drupal core level, the OS and patch installs and even the modules and themes installed on drupal. file structure. user accounts. access methods (although prod uses sftp and qual uses ftp) and os components like apache and mysql. very nice.

i had to reenable clean urls on the q and d boxes. which was OK. i used most of the proc that i uploaded earlier today with a few differences that are not in the instructions. but google and drupal.org were very helpful and I really didn’t have much trouble. just be aware that linux permissions seem to be trickier than perms on a windows server.

getting late, and write more tomorrow. but i am excited because i am finally going to be able to get to work on the site itself and not have to worry so much about all the infrastructure crap. i did  get a few issues with file transfer worked out too. but i have a question: is there any other way to use drupals install module gui to use a protocol other than ftp? that’s the only choice in the drop down. i can ssh files to the server and i’m sure that i could install/enable the modules via the command line, but from the web admin gui running remotely, i don’t see how you could. i’ll pcik that back up more tomorrow maybe.

Enabling Clean URLs

I though that i had posted this proc months ago until i went looking for it and couldn’t find it. so here is the proc that i used to enable clean urls on drupal 7.17 on ubuntu desktop 12.04

Procedure – Enable clean URLs for Drupal 7 on Ubuntu Desktop 12_04 rev_1.doc

Date: 11/13/2012

Author: PJ McGhee

Email: PJMcGhee@hotmail.com

Procedure: New

Replace Existing

Intended Audience: Drupal admins

Purpose: To explain the steps needed to enable clean URLs in Drupal by enabling the rewrite_module component and changing the httpd.conf file of Apache2 to

Web References:

http://drupal.org/getting-started/clean-urls

http://httpd.apache.org/docs/2.0/mod/mod_rewrite.html

Procedure:

  1. Navigate to the Configuration, Search and Metadata section of Drupal.
  2. Click Clean URLs.
  3. In Drupal 7 the test will be run automatically and will give the result right away.
  4. When the test fails take a good look at the links above for background on the issue.
  5. Then, on this particular install, since you are running a dedicated server, you can do this first.
  6. At the command line, type apache2ctl –M and hit enter.
  7. Look for rewrite_module (shared) you should not see it (that’s to be expected)
  8. Type a2enmod rewrite and hit enter.
  9. Type apachectl –M again, hit enter. Now you should see the module listed there.
  10. Go to etcapache2 and edit the httpd.conf file which should be empty.
  11. Add the lines:

<Directory /var/www/drupal/>
   RewriteEngine on
   RewriteBase /drupal
   RewriteCond %{REQUEST_FILENAME} !-f
   RewriteCond %{REQUEST_FILENAME} !-d
   RewriteRule ^(.*)$ index.php?q=$1 [L,QSA]
</Directory>

  1. In this scenario, the directory structure shown is correct, but it should reflect where you actually have Drupal installed.
  2. Then type sudo /etc/init.d/force-reload and hit enter.
  3. Then type sudo /etc/init.d/apache2 restart and hit enter.
  4. Then test clean URLs in Drupal.
  5. You should now see a check box that will allow you to enable this feature.