Sphider Search Engine

ThomasKimmich · April 1, 2010, 1:08pm

Hi all,

a question where probably David Owen may flash some light into my dark.

As he described, he established Sphider successfully to his printlineadvertising (nice site, well done).

Now when I install sphider, I`m able to index nearly every page on that planet - but not mine. Is there something that I missed here? I wonder about the reason for this.

If David could have an eye on it?

Thanks for any suggestion in advance.

Greetz from Germany

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

davedesigner · April 1, 2010, 4:27pm

Hi Thomas,

Good point.

It’s a while since I set this up so quickly from memory without looking it up. Not all servers allow the process to robot its own site.

The way I got round this is to set up the admin side of Sphider on a local machine using MAMP, but write the results to the database on the server (changing the sql details in the admin side)

Whenever there is a site change I open up the Admin section locally in a browser and spider the site. I’ve even got an action to add “Sphider no index” on divs like menus or footers that would confuse the results with duplicated content.

we use it to search two sites the http://www.printlineadvertising.co.uk site and this one:

http://www.promotion-shop.co.uk/printline/

and combine the results.

David Owen { Freeway Friendly Web hosting and Domains }

http://www.ineedwebhosting.co.uk | http://www.PrintlineAdvertising.co.uk

On 1 Apr 2010, at 14:08, Thomas Kimmich wrote:

Hi all,

a question where probably David Owen may flash some light into my dark.

As he described, he established Sphider successfully to his printlineadvertising (nice site, well done).

Now when I install sphider, I`m able to index nearly every page on that planet - but not mine. Is there something that I missed here? I wonder about the reason for this.

If David could have an eye on it?

Thanks for any suggestion in advance.

Greetz from Germany

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · April 1, 2010, 5:34pm

Thanks David for the light.

I`d never built a “local machine” with mamp so I have to add it to my “to-do-list: Point 103 :-)”

But Ill try it, cause I think, that spider is really nice and handy, although I actually dont know how to make a customer solution.

Or would you see some Alternatives to it (A Freeway solution)?. I dont think so and the big search-engine thread stopped and everybody is waiting for the other isnt it?

Best Regards

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

waltd · April 1, 2010, 6:47pm

Tim Plumb made an Action a while back that used a Google personal
search product. It worked really well until Google discontinued the
“beta” for that product. I know there are some ways to do this using
their engine, and styling the results (although you have to keep, but
not love, the ads). Perhaps there could be another run at that
particular windmill, especially if Mr. Plumb finds himself with a
spare moment. (I kid, I kid…)

Most of the good search products require you to kick off a “spider”
program at a regular interval in order to keep the index current with
the site. Getting that part to work (as you have noted here) is the
real rub of any such system.

Another approach occurs to me as I write this: It might be possible to
write the spider in a Freeway Action, and upload the index along with
the site. I have written one search engine in JavaScript – I had to
do that because it was to be distributed on a DVD. It’s a very naive
effort, but within the very limited domain of a book collection, it
works quickly and effectively. The really hard work is done on the
server when the site is generated. The entire virtual directory of the
site is crawled and converted to a JSON object. This object is then
used as the index, and allows such niceties as type-ahead search
suggestions.

http://files.libertyfund.org/pll

Doing something similar in an Action would have some strict limits.
For one, this would only work for static pages. For another, there
would probably need to be a limit to the number of pages in the site,
both for the reason of limiting the resulting JSON object’s size, but
also because Actions tend to time out during publishing and that could
become a brittle point in the process.

Anyway, some things to think about.

Walter

On Apr 1, 2010, at 1:34 PM, Thomas Kimmich wrote:

Thanks David for the light.

I`d never built a “local machine” with mamp so I have to add it to
my “to-do-list: Point 103 :-)”

But Ill try it, cause I think, that spider is really nice and handy, although I actually dont know how to make a customer solution.

Or would you see some Alternatives to it (A Freeway solution)?. I
dont think so and the big search-engine thread stopped and everybody is waiting for the other isnt it?

Best Regards

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · April 8, 2010, 9:06am

Hi Walter and all,

anyway, I´d like to keep the thread warm and it`s no accident that I chose the “OFF TOPIC” for it.

So let me resume:

Using Actions:

only for static sites, that requires the update by changing content in freeway and will do so by uploading? Not too bad, but 99% of (my) pages will be the more or less dynamic using webyep. Believe me - I´m not really waiting for an action, `cause the updating will be "the beer of my customers (as it´s free said in germany)

Sphider Search Engine:

Nice engine with a small backend, where I could say to my customer: "If you change content, please make sure to reindex your Search Engine. In Sphider it´s one click away and thats the reason why I could love it. The only problem in here is, that some "providers" not allow to index their own hosts. The workaround over the own plattform seems to be not suitable for my customers - isnt it?

So Walter - to my opinion not the actionizing will solve the problems - but small apps can do this for sure. Could you or someone else point some other small solutions that´s worth to have a look on?

In another thread I remember you said something like:

It`s not too difficult to write a search engine if a mySQL is available - just 20 lines of code or something like that. What could we expect of a solution like that?

I´d like to say: The one should deliver the backend, I would deliver the graphics, put it together to a small own app, maybe shared for a few bucks to defray development costs - and that`s it. Or is this too optimistic?

Best regards

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

waltd · April 8, 2010, 2:02pm

I’ll try to answer these inline below.

On Apr 8, 2010, at 5:06 AM, Thomas Kimmich wrote:

Hi Walter and all,

anyway, I´d like to keep the thread warm and it`s no accident that I
chose the “OFF TOPIC” for it.

So let me resume:

Using Actions:

only for static sites, that requires the update by changing content
in freeway and will do so by uploading? Not too bad, but 99% of (my)
pages will be the more or less dynamic using webyep. Believe me - I
´m not really waiting for an action, `cause the updating will be
"the beer of my customers (as it´s free said in germany)

Yes, the Action method would not work for any of the dynamic parts,
only for the static things that Freeway can “see” while publishing.

Sphider Search Engine:

Nice engine with a small backend, where I could say to my customer:
"If you change content, please make sure to reindex your Search
Engine. In Sphider it´s one click away and thats the reason why I could love it. The only problem in here is, that some "providers" not allow to index their own hosts. The workaround over the own plattform seems to be not suitable for my customers - isnt it?

Hosting often turns up as the elephant in the living room (amusing
english-language saying, might be American, but I’m not sure). For
example, I can’t count the number of problems people have doing what I
consider to be the most basic things on the (still popular for some
reason, must be the page 3-style commercials) GoDaddy service.

Sphider is a very very nice system, and ideally suited to a mixture of
static and dynamic content. If you can guide your customers to a
hosting provider that actually works, then you can implement it.

So Walter - to my opinion not the actionizing will solve the
problems - but small apps can do this for sure. Could you or someone
else point some other small solutions that´s worth to have a look on?

In another thread I remember you said something like:

It`s not too difficult to write a search engine if a mySQL is
available - just 20 lines of code or something like that. What
could we expect of a solution like that?

The problem with this statement is that it presumes that all of your
content lives in the database, or that you have a “crawler” like
Sphider’s that indexes the static content periodically and stores it
in the database. MySQL includes some very nice search engine functions
as a part of its flavor of the SQL language standard. (Ranking, stop
words, a rudimentary proximity feature, boolean operators, etc.)

The devil there is in the details. If you are using a content
management system or a crawler, if the search engine has to
“normalize” the content (by stripping out all the extraneous HTML
fluff that surrounds the actual content) then it also has to make some
qualitative decisions about that content. What value should it place
on the document’s title, meta tags, H1 tag, etc. In the case of a
normal layer-based Freeway layout, content order is not a predictor of
importance or even logical flow. The first paragraph you see on the
page could very easily be located near the end of the HTML source, and
the search engine cannot see that.

Any of the search engines I have written myself have been centered
around a document model that I could control: usually the content was
fully separated from the presentation, so I had nice logical blocks of
plain text to search – any styling was added separately when that
content was filtered through the application, so I didn’t have to step
around that in order to find the actual content in the midst of a lot
of HTML “noise”.

A search engine can also be extremely naive. For example, here’s a SQL
query that will find the word “oranges”:

SELECT * FROM `content` WHERE `plain_text` LIKE '%oranges%';

This query will not find the word “orange”, though. For that, you need
to perform an operation called “stemming”, where plurals are reduced
to their singular equivalents, complex words are broken down into
their components and then searched for individually, etc. None of this
would fit into the mythical 20 lines of code.

Further, none of this gets you the quality of search that you would
from something like Sphinx or Solr, two open-source search engines I
have used in the past. Those engines give you ranked results, and you
can weight the results based on any parameter you choose, or perform
boolean queries and other more exacting techniques to refine the
results. They are also considerably faster for large content sets.

I´d like to say: The one should deliver the backend, I would deliver
the graphics, put it together to a small own app, maybe shared for a
few bucks to defray development costs - and that`s it. Or is this
too optimistic?

Not at all, it just depends on where you set your expectations. If you
want something that does a “good-enough” job of searching for average-
sized sites, but doesn’t set itself up to be the next Google, then
yes, it’s quite possible. But if you create something like this, for
money or for free, I can tell you with certainty that you will need to
support it for the rest of its life, and it can become a full-time job
to explain it to the end users.

Walter

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · April 9, 2010, 7:12am

Good morning Walter,

haaa - you make me laugh - and you got me: Yes, I´d like to conquer the world. But to work for it ´til I´m old and grey was not in the plan.

So I would prefer the “good-enough” variation.

And when you say that sphider is a really nice system, I´ll try to make a workaround with it. Those hosts that support sphider directly, no problem anyway - those that wont, Ill put it on mine.

But another problem for me is always the integration into a freeway document. If I can come back to you if it happens (and it will - I`m quite sure) would be very helpful.

Best regards

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

davedesigner · April 9, 2010, 9:20am

I only ran the indexing locally because it’s indexing our site/s. I know when they change.

There is nothing stopping you from running the search from another online hosting to search other site. e.g. on your own site to search a client site. You could set up a cron task to automate when the indexing takes place for the client.

David

On 8 Apr 2010, at 10:06, Thomas Kimmich wrote:

Nice engine with a small backend, where I could say to my customer: "If you change content, please make sure to reindex your Search Engine. In Sphider it´s one click away and thats the reason why I could love it. The only problem in here is, that some "providers" not allow to index their own hosts. The workaround over the own plattform seems to be not suitable for my customers - isnt it?

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · April 9, 2010, 9:49am

Arrgnn

what I now report, I can`t really believe myself.

After having had no luck in indexing my own site on my online host (as reported) I made another attempt to do with what result?

Yes - it indexed my site.

Conclusion?:

The elephant moved (yea Walter he did), you just have to wait for the right moment? This is not really a satisfying conclusion isn`t it?

… but David,

could you give me some tips for the integration into freeway? At that time I just have the naked search.php into my sphider folder and I wonder how you did. For any small advice I would be very happy (and it could be nice for others to follow your footsteps).

Many thanks in advance

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

davedesigner · April 9, 2010, 5:16pm

Hi Thomas,

its a while since I did this but from memory…

Sphider provides templates for you to style (header footer etc) so rather than cutting up a Freeway page and inserting Freeway code into Sphider I removed all unwanted HTML from the templates leaving a basic form page to include in my Freeway page.

http://www.printlineadvertising.co.uk/search/search.php

then in a Freeway page in the same folder just put a php include

<?php require_once ("search.php"); ?>

The form and the results then appear in your Freeway page

David

On 9 Apr 2010, at 10:49, Thomas Kimmich wrote:

could you give me some tips for the integration into freeway? At that time I just have the naked search.php into my sphider folder and I wonder how you did. For any small advice I would be very happy (and it could be nice for others to follow your footsteps).

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

davedesigner · April 9, 2010, 6:12pm

There are few other hoops to go through like amending where the search
form is ‘posted’ to. So the results show in the same page as the
search. You amend the Sphider code for that.

David

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Helveticus · May 4, 2010, 11:32pm

David, can you please give a bit more info. I’m trying to get the results to display on my page but I can’t get it to work. I have tried a few of the suggestions on the Sphider forum but not much success.

Cheers, Marcel

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · May 5, 2010, 6:42am

Hi Marcel,

the problem is that I am far away of giving the recommended way to work, cause the most of my work is just trying. But I got a result. So I maybe have to divide it into a few parts:

First: What I wanted to have:

A search box on each of my pages. When a visitor typing the search word in, he will sent to the search results. David has a bit of another way cause he set a link to the search page from where you have to start.

Second: The search result page:

Named the shider folder to “sphider”. In Freeway I built a new folder named “sphider” too. In that folder I put a site named search.php (with my design) - similar to that existing one in that “sphider” original folder. In my freeway search.php I inserted an HTML-Mark-Up, in there I put the code from the original search.php starting with:

<?php /******************************************* * Sphider Version 1.3.x and so on. The search box on each page: That`s a point I can`t remember exactly. But it was that way: I called in Browser that basic simple site where only the search box is shown. Can`t find that but started with: and so on... just copied the code and pasted it in another Mark-Up item, positioned on each of my freeway pages. Now I`m sure, that this can be made much more simpler, but I`m not a coder, so that`s the way I try to make a workaround ... isn`t it naive? You can check the look of here: http://www.kimmich-dm.de/beta/index.php Please remind that this is a beta-site, not really and fully indexed. But typing in for example "print" you will receive some results. Best regards Thomas _______________________________________________ offtopic mailing list email@hidden Update your subscriptions at: http://freewaytalk.net/person/options

davedesigner · May 5, 2010, 10:27am

The key is the templates folder in Sphider

All I’m doing in my page is instead of chopping my Freeway page and adding the Freeway html code to the Sphider templates folder, I’m removing any unwanted headers/styling code from the templates folder header/footer etc and including the whole unstyled Shpider search.php page as an include in the Freeway …/search/index.php page.

You need to make sure (in the Sphider code) the search is posted as GET back to your Freeway search, in my case …/search/index.php page (it’s a while since this was done I would have to re-cap when I’ve got a minute).

David Owen { Freeway Friendly Web hosting and Domains }

http://www.ineedwebhosting.co.uk | http://www.PrintlineAdvertising.co.uk

On 5 May 2010, at 00:32, Helveticus wrote:

David, can you please give a bit more info. I’m trying to get the results to display on my page but I can’t get it to work. I have tried a few of the suggestions on the Sphider forum but not much success.

Cheers, Marcel

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

waltd · May 5, 2010, 12:27pm

If you go this route, be sure to really really chop away at the search results page code. When you’re done, it should be entirely missing the outermost parts of an HTML page. Where a normal page (in code) looks like this:

DOCTYPE
HTML
    HEAD
        (bunch of tags)
    /HEAD
    BODY
        (bunch more code, your entire visible page here)
    /BODY
/HTML

You need to chop away until all you have is the middle bit:

        (bunch more code, your entire visible page here)

…and nothing else. This rule doesn’t apply to anything you find enclosed within <?php ?> tags, because those are read by your server and disappear before they reach the browser. But what you really want to avoid is ending up with more than one HEAD or BODY tag, because that will confuse many browsers to the point of pain.

Walter

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

davedesigner · May 5, 2010, 1:17pm

Exactly, all the Sphider headers and footer templates are blank leaving only the search bits are left to include back into the (middle) of my Freeway page.

View the stripped code less header and footer here: http://www.printlineadvertising.co.uk/search/search.php

I considered that was the easiest route as you did not need to mess with the Sphider code. One of the benefits of the include is some of results are styled by the Freeway pages CSS.

TWEAKS & EXTRAS:

One of the things you do need with Sphider use to tweak the results of your pages is to remove common words from your pages in CSS Menus and footers.

menu

Walt, I hope you did not mind but I amended your HTML Comments Actions to make a Sphider Action to make life easier for me

This actions can be applied to Freeway divs. If you don’t do this your results can get contaminated with common words in Menus (e.g 1 of 40 pages of your “your product”

David

On 5 May 2010, at 13:27, waltd wrote:

When you’re done, it should be entirely missing the outermost parts of an HTML page.

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

waltd · May 5, 2010, 1:43pm

Not at all. All of my stuff is open source, so fork away! Consider
posting your finished product on ActionsForge, too.

Walter

On May 5, 2010, at 9:17 AM, David Owen wrote:

Walt, I hope you did not mind but I amended your HTML Comments
Actions to make a Sphider Action to make life easier for me

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · May 6, 2010, 2:28pm

Hi David and Walter,

first thanks for pointing as through (that Off-Topic).

Now I tried to rebuilt the search page to it`s basics. I removed all code from categories.html, header.html, footer.html in the templates folder.

In the search_form.html removed the center tag
the search_results.html I left untouched.

This is the result:

http://www.kimmich-dm.de/beta/sphider-1.3.5/search.php

where I hope, that this matches your expactations.

But now the integration in Freeway:

What do I have to do when include php ( <?php require_once ("search.php"); ?> ) Never did an operation like that and no idea what this means, nut it sounds great.

Thanks for your help.

Cheers

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

waltd · May 6, 2010, 2:36pm

The PHP code you listed here, is that meant to load a library at the beginning of the page, or is that what you put in place when you want to display the results? The two are very different things, and it’s going to stand or fall depending on how search.php is written and what it expects from a template.

If in the sample page, you see this code appear at the very top of the page code (many libraries do this, especially if they need to redirect the browser) then open the Page / HTML Markup dialog, and select Before HTML from the picker in its lower-left corner.

If in the sample page you see this code appear in the middle of the page, where the results would be printed, then use the Crowbar Action to add it within an HTML box that you have drawn on the page. Draw the HTML box the size you expect your search results to be, then double-click inside the box (so you see a text cursor) and from the main menu choose Insert / Action Item / Crowbar. Click once on Crowbar and use the Code button in the Actions palette to enter your code.

If you don’t have Crowbar, you can get it at ActionsForge. It is, by the way, my vision of how Markup Items should be able to work. If you add a Crowbar as the only element on a line of text (so it’s alone in a paragraph) then it will delete the surrounding paragraph. If you enter it as the first character of text in a paragraph, then its output will be moved before the opening paragraph tag on that line. Likewise if it’s the last thing in a paragraph – the code will be moved after the /P tag.

Walter

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

ThomasKimmich · May 6, 2010, 3:07pm

Hi Walter,

if you ask me that way - at the moment I really do not know what I am doing ( … the worst case).

By using “my method (written above)” just inserting HTML Markups I thought to have the solution.

But nobody said: OK do it so - or not possible or something like that I try to follow the footsteps of David as it seems to be a bit of better way (and cleaner code) - and now … hm - I think I have to cry for a while and come back to you later with a bit more detailed questions.

Thomas

offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options