md5

For various security-related reasons I’m using a md5 hash to encrypt part of the url generated by a front-end file manager that I’ve heavily modified/customized. For example, if you insert an image, video etc. via the file manager into your article the resulting url will look something like this,

http://mysite.com/member/qlu62f8d6dh65d1ce91064ayh27he0pl/video.mp4

Ugly? Yes it is.

But will this cause any bad mojo with regard to search engines? I’ve been looking around but haven’t found anything definitive one way or another.

Todd
https://xiiro.com


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

It won’t break anything, but it will be giving up a great deal of search engine mojo. If you can add another segment to the URL somewhere with a “slug-ified” version of the name, then a search by name will turn up this page. I presume that these pages are public, right? So your URL could look like:

http://example.com/members/asdfasdfasdfasdfasdfasfa/bob-smith

and then a search for Bob Smith would have a much better chance of including that page.

You may want to use a more compact slug than MD5, too. Have a hunt for “URL shortener in [your programming language here]” for examples of different approaches that return something tidy like aFe5 (cryptic, short, yet unique) rather than the guaranteed 32 characters of an MD5 hash. If you store this in the database (rather than calculating it on the fly), and index it, it makes no difference either way for performance whether you find by ID or find by hash. You pay once at account creation for the conversion.

Walter

On Mar 27, 2015, at 9:57 PM, Todd email@hidden wrote:

For various security-related reasons I’m using a md5 hash to encrypt part of the url generated by a front-end file manager that I’ve heavily modified/customized. For example, if you insert an image, video etc. via the file manager into your article the resulting url will look something like this,

http://mysite.com/member/qlu62f8d6dh65d1ce91064ayh27he0pl/video.mp4

Ugly? Yes it is.

But will this cause any bad mojo with regard to search engines? I’ve been looking around but haven’t found anything definitive one way or another.

Todd
https://xiiro.com


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

Yes these are public pages, but perhaps I wasn’t clear: The hash is only used to reference embedded images, videos etc. It is not part of the public-facing article which uses friendly urls, eg, http://mysite.com/css/sass/my-sassy-tutorial.html. So really, it’s only when a member embeds their own image etc. via their personal file manager that certain account-specific info could be exposed, hence the hash.

I first tried it with 8 characters instead of 32 which I’m sure would be perfectly acceptable for my needs.

The hash is created on-the-fly the first time the member launches the file browser which creates each member’s unique personalized media folder. The hash is not stored in the db.

Besides security, I also need to ensure there are no duplicates. Yes, a 32 character hash is overkill for this, and would assume an mind-numbingly absurd number of members which simply will not be the case. Like I said, 8 characters should be perfectly fine (maybe even less) but I’m trying to determine an appropriate balance without knowing how many members there may ultimately be.

The slug is an interesting idea but I’m not sure how to pull it off.

Todd
https://xiiro.com

It won’t break anything, but it will be giving up a great deal of search engine mojo. If you can add another segment to the URL somewhere with a “slug-ified” version of the name, then a search by name will turn up this page. I presume that these pages are public, right? So your URL could look like:

http://example.com/members/asdfasdfasdfasdfasdfasfa/bob-smith

and then a search for Bob Smith would have a much better chance of including that page.

You may want to use a more compact slug than MD5, too. Have a hunt for “URL shortener in [your programming language here]” for examples of different approaches that return something tidy like aFe5 (cryptic, short, yet unique) rather than the guaranteed 32 characters of an MD5 hash. If you store this in the database (rather than calculating it on the fly), and index it, it makes no difference either way for performance whether you find by ID or find by hash. You pay once at account creation for the conversion.

Walter

On Mar 27, 2015, at 9:57 PM, Todd email@hidden wrote:

For various security-related reasons I’m using a md5 hash to encrypt part of the url generated by a front-end file manager that I’ve heavily modified/customized. For example, if you insert an image, video etc. via the file manager into your article the resulting url will look something like this,

http://mysite.com/member/qlu62f8d6dh65d1ce91064ayh27he0pl/video.mp4

Ugly? Yes it is.

But will this cause any bad mojo with regard to search engines? I’ve been looking around but haven’t found anything definitive one way or another.

Todd
https://xiiro.com


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

If the hash is not stored in the database, then you need to ensure that it is something very unique, because you won’t have any means of making a test for uniqueness. I would actually urge you to add this as a user-level attribute in your database, so you know which media folder belongs to each user on a very basic level. As for making certain that a value doesn’t repeat, there are a couple of easy ways forward here (once you accept that you do need to involve the database):

  1. Seed the random string with a deterministic value, like the user id from the database.
  2. Do a recursive lookup to see if the random string has already been used.

Here’s a Ruby implementation of the latter, but you can easily do this in PHP as well:

def set_order_code
  code = Array.new(7) { (('a'..'z').to_a + (0..9).to_a)[rand(36)] }.join.to_s
  if WidgetOrder.find_by_order_code(code)
    return self.set_order_code()
  else
    self.order_code = code
  end
end

The key is in the third and fourth lines – if the database returns a result, then call the function from within itself (recursion). This won’t start being expensive unless you make your keys very short or you start having a whole lot of users.

Walter

On Mar 28, 2015, at 11:11 AM, Todd email@hidden wrote:

Yes these are public pages, but perhaps I wasn’t clear: The hash is only used to reference embedded images, videos etc. It is not part of the public-facing article which uses friendly urls, eg, http://mysite.com/css/sass/my-sassy-tutorial.html. So really, it’s only when a member embeds their own image etc. via their personal file manager that certain account-specific info could be exposed, hence the hash.

I first tried it with 8 characters instead of 32 which I’m sure would be perfectly acceptable for my needs.

The hash is created on-the-fly the first time the member launches the file browser which creates each member’s unique personalized media folder. The hash is not stored in the db.

Besides security, I also need to ensure there are no duplicates. Yes, a 32 character hash is overkill for this, and would assume an mind-numbingly absurd number of members which simply will not be the case. Like I said, 8 characters should be perfectly fine (maybe even less) but I’m trying to determine an appropriate balance without knowing how many members there may ultimately be.

The slug is an interesting idea but I’m not sure how to pull it off.

Todd
https://xiiro.com

It won’t break anything, but it will be giving up a great deal of search engine mojo. If you can add another segment to the URL somewhere with a “slug-ified” version of the name, then a search by name will turn up this page. I presume that these pages are public, right? So your URL could look like:

http://example.com/members/asdfasdfasdfasdfasdfasfa/bob-smith

and then a search for Bob Smith would have a much better chance of including that page.

You may want to use a more compact slug than MD5, too. Have a hunt for “URL shortener in [your programming language here]” for examples of different approaches that return something tidy like aFe5 (cryptic, short, yet unique) rather than the guaranteed 32 characters of an MD5 hash. If you store this in the database (rather than calculating it on the fly), and index it, it makes no difference either way for performance whether you find by ID or find by hash. You pay once at account creation for the conversion.

Walter

On Mar 27, 2015, at 9:57 PM, Todd email@hidden wrote:

For various security-related reasons I’m using a md5 hash to encrypt part of the url generated by a front-end file manager that I’ve heavily modified/customized. For example, if you insert an image, video etc. via the file manager into your article the resulting url will look something like this,

http://mysite.com/member/qlu62f8d6dh65d1ce91064ayh27he0pl/video.mp4

Ugly? Yes it is.

But will this cause any bad mojo with regard to search engines? I’ve been looking around but haven’t found anything definitive one way or another.

Todd
https://xiiro.com


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

This is something I’ve thought about since I first considered using a hash, but admittedly it’s beyond my current understanding.

The file manager is a completely separate app from the primary framework and most of the work required to bridge the two was done by someone else. The md5 aspect was my suggestion and was added after-the-fact, I simply don’t know enough to weave your suggestion into the mix.

If you want to look at the source you’re welcome to, maybe it’s a simple matter, maybe not. But on this matter I think I’m over my skis.

Todd
https://xiiro.com

add this as a user-level attribute in your database, so you know which media folder belongs to each user on a very basic level.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

If you’re doing an MD5 of the current time in milliseconds, then the odds of a repeat are very slight indeed. But as you have seen, the URLs will be long and that will be out of your control. But more importantly, where this is being done, and where the path is being stored relative to the user’s account or the page they are editing are the primary questions I would have about this. If the file manager simply creates a path and stores a file there, and then returns that path to the editor (WYSIWYG or otherwise) in your application, probably in the JavaScript realm, then there’s no actual integration going on here. And there would not be any issue with a particular user having any number of these paths, each one being unique and all. If the file manager is creating an account per user, and making some connection between that account and the user account in your system, then I don’t know what you’re going to do here without persisting that connection in one or more databases. Either you pass a token from the user account to the file manager, identifying the user to the file manager for future reference (say they want to browse all the photos they have uploaded), or you are storing a file_manager_user_id on your user account system, for the same reason.

But getting back to your initial question, I personally don’t put much stock in sub-page filenames (images and other resources) having any material impact on SEO. I have heard people say that it does, but then I have heard a lot of crazy things in my life, and I choose which to believe fairly carefully.

Walter

On Mar 28, 2015, at 11:54 AM, Todd email@hidden wrote:

This is something I’ve thought about since I first considered using a hash, but admittedly it’s beyond my current understanding.

The file manager is a completely separate app from the primary framework and most of the work required to bridge the two was done by someone else. The md5 aspect was my suggestion and was added after-the-fact, I simply don’t know enough to weave your suggestion into the mix.

If you want to look at the source you’re welcome to, maybe it’s a simple matter, maybe not. But on this matter I think I’m over my skis.

Todd
https://xiiro.com

add this as a user-level attribute in your database, so you know which media folder belongs to each user on a very basic level.


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options

If you’re doing an MD5 of the current time in milliseconds, then the odds of a repeat are very slight indeed. But as you have seen, the URLs will be long and that will be out of your control. But more importantly, where this is being done, and where the path is being stored relative to the user’s account or the page they are editing are the primary questions I would have about this. If the file manager simply creates a path and stores a file there, and then returns that path to the editor (WYSIWYG or otherwise) in your application, probably in the JavaScript realm, then there’s no actual integration going on here. And there would not be any issue with a particular user having any number of these paths, each one being unique and all.

It (file manager) is only creating a unique path between itself and the RTE on a user-by-user basis. There is some integration between the main framework and the file manager (mostly relating to access rights/permissions), but as it applies to the actual hash (path), that part runs independent of and is not tied to any core account functions. It’s only obscuring a name. As long as there aren’t more users than hash combinations everything should be fine. 32 is safe but probably unnecessary. 8 seems more reasonable yet still safe.

That said, it would be nice to directly associate the hash with the back-end user-account for the convenience of the admin. Perhaps at some point that will happen but for now it will require some minor backtracking. Doable yes, but somewhat inconvenient.

If the file manager is creating an account per user, and making some connection between that account and the user account in your system, then I don’t know what you’re going to do here without persisting that connection in one or more databases. Either you pass a token from the user account to the file manager, identifying the user to the file manager for future reference (say they want to browse all the photos they have uploaded), or you are storing a file_manager_user_id on your user account system, for the same reason.

There’s nothing that low-level going on with regard to the user-account and hash.

But getting back to your initial question, I personally don’t put much stock in sub-page filenames (images and other resources) having any material impact on SEO. I have heard people say that it does, but then I have heard a lot of crazy things in my life, and I choose which to believe fairly carefully.

That’s what I assumed but I thought I would ask anyway. You never know.

Thanks,

Todd
https://xiiro.com


offtopic mailing list
email@hidden
Update your subscriptions at:
http://freewaytalk.net/person/options