-
Notifications
You must be signed in to change notification settings - Fork 9.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
core(link-text): localize link text audit #14927
base: main
Are you sure you want to change the base?
Conversation
also added German strings to block list
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
no functional change, but less if/else soup and more descriptive function name
if opening html tag is the only place with a lang attribute
@@ -5,7 +5,7 @@ | |||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | |||
--> | |||
|
|||
<html> | |||
<html lang="en"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be good to update this smoke test to include cases that exercise the new logic (getLangOfInnerText
). The expectations for this smoke tests are found in seo-failing.js
. If the current smoke expectations were more explicit, it would have this for the link-text
audit (currently it just checks for 4 failures, but it'd be good to be more explicit now):
'link-text': {
score: 0,
displayValue: '4 links found',
details: {
items: [
{text: 'click this'},
{text: 'click this'},
{text: 'CLICK THIS'},
{text: 'click this'},
],
},
}
Can you devise some cases here to make sure getLangOfInnerText
works as expected? If you need to create a new file HTML in fixtures/seo
and a new accompanying expectation file in test-definitions/seo-mixed-language.js
, that's OK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed the smoke test seo-failing
to be more explicit, including the language attribute and a German example. I also added a new smoke test seo-mixed-language
with many edge cases (German and English examples).
const langElements = getElementsInDocument('[lang]'); // eslint-disable-line no-undef | ||
const docHasOnlyHtmlLangAttr = (langElements.length === 1) && | ||
langElements[0].nodeName === 'HTML'; | ||
const lang = !docHasOnlyHtmlLangAttr ? null : langElements[0].getAttribute('lang'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be normalized some how. Consider the valid language code en-US
.
I don't recall how we could actually normalize to the base language code. In shared/localization/locales.js
we form a mapping explicitly. I suppose here we could just split by -
and return the first part to use as the loclae.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't normalize here. The gatherer just collects the lang code and passes it to the audit. If it should get normalized, then the last tool in the chain (the audit) should do that.
See also my response below to @adamraine about keeping the loop over language code segments.
@@ -5,7 +5,7 @@ | |||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | |||
--> | |||
|
|||
<html> | |||
<html lang="en"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little uneasy how this is now required for the audit to function at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this, and we think it'd be best to do the current behavior of the audit when the document does not declare a language. I think that's what the PR is currently doing, but some tests in the audit would be good to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?
My implementation ignores links if the language is not declared. I also added a new test case to core/test/audits/seo/link-text-test.js
to reflect that new behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?
The current behavior before this PR is what we want to fall back to if a lang
is not specified.
@connorjclark @adamraine Thanks for the review. The easy ones are fixed. I wrote a smoke test, but I don't have more time today. I'll do some cleanup and push the changes tomorrow. I'll have a look at the other suggestions this week (or weekend). |
This comment was marked as spam.
This comment was marked as spam.
@@ -10,15 +10,15 @@ import LinkTextAudit from '../../../audits/seo/link-text.js'; | |||
|
|||
describe('SEO: link text audit', () => { | |||
it('fails when link with non descriptive text is found', () => { | |||
const invalidLink = {href: 'https://example.com/otherpage.html', text: 'click here', rel: ''}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a new test or two that has non-english text that is bad would be clearer. All these existing test cases could leave off lang: 'en'
since it doesn't modify the thing they are asserting; plus that gives us coverage for the common case of there being no document language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned it up and added more tests.
I think, I addressed all annotations. If you really want to change the audit to use only the first segment of a language code (e. g. A few questions are still open from #14845 (comment):
|
@connorjclark @adamraine Can I do anything to finish this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay @raffaelj, and thanks for being patient
@@ -5,7 +5,7 @@ | |||
* Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. | |||
--> | |||
|
|||
<html> | |||
<html lang="en"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure - What do you mean with "current behavior"? The current behavior before this PR or the current behavior of this PR?
The current behavior before this PR is what we want to fall back to if a lang
is not specified.
* @param {string|null} currentLang | ||
* @return {string} | ||
*/ | ||
function getLangOfInnerText(node, currentLang = null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function adds a lot of complexity just so we can check the edge case where a child of the anchor defines a new language. We could get most of the benefit if we just check for node.closest('[lang]')?.getAttribute('lang') || ''
and set the lang to unknown (''
) if any child nodes define a lang. If the language is unknown we just check every language like Connor mentioned.
I think the slightly worse experience for those edge cases is preferable to introducing all of this complexity.
nonDescriptiveLinkTexts[lang].has(searchTerm)) { | ||
return true; | ||
} | ||
lang = lang.substring(0, lang.lastIndexOf('-')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future-proofing sounds fine 👍
…o localize-link-text-audit
Now I have to add another delay. I see 10 changed files from 3-6 months ago and I have to read them all again before I can answer. I'm not sure, if I find some time during the next weeks. If someone else wants to take over to speed things up - just go ahead. |
Summary
The link-text audit produces false positives, because it is not localized. I changed the anchor-elements gatherer and the link-text audit to check for the current language.
Related Issues/PRs
Fixes #14845