For a while now I've been unhappy with the ruleset names in PMD. Some rulesets have a clear set of rules of a certain type (such as 'strictexception'), but the 'basic', 'design' and 'controversial' rulesets have a very diverse set of rules in them. The result is that:
- I find it hard to guess where a specific rule would be (when I search for a rule).
- Scanning the docs for the 'diverse' rulesets takes too long because of all the rules in them.
- Finding out what a rule does exactly takes longer, since the ruleset name doesn't help you.
- It's not clear where new rules should go.
I believe that the reason for this mess is that we are using the ruleset names to define two different types of groupings. First of all, we want rules to be grouped by their kind for easy searching (as is now done for rulesets like 'unusedcode' and 'strictexception'). Secondly, we want rules to be grouped by their usefulness (like the good practices of 'basic' and the more optional rules in 'controversial') so we can easily enable a set of rules that is useful for most people and can then add a few of the more optional rules later.
My suggestion is to separate these two types of groupings. The rule definitions should be grouped by their type. Then rulesets with only rule references can be used to define usefulness groupings. Since PMD already supports rule references, this is mostly a simple reorganization.
A big advantage of this is also that we can define different standard groups of rule references in the standard PMD distribution. For instance, we can have groups like:
- Errors, good practices, eye candy
- No false positives, few false positives, many false positives
- Basic, design, controversial
Regards,
Wouter Zelle
PS. I'll try to find some time to work on a proof of concept.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
> but the 'basic', 'design' and 'controversial' rulesets
> have a very diverse set of rules in them
I agree. Especially "design", which is all over the place.
> we are using the ruleset names to define
> two different types of groupings.
Yup, right on.
> Then rulesets with only rule references can be
> used to define usefulness groupings.
Sounds like a very good idea. And we've been moving some rules around between releases anyhow, so this wouldn't be completely unexpected. And it'll make categorizing future rules easier as well, perhaps.
> I'll try to find some time to work on a proof of concept.
Great, thanks for thinking/working through this!
Yours,
tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've uploaded the prototype as the new Patch 1243975.
What I've done is: I renamed basic.xml to basic-defs.xml. Then I turned basic.xml into a ruleset with links (mirroring the exact same rules in basic-defs.xml). Then I moved as many rules as possible from basic-defs.xml to other ruleset files (some existing and some new). I then did the same for controversial.xml and design.xml.
I've also renamed unusedcode.xml to uselesscode.xml, so I could add similar rules to that ruleset.
The issue remains that rule definition rulesets and rule links rulesets are mixed in the directory pmd/rulesets. I think they should be seperated, but I don't know what the best choice is. For instance, this is one possibility:
Yup, thanks for uploading the patch, and sorry for the delay in commenting.
I've really been thinking about this for the past few days... I'm just trying to think of any way to minimize the complete breakage of custom rulesets. I agree with the concept - i.e., creating rulesets for types of violations - I'm just still mulling over a better way to do it.
Well, anyhow, just wanted to touch base,
Yours,
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I've upped a new patch that does not result in any breakage. The actual rules definitions are now in /rules and /rulesets now only contains references. The result is that the exported rulesets are no longer directly linked to the rule definitions and you can now have the same rule in multiple rulesets.
The result is that once this patch is applied, we can move all 'basic', 'design' and 'controversial' rule definitions to properly named files (this is not yet done for the patch I supplied) and many more interesting rulesets can be created. The patch itself is only groundwork.
I've not touched scratchpad.xml, since I do not exactly know how you use it. If you only use it for development purposes, I suggest it to be moved to /rules.
Regards,
Wouter
PS. Ruleset docs generation will have to be fixed after this patch too.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It's been a while since we discussed this "move all rules to one big ruleset and then just ref them" thing. What do you think about this now - should we revive this suggestion?
Yours,
Tom
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
That sounds like a good idea, although I would keep seperate files for the rule definitions. An advantage of splitting the 'public' rulesets from the 'private' rule definitions is that we can sort the rule definitions by topic, regardless of whether we want to publish them by topic or as controversial, design or basic. This makes it easier for us to find rules for development work, without having to do a search.
I would suggest creating a rulesets/ruledefs directory and moving the current rulesets there (except scratchpad and the rulesets that refer to other rulesets). Then we create new rulesets files with only references to replace the moved rulesets:
rulesets/ruledefs/basic-def.xml <- Same as the old basic.xml
rulesets/basic.xml <- refers to basic-def.xml
This is completely transparant to current users. Then step 2 is to clean up the ruledefs, by moving the controversial, design and basic rule definitions to topical ruledef files. After a bit of cleanup we can create new 'public' rulesets at will, with various combinations of rules.
Regards,
Wouter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think we can look at doing something about this post 4.2, as we're going to be making other big changes then too.
My thoughts on this...
1) Only a Rule Name should be exported to the public.
2) Rule names should be globally unique within PMD.
3) RuleSets should enforce name uniqueness, and perform auto-renaming on collision. Current definition of rule equality is (name && priority) which is just weird to me.
4) Categorizations we use for documentation purposes should not be affect the usage of a Rule.
5) There should be 1 public RuleSet containing all published rules, which contains references to the PMD internal rulesets.
6) We should be able to figure out how to use the existing RuleSet name to serve as a means of category annotation, which will follow the Rule around as it gets referenced in other used rulesets. Thus the original definer of a Rule dictates the Category. This will allow IDEs (e.g. the Eclipse plugin) to still show User something currently like it does with RuleSet name.
Something like this would simplify things from a Rule users perspective. And, we can totally reorganize/recategorize Rules all we want, and not affect the users.
Further, PMD 4.2 will provide built-in Rule serialization support, with total support for Rule references. This means all RuleSets used by Users should be composed of only rule references to existing PMD rules, never a copy (the Eclipse plugin used to make copies, making PMD upgrades non-effective). Combined with the RuleSet organization above, and PMD upgrades will be a bit more seamless for more of our users in the future.
Ryan
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
While thinking about this more, one of the goals for post 4.2 PMD is to expand language support. Perhaps this is an opportunity to factor that into how we have users think about rules (and us devs for that matter). There's two primary considerations to choosing a Rule
1) Identify the Language (version too?)
2) Identify the Rule name
User's will need to specify at least that much information. Perhaps that tells us how we should be organizing the rules from an external perspective? One public RuleSet per language? (or not see next comment).
I also am skeptical of the utility of the current approach of having a RuleSet dictate the Language, when I believe it properly belongs on the Rule. If we can fix that, then multi-lingual RuleSets can be used. PMD can already tell what files go with what language by default. We should support that in a fashion that doesn't cause errors for people (which currently happens when you mix languages).
We should also have rules specify Language version information. Thus PMD can apply rules which apply only to the given language version of the source file be processed. Currently the user has to make sure they specify rules which make sense with the language version they are using.
It's not clear to me if we should have Rule names be unique with in Language, or just make the universally unique to keep things simple.
More food for thought. :)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I think that my patch will have limited breakage if you add an unusedcode.xml with rule links. The only problem that then remains is that people who use entire rulesets (such as imports.xml) will get the moved rules. I think that is not such a big problem, but it can also be solved (see bottom of post).
Of course, my patch did not resolve the problem of having rule definitions and rule links rulesets mixed together. A transitory solution may be to put rulesets with links that mirror the current rule definition files in the rulesets directory. For example:
rulesets/rulelinks/basic.xml #docs say: Use this
rulesets/basic.xml #docs say: Don't use, deprecated
In this case, basic.xml is the ruleset links file that I put in the patch. For the other setup, you can do this:
rulesets/imports.xml #docs say: Don't use, deprecated
ruledefs/imports.xml #docs say: Use this
In this case, the first imports.xml file is a ruleset links file. This file does not have to include the rules that I moved in the patch rules, so this solution allows for full compatibility for current users. Then you are free to update your docs and you can remove the deprecated functionality in 4.0 or 5.0.
If you want I can create another patch for you that does maintain custom ruleset compatibility.
Regards,
Wouter
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
For a while now I've been unhappy with the ruleset names in PMD. Some rulesets have a clear set of rules of a certain type (such as 'strictexception'), but the 'basic', 'design' and 'controversial' rulesets have a very diverse set of rules in them. The result is that:
- I find it hard to guess where a specific rule would be (when I search for a rule).
- Scanning the docs for the 'diverse' rulesets takes too long because of all the rules in them.
- Finding out what a rule does exactly takes longer, since the ruleset name doesn't help you.
- It's not clear where new rules should go.
I believe that the reason for this mess is that we are using the ruleset names to define two different types of groupings. First of all, we want rules to be grouped by their kind for easy searching (as is now done for rulesets like 'unusedcode' and 'strictexception'). Secondly, we want rules to be grouped by their usefulness (like the good practices of 'basic' and the more optional rules in 'controversial') so we can easily enable a set of rules that is useful for most people and can then add a few of the more optional rules later.
My suggestion is to separate these two types of groupings. The rule definitions should be grouped by their type. Then rulesets with only rule references can be used to define usefulness groupings. Since PMD already supports rule references, this is mostly a simple reorganization.
A big advantage of this is also that we can define different standard groups of rule references in the standard PMD distribution. For instance, we can have groups like:
- Errors, good practices, eye candy
- No false positives, few false positives, many false positives
- Basic, design, controversial
Regards,
Wouter Zelle
PS. I'll try to find some time to work on a proof of concept.
> but the 'basic', 'design' and 'controversial' rulesets
> have a very diverse set of rules in them
I agree. Especially "design", which is all over the place.
> we are using the ruleset names to define
> two different types of groupings.
Yup, right on.
> Then rulesets with only rule references can be
> used to define usefulness groupings.
Sounds like a very good idea. And we've been moving some rules around between releases anyhow, so this wouldn't be completely unexpected. And it'll make categorizing future rules easier as well, perhaps.
> I'll try to find some time to work on a proof of concept.
Great, thanks for thinking/working through this!
Yours,
tom
Tom,
I've uploaded the prototype as the new Patch 1243975.
What I've done is: I renamed basic.xml to basic-defs.xml. Then I turned basic.xml into a ruleset with links (mirroring the exact same rules in basic-defs.xml). Then I moved as many rules as possible from basic-defs.xml to other ruleset files (some existing and some new). I then did the same for controversial.xml and design.xml.
I've also renamed unusedcode.xml to uselesscode.xml, so I could add similar rules to that ruleset.
The issue remains that rule definition rulesets and rule links rulesets are mixed in the directory pmd/rulesets. I think they should be seperated, but I don't know what the best choice is. For instance, this is one possibility:
pmd/rulesets #for rule definitions
pmd/rulesets/rulelinks #for rule links rulesets
and this is another (note, directory names are just examples):
pmd/ruledefs #for rule definitions
pmd/rulesets #for rule links rulesets
Another issue is that the docs may have to be adapted or fixed (for the generated docs).
Regards,
Wouter
Hi Wouter -
Yup, thanks for uploading the patch, and sorry for the delay in commenting.
I've really been thinking about this for the past few days... I'm just trying to think of any way to minimize the complete breakage of custom rulesets. I agree with the concept - i.e., creating rulesets for types of violations - I'm just still mulling over a better way to do it.
Well, anyhow, just wanted to touch base,
Yours,
Tom
Tom,
I've upped a new patch that does not result in any breakage. The actual rules definitions are now in /rules and /rulesets now only contains references. The result is that the exported rulesets are no longer directly linked to the rule definitions and you can now have the same rule in multiple rulesets.
The result is that once this patch is applied, we can move all 'basic', 'design' and 'controversial' rule definitions to properly named files (this is not yet done for the patch I supplied) and many more interesting rulesets can be created. The patch itself is only groundwork.
I've not touched scratchpad.xml, since I do not exactly know how you use it. If you only use it for development purposes, I suggest it to be moved to /rules.
Regards,
Wouter
PS. Ruleset docs generation will have to be fixed after this patch too.
Hi Wouter -
It's been a while since we discussed this "move all rules to one big ruleset and then just ref them" thing. What do you think about this now - should we revive this suggestion?
Yours,
Tom
Hi Tom,
That sounds like a good idea, although I would keep seperate files for the rule definitions. An advantage of splitting the 'public' rulesets from the 'private' rule definitions is that we can sort the rule definitions by topic, regardless of whether we want to publish them by topic or as controversial, design or basic. This makes it easier for us to find rules for development work, without having to do a search.
I would suggest creating a rulesets/ruledefs directory and moving the current rulesets there (except scratchpad and the rulesets that refer to other rulesets). Then we create new rulesets files with only references to replace the moved rulesets:
rulesets/ruledefs/basic-def.xml <- Same as the old basic.xml
rulesets/basic.xml <- refers to basic-def.xml
This is completely transparant to current users. Then step 2 is to clean up the ruledefs, by moving the controversial, design and basic rule definitions to topical ruledef files. After a bit of cleanup we can create new 'public' rulesets at will, with various combinations of rules.
Regards,
Wouter
I think we can look at doing something about this post 4.2, as we're going to be making other big changes then too.
My thoughts on this...
1) Only a Rule Name should be exported to the public.
2) Rule names should be globally unique within PMD.
3) RuleSets should enforce name uniqueness, and perform auto-renaming on collision. Current definition of rule equality is (name && priority) which is just weird to me.
4) Categorizations we use for documentation purposes should not be affect the usage of a Rule.
5) There should be 1 public RuleSet containing all published rules, which contains references to the PMD internal rulesets.
6) We should be able to figure out how to use the existing RuleSet name to serve as a means of category annotation, which will follow the Rule around as it gets referenced in other used rulesets. Thus the original definer of a Rule dictates the Category. This will allow IDEs (e.g. the Eclipse plugin) to still show User something currently like it does with RuleSet name.
Something like this would simplify things from a Rule users perspective. And, we can totally reorganize/recategorize Rules all we want, and not affect the users.
Further, PMD 4.2 will provide built-in Rule serialization support, with total support for Rule references. This means all RuleSets used by Users should be composed of only rule references to existing PMD rules, never a copy (the Eclipse plugin used to make copies, making PMD upgrades non-effective). Combined with the RuleSet organization above, and PMD upgrades will be a bit more seamless for more of our users in the future.
Ryan
While thinking about this more, one of the goals for post 4.2 PMD is to expand language support. Perhaps this is an opportunity to factor that into how we have users think about rules (and us devs for that matter). There's two primary considerations to choosing a Rule
1) Identify the Language (version too?)
2) Identify the Rule name
User's will need to specify at least that much information. Perhaps that tells us how we should be organizing the rules from an external perspective? One public RuleSet per language? (or not see next comment).
I also am skeptical of the utility of the current approach of having a RuleSet dictate the Language, when I believe it properly belongs on the Rule. If we can fix that, then multi-lingual RuleSets can be used. PMD can already tell what files go with what language by default. We should support that in a fashion that doesn't cause errors for people (which currently happens when you mix languages).
We should also have rules specify Language version information. Thus PMD can apply rules which apply only to the given language version of the source file be processed. Currently the user has to make sure they specify rules which make sense with the language version they are using.
It's not clear to me if we should have Rule names be unique with in Language, or just make the universally unique to keep things simple.
More food for thought. :)
I have an xslt that aggregagtes all the rulesets in one 'big ruleset file'.
All it's requires is a small xml files such as this :
<rulesets>
<ruleset filename="..."/>
<ruleset filename="..."/>
<ruleset filename="..."/>
</rulesets>
Which is easily generated on *nix:
echo "<rulesets>"
adding childrens...
ls -1 ${RULESETS_DIR}/*.xml | \ while
read FILE
do
# adapting path to xslt engine
FILENAME=
echo ${FILE} | sed -e 's/\.\/src\/main/\.\./g' -e 's/\/\//\//g'
echo " <ruleset filename=\"${FILENAME}\"/>"
done
echo '</rulesets>'
If you think the xslt will be help, i'll commit it, along the shell script if you want too, in etc/...
Tom,
I think that my patch will have limited breakage if you add an unusedcode.xml with rule links. The only problem that then remains is that people who use entire rulesets (such as imports.xml) will get the moved rules. I think that is not such a big problem, but it can also be solved (see bottom of post).
Of course, my patch did not resolve the problem of having rule definitions and rule links rulesets mixed together. A transitory solution may be to put rulesets with links that mirror the current rule definition files in the rulesets directory. For example:
rulesets/rulelinks/basic.xml #docs say: Use this
rulesets/basic.xml #docs say: Don't use, deprecated
In this case, basic.xml is the ruleset links file that I put in the patch. For the other setup, you can do this:
rulesets/imports.xml #docs say: Don't use, deprecated
ruledefs/imports.xml #docs say: Use this
In this case, the first imports.xml file is a ruleset links file. This file does not have to include the rules that I moved in the patch rules, so this solution allows for full compatibility for current users. Then you are free to update your docs and you can remove the deprecated functionality in 4.0 or 5.0.
If you want I can create another patch for you that does maintain custom ruleset compatibility.
Regards,
Wouter